Provenance-aware tracing of worm break-in and contaminations: A process coloring approach

Provenance-aware tracing of worm break-in and contaminations: A process coloring approach
of 25
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Provenance-Aware Tracing of Worm Break-in and Contaminations:A Process Coloring Approach Xuxian Jiang † , Aaron Walters † , Florian Buchholz † , Dongyan Xu † , Yi-Min Wang ‡ , Eugene H. Spafford † † CERIAS and Department of Computer Science  ‡ Microsoft ResearchPurdue University, West Lafayette, IN 47907 Redmond, WA 98052 {  jiangx, arwalter, buchholf, dxu, spaf  } Abstract To investigate the exploitation and contamination by self-propagating Internet worms, a provenance-aware tracing mechanism is highly desirable. Provenance unawareness causes difficulties in fast and accu-rate identification of a worm’s break-in point (namely, a remotely-accessible vulnerable service running inthe infected host), and incurs significant log data inspection overhead. This paper presents the design, im-plementation, and evaluation of   process coloring , an efficient provenance-aware approach to worm break-in and contamination tracing. More specifically, process coloring assigns a “color”, a unique system-wideidentifier, to each remotely-accessible server or process. The color will then be either inherited by spawnedchild processes or diffused indirectly through process actions (e.g.,  read   or  write  operations).Process coloring brings two major advantages: (1) It enables fast color-based identification of thebreak-in point exploited by a worm even before detailed log analysis; (2) It naturally partitions log dataaccording to their associated colors, effectively reducing the volume of log data that need to be examinedand correspondingly, log processing overhead for worm investigation. A tamper-resistant log collectionmethod is developed based on the virtual machine introspection technique. Our experiments with a numberof real-world worms demonstrate the advantages of processing coloring. For example, to reveal detailedSARS worm contamination, only  12 . 1%  of the entire log data need to be processed. Beyond the virtualmachine platform of our prototype, process coloring and logging mechanisms only incur a very smalladditional performance penalty. Keywords  Intrusion Detection, Worm Infection and Investigation, Process Coloring 1 Introduction Internet worms have become more stealthy and sophisticated in their infection, exploitation, and con-tamination. The recent absence of large-scale worm outbreaks does not indicate that Internet worms areeliminated. Quite on the contrary, there have been reports [8, 9] suggesting that worms may deliberatelyavoid fast massive propagation. Instead, they attempt to lurk in infected machines and surreptitiously inflictmalign contaminations such as rootkit and backdoor installation [1, 34, 45]. In the combat against worms,the following tasks are critical to the understanding of a worm’s exploitation details and to the recovery of aninfected host from worm contaminations:  (1)  identifying the  break-in point  , namely the vulnerable, remotelyaccessible service via which the worm infects the victim and  (2)  determining all contaminations and damagesinflicted by the worm during its residence in the victim. To perform these tasks, various intrusion analysis  tools can be used [12, 31, 35, 36]. For example, BackTracker [36] is an advanced forensic tool that tracesback an intrusion starting from a “detection point” and identifies files and processes that could have affectedthat detection point. The tool takes the entire log file of the host as input for the back-tracking.Log-based intrusion analysis tools face the following challenges:  (1)  Many tools [13, 36, 56] rely onan  externally-determined   detection point, from which a forensic investigation will be initiated towards thebreak-in point of the intrusion. However, due to a worm’s possibly long “infection-to-detection” duration,it may be days or even weeks later when such a detection point is identified. It is therefore desirable thatthe log data carry more information and provide “leads” to initiate more timely investigations.  (2)  Currentoperating systems lack a  provenance-aware  mechanism to pre-classify the log data before log analysis. Onthe other hand, log data generated by the system may be of large volume. As reported in [36], log data aslarge as  1 . 2 GB  can be generated within one day and need to be examined for an intrusion back-track. Theuncategorized bulk log data are likely to result in long duration and high overhead in worm investigation.Although human investigators can provide heuristics (such as the “filtering rules” in [36]) to reduce the logspace to be examined, such heuristics may lead to inaccuracy or incompleteness in worm investigation results. (3)  Many log-based tools do not address  tamper-resistant   log collection, which is essential in dealing withadvanced worms. As shown in Section 2.3, a commonly adopted mechanism, i.e., syscall-wrapping, forcollecting system call traces can be easily circumvented during an attack.In this paper, we present the design, implementation, and evaluation of   process coloring , an efficientprovenance-aware approach to worm break-in and contamination investigation. More specifically, processcoloring associates a “color”, a unique system-wide identifier, to each remotely-accessible server or process -a potential worm break-in point. The color will be either  inherited   directly by any spawned child process, or diffused   indirectly through the processes’ actions (e.g.,  read   or  write  operations). As a result, any process orobject (e.g., a file or directory) affected by a colored process will be tainted with the same color, as recordedin the corresponding log entry. Process coloring naturally leads to the following two key advantages: Color-based identification of a worm’s break-in point  All worm-infected processes and contaminated ob- jects will be tainted with the same color as the srcinal vulnerable service, which is exploited by the worm asthe break-in point. By simply examining the color of any worm-related log entry or any worm-affected object,the break-in point of the corresponding worm can be immediately identified before detailed log analysis. Natural partition of log data  The colors of log entries provide a natural way to partition the log. To revealthe contaminations caused by a worm, it is no longer necessary to examine the entire log file. Instead, only logentrieswiththesamecolorastheworm’sentrypointwillneedtobeinspected. Suchpartitioncansubstantially2  reduce the volume of relevant log data, and thereby improve the efficiency of worm investigation.The practicality and effectiveness of process coloring are demonstrated using a number of real-world self-propagating worms and their variants. For each of these worms, we are able to fast identify the vulnerablenetworked service exploited by the worm. Moreover, reduction of inspected log data is achieved in eachworm experiment. For example, for a detailed SARS worm [11] break-in and contamination investigation,only 12 . 1% oftheentirelogdataneedtobeinspected. Ourprototypealsoaddressestheimportantrequirementof tamper-resistant log data recording. Virtual machine techniques such as VMware [10], Denali [54], Xen[24], and User-Mode Linux (UML) [22] provide a better instrumentation facility than the system call hookingmechanism to safely obtain and collect internal states, including the worm exploitation and contaminationinformation. We adopt a technique similar to Livewire [28] and develop an extension to the UML virtualmachine monitor (VMM) for tamper-resistant logging.We in this paper focus on the application of process coloring to the investigation of Internet worms.However, we also note that process coloring is a generic, extensible mechanism and some of its potentialswill be presented in Section 5.1. The rest of the paper is organized as follows: Section 2 provides an overviewof the process coloring scheme, whose implementation is presented in Section 3. Experimental evaluationresults are presented in Section 4. Other applications and possible attacks are addressed in Section 5. Section6 discusses related work. Finally, Section 7 concludes this paper. 2 Process Coloring Approach 2.1 Initial Coloring Figure 1 shows a process coloring view of a networked host system running multiple servers. A uniquesystem-wide identifier called  color   is assigned to each server process. The color assignment takes place afterthe server processeshave startedbut beforeservingclient requests. A wormbreaking into the systemwill needto exploit a certain vulnerability of a (colored) server process. Because any action performed by the exploitedprocess will lead to a corresponding  color diffusion  in the host (Section 2.2), the break-in and contaminationsby the worm will be evidenced by the color of the affected processes and system resources and by the colorof the corresponding log entries.Each remotely-accessible service is performed by one or more active processes in the host. For example,the Samba service will start with two different processes  smbd   and  nmbd  ; and both  portmap  and  rpc.statd  processes belong to the NFS/RPC service. Such processes can be assigned the same color. However, if we3  Apache Sendmail NFS/RPCMySQL Figure 1: Process coloring view of a system running multiple serversneed to further differentiate each individual process (e.g., “which Apache process is exploited by a Slapperworm?”), different colors can be assigned to processes belonging to the same service. One benefit of suchassignment is that it provides a finer granularity in log data partition. Alternatively, it is possible to define acolor with two fields: a  major   field indicating the service and a  minor   field differentiating between individualworking processes of the same service. For simplicity, we consider each color as having only one single fieldin this paper.Wenotethat althoughtheprocessidentifier (PID)uniquelyidentifies aprocess, itis  not   suitableforcoloringpurpose. Firstly, PIDs are generated without any awareness of break-in points. Consider a zombie process, itis not possible to tell its break-in point simply by its PID or parent’s PID. Secondly, it is possible that a processdynamically injects a customized code (e.g., a whole library) into the code space of another active process. Inthis case, the PID is not capable of reflecting the impact of the former process on the latter. Such an attack hasbecome popular on Windows platform (e.g., the hxdef rootkit[1]) and there exist open-source libraries (e.g.,Injectso [2]) which provide similar functionality for Linux and Solaris platforms. In our design, a new field isdefined in the operating system kernel to record the current colors of active processes. 2.2 Color Diffusion Model After the service processes are initially colored, the colors will be diffused to other processes accordingto the operations performed by the processes. To reveal worm contaminations, we are especially interested inprocess color diffusion via system-wide shared resources, such as files, directories, and sockets. For a wormto inflict contamination (e.g., backdoor installation), it needs to go through a number of system calls. Hencethe process colors are diffused to the affected system resources via the operations performed by the systemcalls. Table 1 shows a simplified color diffusion model with respect to several abstract operations. A wormcontamination example will be described later in this section.The color diffusion model is based on our more general  process label  framework [15], where audit in-formation (defined as process label) is propagated and preserved in a system. We also note that process color4   Abstract Operation Color Diffusion Description Example Events/Actions create < s 1 ,o > color ( o ) =  color ( s 1 )  Subject  s 1  creates a new object  o  create, mkdir, link,mknod, pipe, symlink  create < s 1 ,s 2  > color ( s 2 ) =  color ( s 1 )  Subject  s 1  creates a new subject  s 2  fork, vfork, clone,execve read < s 1 ,o > color ( s 1 ) ∪  =  color ( o )  Subject  s 1  reads from object  o  read, readv, recv,access, stat, fstat, msgrcv read < s 1 ,s 2  > color ( s 1 ) ∪  =  color ( s 2 )  Subject  s 1  reads from subject  s 2  ptrace write < s 1 ,o > color ( o ) ∪  =  color ( s 1 )  Subject  s 1  writes into object  o  write, writev, truncate,chmod, chown, fchown,send, sendfile write < s 1 ,s 2  > color ( s 2 ) ∪  =  color ( s 1 )  Subject  s 1  writes into subject  s 2  ptrace, kill, destroy < s 1 ,o >  - Subject  s 1  destroys the object  o  unlink, rmdir, close destroy < s 1 ,s 2  >  - Subject  s 1  destroys the subject  s 2  kill, exitTable 1: A simplified color diffusion model. A subject is a process while an object is a shared resource.diffusion reflects various information flow models [14, 20, 21] in many aspects such as explicit/implicit infor-mation flows [30]. In this paper, we only consider the information flow through syscall interfaces, with theprocesses as subjects and intermediate resources as objects. Other means such as using CPU utilization ordisk space availability to convey information are beyond the scope of this paper. In the following, we describetwo types of syscall-based color diffusion: Direct diffusion  involves one process directly affecting the color of another process. It can happen in anumber of ways:  (1)  Process spawning:  If a process issues the  fork  ,  vfork  , or  clone  system call, a newchild process will usually be spawned and it will inherit the color of the parent process.  (2)  Code injection:  Aprocess may use code injection (e.g. via  ptrace  system call) to modify the memory space of another process tochange its functionality. The color of the injected process will be updated accordingly.  (3)  Signal processing: A process may send a special signal (e.g., the  kill  command) to another process. If received and authorized,the signal will invoke corresponding signal handling and thus affect the execution flow of the signaled process. Indirect diffusion  from process  s 1  to  s 2  can be represented as  s 1  ⇒  o  ⇒  s 2 , where  o  is an intermediateresource (object). Various types of intermediate resources exist: some resources are dynamically created andwill not exist after the process is terminated (e.g., UNIX sockets); other resources such as files can persis-tently exist and may later affect another process if that process acquires some input from these resources. Tosupport indirect diffusion, the system data structure for an intermediate resource will be enhanced to recordthe influence of a process (i.e. its color). Later, when another process gets input from the “tainted” resource,the process will be tainted the same color  1 . Common resource types supported in current Linux systems 1 To determine which input actually leads to an output, we show in [16] that such problem is equivalent to solving the Halting 5
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!