Sebek 3: tracking the attackers, part one
by Raul Siles, GSE
It has become increasingly important for security professionals to deploy new detection mechanisms to track and capture an attacker's activities. Third Generation (GenIII) Honeynets provide all the components and tools required to gather this information at the deepest level. Sebek is the primary data capture tool for GenIII Honeynets.
The first of this two-part series will discuss what Sebek is and what makes it so interesting. We'll start by looking at the latest Sebek release, version 3, its new capabilities, the Sebek protocol specification and how it integrates with GenIII Honeynet infrastructures. The second article will briefly address how to install and use Sebek on Linux and Windows. It will then focus on a Sebek patch developed by this article's author that makes possible not only to watch what the attacker types but also the response received.
Sebek, a kernel based data capture tool
In May 2005, honeynet technologies notably improved their functionality with the newly released third generation. This last generation is based on a new Honeywall version called Roo [ref 1] and on the new Sebek release, version 3.
Sebek [ref 2] is the most advanced and complex honeynet data capture tool. It is an open-source tool whose purpose is to capture from a honeypot as much information as possible of the attacker’s activities by intercepting specific system calls (syscalls) at the kernel level.
Sebek is based on a client-server architecture. The client is installed on the honeypots and the server is typically deployed on the Honeywall, that is, the honeynet gateway all the traffic entering and leaving the honeynet passes through. The Sebek client component uses techniques similar to those used by kernel-based rootkits. [ref 3] Sebek is implemented in the form of a Linux Kernel Module (LKM) on Linux, as an OS kernel driver on Windows, and as a kernel patch on the various *BSD operating systems. The server module contains user-level tools that allow to gather and display the information captured and exported by the Sebek clients.
The first Sebek version 3 client (v3.0.3) was released for Linux 2.4 kernels in May 2005. This version also included the Linux Sebek server (v3.0.3). Then in October 2005, new client versions were released for other operating systems, such as Windows (v3.0.3), Linux 2.6 kernels (v.3.1.2b) and *BSD (v3.0).
The purpose of this paper is to discuss the new Sebek version, version 3, which will be referred as Sebek from now on. Sebek version 2, its purpose, architecture, features, tools and protocol specification are extensively covered on the original Sebek KYE paper. [ref 4]
Sebek's new capabilities
Sniffing network traffic has long been the traditional way of inspecting the actions performed by an attacker remotely accessing a compromised resource. However, this is not possible if the attacker is protecting his communication channel through encryption and the key used is unknown.
The first Sebek version intercepted all "read" kernel syscalls with a length of one byte, which is what allows one to get the keystrokes typed by the honeypot intruder before they are encrypted, including the commands executed or the passwords used. This initial Sebek data capture functionality was later improved in version 2 to capture all "read" data. This second version also allows to recover entire files copied with SCP or complete IRC and mail messages.
Sebek version 3 extends this functionality by intercepting a new set of system calls. Additionally, it retrieves the parent process id (PPID) and the inode associated with any file-related event. These two fields will be added for each Sebek record. Apart from intercepting the standard "read" syscall, the new version hijacks additional "read" syscalls, the "socket" syscall, the "open" syscall, and the "fork" and "clone" syscalls. The following descriptions use the Linux version as a reference. The same ideas also apply to the Windows version.
The previous Sebek version simply intercepted the standard "NR_read" syscall. The current Sebek version also intercepts the "NR_readv" and "NR_pread" syscalls, which allows one to capture all "read" activities in the system and counteract evasion performed by tools like "sebekill". [ref 5] This tool is a library that maps every "read" syscall to the "readv" syscall, thus avoiding previous Sebek versions' "read" data capture.
The "socket" syscall (NR_socketcall) maps network activities to process activities. Therefore, Sebek is able to collect information about any communication and to establish its relationship with the specific process running on the honeypot. This information will also help to correlate Sebek data with network traffic data captured by the Honeywall using a sniffer. Additionally, if a network connection is associated with an intrusion attempt, it is possible to directly identify the compromised process.
The "open" syscall (NR_open) maps any read action to a filename and its inode in the honeypot filesystems. This allows one to identify all files accessed during an intrusion. The recording of inodes is needed to avoid the "dup2" based obfuscation techniques implemented in sepabek. [ref 6] In this technique, a modified version of the "read" library function is invoked. The modified function duplicates the "read" file descriptor and reads random data from the same descriptor but from a different inode. Using inode tracking capabilities, Sebek is able to screen the invalid data read.
The "fork" ("NR_fork" and "NR_vfork") and "clone" ("NR_clone") syscalls track the process (PID) and parent process identifiers (PPID). Monitoring the process creation makes it possible to identify the system processes relationships and rebuild the whole execution processes tree.
In response to the NoSEBrEaK tool techniques [ref 7], the new Sebek version tries to avoid the extraction of its internal state by initializing its block structures with random values. Additionally, in order not to be removed from the system, Sebek disables the cleanup function when it is not running in testing mode.
The Windows version of Sebek implements similar techniques to the Linux client. The new version hides itself from the list of running modules, filesystem and registry queries. It avoids some of the detection methods used by KprocCheck [ref 8], such as the technique based on the traversal of the PsLoadedModuleList and implemented by the "-d" option.
The Windows module also logs all socket connections, including multiple network-based syscalls such as open, accept and bind. This version records the PPID and PID of each process, required to generate Sebek version 3 protocol packets. Finally, the previous Sebek Windows version only captured command line activities from the "cmd.exe" command prompt. The current version logs all console application input and output, even those that specify a socket for stdin, stdout or stderr. One of the main benefits of the Windows version versus the Linux one is that it does survive system reboots.
Although the latest Sebek version has notably improved its functionality and counterhacking capabilities, there are still challenges in both fields. From a functionality perspective, it would be very interesting to be able to capture the response received by the attacker. Therefore, this improvement will be covered in the second part of this article. From a protection perspective, Sebek should defend itself against common techniques that detect and defeat kernel-based rootkits. Ongoing Sebek development should address issues such as:
- The presence of the Sebek module on Linux can be detected. The hiding method currently used by Sebek simply unlinks the module from the linked list of kernel modules. [ref 3] Figure 1, below, illustrates the "module_hunter" tool detecting the Sebek module running with the "sebek" name. [ref 9]
# insmod ./module_hunter.o # cat /proc/showmodules address module 0xc880d000 scsi_mod size: 0x1a278 0xc8827000 * size: 0x12 0xc8829000 sd_mod size: 0x348c 0xc882e000 BusLogic size: 0x189bc 0xc8848000 jbd size: 0xcab4 0xc8856000 ext3 size: 0x11480 0xc888a000 cdrom size: 0x83c0 0xc8894000 ide-cd size: 0x8b7c 0xc889e000 ide-scsi size: 0x2fb0 0xc88a2000 sr_mod size: 0x46d8 0xc88a6000 net size: 0x219 0xc88a8000 sg size: 0x8eac 0xc88b2000 ip_tables size: 0x3af8 0xc88b7000 iptable_filter size: 0x96c 0xc88bb000 mii size: 0xf88 0xc88bd000 module_hunter size: 0x5ec 0xc88bf000 ipt_REJECT size: 0xf58 0xc88c1000 pcnet32 size: 0x4740 0xc88c7000 autofs size: 0x33d4 0xc88cc000 sebek size: 0x5e50 #Figure 1. Sebek Linux 3.0.3 detection - module_hunter.
- Sebek can also be identified on Windows looking at the hooked native APIs Sebek on the SDT (Service Descriptor Table). [ref 8] This technique is implemented by the "KprocCheck -t" option. Figure 2 illustrates the hooked SDT entries identified and owned by an unknown module, in this case, Sebek.
C:\>KProcCheck.exe -t KProcCheck Version 0.2-beta2 Proof-of-Concept by SIG^2 (www.security.org.sg) Checks SDT for Hooked Native APIs KeServiceDescriptorTable 80559B80 KeServiceDescriptorTable.ServiceTable 804E2D20 KeServiceDescriptorTable.ServiceLimit 284 Entry 19 - [hooked by unknown at FA881498] Entry 25 - [hooked by unknown at FA881E16] Entry 29 - [hooked by unknown at FA882266] Entry 35 - [hooked by unknown at FA880F8E] Entry 47 - [hooked by unknown at FA882360] Entry 49 - [hooked by unknown at FA881EDE] Entry 74 - [hooked by unknown at FA881D6C] Entry 77 - [hooked by unknown at FA8822E2] Entry 91 - [hooked by unknown at FA881924] Entry AD - [hooked by unknown at FA881A4A] Entry B7 - [hooked by unknown at FA8810EE] Entry C8 - [hooked by unknown at FA881310] Entry D2 - [hooked by unknown at FA8813EA] Entry 112 - [hooked by unknown at FA881146] Number of Service Table entries hooked = 14 C:\>Figure 2. Sebek Windows 3.0.3 detection - KProcCheck.
- Other methods use the knowledge of the default Windows driver name, sebek.sys, to detect Sebek's presence. [ref 8] This can be avoided changing the default name. The Sebek Windows installer can be run with the "/N=NAME" command line argument, where NAME is the name of the driver you want to use without the ".sys" extension appended.
- The Sebek Linux syscall table modifications can be discovered. Additionally, the Sebek syscall pointers could be overwritten, disabling Sebek. [ref 7] [ref 3]
- The Sebek Windows version can also be disabled following a similar procedure based on restoring the ServiceTable entries on the SDT. [ref 10] This method is implemented by the SDTrestore tool and is illustrated below in Figure 3.
C:\>SDTrestore.exe SDTrestore Version 0.2 Proof-of-Concept by SIG^2 G-TEC (www.security.org.sg) KeServiceDescriptorTable 80559B80 KeServiceDecriptorTable.ServiceTable 804E2D20 KeServiceDescriptorTable.ServiceLimit 284 ZwClose 19 --[hooked by unknown at FA881498]-- ZwCreateFile 25 --[hooked by unknown at FA881E16]-- ZwCreateKey 29 --[hooked by unknown at FA882266]-- ZwCreateThread 35 --[hooked by unknown at FA880F8E]-- ZwEnumerateKey 47 --[hooked by unknown at FA882360]-- ZwEnumerateValueKey 49 --[hooked by unknown at FA881EDE]-- ZwOpenFile 74 --[hooked by unknown at FA881D6C]-- ZwOpenKey 77 --[hooked by unknown at FA8822E2]-- ZwQueryDirectoryFile 91 --[hooked by unknown at FA881924]-- ZwQuerySystemInformation AD --[hooked by unknown at FA881A4A]-- ZwReadFile B7 --[hooked by unknown at FA8810EE]-- ZwRequestWaitReplyPort C8 --[hooked by unknown at FA881310]-- ZwSecureConnectPort D2 --[hooked by unknown at FA8813EA]-- ZwWriteFile 112 --[hooked by unknown at FA881146]-- Number of Service Table entries hooked = 14 WARNING: THIS IS EXPERIMENTAL CODE. FIXING THE SDT MAY HAVE GRAVE CONSEQUENCES, SUCH AS SYSTEM CRASH, DATA LOSS OR SYSTEM CORRUPTION. PROCEED AT YOUR OWN RISK. YOU HAVE BEEN WARNED. Fix SDT Entries (Y/N)? : Y [+] Patched SDT entry 19 to 805675D9 [+] Patched SDT entry 25 to 8057164C [+] Patched SDT entry 29 to 8056F063 [+] Patched SDT entry 35 to 8057F262 [+] Patched SDT entry 47 to 8056F76A [+] Patched SDT entry 49 to 805801FE [+] Patched SDT entry 74 to 805715E7 [+] Patched SDT entry 77 to 805684D5 [+] Patched SDT entry 91 to 80574DAD [+] Patched SDT entry AD to 8057CC27 [+] Patched SDT entry B7 to 80571B30 [+] Patched SDT entry C8 to 8057860F [+] Patched SDT entry D2 to 80585D7D [+] Patched SDT entry 112 to 8057A125 C:\>Figure 3. Disabling Sebek Windows 3.0.3- SDTrestore.
- The data that is accessed through the "mmap" syscall cannot be recorded by Sebek.[ref 7]
- The Sebek Linux version does not survive system reboots. [ref 3]
Some of the above issues, such as survival after reboot or Sebek syscall disabling, could be solved by implementing Sebek as a kernel patch. However, the selection of an LKM versus a kernel patch is really more of a usability question. A LKM (or driver) is much easier to install, more flexible and best suited for incident response where you have to enable monitoring after an intrusion. For honeypots, it would be very interesting to have a reliable kernel patch implementation.
Sebek is an open-source tool, so it is strongly recommended that one modify it to meet his needs. A customized Sebek version will decrease the likelihood of detection because it is different from the publicly available one.
Sebek's protocol specification
Apart from the Sebek data capture capabilities offered, Sebek also implements advanced logging mechanisms. The data captured by the kernel is sent to the Sebek server using a UDP covert channel. Sebek uses its own kernel-based implementation of the Raw Socket interface.
The new Sebek version uses a redesigned Sebek protocol specification, version 3, which is not backwards compatible. Therefore, for a GenIII Honeynet it is required to use Sebek version 3 in the Roo Honeywall (available by default) and in all the honeypots. Figure 4 illustrates the Sebek protocol binary record format details. Each record has a 56 byte header.
Figure 4. Sebek protocol version 3 packet header.
The new protocol header accommodates the extra information collected at the kernel level, including the parent process identifier (PPID) and the filesystem inode. These two are 32 bits fields and complement the previous protocol version fields. [ref 2] Additionally, the version value currently used is 3 and the "Type" field now supports the new syscalls: read (0), write (1), socket (2) and open (3).
All this information is used by the new GenIII Honeywall advanced data analysis tools to correlate the actions taking place on the honeypot, tracks the attacker's activities at the process level and generates graphical flowcharts representing these events.
Ethereal includes a specific Sebek protocol dissector from version 0.10.0. This dissector is capable of inspecting Sebek protocol version 1, the one used by the previous Sebek releases. Unfortunately, there is no dissector available for Sebek protocol version 3 yet, though this article's author plans to develop one very soon.
Sebek and GenIII Honeynets integration
One key goal for today's security infrastructures is to provide the ability to collect and easily analyze the malicious activities taking place in the IT environment. In September 2004, the Honeynet Research Alliance team members got together to design, architect and develop a new honeynet model. The main concern was the need for a powerful and easy-to-use data analysis tool. The primary purpose of a honeynet is to collect data for gathering information about threads - but how good is that data if it cannot be analyzed?
The result was the new GenIII Honeynet technologies based on the Roo Honeywall. Roo's main purpose was to add advanced data analysis capabilities to the previous GenII version. Sebek is the fundamental tool for advanced data capture and perfectly integrates with the new GenIII Honeynets model and its advanced data analysis features.
GenIII Honeynets implement a new data model independent of the data source. [ref 11] The model establishes the relationships between 4 different conceptual objects: hosts, representing the honeypots, processes, the programs executing in the hosts, files, representing data stored in a hard drive, and network flows, representing communications between hosts.
Sebek data helps to bind these different objects. Processes are identified by the syscalls they invoke. The "read", "write" and "open" syscalls link processes with files, the "socket" syscall links processes with network flows and the "fork" and "clone" syscalls link processes with other processes.
Sebek data is correlated with data captured from the network traffic. The network activities are collected by the Honeywall using the tcpdump network sniffer. These events are processed by different tools: the Snort IDS provides malicious traffic identification, the p0f tool performs OS fingerprinting, and the Argus tool is used for flow monitoring. The data from all these various sources is unified and correlated in a relational database. The data correlation is supported by an Hflow database schema and by the pcap-api interface, used for packet capture manipulation.
The Roo Web-based graphical interface known as Walleye allows one to display and analyze all the data captured and correlated by the honeynet. Typically, an intrusion is initially discovered through the detection of suspicious network events. Figure 5 illustrates Walleye's capabilities to display network traffic details detected by the Honeywall.
Figure 5. Walleye's network flow - Sebek "socket" syscall.
The incident handler should start the incident investigation with a network traffic analysis. In this example, some interaction between system 192.168.100.66 (owned by the attacker) and the honeypot at 192.168.100.150 was detected. This network flow corresponds to TCP traffic with a source port of 1135 and a destination port of 45295. Several packets were exchanged in each direction and the traffic generated two different Snort IDS alerts. The traffic seems to be related with process number 2340 on the honeypot.
Walleye allows one to increase the granularity of the data collected. Figure 6 illustrates the level of detail provided by GenIII Honeynets and Sebek by showing the system processes flowchart diagram associated to the previous network flow. Please note that the image below is a small, truncated version of the original figure and should be click on to be viewed properly.
Figure 6. Truncated illustration of Walleye's process flowchart - Sebek "fork" syscall. Follow the link to view the actual flowchart.
This example shows a Linux honeypot compromised through the "trans2open" Samba buffer overflow. [ref 12] The first Linux process, "init" (PID 1), forked the Samba daemon, "smbd" (PID 1525), which in turn forked a new "smbd" child process (from PID 2320 to 2339) for each new connection received and served. Every connection corresponded to a remote buffer overflow attempt trying to exploit the mentioned Samba vulnerability. Finally, this weakness was successfully exploited by the connection associated to the process with PID 2340. The compromised process generated two different Unix shell process, "sh" (PID 2341 and 2342). The second shell was the one used by the attacker to execute several commands (and processes) on the honeypot, such as "uname", "id", "cat", "ls" or "passwd". The flowchart provides a detailed view of the complete intrusion sequence.
All the information required to build the process flowchart is supplied by Sebek. The Walleye's interface also allows one to inspect additional process details collected by Sebek. By using these capabilities, it is even possible to identify the files accessed during the incident and retrieve the specific commands executed by the attacker. Figure 7 illustrates the activities associated with the shell process (PID 2342) previously referenced on the Figure 6 flowchart. This process opened various library files, such as "/etc/ld.so.cache" or "lib/libtermcap.so.2.0.8", and executed several commands typed by the attacker and captured by Sebek through the "read" syscall, like "uname -a", "id", "cat /etc/passwd" or "ls -l /".
Figure 7. Walleye's process details - Sebek "read" and "open" syscalls.
The new GenIII Honeynets data model provides improved data analysis capabilities that allow one to easily inspect all the activities taking place on the honeynet. Sebek is the key component used to obtain such a detailed level of information.
Concluding part one
Honeynet technologies have existed since 1999. Now in early 2006, Roo based GenIII Honeynets have moved out of the world of academic research and expanded into a real production solution for a variety of organizations. The new Sebek release has certainly motivated these improvements.
The first part of this article series has described the current Sebek version features and enhancements, the lastest Sebek protocol specification, and how this tool integrates with GenIII Honeynets. The article has pointed out Sebek's strengths and weaknesses and has hinted at improving upon one of Sebek's current limitations: whether it is possible to gather what the attacker typed but not the response received. In part two we shall introduce an advanced Sebek version that allows one to overcome this limitation.
[ref 1] "Know Your Enemy: Honeywall CDROM Roo. 3rd Generation Technology". Honeynet Project & Research Alliance. August, 2005. http://www.honeynet.org/papers/cdrom/roo/
[ref 2] "Sebek Homepage". The Honeynet Project.
[ref 3] "Linux kernel rootkits: protecting the system’s "Ring-Zero"". Raul Siles. GCUX whitepaper. May, 2004. http://www.giac.org/certified_professionals/practicals/gcux/0243.php
[ref 4] "Know Your Enemy: Sebek. A kernel based data capture tool". The Honeynet Project. November, 2003. http://www.honeynet.org/papers/sebek.pdf
[ref 5] "sebekill". Ilja van Sprundel. http://ilja.netric.org/files/sebekill.c
[ref 6] "sepabek". Philippe Biondi. 2004. http://www.secdev.org/c/sepabek.c
[ref 7] "NoSEBrEaK". M. Dornseif, T. Holz, C. N. Klein. June, 2004. http://md.hudora.de/publications/2004-NoSEBrEaK.pdf
[ref 8] "Detecting Sebek Win32 Client" and "KProcCheck". Tan Chew Keong. June 2004. http://www.security.org.sg/vuln/sebek215.html, http://www.security.org.sg/code/kproccheck.html
[ref 9] "Finding hidden kernel modules (the extreme way)". madsys. 2003. http://www.phrack.org/phrack/61/p61-0x03_Linenoise.txt
[ref 10] "Win2K/XP SDT Restore 0.2". Tan Chew Keong. July 2004. http://www.security.org.sg/code/sdtrestore.html
[ref 11] "Towards a Third Generation Data Capture Architecture for Honeynets". Edward Balas and Camilo Viecco. Proc. 6th IEEE Information Assurance Workshop. June 2005. http://www.honeynet.org/papers/individual/hflow.pdf
[ref 12] "trans2open() buffer overflow vulnerability". CAN-2003-0201. CVE. 2003. http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2003-0201
About the author
Raul Siles is a senior security consultant with Hewlett-Packard. His current research interests include honeynet technologies, kernel rootkits and wireless security. He is one of the few individuals who have earned the GIAC Security Expert (GSE) designation. More information can be found on his website, www.raulsiles.com.
(C) Copyright 2006, SecurityFocus.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.