Defeating Honeypots: System Issues, Part 1
by Thorsten Holz, Frederic Raynal
To learn about attack patterns and attacker behavior, the concept of electronic decoys or honeypots are often used. These look like regular network resources (computers, routers, switches, etc.) that are deployed to be probed, attacked, and compromised. This electronic bait lures in attackers and helps with the assessment of vulnerabilities. As honeypots are being deployed more and more often within computer networks, blackhats have started to devise techniques to detect, circumvent, and disable the logging mechanisms used on honeypots.
This paper will explain how an attacker typically proceeds as he attacks a honeypot for fun and profit. We will introduce several publicly known (or perhaps unknown) techniques and present some diverse tools which help blackhats to discover and interact with honeypots. The article aims to show those security teams and practitioners who would like to setup or harden their own lines of deception-based defense what the limitation of honeypot-based research currently is. After a brief theoretical introduction, we present several technical examples of different methodologies. This two-part paper will focus on the system world and the application layer, as opposed to our first paper, "Defeating Honeypots: Network Issues," [ref 1] which concentrated purely on network issues.
Honeypots versus steganography
Before going any further, let us talk briefly about steganography. Its goal is to hide the existence of a communication channel to anyone but the intended recipient of a message. As an art and science, it came to the forefront a few years ago when Simmons introduced his classic prisoners problem. [ref 2] Assume two prisoners are jailed in different cells. A warden has been authorized to carry messages from the one to the other. If the messages are ciphered -- which means the warden cannot understand the content of the message -- he will become suspicious, and the communication channel will be stopped. But if the prisoners have agreed on a code (for instance, a red sun on a painting is a code to mean something, while a yellow sun means something else), the message will not be noticed by the warden, and the prisoners will have the chance to covertly plot their escape.
When we configure a high interaction honeypot, we hope to capture a great deal of information about the attacker's activity. Even if he notices he is on a honeypot, learning how he noticed it to be a fake system is still valuable information. So, honeypots do need to be covert, but not necessarily completely covert.
Steganography and honeypots share some characteristics: mainly, that once you are discovered, the game is almost over. Also, in both steganography and honeypots you have to hide the presence of something as best you can. But there are always signs that you leave that inevitably allow for detection. For example, let's use our analogy with the warden again. He may examine the image he's carrying, and if he looks closely he will notice differences between several pictures, and perhaps become suspicious. For honeypots, the situation is comparable: if an attacker carefully watches for signs of deception, he will sooner or later find some.
Since honeypots are being deployed all across the Internet, more and more blackhats' tools are starting to include automatic detection of suspect environments. This has already begun with the backdoor-virus-worm known as AgoBot (also known as Gaobot). [ref 3]
Let's start with some technical examples that show some of the different techniques that attackers can use to detect honeypots.
Many tools are available for building a high interaction honeypot. We will focus some of the most known, and help show you the inside of the matrix.
User Mode Linux (UML)
Some people have tried to used UML [ref 4] as a honeypot, but in order to gauge its effectiveness, we need to first recall what UML is. Basically, UML is a way to have a Linux system running inside another Linux system. We will call the initial Linux kernel the host kernel (or host OS), while the one started by the command linux will be called the guest OS. It runs "above" the host kernel, all in userland. Note that UML is only a hacked kernel that is able to run in userland. Thus, you have to provide the filesystem containing your preferred Linux distribution.
By default, UML executes in TT (Tracing Thread) mode. One main thread will ptrace() each new process that is started in the guest OS. On the host OS, you can see this tracing with the help of ps:
You can identify the main thread (PID 1039) and several threads which are ptrace()d: several kernel threads (PID 1044 -- 1066), init (PID 1068), ile (PID 1268), a shell (PID 1272), and dd (PID 1348).
We quickly discover that when used "by default," UML is not designed to be hidden:
The red lines are ones specific to UML in its default mode. Also note that the network device 0 uses a TUN/TAP (blue line) which is not that common on a real system.
One of the big issues with UML is that it does not use a real hard disk; it uses a fake IDE device called ubd*. If you take a look at the /etc/fstab, execute the command mount, or check the directory /dev/ubd/, you will notice that you are inside a UML system. To hide that information, it is possible to start UML with the options fake_ide and fakehd. But don't forget that what you read may not, in fact, be true: have a look at the UML's root device ubd to see that it is 98 (0x62).
UML can also be easily identified by taking a look at the /proc tree. Most of the entries in this directory will show signs of UML if you just take a closer look:
In addition, the entries iomen, ioports, interrupts, and many others look suspicious. To counter this way of fingerprinting UML, you can use hppfs (Honeypot procfs, [ref 5]) and customize the entries in the /proc hierarchy.
Another place to look for UML at is the address space of a process. On the host OS, the address space looks as follows:
In contrast, the address space inside the guest OS looks like this:
What one should notice, and what is not that common, is the topmost address which indicates the end of the stack (forget about the mapping of the dynamic libraries). Depending on the amount of memory available on your host, it is usually 0xc0000000. However, on the UML, we have 0xbefff000. In fact, the address space between 0xbefff000 and 0xc0000000 on a UML contains the mapping of the UML kernel. This means that each process can access, change, or do whatever it wants with the UML kernel.
To fix most of these problems, you can start UML either with the argument honeypot [ref 6, ref 7] or with the skas mode (Separate Kernel Address Space) [ref 8]. However, having skas mode running is not that easy to do, and the host kernel is really not stable when it is (pending processes, and so on, lead to reboots).
VMware is a very efficient virtual machine which provides a virtual x86 system. Thus, you can install (almost) any Operating System you want, from Linux or Windows to Solaris 10.
The first step to detect a VMware is to look at the hardware that it is supposed to emulate. Prior to version 4.5, there were some specific pieces of hardware that are not configurable:
It is possible to patch the VMware binary to change these default values, however. Kostya Kortchinsky from the French Honeynet Project has written such a patch, which is able to set these values to some other values. This patch is publicly available. [ref 9]
Furthermore, the VMware binary also has an I/O backdoor. This backdoor is used to configure VMware during runtime. The following sequence is used to call the backdoor functions:
At first, register EAX is loaded with a magic number that is used to "authenticate" the backdoor commands. Register EBX stores parameters for the commands. In register ECX the command itself is loaded. The following table gives an overview of some possible commands:
In total, there are at least 15 implemented commands.
Register DX stores the I/O backdoor port, and with the help of the IN instruction, the backdoor command finally gets executed. It is clear that with the help of the VMware I/O backdoor it is possible to interfere with a running VMware.
With the help of Kostya Kortchinsky's patch, you can change the magic number and thus "hide" the backdoor from an attacker. More information about the backdoor in VMware is also available. [ref 10]
Detecting additional lines of defense: chroot and jails
chroot() was never designed for security, but it is considered to be a necessity when one wants to protect a sensitive server. Detecting that you are in a chroot environment , or even circumventing it, is not really that difficult.
Unless the chroot directory is on a specific partition, and placed at the top of it, the inode numbers are not those expected of a real root directory:
Here, the directories inodes of . and .. are the same, and are equal to 2 (which is the normal value for a root directory on a partition). In the current directory, we have:
Then, when we chroot a shell in the current directory, we retrieve the same inodes numbers:
While the .. has been changed to match the . directory, it is still not the expected value.
Note that there is much more to do in a chroot. For instance, you can send signals to any process outside the chroot(), or even attach to outside processes with ptrace(). Since ptrace() can be executed from inside the chroot on any process that is outside the chroot(), the attacker has an easy way to inject whatever he wants on the host. Such evasions are also possible through mount(), fchdir(), sysctl() and so many others [ref 11].
When we think about virtual environments and security, it's pretty clear that chroot() is definitely not something to rely upon. Another option to enforce confinement provided by FreeBSD, which is based on chroot() but is more reliable, is the jail(). A jail() let you create a virtual host, bound to an IP address, with its own tools, users, and more. It is very convenient for virtual hosting, and it could be used for honeypots too.
However, even though FreeBSD's jail() is more reliable, it is not really much more covert. There are several tests one can perform to detect if you are in a jail:
In this section, we focused on detecting if we were in a confined environment with chroot() and jail(). However, are these really even issues for a hacker inside a honeypot? Learning that we are on a "restricted host" is not all that important anymore, as such systems are spreading all across the Internet. The real issue here deals with the leaking of security from the guest to the host. And currently, there are very few (if any) systems out there that have proved to be well enough confined.
Concluding part one
In the first of this two-part series, we compared honeypots to steganography and then looked at three common techniques for virtualizing honeypots. For each of these methods, which included User Mode Linux, VMware environments, and chroot/jail environments, we looked at weaknesses that lead to their detection. It's clear that while each of these have their advantages, they can all be easily detected by an experienced hacker.
Next time, we'll continue our look at honeypot virtualization tools by discussing the Sebek data capture tool in detail, along with some of the ways it too can be detected. Then we'll discuss some other techniques available for detecting honeypots, such as x86-specific ones and time based analysis. Stay tuned.
Thanks to Kelly Martin, Lance Spitzner, Dragos Ruiu, Maximillian Dornseif, Christian Klein, Felix Gärtner, Lutz Böhne, Laurent Oudot, Philippe Biondi, and folks from the German and the French Honeynet Projects.
About the authors
Thorsten Holz is a research student at the Laboratory for Dependable Distributed Systems at RWTH Aachen University. He will presumably graduate next fall and continue his studies as a Ph.D. student. His research interests include the practical aspects of secure systems, but he is also interested in more theoretical considerations of dependable systems. He is one of the founders of the German Honeynet Project.
Frédéric Raynal is head of the Software Security Research and Development team at the Common Research Center (CRC) of EADS. He is also the Chief Editor of the first french magazine dealing with computer and information security (MISC), and Head of the Organisation Committee of SSTIC (Symposium sur la Sécurité des Technologies de l'Information et de la Communication). He worked on information hiding and cryptography as he earned his PhD. Now, he deals with (in)secure programming and security of operating systems. He also contributes to several open source projects and is part of the French Honeynet Project.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.