by Jamie Riden and Christian Seifert
Honeypots come in many shapes and sizes and are available to mimic lots of different kinds of applications and protocols. We shall take the definition of a honeypot as "a security resource whose value lies in being probed, attacked, or compromised"[Spitzner02]. That is, a honeypot is a system we can monitor to observe how attackers behave, a system which is designed to lure attackers away from more valuable systems and/or a system which is designed to provide early warning of an intrusion to the target network. A honeypot may be used for all three applications at the same time.
The first appearances of honeypots in computer science are possibly in "The Cuckoo's Egg" by Clifford Stoll and in An Evening with Berferd by Bill Cheswick. In the former fake military reports were used as bait for the attackers. The latter is more recognisable as the sort of honeypot we know today, where an attacker is monitored and diverted away from production systems. In this paper we give a brief overview of what is available and highlight some of the key differences between today's honeypots.
Are honeypots high maintenance?
A major difference between types of honeypots is described in the ability of an attacker to interact with the application or service. Truly vulnerable systems allow for an attacker to interact with the system on all levels. The bad guys can probe, attack, and compromise it and upon successful exploitation use it as a tool for further attacks. These systems are therefore called high interaction honeypots, such as the Honeynet Project's 3rd Generation Honeywall ('roo') framework.
These require a lot of close monitoring and detailed analysis to see what the attacker is up to. We might expect these honeypots to be used to monitor an attacker who is trying to break in by guessing an SSH username and password. It is not, in general, possible to provide an emulation of a UNIX shell that will convince an attacker for long, and so a high interaction system is preferred. This is also likely to be what many people think of as a typical honeypot system; a vulnerable system with additional instrumentation to help the owner in monitoring.
Theoretically, any vulnerable machine can act as a high interaction honeypot. Connect it to the network and soon you will observe the first attacks. However, on such a system, it would be very difficult to perform a full forensic analysis on and this is why the honeywall, which sits between the honeynet and the outside world, is often used. It collects data from the network, the attacker's keystrokes and logs it all in a central repository for forensic analysis at a later time. Key-logging is done so that the attacker cannot conceal their actions by using an encrypted protocol like SSH.
The second type, the low interaction honeypots generally emulate vulnerabilities rather than presenting real vulnerable systems and therefore the attacker is not able to interact with it on all levels. For this reason, they are safer, in that you do not have to worry about the actions of the attacker on the system, but are considerably less flexible. An example of a (relatively) low interaction honeypot is nepenthes which will automatically collect samples of Windows-based worms with minimal user intervention. Nepenthes is a fantastic tool for collecting malware samples, but doesn't provide a complete simulation of a Windows host. Another program, honeyd allows the user to create a simulated network of over 60,000 hosts which can appear to be running different operating systems and services.
The honeypot tool honeytrap is designed to capture unknown attacks. It does this by listening on all TCP ports and dynamically loading handlers for each port. For example, nepenthes can capture unknown attacks against an existing service, such as microsoft-ds on 445/tcp but it will not deal with a connection attempt to a previously unknown port. Honeytrap has simple handlers which will record information about the TCP session, replay previously captured responses, download remote files when given (T)FTP commands, or it can proxy connections to another program.
At the very lowest interaction, we have a tool known as a network telescope, or darknet. This is IP address space which is advertised but does not have any hosts connected to it. Instead of faking a network using a tool such as honeyd, the operator just observes traffic going to this network segment. Since there are no real machines on it, scans of the address space are very easy to spot. The network telescope may also show evidence of 'backscatter' as some of its addresses are forged by machines elsewhere on the Internet, and the telescope receives the RST or SYN+ACK packets (in the case of TCP) or replies or ICMP unreachable messages (in the case of UDP).
Probably the most famous is CAIDA's network telescope which accounts for around 1 in every 256 IP addresses in terms of size. Data from it has been used to analyse the spread of worms such as Witty and Slammer. (See also descriptions of Witty and SQL Slammer/Sapphire.)
Looking for trouble: Client honeypots
One relatively new distinction is between traditional (server) honeypots, such as Niels Provos' honeyd, and client honeypots. Instead of passively waiting for an attack, client honeypots will actively search out malicious servers; typically this has centered on web servers that deliver client-side browser exploits, but is certainly not limited to such. Recently, client honeypots have expanded to investigate attacks on office applications.
Examples of client honeypots are the MITRE HoneyClient, Shelia, Honeymonkey, and CaptureHPC. These client honeypots all work on the same principle. We start with a dedicated system, which is usually based on some virtualization technology so it can be automatically reset into clean state after a successful infection. They interact with potentially malicious servers and monitor the system for unauthorized state changes that occur during or after the interaction with the server. If, for example, we observe extra files in
C:\Windows\system32 and additional registry keys in
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run, we know that the server we have just interacted with must have been malicious and manipulated our machine to run some code upon the next system restart. Unauthorized state changes that can occur on a machine range from the mentioned changes on the file system and registry to changes to network connections, memory, processes, et cetera.
Since we originally wrote this article, Capture HPC is now in version 2.0 and allows the use of different clients, such as Firefox, RealPlayer, Microsoft Word, etc, as well as an option to collect pushed malware and log
tcpdump captures of the interactions between client and webserver. A paper on initial results using this tool is now available as Know Your Enemy: Malicious Web Servers.
Client honeypots need to interact with servers in order to determine whether they are malicious or not. With high interaction client honeypots, this is quite expensive, and therefore selection of what servers to interact with can greatly increase the success rate of finding malicious servers on a network. There are several sources one can use: a crawler is probably the most traditional way to access a large quantity of web servers. Combined with link scoring, a method implemented by HoneyClient, and filtering, this method can yield good results. Alternatively, one can mine links directly from known bad sources, such as spam email messages. Search engine integration, in which keywords are submitted to obtain links to from specific content areas (for example adult content) also yields good results
Distinction of the interaction level also applies to client honeypots. Client honeypots that drive a vulnerable client to interact with servers and classify a server as malicious based on state changes are high interaction client honeypots. On the other hand, a low interaction client honeypot uses a simulated client, such as wget in place of Internet Explorer, and assesses the malicious nature of a server via static analysis, such as signatures. The danger of spreading infections, which is very real on high interaction client honeypots, is greatly reduced with these low interaction client honeypots, because vulnerable clients are only emulated. SpyBye and HoneyC are available low interaction client honeypots that perform simple rule based and signature matching to detect client-side attacks.
Increased speed and lower resource consumption are the greatest advantage of these low interaction client honeypots. However, since they are usually rule and signature based, they are not able to detect previously unseen attacks (0-days). High interaction client honeypots are better to detect these sort of attacks since they do not need to have prior knowledge of the attack in order to detect it. Microsoft is said to have identified and patched several 0-day flaws in Internet Explorer based on results from their farm of high interaction Honeymonkey machines.
Niche players: Application-specific honeypots
As well as general purpose honeypots which provide or mimic vulnerable systems, there are application or protocol specific honeypots. There are many honeypots designed to catch spam by masquerading as open email relays or open proxies. Jackpot is written in Java and pretends to be a misconfigured SMTP server which allows relaying. Instead however, it presents a list of messages to the user, who can then pass the spammer's test message and hold the rest of the spam run. (Usually, spammers will attempt to deliver a test email to verify the host in question is actually an open relay.)
Another example is Proxypot -- although this appears to be no longer maintained. See Fighting Spammers With Honeypots: Part 1 and Part 2 for more details on using honeypots to block, or discover more about spam.
Another protocol which has been given attention recently is HTTP, specifically web application honeypots. The Google Hack Honeypot provides various different modules, one of which looks like a misconfigured version of PHPShell. PHPShell allows an administrator to execute shell commands via a web interface, but access to it should be restricted using a password at the very least. In the Google Hack Database, there is a search which will match on unprotected PHPShell applications and the GHH module attempts to reproduce this interface. GHH has a central web interface which allows the operator to monitor commands users are trying to execute.
A common exploitation technique is using remote file inclusion. Some configurations of PHP allow the executing program to include files that reside on other web or ftp servers. When a particular program doesn't take care to check its input before it makes an
include($mylibrary); statement, the attacker can often execute code of their choosing on the webserver. Alternatively, part of PHPHoP is a PHP script which attempts to analyse and download payloads as a direct response to the malicious request. The former may prove more useful as it designed to run on production servers without interfering with their operation. There are many other flaws in web applications however, such as SQL injection, and direct command injection as in a commonly exploited AWStats bug.
Some honeypots are better at catching particular attacks than others, and ideally a mixture of honeypots would provide the best insight into current attacks. For example, the catch-all
ErrorDocument handler used in part of PHPHoP will not trap POST data properly, as Apache doesn't pass this information on to
ErrorDocument handlers. To do this you either need a custom-written webserver, or to create PHP scripts with the exact names of files you are looking to emulate. The latter might be something like an attempt to exploit PHPXMLRPC exploit, by a POST to
/xmlsrv/xmlrpc.php. More detail on some of the tools and methods for monitoring web application attacks can be found in Know Your Enemy: Web Application Threats. The paper 'Web Server Botnets and Hosting Farms as Attack Platforms' (pdf), first published in Virus Bulletin, February 2007, also goes into detail about web application attacks, and subsequent use of compromised hosts. It also describes an attempt to measure, share information and counteract this threat called The Web Honeynet Task Force.
Recently, a more sophisticated method of building web application honeypots is described in Michael Mueter's MSc thesis. This toolkit allows arbitrary PHP applications to be turned into high-interaction honeypots and has been tested with software such as PHPMyAdmin, PHP-Nuke and PHPBB.
Potential issues with honeypots
Secrecy is paramount when deploying a honeypot or honeynet. If everyone knows it is a trap, no-one will attempt to attack it at all, except perhaps automated tools such as worms. Some honeypots, especially low interaction ones, may be easily identified as honeypots by an attacker due to their emulation of services. Any emulation of a complex system will always differ from the real thing; for example, there are a variety of ways for a program to check if it is running within a virtual machine and malware is increasingly using these techniques to hamper analysis. There will always be an arms race between those trying to develop ways of detecting honeypots, and those who are trying to improve honeypots so they are harder to fingerprint.
Client-side attack frameworks exist, such as MPack, that contain automated mechanisms that make detection and analysis of malicious web servers with client honeypots more difficult (see KYE: Behind the Scenes of Malicious Web Servers for details). For example, client-side attacks might not trigger if the client honeypot accesses a malicious web server from a specific network (for example, from our research lab) and/or client-side attacks might only trigger once. Upon repeated interaction, the malicious web server might not launch client-side attacks anymore making tracking and analysis of the malicious server and its attack difficult.
Another concern is that if a high interaction honeypot is compromised, the attacker may attempt to use this as a stepping stone to damage or take over other systems. Ideally the honeypot should use several mechanisms to prevent this, and the operator should pay close attention so no harm comes to innocent third-parties. In some jurisdictions, legal liability for the actions of users of the honeypot may be a concern, as may local electronic interception laws.
A large amount of data about attackers and their methods has been gathered by the use of honeypots of various sorts over many years, and we expect to see this trend continuing. Honeypots are now being used increasingly in mainstream applications and an ever increasing array of tools are available to the amateur and professional. In particular, we expect to see significant developments in the field of client honeypots this year, as Internet Explorer flaws continue to remain one of the most critical Windows vulnerabilities according to the current SANS Top 20 and IPv6 is slowly but inevitably being adopted. Similarly web applications are the most critical of the cross-platform vulnerabilities in the same list. We may also see newer applications, such as VoIP and SCADA honeypots starting to become widespread (although a few groups are already deploying these) as abuse of these protocols becomes more important to the community.
As honeypots are gaining importance to detect and analyze attacks, it is suspected that the attackers will develop techniques to identify and avoid honeypots. The MPack web exploitation framework already is going down this route. As these techniques become more prevalent, Honeynet technology is likely to respond to make such detection more difficult. Distributed honeynets and honeynet implementations that are not based on virtualization technology, which is another vector to detect honeypots, a1re likely to gain importance. The arms race between attackers and security researchers is continuing, but at this point in time, honeypots still provide us with invaluable data about the attackers and attacks of the real world.
[Spitzner02] Spitzner, L. Honeypots: Tracking Hackers, Addison-Wesley, Boston, 2002.
For questions specifically about honeypots, the SecurityFocus.com honeypots mailing list "is dedicated to the research, development, and understanding of honeypots and honeypot related technologies."
Niels Provos and Thorsten Holz' new book Virtual Honeypots: From Botnet Tracking to Intrusion Detection gives an in-depth account of all kinds of honeypots.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.