by Holt Sorenson
This is the second article in a three part series on tools that are useful during incident response and investigation after a compromise has occurred on a Linux, OpenBSD, or Solaris system. The first article focused on system tools, this one focuses on file system tools, and the next article will discuss network and other tools. The information used in these articles is based on OpenBSD 3.2, Debian GNU/Linux 3.0 (woody), RedHat 8.0 (psyche), and Solaris 9 (aka Solaris 2.9 or SunOS 5.9). The tools focused on are generally tools that are available with the operating system, although there are some that may not be native to a given system that are discussed as well. If a tool that is discussed isn't available on the operating system you're using, the information on acquiring tools in the references section might help you out.
The tools that are covered in this article are all tools that execute in user-space. If an attacker has compromised the system and installed a kernel module that hides his activities, or a root-kit that changes the binaries on the system, the results that the tools below provide are likely to not be accurate. This is one of the reasons that offline analysis should be performed after data from the live system has been secured. This live data shouldn't be trusted as valid until it has been corroborated. Part of the reason that so many tools are being covered is to familiarize the reader with multiple tools that do similar things so that one can check a tool's version of reality with other tools. It's important that those responding to incidents not prematurely rule out possibilities and that they remain skeptical.
It can't be emphasized enough that these tools should be executed from read-only media or on a secure system that is reserved for offline analysis. Using read-only media ensures that the tools haven't changed since they were stored on the read-only media, making it less likely that an attacker has compromised the tools.
When responding to an incident, security personnel need to already have decided what their priorities are for capturing data. The data on the system has a propensity for change. Incident response teams need to debate whether it is more important to immediately take a machine offline and image the file system, or to capture live data such as processes that are currently executing, established network connections, memory allocated by processes, and users that are logged in. The longer a system is kept on the network after a compromise, the more likely it is that the there will be more damage caused by the attacker to remote or local systems. Note that these statements aren't asserting that one should capture live data and not create file system images--if you decide the live data is important, you still need to image the file system after capturing the live data.
Law enforcement should be consulted to verify that the procedures that the security team has developed will facilitate any necessary law enforcement investigation. Organizations such as the HTCIA (US) and the NHTCU (UK) can help technical personnel get into contact with law enforcement officials that are responsible for computer related criminal investigations in their area.
The previous article in the series finished at the point where a password cracker, John the Ripper, had been found on a compromised web server after a webmaster had reported performance issues. In this article we explore how file and file system tools can be used to capture information that may shed light on how the host got compromised.
Sum packages need to be checked
The attacker began to use the web server to crack passwords at 03:17. 03:17 is a time from which to start the search, but this time shouldn't necessarily viewed as the time that the attacker compromised the system. Attackers frequently use automated programs that scan and attack vulnerable systems. It is possible that the attacker is just getting around to visiting a system that an automated tool hacked some time ago.
Lets dig into some tools that can be used to help detect changes on the file system by an attack. Some of the tools that will be covered are included with a stock operating system. Other third-party tools are installed and customized to the operating system.
On Solaris and Linux, there are some native commands that help compare the current operating system install against the metadata contained in the package database. The commands are: rpm -Vva (RedHat 8.0), pkgchk -vn (Solaris 9), and debsums -ac (Deb 3.0). Both the rpm and the Debian format can use RFC2440 (OpenPGP) compliant signatures to authenticate the packages. Both also use MD5 hashes as their "checksums". On Solaris, pkgchk uses the System V (SYSV) sum algorithm to verify that binaries haven't been modified. The SYSV sum algorithm will generally detect changes to file contents that don't change the file size. However, it is unacceptable to depend on the SYSV sum algorithm for integrity. It's only slightly more robust than relying on file size and timestamps. Any of you who have spent some time in a hex editor or have used the touch command know how easy it is to change the file contents without changing the size or how easy it is to change the timestamps on the file's inode. On Debian GNU/Linux distributions, you can specify a set of packages to check against. If you mounted a read-only set of packages on /mnt, the command that you use to verify against the package files is: debsums -cagp /mnt/*. The equivalent rpm command is: rpm -Vvp /mnt/RedHat/RPMS/*. rpm can also verify against a different rpm database than the one on your system. Use rpm -Vva --dbpath <some_path> to verify against the alternate database. The equivalent command on a Debian system is: debsums -cagd <some_path>. These databases should reside on read-only media and should come from a trusted system. If you want a ridiculous amount of verboseness from rpm, add another -v. The output of these commands can be used to find changes that might have been caused by an attacker.
Comparing the package database to the currently installed system generally only gets you so far, however. On systems that have been heavily customized, the metadata in the packaging system and the actual state of the file system can provide two disparate views of reality. Using package tools to compare current file system state against the package database might not net you the results you desire. OpenBSD has a tool called mtree that can use hash functions to give assurance that files on the file system haven't changed since its database was last updated. If an attacker trojans or modifies files whose hashes have been stored in mtree's database, then the changes will be detected by mtree. Similar functionality can be achieved by installing third-party tools such as AIDE, integrit, Osiris, and tripwire on Solaris and Linux. Tools such as changedfiles, dnotify, and FAM use kernel modules or poll the file system to detect changes as (or close to, when the tool uses polling) they are made to the file system. These tools result in more rapid notification of changes than the file system integrity checking tools. This is because the file system integrity checking tools are ran periodically because the hash functions they use are computationally intensive. While it should be obvious, these tools would need to be installed and configured prior to a compromise.
When these automated tools aren't available or when you need to dig deeper, there are some other tools that can be used during a response to an incident. These tools are usually used on files, although on Solaris, they can run on directories as well. The values that these tools display aren't useful unless you have a trusted system on which you can run the same tool and compare the results. If you trust The Shmoo Group, you can consult the Known Goods database to find hashes and file sizes of binaries. The tool that provides the MD5 hash function on Linux systems is called md5sum (this is part of the the FSF's coreutils software package). On OpenBSD, the analogous tool is md5. OpenBSD also provides sha1 and rmd160, based on the SHA1 and RIPEMD160 hash functions, respectively. The OpenSSL library provides hash functions that can be run from the command line application openssl. openssl dgst <some_file> uses the MD5 hash function by default. Use openssl dgst -? to see the usage for the openssl dgst command and the different hash algorithms it supports. One can also run the commands without including the token 'dgst'. For example, if you required the MD5 hash of a file, you could run openssl md5 <filename>. Another utility, shash, provides "checksums" using the hash function algorithms previously mentioned (see shash -l for the list). It can also compute checksums using a couple of other algorithms, as well.
Commands that provide checksums that aren't based on hash functions shouldn't be relied on for incident response. Two examples of such commands are cksum and sum. Checksum algorithms (CRC*, BSD, and SYSV sum) can't be trusted to provide the level of assurance that hash functions are accorded. In fact, checksum algorithms shouldn't be trusted for forensics purposes at all. Checksum algorithms are designed to detect errors in data being transmitted or stored, but are not designed to protect data from being altered by a persistent and/or malicious attacker. They don't have the property of being one-way, and it is easy to find multiple sets of data that have the same checksum. Even if you took a checksum of some data and then stored it with the data on read-only media to ensure that the data being stored wasn't tampered with, the defense could trivially create other sets of data that had the same checksum. The defense could then argue that the data you stored wasn't the data that was relevant to the case being heard because the checksum couldn't provide assurance that the data was the data relevant to the case. The checksum algorithms that sum and cksum depend on map an infinity of possibilities to a set that is too small. Tools that rely on MD5 map the infinity of possibilities to 2^128 possibilities. Tools that rely on SHA1 map the infinity of possibilities to 2^160. Both MD5 and SHA1 map in such a way that, in practical terms, it is difficult to find two sets of data for which the result that a hash function produces are the same. The checksum algorithms are designed to protect against accidental corruption of data, not to protect against malicious attacks. This is why using a tool that depends on hash functions such as MD5 or SHA1 is a necessity.
Hey MAC, what time ya got?
Unix (and unix-like) file systems store a set of timestamps in their file system metadata. These timestamps are called the modify, access, and change (MAC) times. These timestamps are all stored in a file system structure called an inode. In addition to these timestamps, the inode contains information about the file such as its file type, permissions, owner, group, size, number of hard links to the file, and the data blocks that the contents of the file occupies. Inodes also exist for directories. The directory inodes contain information such as the name of files inside the directory and the inode numbers that contain the file information.
The mtime reflects the time at which the file's data was last modified. System calls such as write, truncate, and mknod change mtime. The ctime reflects the time at which the file's inode was last changed. The atime reflects the time at which the file's data was last accessed. atime is changed by the system calls such as execve, read, mknod, utime, and pipe. More details on atime, mtime, and ctime can be found in your system's stat(2) man page. This man page will also contain the system calls that change a given timestamp.
Various commands change the MAC times in different ways. The table below shows the effects that some common commands have on MAC times. These tables were created on Debian 3.0 using an ext2 file system contained in a flat file mounted on a loopback device. If you find that you are mounting lots of images over the loopback device on your Linux system, you can have your bootloader pass the parameter 'max_loop=255' to the kernel prior to booting. Then you will be able to mount up to 255 images, instead of being limited to the default, eight. Once the file system was mounted on the loop back device, it was examined with debugfs. Experimenting with your own system to verify the information in the tables below is encouraged. These tables can serve as a general guide, however.
The ls command can be used to show the modify, access or change times of files. The following table shows various ls commands that sort in reverse order by mtime, atime, or ctime. This causes ls to list the most recent times last.
The find command is one that can be quite useful. find can be used to search the file system for files that have been changed, accessed, or modified using the -ctime, -atime, and -mtime arguments. Be aware that the atime will change on directories as you run find. If the purpose of using find is more formal than exploration, you should be working on a read-only image.
Most Linux systems include GNU find, from the coreutils package. The GNU version of find is quite powerful because of the variety of features it offers to users. A time range can be specified and find will print the timestamps on the data in that range. If you want to search for files modified more than two, but less than seven days ago and print their mtime, ctime, and inode, you can use:
# find / \( -mtime +2 -a -mtime -7 \) -a -printf "M:%t C:%c %i\t%p\n"
If you don't have GNU find, you can use the stat command, assuming you are using Linux. If you don't have access to the stat command, but do have perl, you can use perl as a stat(2) wrapper:
# find / \( -mtime +2 -a -mtime -7 \)|perl -ne 'chomp;($i,$m,$c)=(stat)[1,9,10];printf"M:%s\t$i\t$_\n",localtime($m)." C:".localtime($c)'
One possible attack vector for compromising a *nix host is to use a broken suid or sgid binary to escalate privilege. Attackers also sometimes leave suid root shells on the file system for their convenience. find can be used to hunt down such files. To search for suid and sgid files and display their information, use:
find / -perm -6000 -ls
A few more examples of using find appear below:
If I had a photograph of you... something to remind me...
The command most commonly used to create file system images on Linux and Unix is the dd command. dd creates a bit-by-bit binary image of its input and saves it to its output. dd doesn't display its progress and can be slow when its input block size is different from its output block size. A friendlier version of dd, sdd, doesn't suffer from these drawbacks. sdd can also display the current statistics when it is sent a SIGQUIT (usually bound to ^\) and it displays a realistic view of the amount of data processed (partial blocks aren't considered full blocks).
Prior to invoking dd on a file system, you should run a utility that will generate a hash of the file system to be imaged. Assuming you are on Linux and are interested in creating an image of /dev/hda, you can run md5sum /dev/hda and then start dd. After dd completes, the same utility should be run on the saved image. If the values don't match, the examiner should look into why there is a discrepancy. On OpenBSD and Solaris, the raw devices should be used when engaging in incident response work.
On Linux, the utility e2image can be used to create images of the ext2 and ext3 file systems. e2image interprets the file system that is being imaged, instead of saving raw bits. e2image may not save data that clever attackers have stored outside of file system structures on the disk. e2image creates "raw" and "normal" images, both of which are created as sparse files to conserve disk space. This causes a file system image that is created with e2image to have a different hash than the file system stored on the hard disk, making it difficult to have assurance that the data you want has been captured. These reasons lead to the conclusion that e2image shouldn't be used in place of dd for forensic images.
Another utility that can be used to create system images, partimage, can save image files to a partimage server via SSL. It currently supports ext2, ext3, ReiserFS, JFS and XFS. It also supports FAT16/32 and HPFS (OS/2). Work is in progress on UFS (Solaris, *BSD), HFS (MacOS), and NTFS.
partimage only saves used blocks of the partition being imaged. Like e2image, it is possible that an image that is created using partimage doesn't accurately represent the state of the disk at the time the image was created. Examiners should also be aware that the popular PC product, Ghost, can create images that are not suitable for forensics. One should carefully review the manual for the version of Ghost you have to make sure you can use it and then test with it, prior to using it in a critical situation.
One of the article reviewers asked why I would include tools that aren't appropriate for forensics work in this article. I chose to include such tools to make the following point:
When a tool claims to make an image of a file system, you need to figure out how the tool author is defining "image". An empirical method for figuring out the definition of image is to:
If the fingerprints don't match, don't use the tool for forensics purposes.
Would you like some file system to go with that debugger?
Sometimes command line tools such as ls, stat, and find don't provide enough visibility into the state of the file system you're trying to analyze. Perhaps you want to look at an image of a file system that you've created with dd, maybe you're worried that a rootkit has been installed and that ls has been trojaned, or you think there are some deleted, yet recoverable, files left on a file system that have interesting data.
All three operating systems include a file system debugger. The debugger is called fsdb on OpenBSD and Solaris. Solaris's debugger is the most arcane, and takes some time to get used to. fsdb on OpenBSD and debugfs for ext2 and ext3 offer integrated help that can be accessed by typing 'help'. Interactive file system debuggers for other popular Linux file systems such as JFS and reiserfs are currently not available. xfs_db is a file system debugger that is available for XFS. It is not covered in this article because it currently isn't a file system that is in wide use by the operating systems discussed in this article.
Linux's ability to mount foreign file systems is quite useful for read-only forensics work. *BSD's ffs and Solaris's ufs can be mounted using the ufs kernel module. Filesystems that are available for Linux such as ext2, ext3, JFS, reiserfs, and XFS can be mounted. MSDOS (fat16), fat32, and most NTFS images are also supported. Linux doesn't include file system debuggers for foreign file systems, however.
You can find out the file systems that are supported on your currently running Linux instance using find /lib/modules/`uname -r`/kernel/fs/* -type f|grep -v 'nls\/'. To see what file systems are currently loaded, use grep -v '^nodev' /proc/filesystems. For more information on mounting file systems, see mount(8).
To mount an image in read-only mode use: mount -t ext2 -o ro,loop=/dev/loop0 /var/tmp/2003_02_17_attack.bin /mnt. You can now examine the newly mounted file system using tools supplied by the OS. As you examine the image, you will be changing the access times of files and directories that you manipulate. These changes will only take place in memory. It is critical that you remember to include the read-only (ro) option to -o with your mount command so that changes aren't committed to the image file. In addition, it is wise to work with copies of images that you have verified are bit-for-bit copies. A tool like md5sum is sufficient for making sure that the copy is the same as the original. Working with a copy ensures that the original image won't be modified. More information on working with loop devices is available in the "THE LOOP DEVICE" section of mount(8).
To use debugfs, you type debugfs <path_to_image>. Many of the commands that you are used to using while in a shell are emulated. If you get stuck, use the help command at the debugfs: prompt to get pointed in the right direction. If that isn't sufficiently helpful, use man debugfs. debugfs starts in read-only mode, so it is less likely to change your image as you work with it. However, you should still work with a copy and treat the original in a sacrosanct manner. An example of using debugfs follows:
Working with OpenBSD images on OpenBSD takes a few more steps. The images need to be configured on the vnode disk device before they can be mounted. An example follows:
fsdb is the ffs (fast file system) editor in OpenBSD. fsdb doesn't have a read-only mode, so it is important to only work on copies of the image. Also, the 'cd' command in fsdb might more appropriately be written as 'ci', as it really means change [active] inode. You can make any inode on the file system the active inode that you are examining by 'cd'ing into it.
Like OpenBSD, working with Solaris images on Solaris requires a few extra steps. Solaris has a driver called lofi that is a contraction of the words 'loopback file'. Prior to first using lofiadm, the kernel won't show the lofi module as installed. After the first invocation of lofiadm, you should find the lofi driver loaded into the kernel. Use the modinfo command to display the currently loaded kernel modules. The binary lofiadm is included in the SUNWcsu package, one of the core packages of the OS. You shouldn't have to install any extra packages to use the lofi driver or the utilities associated with it. An example of mounting an image via the lofi driver follows:
fsdb is a program that you need to spend some time with prior to getting into a situation where you have to be thinking quickly. Its command syntax is arcane enough that if you master it, you should probably get a medal for perseverance or an award for spending too much time on computers. The most useful documentation that the OS provides is the fsdb_ufs(1M) man page.
Sometimes it is necessary to see the information that is contained in the image without any interpretation by file system debuggers. Several tools can be used for this purpose.
A hex-editor, such as hexedit, is useful for browsing file system images without the constraints of a file system debugger. hexedit is available as part of the RedHat distribution and can be installed on OpenBSD using the ports tree in /usr/ports.
When a hex-editor isn't available, you can use emacs or vim. To use emacs to view a file system image, start emacs using emacs <image_name>. Turn on read-only mode using 'Esc-x toggle-read-only'. Next, changed to the hex-edit view using 'Esc-x hexl-mode'. To exit, use 'Ctrl-x Ctrl-c'. For more information on how to us emacs, type 'Ctrl-h t' to start the emacs tutorial.
vim's hex-editing support is a bit more klunky than emacs'. To start vim in read-only binary mode with no swap file, run vim -nRb <image_name>. Next type 'Esc:%!xxd' to convert the file to being displayed in a hex format. To exit, type 'Esc:q!'. For a tutorial on using vim, type vimtutor from the command line. For more help with vim hit the F1 key while in vim.
The amount of memory that RAM and swap provide is a practical limitation for using editors to view large files. Once the machine starts swapping (swapping occurs when RAM is used up and swap starts to be used), your session can slow down significantly. Use split or csplit to divide the file system image into multiple parts when you run into this problem. To reiterate: you need to be using a copy of the image when doing anything that will or has the potential to change your image. Even if you've specified options that make your tools to operate in read-only mode, you shouldn't be working on the only copy you have. If you are in a situation where you're gathering evidence for an investigation and the defense can question your methodology because you might have tainted the evidence by writing to it, you can lose your case. One should also be cognizant of the changes that are made as you manipulate the copy. The more you change it (split it, compress it, fiddle around in an editor), the more you have to be careful about the conclusions you're drawing. Check, check, and re-check. When you're done, re-assemble the copy and compare the hash of the copy to the hash of the original. This will help you verify that your transforms didn't introduce any changes.
When a hex editor, emacs, and vim aren't available, you can use od, along with a pager such as less, more or, on Solaris, pg. od is a tool that is present on OpenBSD, RedHat Linux, and Solaris. od dumps data in various representations. Using a pager such as less is recommended. more is ok, too, but less has a fuller feature set and is more efficient at dealing with the amount of data that can be generated when od dumps a file system image. You might say that less is more than more. od -vxca dumps in hex, and displays characters with C style escapes, their ASCII equivalents, and prints duplicate lines. You get three rows of dump data for every 32 (0x20) bytes of data that od is processing. od on OpenBSD and RedHat displays the dump in big-endian order (the highest byte has the lowest address) and od on Solaris displays the dump in little-endian order (the lowest byte has the lowest address).
Transferring data from the compromised host
If you need to transfer data off a host that is on-line prior to a creating a forensic image, there are a couple of issues to consider.
There are a number of ways of saving the most important data. The first covered is to use a hub or crossover cable to transport data from the compromised host to a second host (such as a laptop). Using ssh and using several tools together to create move data via TLS/SSL is also discussed.
Connecting the compromised host to a private network that has a host on which the forensic data can be stored can provide a secure medium with high bandwidth. This method has some drawbacks, however:
As always, this is yet another trade off where some thinking and planning needs to be done prior to making a decision.
Another tool called socat can be used and removes the need for using dd. socat is significantly more flexible than netcat because it can use a variety of methods to move data. Note the ignoreof option used in each of the commands below that makes it possible to create an image of /dev/hda.
If you can't or don't want to pull the machine off the network, you can use ssh to securely transport data off the host. The following script saves current network connection state, the current process list, various system information, and a file system image to a remote host.
Another way of propagating information safely is to use socat. socat currently doesn't support receiving TLS/SSL connections, so stunnel is used to provide that functionality on the remote host.
The following steps show how to move data using socat and stunnel: