Data Recovery on Linux and ext3
by Abe Getchell
This article discusses the process of recovering deleted data from an ext3 partition, on a system running Linux, using a process called data carving. This basic technique is useful in any number of situations, such as recovering data that has been accidentally deleted by a user, information removed in an attempt to erase signs of a system intrusion that could be used to track the source, or data erased by an end-user attempting to cover up an acceptable use policy infraction.
This article assumes that you have a basic understanding of ext3 and the inner workings of filesystems. It is important to note that there is a certain amount of risk associated with this process. When performed improperly, the data you are attempting to recover, or other data stored on the system, could be permanently lost. While this technique is quite accurate most of the time, and very useful in any number of different situations, it is not "forensically sound" and will not hold up legally for use in court. Special software, hardware, and procedures -- or professional services -- are a must in situations when legal action is required.
The tools used in this article are freely available and can be downloaded from their respective websites.
The basic recovery process
In this section we will go step-by-step through the data recovery process and describe the tools, and their options, in detail. We start by listing a directory below.
[abe@abe-laptop test]$ ls -al
drwxrwxr-x 2 abe abe 4096 2008-03-29 17:48 .
drwx------ 71 abe abe 4096 2008-03-29 17:47 ..
-rwxr--r-- 1 abe abe 42736 2008-03-29 17:47 weimaraner1.jpg
In the listing above we can see that there is a file named weimaraner1.jpg in the test directory. This is a picture of my dog. I don't want to delete it. I like my dog.
[abe@abe-laptop test]$ rm -f *
Here we can see I am deleting it. Whoops! Sorry buddy. Let's gather some basic information about the system so we can begin the recovery process.
[abe@abe-laptop test]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 71G 14G 53G 21% /
/dev/sda1 99M 19M 76M 20% /boot
tmpfs 1007M 12K 1007M 1% /dev/shm
/dev/sdb1 887M 152M 735M 18% /media/PUBLIC
Here we see that the full path to the test directory (which is /home/abe/test) is part of the / filesystem, represented by the device file /dev/sda2.
[abe@abe-laptop test]$ su -
[root@abe-laptop ~]# debugfs /dev/sda2
su to gain root access, we can start the
debugfs program giving it the target of /dev/sda2. The
debugfs program is an interactive file system debugger that is installed by default with most common Linux distributions. This program is used to manually examine and change the state of a filesystem. In our situation, we're going to use this program to determine the inode which stored information about the deleted file and to what block group the deleted file belonged.
debugfs 1.40.4 (31-Dec-2007)
debugfs: cd /home/abe/test
debugfs: ls -d
1835327 (12) . 65538 (4084) .. <1835328> (4072) weimaraner1.jpg
debugfs starts, we
cd into /home/abe/test and run the
ls -d command. This command shows us all deleted entries in the current directory. The output shows us that we have one deleted entry and that its inode number is 1835328 -- that is, the number between the angular brackets.
debugfs: imap <1835328>
Inode 1835328 is part of block group 56
located at block 1835019, offset 0x0f80
The next command we want to run is
imap, giving it the inode number above so we can determine to which block group the file belonged. We see by the output that it belonged to block group 56.
[...lots of output...]
Blocks per group: 32768
[...lots of output...]
Running the stats command will generate a lot of output. The only data we are interested in from this list, however, is the number of blocks per group. In this case, and most cases, it’s 32768. Now we have enough data to be able to determine the specific set of blocks in which the data resided. We're done with
debugfs now, so we type
q to quit.
[root@abe-laptop ~]# dls /dev/sda2 1835008-1867775 > /media/PUBLIC/block.dat
The next thing we need to do is pull all unallocated blocks from block group 56 so we can examine their content. The
dls program, from The Sleuth Kit (TSK), allows us to do just that. We simply need to know the device file, a range of blocks, and have enough space in the appropriate place to output this data. Using the information above, we can calculate the block range by multiplying the block group number and the block group size and then multiplying the block group number plus one by the blocks per group minus one. In this case, the formula would look like this:
(56 x 32768) through ((56 + 1) x 32768 - 1)
This would give us a range of 1835008 through 1867775. It's very important that the destination of the output does not reside on the same partition as the data you're attempting to recover. What will most likely be a large amount of data being written to disk from the output of this command could potentially overwrite the data you are trying to recover (as the blocks which stored the data from the deleted file have already been marked unallocated). You want as little disk activity as possible on the partition you're working with. In this example, I'm using a USB thumb drive (located on /media/PUBLIC) as a location to store this data.
[root@abe-laptop ~]# mkdir /media/PUBLIC/output
[root@abe-laptop ~]# foremost -dv -t jpg -i /media/PUBLIC/block.dat -o /media/PUBLIC/output/
Next we need to attempt to extract this data from the unallocated blocks we extracted with the dls command above. To do this, we are going to use Foremost. This program is used to recover files based on header information, footer information, and internal data structures. This is the process, mentioned earlier, called data carving. First we are going to create a directory to store the foremost output (again, this should be on a separate partition). Next we are going to run the foremost command giving it the file type of jpg (which is an internally recognized type - more on custom types below), the input file, and the output directory. The output from this command is listed below.
Foremost version 1.5.3 by Jesse Kornblum, Kris Kendall, and Nick Mikus
Foremost started at Sat Mar 29 18:02:29 2008
Invocation: foremost -dv -t jpg -i /media/PUBLIC/block.dat -o /media/PUBLIC/output/
Output directory: /media/PUBLIC/output
Configuration file: /usr/local/etc/foremost.conf
Start: Sat Mar 29 18:02:29 2008
Length: 110 MB (115941376 bytes)
Num Name (bs=512) Size File Offset Comment
0: 00033272.jpg 26 KB 17035264
1: 00033328.jpg 184 KB 17063936
2: 00033704.jpg 58 KB 17256448
3: 00033824.jpg 62 KB 17317888
*46: 00210136.jpg 2 KB 107589632
47: 00210144.jpg 3 KB 107593728
48: 00210392.jpg 6 KB 107720704
Finish: Sat Mar 29 18:02:29 2008
49 FILES EXTRACTED
Foremost finished at Sat Mar 29 18:02:29 2008
As we can see, Foremost found forty-nine previously deleted jpg files (this output is also saved in a file named audit.txt in the root of the specified output directory). How do we know which is the file we are trying to recover? We could, as is most commonly done, open all of these files and see their contents. Another option is to simply compare file sizes. We know from our directory listing above that the jpg file we are looking for is 41k in size. There's only one file that foremost extracted into the output directory that's 41k, and indeed, 00114144.jpg is the file we are attempting to recover. Comparing size only works, of course, if you "know your data". Integrity checking programs such as Tripwire play a big role in a recovery operation as you can identify the recovered data without ever inspecting the content, as well as verify its integrity. This becomes quite useful if the information you're attempting to recover is confidential and you are not authorized to view the data.
Defining custom types in Foremost
As of Foremost v1.5.3, the internally supported data types that the program will recover without custom rules are jpg, gif, png, bmp, avi, exe, mpg, wav, riff, wmv, mov, pdf, ole, doc, zip, rar, htm, and cpp. If you need to recover data beyond these built-in data types, you will need to define custom types in Foremost's configuration file (foremost.conf).
An entry that defines a type in the foremost configuration file (as explained in the documentation at the beginning of foremost.conf or in the manpage) consists of several columns: extension, case sensitivity, maximum size, header and footer (optional), and special keywords (optional). As an example that most should be familiar with, here is the entry for an html file:
htm n 50000 <html </html>
We see here that the file extension is htm (NONE can be specified if no file extension should be used during the output of extracted data), the header and footer are not case sensitive, the maximum file size is 50k bytes (which means that 50k bytes after the header will be recovered if no footer is specified or 50k bytes will be recovered if that amount of data is recovered before the defined footer is detected), the recovered file should start with "<html" (header) and end with "</html>" (footer).
The ASCII keyword can also be used when attempting to recover ASCII files. Specifying this keyword at the end of an entry will tell Foremost to extract all ASCII printable characters before and after the keyword defined. An example of this would be a type to recover a perl script. If, for example, you need to recover a perl script that you know included Crypt::CBC, you could use the following type definition:
pl y 100000 Crypt::CBC Crypt::CBC ASCII
Note that Crypt::CBC is listed in both the header and footer fields. This is done so that Foremost will recognize this as the string to search around when the ASCII keyword is used. A more general type to find perl scripts could be defined as follows:
pl n 100000 #!/usr/bin/perl #!/usr/bin/perl ASCII
When attempting to recover files that are not ASCII, hexadecimal and octal notation can be used by specifying \x[0-f][0-f] or \[0-3][0-7][0-7], respectively. Below is an example of hexadecimal notation describing the header and footers of a gif file:
gif y 155000000 \x47\x49\x46\x38\x37\x61 \x00\x3b
As you may have realized by now, Foremost is a very powerful tool. Learn its intricacies and it can be a wonderfully flexible tool in data recovery and computer security forensic operations. Read the Foremost man page or consult the configuration file for a complete guide to creating custom data types.
ext2 vs ext3 Data Recover
You may be asking yourself why this process is so much more difficult with ext3 than it is with ext2? This question is answered by one of the ext3 developers in the Linux ext3 FAQ:
Q: How can I recover (undelete) deleted files from my ext3 partition?
Actually, you can't! This is what one of the developers, Andreas Dilger, said about it:
In order to ensure that ext3 can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas ext2 just marks these blocks as unused in the block bitmaps and marks the inode as "deleted" and leaves the block pointers alone.
Your only hope is to "grep" for parts of your files that have been deleted and hope for the best.
The process, as described in this article, is the "grep" that Andreas is referring to. Hopefully, as ext3 is developed further, some effort will be put in to making this process easier and more reliable.
While going through this process may be necessary to recover information lost in any number of situations, it’s not a process you want to go through on a Monday morning to recover your organization's payroll data after an administrator fat-fingers an
rm command. The single most important piece of information you should take away from this article, in that vein, is to keep current, tested backups of business critical data that reside on the systems you manage. Regardless of the reason for its use, the process covered in this article is something that every system administrator and security analyst should have in their toolbelt.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.