How to force a garbage collection of the deduplication folder

Article:TECH129151  |  Created: 2010-01-23  |  Updated: 2013-01-03  |  Article URL http://www.symantec.com/docs/TECH129151
Article Type
Technical Solution


Issue



How to force a garbage collection of the deduplication folder.


Solution



Say you are concerned that the disk where your dedupe folder is located is running out of disk space. You want to recover some of that space. How do you do it? You don't.

Dedupe by its nature is not one where you have redundant copies of data you can afford to delete. When data is backed up into the dedupe folder, the data goes into 128k segments in 256MB container files. As you keep backing up more and more data, more and more pointers are linked into these container files. Eventually, as backups age out of the system and are deleted by BE, some of those links are going to go away. But, the current backups are likely to still be linked into that content. The probability that every single segment in a 256MB file is going to be aged out and not needed any more is low.

Add to that the fact that the unused segments inside the 256MB containers get reused, and you can see that you are unlikely to be able to delete any of the container files and return that space to the file system.

Note that because of the way that the PDDE allocates space (the 256MB container files), it is not likely to cause disk fragmentation issues.

Garbage collection runs once a week to reclaim space within the content router, but don't expect to see it by looking at the file system for space to become available.

The properties of a dedupe device in BE is where you should go to see how much space is available on the device.

Garbage collection of the deduplication folder happens every 12 hours. Queue processing (remove/add records/references) runs at 0:20 and 12:20 each day. Garbage collection (removing stale objects that couldn't be removed during queue processing - such objects are rare) runs at 2:40 on Sundays. These processes can be forced from the Backup Exec installation directory (i.e., x:\Program Files\Symantec\Backup Exec\) with:

crcontrol --processqueue #queue processing

crcollect -v -m +1,+2 --noreport #garbage collection
 
Note: For detailed steps on how to run this commands, refer the articles in the related articles section.

More detail on these processes below:

The CR stores each unique segment or data object only once. For each object there is a matching record in the CR database. When an existing object is stored again, we don't actually store the object but simply add a reference to the object's record in the database. Likewise, removing an object means removing a reference from the object's record, and the object is only truly removed if the last reference in the object's record is removed. The commands that result in database operations (creating and removing records, adding references to and removing references from records) are not executed immediately but rather they are queued up, for performance reasons. During queue processing, we process all these database operations in one go, going over the database sequentially to optimize disk access.

Queue processing runs twice a day. The other maintenance process, garbage collection, removes objects without references that could not be removed immediately when the last reference was removed. There are only a few rare occasions in which such garbage objects can be created, and therefore garbage collection needs to run infrequently. It is now configured to run once a week, which should be suitable for any setup large or small.

These two maintenance process - CR queue processing and garbage collection - cannot be scheduled by the end users. The defaults will work for almost any user.



Legacy ID



351380


Article URL http://www.symantec.com/docs/TECH129151


Terms of use for this information are found in Legal Notices