How to reclaim deduplication storage space manually (PureDisk Storage Pool and NetBackup Media Server Deduplication Pool)
| Article:TECH124914 | | | Created: 2010-01-04 | | | Updated: 2012-02-21 | | | Article URL http://www.symantec.com/docs/TECH124914 |
Problem
Users may want to reclaim deduplication storage space manually in PureDisk 6.6.x and NetBackup 6.5.x and 7.x environments.
To reclaim storage space manually on a NetBackup appliance, see http://www.symantec.com/docs/TECH180659.
Solution
- Symantec NetBackup PureDisk is a product that provides a complete backup and deduplication environment. It can be used as a stand-alone (PureDisk writes data to the storage pools) or as a back-end for NetBackup.
- NetBackup 7.0 and later delivers deduplication integrated into NetBackup at the media server and supports a dedicated deduplication pool called a Media Server Deduplication Pool (MSDP).
The crcollect and crcontrol commands
The crcollect and crcontrol commands are required for manual operations; these commands are located in the following directories:
- On a PureDisk Storage Pool server - /opt/pdcr/bin
- On a UNIX or Linux MSDP Media Server - /usr/openv/pdde
- On a Windows MSDP media server - <install path>\pdde
Display the storage statistics
% crcontrol --dsstat
Data storage Size Used Avail Use%
35.1T 5.3T 29.8T 15%
Number of containers : 98125
Average container size : 268211087 bytes (255.79MB)
Space allocated for containers : 26318212931722 bytes (23.94TB)
Space used within containers : 24563636296449 bytes (22.34TB)
Space available within containers: 1754576635273 bytes (1.60TB)
Space needs compaction : 18875042669860 bytes (17.17TB)
Records marked for compaction : 139455139
Active records : 43562983
Total records : 183018122
==
From the crcontrol --dsstat output, you can determine the following:
- Total size is obtained from the operating system, 35.1T in the preceding example output.
- The total used space = the file system used space - space available within containers - space needs compaction.
For the preceding example output, X - 1.6TB - 17.17TB = 5.3TB. NetBackup / PureDisk obtains the file system used space and uses that value for X. - Available space = the total size - used space, 35.1TB - 5.3TB = 29.8TB in this example.
- The percentage used is the used space / total size, 5.3TB / 35.1TB = 15%.
- The space actually used to store data = the space used within containers - the space that needs compaction: 22.34TB - 17.17TB = 5.17TB. This value differs from the 5.3TB in the example output because the command output includes overhead space that is used in the data store.
crcontrol --compactstate
Data store compaction: ON, DeleteSpaceThreshold: 30%, CompactLBound: 4MB
Compaction busy: No [or Yes]
- Queue processing is not running or failing. Check your deduplication activity logs.
- PDDO data removal is not running or failing (PureDisk storage pools only).
- Content router modes disallow deref and delete operations (PureDisk storage pools only, after rerouting of Disaster Recovery Backup jobs). To check and set content router modes, see the Symantec NetBackup PureDisk Administrator's Guide.
- Compaction is disabled or the CompactWakeUpInterval is set too high.
- Based on the configured retention, NetBackup expires the images in the NetBackup image catalog (image cleanup job) and then creates corresponding expiration transactions in the deduplication transaction queue.
- Transactions in the queue are processed by the deduplication daemons every 12 hours, 20 minutes past the hour, and the corresponding data pointers are dereferenced from storage.
- A garbage collection runs weekly to clean up any orphaned data, which can occur in certain edge cases.
- Based on the configured retention, NetBackup expires the images in the NetBackup image catalog (image cleanup job). NetBackup also notifies the PureDisk MetaBase (catalog) of the images to be deleted.
- On a weekly basis the PDDO Data Removal Policy removes the corresponding NetBackup image data from the PureDisk MetaBase engine (catalog) and creates corresponding expiration transactions in the PureDisk deduplication transaction queue. The schedule of the PDDO Data Removal Policy can be changed in the PureDisk WebUI to daily if needed.
- Transactions in the queue are processed by the Queue Processing Policy, and the corresponding data pointers are dereferenced from storage. The schedule of the Queue Processing is 2 times a day by default; it can be changed from the PureDisk WebUI.
- On a monthly basis a CR garbage collection policy runs automatically to clean up any orphaned data, which can occur in certain edge cases. If you have very short retention policies (retention of less than 5 days), then it is useful to schedule CR Garbage Collection more frequently (once or twice a week).
- The data removal policies, which run at a user's scheduled time, automatically dereference the affected files in the MetaBase Engine (catalog) and add the necessary expiration transactions in the Content Router (CR) queue for removal.
- Transactions in the queue are processed by the Queue Processing Policy, and the corresponding data pointers are dereferenced from storage. The schedule of the Queue Processing is 2 times a day by default; it can be changed from the PureDisk WebUI.
- On a monthly basis a CR garbage collection policy runs automatically to clean up any orphaned data, which can occur in certain edge cases. If you have very short retention policies (retention of less than 5 days), then it is useful to schedule CR Garbage Collection more frequently (once or twice a week).
How to reclaim space more quickly
- Identify what to expire: NetBackup backup images (use case 1 and 2) or PureDisk backup images (use case 3).
- Expire backup images that are no longer required. When you expire images in either NetBackup or PureDisk, transactions that remove the deduplicated image fragments are then generated and placed in the deduplication transaction queue (either NetBackup (use case 1) and/or PureDisk (use cases 2 and 3).
- See Expire NetBackup images (use cases 1 and 2 only)
- See Expire PureDisk images in the metabase (use case 3 only)
- Process the transaction queue twice so that unnecessary fragments are removed from storage.
- See Process the transaction queue.
You can also remove garbage data at any time. If you remove garbage data, you must also process the transaction queue twice. Any time that you do something manually that generates transactions, you should process the transaction queue twice to complete the operation. The crcollect and crcontrol commands are required for these manual operations. See Remove garbage data.
Expire NetBackup images (use cases 1 and 2 only)
1. Expire the backup images by using the NetBackup GUI or the command line.
a) Use the bpimagelist command to determine the backup IDs of the backups to be expired.
b) Run the command bpexpdate -backupid <backip ID> -d 0 -force -notimmediate to expire each image. The -notimmediate option prevents bpexpdate from calling the nbdelete command, which deletes the image. Without this option, bpexpdate calls nbdelete to delete images. Each call to nbdelete creates a job in the Activity Monitor, allocates resources, and launches processes on the media server.
c) After you expire the last image, delete all of the images by using the nbdelete command with the -allvolumes option. Only one job is created in the Activity Monitor, fewer resources are allocated, and fewer processes are started on the media servers. The entire process of expiring images and deleting images takes less time.
-
Create (if necessary) and run a data removal policy. Configure a removal policy to define what versions you would like to remove. Select a data retention in days or select a version retention to be more specific.
- Process the transaction queue twice. See Process the transaction queue.
Remove garbage data
In a few rare scenarios, some data segments may become orphaned. Garbage collection cleans these segments up by removing them.
- Run the following command:
crcollect -v -m +1,+2 - Process the transaction queue twice. See Process the transaction queue.
- Check if another queue processing is running:
crcontrol --processqueueinfo - Repeat step 1 until pending and busy status show "no" as the result.
Remark: one queue processing can take between a few minutes up to about a day. Progress can be seen in the storaged.log. - Start the CR queue processing:
crcontrol --processqueue - Check if the queue processing is finished by using the following command:
crcontrol --processqueueinfo - Repeat step 4 until pending and busy status show "no" as the result.
The storaged.log is located in the following directories:
- On a PureDisk Storage Pool server - /Storage/log/spoold/storaged.log
- On a UNIX or Linux MSDP Media Server - <storage path>/log/spoold/storaged.log
- On a Windows MSDP media server - <storage path>\log\spoold\storaged.log
|
|
Legacy ID
346797
Article URL http://www.symantec.com/docs/TECH124914
Terms of use for this information are found in Legal Notices









Thank you.