Data Lifecycle Management in BE 2012
Let me start with some of my background first, since this is my very first blog. I have been with Symantec Backup Exec team more than 12 years. In the past 12 years, I have been working on the development of both NetBackup and Backup Exec products. My area of focus has been on the development and design of the shared catalog components. In the recent few years, I have been heavily involved in designing many different Backup Exec features. In 2010, after more than 10 years of service, I was recognized as a Distinguished Engineer. This recognition was due to my overall contributions to the company. During the development of BE 2012, I was leading the design and development of Data Lifecycle Management, Metadata Web Service and Simplified Restore Workflow features. I was also heavily involved in the design and development of many other new features.
There are many new features that have been introduced in BE 2012. It probably would be a good idea to blog about some of these new features. Before I dive in some highlights of one of the new features-- “Data Lifecycle Management “ (DLM), let me start with some insights of why we created the DLM feature. Before Backup Exec 2012, Backup Exec had always managed the lifecycle of backup data using a tape centric method, no matter what the backup data type was. This included data that was backed up to .BKF container files (tape emulated disk container file) on disk or to tapes. The media set was designed to manage removable media like tape. This design has been working great with removable media like tape for years. In recent few years, the disk storage has been adopted as the primary target storage for backup due to its enablement of advanced recovery capabilities (e.g. GRT), fast performance and continuous decreasing cost per Mega Bytes.
There are two key issues of media centric management using media sets, especially for disk backup. When using disk based backup, the following issues apply:
1. No guarantee of backup data integrity for recovery. Lacking of knowledge of data dependency (incremental backup data would not be very useful if its previous incremental or full backup data is gone). Therefore, users have to know how to configure two media sets with correct overwrite and append periods for their full and incremental backup jobs, to avoid the scenario that the media containing full backup data is overwritten before the media containing its associated incremental backup data.
2. Lazy disk storage reclamation. The disk storage used by expired backup data won’t be reclaimed in time. It is because the lifecycle of backup data is managed by media, not by backup data itself. Therefore, The media won’t be overwritten until all backup sets on this media are expired. The append period of media set is used to control this gap. (e.g. if the overwrite period is 4 weeks and append period is 1 week, then the retention of this media is 5 weeks.) With limited disk storage space, the expired backup data should be deleted immediately to free up disk space for new backup jobs. This posts a real issue for customers today. Plus, you cannot simply add additional disk storage as easily as adding another tape. Therefore, media centric management of backup data on disk storage is not a desirable solution anymore. The data centric management design is the key concept of our new feature “Data Lifecycle Management” introduced in BE 2012.
Therefore we introduced DLM in BE 2012 to address these issues. Here are some highlights of the new feature “Data Lifecycle Management “ (DLM). The new feature “DLM” is designed to manage all backup data stored on all types of disk storage except removable disk cartridge. Here are some important points you should be aware of:
· Data retention:
o The retention of backup data is associated with backup set instead of media.
o The retention of backup data is a property of job definition and configured using single retention value (e.g. 4 weeks). It is not aproperty of media set using both overwrite period and append period.
o Single backup set BKF:
§ A single BKF file can contain only one or partial backup set.
§ A backup set can be spanned over more than one BKF file.
· DLM grooming process:
o DLM process will proactively check for expired backup sets and groom those backup sets and associated backup data to free disk space occupied by expired backup data.
o DLM process will be kicked off every 4 hours or whenever receiving low disk storage event.
o The dependency of full and associated incremental backup sets will be checked to prevent broken chain of full/incremental backup sets. In other words, DLM won’t groom expired full and incremental backup sets if there is a dependent incremental backup set that has not expired.
o DLM won’t delete the last copy of the latest recovery point chain.
§ The definition of the recovery point chain:
· Delete operation from Backup set view:
o Deleting a backup set from backup set view will delete the backup set and associated backup data from disk storage.
o The dependency of full/incremental backup sets will be checked on deletion. The depending incremental backup sets will be shown and prompted. User will have the option to either cancel the delete operation or delete the selected backup set/associated backup data and all shown depending backup sets/associated backup data.
· Retain operation from Backup set view:
o User can manually retain a backup set and assign it with a reason code and description.
o DLM grooming process won’t groom retained backup set/data and the backup sets/data it depends on.
o Delete operation from backup set view is disabled for retained backup set/data.
There are other details related to DLM grooming process and design philosophy I can blog more if there are enough interests. Hopefully, today’s blog is helpful.
Backup Exec Team