Nuts and Bolts in NetBackup for VMware: Handling Orphaned Snapshots
Recently, one of our customers asked me if NetBackup for VMware supports the use of a dedicated data store for snapshots. That triggered this blog.
Snapshot is great. Among many of its uses, NetBackup employs it to create a consistent point-in-time image of the virtual machine for the purpose of backup. When a snapshot is active, the writes to VMDK files are directed redo logs. At the end of the backup, the snapshot is released and redo log is played back into VMDK.
The world is less than ideal. What happens if a backup ends prematurely and the snapshot is left behind? Now the redo log grows. What if such situations arise frequently? Now you have multiple redo logs growing in the data store. There are two major issues here.
- The storage space on data store gets used up quickly, if the data store fills up all the VMs using that data store would be affected
- The more snapshots you have for the same virtual machine, the worse will be the VM performance
There is a tendency to mitigate issue 1 by dedicating a separate data store just for storing snapshots. However the solution has shortcomings
- That solution does not address issue 2 listed above, the VM performance degrades if more and more orphaned snapshots stay in the data store.
- You just created a bottle neck in the infrastructure. Now the number of VMs you can concurrently backup is limited by the I/O performance of the snapshot data store
- There is some cost involved in creating and managing this snapshot LUN. You would need to monitor this LUN for orphaned snapshots and clean it up manually.
How does NetBackup for VMware mitigate this nightmare? It is a simple, yet powerful strategy. It gives you the flexibility to control those orphaned snapshots. It is documented in NetBackup for VMware System Administrator’s guide with the title Existing snapshot handling
NetBackup for VMware empowers you to decide what needs to be done if an existing snapshot (orphaned snapshot) is found for a VM. You can let NetBackup abort the backup job for that VM so that the problem is not further escalated. Or you can use the Remove NBU option which will automatically remove the orphaned snapshot if it was originally created by a NetBackup backup job.
The strength of this approach is that you are resolving issue 1 and 2 listed above proactively. Furthermore you eliminated all the shortcomings in the use case where dedicated a snapshot data store is used.
Do you still have a business use case where you want to use a dedicated snapshot data store? If yes, please post it as an idea with the business use case. Let the community members vote. We are listening!