Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.
Netting Out NetBackup

Nuts and Bolts in NetBackup for VMware: Virtual machine snapshots for backing up Business Critical Applications?

Created: 02 Jul 2012 • Updated: 22 Jan 2013 • 5 comments
AbdulRasheed's picture
+4 4 Votes
Login to vote

 

Recently, there were discussions in social media and blogs on using hypervisor level snapshots on virtual machines hosting business critical applications like Microsoft Exchange. Some confusion came on account of a statement from Microsoft documented here. The emphasis is mine.  http://technet.microsoft.com/en-us/library/aa996719.aspx

“Some hypervisors include features for taking snapshots of virtual machines. Virtual machine snapshots capture the state of a virtual machine while it’s running. This feature enables you to take multiple snapshots of a virtual machine and then revert the virtual machine to any of the previous states by applying a snapshot to the virtual machine. However, virtual machine snapshots aren’t application aware, and using them can have unintended and unexpected consequences for a server application that maintains state data, such as Exchange. As a result, making virtual machine snapshots of an Exchange guest virtual machine isn’t supported.”

I also got a few questions on this during VMware User Group (VMUG) conference in Minneapolis while I was talking about strategies to bring business critical applications into vSphere environments. I wanted to use this blog to clarify what the statement above really means from data protection strategy for business critical workloads on vSphere.

First let us define a few snapshot operations so as to avoid confusions. As we are talking about data protection, let us focus on how virtual machine disk files (using VMware vSphere VM snapshot as an example) are impacted by a VM snapshot operation.

When a snapshot exists and an application on virtual machine writes data to disk, that data is written to a set of redo-log files. Newly saved data continues to accumulate in the redo-log files until you take an action that affects the snapshot. Possible actions that we need to discuss are…

Delete the snapshot - When you delete the snapshot, the changes accumulated in the redo-log files are written permanently to the base disks, i.e. VMDK files. Thus the VMDK files become ‘current’.

Revert to the snapshot - When you revert to the snapshot, the contents of the redo-log files are discarded. Now the virtual machine ‘rolls back’ to the point in time when the snapshot was created.

First of all, if your backup application is using Revert to the snapshot operation, then that solution is unsupported by Microsoft as I have shown in the emphasis in the statement quoted above.

Backup Exec 2012, NetBackup 7.5, NetBackup 5220 appliance are examples of data protection solutions that do not use Revert to the snapshot operation and hence not impacted by that part of the statement.

Now we need to address application unawareness of VM snapshots, which is frowned upon by Microsoft on account of genuine concerns.

VM snapshots (aka hypervisor snapshots) are indeed application unaware. Some level of awareness can be achieved by using a VSS provider that works with VSS writers of the application but it can be quite cumbersome for large environments. As we are taking about business critical applications (the lifeblood of the organization), such sub-par solutions may have risks and hence Microsoft released such a statement.

   Symantec solves this problem by providing agent-assisted backups in Backup Exec 2012, NetBackup 7.5 and NetBackup 5220 appliances. An agent sitting in the VM discovers and quiesces application as if it was an agent-based backup. This brings full-fledged application awareness and consistency needed for business critical workloads. Then a VM snapshot is created using VMware APIs for Data Protection (VADP). After that the application is released from its state of quiescence. The VM data is copied using VADP transports. Thus, Symantec provides the best of both worlds when it comes to protecting mission critical applications on VMware vSphere; agent is used for the purpose of application discovery and quiescence thereby meeting Microsoft’s requirements. Then an agentless data movement (backup) is performed though VADP!

   This agent-assisted backup is currently available from Symantec for Microsoft Exchange, Microsoft SQL Server and Microsoft SharePoint. In addition to providing support for these business critical workloads, you also get any-level-recovery from these applications with a single backup. For example, if you are using Backup Exec 2012, NetBackup 7.5 or NetBackup 5220 appliance for protecting a virtualized Microsoft Exchange environment on vSphere, you get the following from a single backup.

1.     Recover entire virtual machine

2.     Recover individual files

3.     Recover specific database availability groups or information stores

4.     Recover specific mailboxes or mailbox items

 

Comments 5 CommentsJump to latest comment

Stuart Green's picture

Great article.

Thanks for outlining some key terms and specifically the use of an application aware agents to quiesce the application inside a VM before a VM backup takes place.

This is all well and good for Windows.

On Linux VMware Tools can trigger a couple of scripts /usr/sbin/pre-freeze and /usr/sbin/post-thaw

We use these scripts for example to put Oracle DB tablespaces into backup mode prior to the VM snapshot and then take them out of backup mode after the snapshot on our Linux VM's.

(We are of course also writing out archive redo logs to another area.)

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

+1
Login to vote
J. Bartke's picture

Hi Stuart,

we are facing the same challenge. We have virtualized SAP Systems running on SLES for SAP Applications 11 SP2 with Oracle 11GR2, platform is vSphere5. Backup product is Symantec Netbackup 7.5.0.6. Snapping the SAP VM even with Netbackup Agent installed inside the VM doesn't quiesce the Oracle database (.i.e the application). What are the commands you run against the oracle pre and post of the backup? And - I "heard" that 'alter tablespace blabla begin backup' is not entirely safe and it is advised to use 'alter system suspend' at the beginning of the snapshot and 'alter system resume' immediatly at the end of the snapshot operation (not the whole backup...)

What are your experiences regarding this? Furthermore - what is your whole strategy behind that? What I'm asking myself - you take a snapshot of your machine let's say at 8 in the morning. meanwhile you backup the archived redo logs aof the oracle with the NB Agent elsewhere. Lets say at 11:00 clock the maschine crashes and you restore the whole VM with the snapshot you took a 8:00. Is it possible the easily recover the database with the backed up redo logs by netbackup means?

Thank you in advance & regards from Cologne/Germany

Joerg Bartke

 

+3
Login to vote
Stuart Green's picture

Joerg,

 What are the commands you run against the oracle pre and post of the backup?

$ cat /usr/sbin/pre-freeze-script

# VMware Freeze Script

echo Freeze: $(date) >>/var/log/snap

orauser=orasmp

sudo -i -u ${orauser} <<BOF 2>&1
export ORACLE_SID=SMP
sqlplus /nolog <<EOF 2>&1
connect / as sysdba
alter system switch logfile;
alter database begin backup;
EOF
BOF

$ cat /usr/sbin/post-thaw-script

# VMware Thaw Script

echo Thaw: $(date) >>/var/log/snap

orauser=orasmp

sudo -i -u ${orauser} <<BOF 2>&1
export ORACLE_SID=SMP
sqlplus /nolog <<EOF 2>&1
connect / as sysdba
alter database end backup;
EOF
BOF

 

 I "heard" that 'alter tablespace blabla begin backup' is not entirely safe and it is advised to use 'alter system suspend' at the beginning of the snapshot and 'alter system resume' immediatly at the end of the snapshot operation (not the whole backup...)

As you can see above this is what we use. Is it safe and how can I prove it is safe? Well...
I use Veeam to backup these SAP Central Instance VM's (daily) and use Veeam's SureBackup tech to confirm my backups are good. To the point I just need to take the database out of backup mode in the SureBackup lab environment and create a network route on my PC and fire up SAPGUI and logon - to my BACKUP...
this takes less than 30minutes to prove a success - start to finish.
(My R3 instance is 900GB in size)

• Archive redo logs.

The amount of redo logs and rate/schedule of creation is down to your businesses RPO I guess.

We use brtools brarchive and ijn SAP schedule it to run 4 times a day to create archive redo logs to a NFS mounted directory that is conveniently on our Linux NetBackup master server. All our SAP instances are done this way. So we backup our VM with a consistent database and perform a filesystem backup of our archive redo logs. And we have Archive redo logs on disk and then move to tape.

All pieces of the VM go off to tape via NetBackup. Having the 2 solutions is possibly more management. But we have had NetBackup for over 10 years to always perform to tape function - so far.

No reason NetBackup can't be used for similar.

However, we also use an ability within backups to perform landscape refreshes. Each sapdata area is on its own filesystem and ultimately VMDK. We just blow away DEV sapdata areas and alternate restore production backup VMDK's (sapdata areas) onto the DEV VM. This is neat in SLES 11 if you have your disks mounted by label as when you power up the DEV VM it doesnt matter you didnt match the same SCSI ID as long as your disk labels are consistent across your landscape.

NetBackup 7.6 offers some interesting features maybe for checking consistent backups of VM's - havent checked release notes.

 

NOTE: Currently I face an issue with SLES 11 SP1 SP2 and SP3 VM's.

VMware KB: Snapshot quiescing fails on Linux guests after upgrading to ESXi and VMware Tools 5.1
 

With the quiesce option enabled in the backup software for those VM's, it results in hung VM's. Or just testing in the vSphere client and choosing snapshot and Quiesce the guest os.
I have traced it down to a problem with ioctls. Red Hat Linux has a similar bug acknowledged by Red Hat and discussed on this VMware KB article. I have an outstanding SR with Novell regards it.
The workaround is a very good tip. As I need to run the scripts. And quiesce VM is the setting that basically initiates this, otherwise they are ignored to be run.

 

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

0
Login to vote
KDob's picture

So, I configured my SharePoint backups with NetBackup for SharePoint version 7.5.0.4.

To the heart of the Matter:  Do I need a VSS provider from Symantec or not to use this feature?  and if so, where do I get it to install on the VMs in question.  Yes.  I want the GRT feature to work as well.

Currently getting conflicting messages in the Details Tab of the Backup Jobs:

"No VSS provider has been found on this VM.  This may not be a quiesced snapshot."

"Host successfullly protected SharePoint and Cataloged under Host"

Ps. I thought Symnatec made a business desicion NOT to be in the business of writing VSS providers and was totally relying on Microsoft to do that for them?  Remember VSP from Veritas days???  Much better product than VSS from Microsoft!  LOL!

Let me know what the story is on SharePoint backups when all servers are VMs...documentation is not much help :)

+1
Login to vote
KDob's picture

One last thing: The messages above occurred on the Application State Capture Job which failed/succeeded with a Code 1.  Was it a failure or successful?  I don't know at this point.

 

+5
Login to vote