Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

VM snapshots not deleting

Created: 29 Jun 2012 | 14 comments

Hi We are experiencing a problem with VM backups, in policy Existing snapshot handling is NBU rmove, but on few machines it still not deleting the snapshot and it is increasing the number of snapshots and all the times we have to delete these snapshots manually and its degrading the performance of Vm also creating athreat of crash of VM, Please advice on it.

Comments 14 CommentsJump to latest comment

Andy Welburn's picture

Query was originally tagged to a Blog.

Have move to new discussion for greater exposure - i.e. you're more likely to get a response here!

Probably a good idea to provide a bit more info regarding your environment.

ChAmp35's picture

We had discussed with symantec, The snapshots which are not getting deleted those have name consolidate_helper_number, but NBU creates snapshot with name NBU_Snapshot, so these are not our snapshots, also NBU deletes last NBU_Snapshot , if its there. in any case if there is another snapshot with diffrent name afte the NBU_Snapshot, then NBU wont be able to delete any of them and backups will fail. in that case we have to manually delete these snapshots.

Detail of environment:

Master : SUN Solaris 10, NBU 7.1.0.2

Media: SUn Solaris 9 , NBU 6.5.6

Backup Proxy Host : VM machine, NBU 7.1.0.4 (recommend by symantec), Windows 2003

Stuart Green's picture

Can you clarify:

Your Backup Proxy Host is running 7.1.0.4

Your Master is running prior release 7.1.0.2

If so install 7.1.0.4 on the master ASAP

Then rerun the backup.

Also consider the requirements for hotadd as your transfer method as you have chosen to use a VM as your backup proxy. Please make sure your setup complies with the section in the NetBackup for VMware Guide on Notes on hotadd transfer type.

VMware Transport Modes: Best practices and troubleshooting

http://www.symantec.com/docs/TECH183072

snippet>>>

Troubleshooting for some common transport mode related failures

Backups/Restores failing with status 6 or status 13 or status 11 with following indication in Activity monitor might indicate that there is some issue with transport modes:-

  • ERR - Error opening the snapshot disks using given transport mode: Status 23 indicates that there was some problem in accessing the vmdk using given transport mode.

    Here are some tips on handling this kind of error:

    • If you are using NBD, make sure the VMware Backup Host has connectivity to ESX server hosting the virtual machine.
    • If you are using SAN, please make sure that the datastore LUNs are accessible to VMware Backup Host.
    • If you are using Hotadd, please make sure that your backup host is Virtual Machine and following conditions are satisfied:
      • The VM should not contain IDE disks.
      • Ensure that there are sufficient SCSI controllers attached on the Backup Host VM.
      • The Backup Host VM has access to datastores where VM being backed up resides.
      • The Backup Host VM and VM being backed up should be under the same datacenter.
      • If the previous backup failed, it might have left some disks of the backup VM attached to Backup Host. These disks need to be manually removed before attempting the next backup.
    • If a non-default port for vCenter is in use, then that port needs to be defined while adding vCenter credentials to NetBackup.
    • If using NBD, please make sure the VMware Backup Host is able to communicate to port 902 of ESX server hosting the VM.
  • file read failed indicates that there might be problem in reading the VMDK using the given transport mode. 
  • file write failed indicates that there might be some problem in writing to the VMDK using the given transport mode.
    • If using SAN for restores, please make sure datastore LUNs are accessible to the VMware Backup Host and in an online state.
    • If using Hotadd for restore, please make sure that SAN policy on the Backup Host is set to OnlineAll.
    • If using SAN for restore, make sure that the size of VMDK is multiple of datastore block size.  Otherwise, the write of the last block will fail.  In this case, a workaround would be to use NBD for restore.
    • Please make sure that the you assign necessary privileges to the user configured in NetBackup to logon to vSphere.

Also consult the hotadd document on VMware too for background.

http://www.vmware.com/support/developer/vddk/VDDK-...

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

ChAmp35's picture

Now we are facing issue of EC 156 on one of Vm machine, its backups are failing because NBU is not able to quiesce the VM machine, and hence backups are failing and when we are disabling this feature in policies then backups for this Vm are running fine, but this practice is not recommended by Symantec.

  • Earlier we were facing the same issue, at that time we found some errors in VSS writers in event logs and we worked on them and it resolved the issue, but this time there is no error of VSS and all looks fine , but still backups are failing with EC156.

PFA bpfis logs and suggest.

AttachmentSize
bpfis.txt 163.67 KB
ChAmp35's picture

7.1.0.4 version is suggested by Symantec, bcz with 7.1.0.2 we were facing issue in restores.

Also if this is the reason, then I am little bit confused , bcz backups of rest of VM's are running fine.

Stuart Green's picture

OK, just normal best practice you upgrade the Master first and then Media Server and then clients. Whatever the version + maintenance release. (I may stand incorrect here in this instance.)

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

captain jack sparrow's picture

Seeing at BPFIS

Hope below technotes would be helpful to you

status 156 - snapshot failed - vixapi freeze failed with 36 - vc error: creating quiesced snapshot failed because snapshot operation exceeded timeout limit.

http://www.symantec.com/docs/TECH137001

Vmware backup fails 'unable to quiesce file system'

http://www.symantec.com/docs/TECH145960

156:Receive "snapshot creation failed, status 156" when attempting to backup a VM.

http://www.symantec.com/docs/TECH154889

Not sure if 3rd one would be case in yours. But hopes so workaround would help you

 Cheers !!!

CJS

captain jack sparrow's picture

Also found one of error.Cannot create a quiesced snapshot because the snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine

Refer VMWare technote for this :

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1018194

 Cheers !!!

CJS

Stuart Green's picture

Found this above KB also.

You did not say what these problem VM's are running. Could be a High Transactional application with a DBMS running in it - MS SQL/Oracle/Exchange?. Therefore high I/O load and cannot commit back.

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

captain jack sparrow's picture

NetBackup Master Server Must be at highest level of version within NetBackup Domain

NetBackup Media servers and Client must be same or lower version compared to NetBackup master server

This is technically no longer true.

If the master, media and client are all at 7.1.0.x, for example, it doesn't matter if the client's "x" is higher than the servers', so long as they're all at the same minor version (in this case, 7.1).

 Cheers !!!

CJS

CRZ's picture

NetBackup Master Server Must be at highest level of version within NetBackup Domain

NetBackup Media servers and Client must be same or lower version compared to NetBackup master server

This is technically no longer true.

If the master, media and client are all at 7.1.0.x, for example, it doesn't matter if the client's "x" is higher than the servers', so long as they're all at the same minor version (in this case, 7.1).

EDIT: Sorry, I hit Edit instead of Reply!


bit.ly/76LBN | APPLBN | 75LBN

Stuart Green's picture

Chris. Thanks for clearing that up.

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

Stuart Green's picture

Also consider with hot-add:-

VM-based backups impact the host ESX server. For a severely overtaxed ESX environment, physical host-based backups should be considered as an alternative to VM-based backups.

So your ESX server that these specific VM's are on, may be being pushed hard resource and performance.

Recommend to use the vcenter performance charts to rule this out when backup(s) are running.

Try vmotioning the VM's to another 'beefier' ESX host, but keep in mind that the ESX host must still be able to access these datastores that the VM for backup is on.

Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.

• If this provides help, please vote or mark appropriate solution.

ChAmp35's picture

So If i Understood correctly then it means that If Backup Proxy host is @ NBU 7.1.0.4 and master @ NBU 7.1.0.2 , tgis hardly makes any problem ?

also regarding this issue:

>> IP of the problemetic VM is static

>> we cant use pre/post scripts to stop/freeze applications, as it will be creating a downtime for application, which is not possible and disable Quiescing for whole system is not recommeneded by symantec and if we disable quiescing for application then no use of backing up that VM, bcz recovery of application is not guranteed, which is required in this case.

>> we have tried restarting the VMware tool services, its of no use.

>> We have increased the value of snapshottimout on proxy host, that is of no use again

>> Reintsalled the Vmware tools and still same issue.

But here is one catch:

Backups are failing from last 1 week and there were no error related to VSS in event viewer till today, but today we found a error in event viewer regarding it, I am astonished if earlier there was everythng fine then why it was not working and now its not working , then how come it encountered an error with VSS writers.

also please let me know if there is any way to increase the time out value for deleting snapshot in NBU.