Video Screencast Help

snapshots cause Linux VMs to lockup

Created: 07 Aug 2014 • Updated: 06 Oct 2014 | 12 comments
This issue has been solved. See solution.

Our vSphere environment is 5.1

NBU is running 7.5.0.6 on WS2008R2 x64

Long story behind this one. We were having a terrible time with OHB's causing RHEL VMs to lockup. At that time we had the symcquiesce utility on all VMs. The Linux admin investigating this decided that the utility was the source of the problem so decided to remove the utility from the environment. Backup success from the console view of the world is ~99% but users still frequently complain of VMs locking up as a result of the snapshot process apparently, the snapshot is attempted, then vmware tools dies and periodically the VMs have to be reset to get back to normal. The other thing is they removed the iso from the datastore that was used to install years back and on our master/media server, the only thing under \NetBackup\bin\goodies\vmware-quiesce is the SYMCquiesce.1.0.0-001.iso, but my read of the 7.5.0.7 release notes says this on P 45:

Warning: Multiple versions of SYMCquiesce.iso may exist in these directories

with names such as SYMCquiesce.1.0.0-001.iso. Unless otherwise specified

in the NetBackup documentation, only install SYMCquiesce.iso. Do not install

SYMCquiesce.1.0.0-001.iso.

So, one question is, where is this special SYMCquiesce.iso it references as I cannot find this anywhere in any of the client installs or the master, and it does not appear to be downloadable by itself?

The other big question is, has anyone else ran into where regardless of if you have the utility installed VMs randomly get hosed up as part of the snapshot/BU process?

In the older releases, was the SYMCquiesce.1.0.0-001.iso. the only one to choose from?

thanks!

Operating Systems:

Comments 12 CommentsJump to latest comment

Will Restore's picture

The SYMCquiesce is not part of the 7.5.0.x or 7.6 install on Linux. To obtain a copy, please download the attachment on this technote.

Article URL http://www.symantec.com/docs/HOWTO92298

Will Restore -- where there is a Will there is a way

Marianne's picture

... the snapshot is attempted, then vmware tools dies ...

This sounds like a problem with vmtools...  
Have you tried to take a snapshot from within vSphere?

Have a look at this post from Stuart Green: 

https://www-secure.symantec.com/connect/forums/cannot-snapshot-sles-11-sp2-netbackup-7601#comment-10382151

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
schrammd's picture

Hi Marianne,

yeah, we had cases with VMware, RH, and Symantec open on this at the time (until the OS admin pulled the plug on it). We can manually snap these VMs all day long and the OS never gets hosed up. Tools are all up to date and functional.

thanks

schrammd's picture

Thanks, but that is not the issue, we are not installing the NB clients in the VMs, this is pure OHB. But the utility still has to be installed for things to be clean. I see the iso on that page has numbers before the .iso. Can someone from Symantec answer why the release notes say to NOT use those versions of the iso and only the "symcquiesce.iso"?? I cannot find such a file anywhere, and to my knowledge, and per the vm backups admin guide, there is no such mention of NOT using the "versioned" utility, in fact, all they say is look "here" for the iso.

Seems a bit more confusing than maybe it's worth. I guess we will use the SYMCquiesce.1.0.0-002.iso

and see how it goes.

CRZ's picture

This response is a little late, but that "symcquiesce.iso" referenced in the 7.5.0.7 RN should be the "1.0.0-002" version.  The key was to make sure you were installing that one and not the earlier "-001" version (which apparently had defects to the point we produced the "-002" version  cheeky).  So if you've already pulled it from HOWTO92298, you should already be good to go...and since you haven't posted in a few days, I'm going to hope that means you saw no issues!  (?)


bit.ly/76LBN | APPLBN | 761LBN

schrammd's picture

Ok more on this. We've done some analysis. With the Symcquiesce utility installed on these RHEL5.4 VMs, the VM running vmwaretools v8 does NOT lockup - ever, but the VMs on version 9 lockup immediately upon creating the snapshot. We have debugging enabled inthe vmtools but they tell me there is nothing in the debug log (empty) so maybe they don't quite have it right. Today, to eliminate doubt and for currancy reasons, I applied the 7.5.0.7 update to be sure, same result on the locking up the VMs. Seems to me this is not a Symantec issue but more likely VMware or RedHat. We don't seem to have a direct match for the case Stuart references, but it does appear very similar. I'm going to get a 3 way going between all you vendors so we can hoepfully get to the bottom of it, the customers are getting really cranky with the lockups and I don't think downreving the vmtools is a long term solution, obviously.

thanks

wsutton's picture

Questions for schrammd:

1. has your problem been resolved?

2. have you seen this issue with NB 7.6 and/or RHEL 6?

We have a NB 7.6 + RHEL 5/6 environment and are thinking about moving from individual host backups to using the VMWare backups.

schrammd's picture

We are not running 7.6 yet so I cannot say. The issue could affect RHEL VMs up to 6.3 as I recall, but in our case, only the 5.4 clients are suffering. The workaround for us since patching them (upgrading to 6.x is not an option) and we did not want to be forced into running ancient vmtools "forever", we simply put these affected hosts into a policy where there is no quiescing. Yes, not ideal, but the app owners signed off on it since it's a lab environment. We have tested restores and such with no issues detected so we think the risk is relatively low for the handful of clients affected by this. All newer builds with the newer tools works fine, no lockups at all these days (99.99% successful backups day after day).

Also, I should note, between 5.3 and 5.4 I think there were some fundamental changes in the driver (RHEL) that is, and it just so happens our builds which are frozen happen to be affected. Doesn't happen to any other 5.x machines nor any 6.x ones either, FWIW.

Travis Ecc's picture

I noticed this thread is still open.

We have been experiencing something very similar since upgradign to our vCenter, ESXi and VM hardware and tools.  We had cases open with VMware, RedHat and investigations with Symantec.

The problem is with SYMCquiesce but is not a Symantec issue.  The issue from VMware and RedHat is a compatibility one between the Linux O/S and VM Tools.  Further information from RedHat confirmed this across RHEL 5.x and 6.x versions that we have.

Disabling the Qiesce option in the VMware policy VM Tab stops the servers from crashing or going into kernel panic.  This is a short term solution only.  We ahve not updated our SYMCquiesce utility to the -002 version.

Does anyone  (CZ?) have a link to a document detailing defects in -001 that are fixed in -002?

Thanks

Travis

CRZ's picture

Hi Travis,

We don't have any such documentation beyond what's in the VMware Guide and various Release Notes. 

From scouring Etrack, it appears that -002 came about due to resolve some "hang" issues in -001.

The only other separate thing I found in the KB was *this* doc, which says that as of 7.6.0.3, everyone should now have version -003 installed - and THAT version came about due to an install issue in -002:

When upgrading SYMCquiesce from SYMCquiesce-1.0.0-001 to SYMCquiesce-1.0.0-002, the /usr/sbin/pre-freeze-script contents are removed
 http://symantec.com/docs/TECH218512

That ALSO got a couple entries in the EEB guide (DOC6085) but they're not terribly helpful.

Hope this wasn't AS terribly unhelpful.  ;-)


bit.ly/76LBN | APPLBN | 761LBN

Travis Ecc's picture

Hi Chris

Apologies for the late reply.  This has been great information.  It at least give me some information to take to management with a push to upgrade the version.  Will likely go straight to -003 if that TECHNOTE has a link.  It also gives some clarity as to process and reasoning for the version changes, which is always nice to see.

Given we've had some hang issues here and there an upgrade is deserved I think.

Thanks again

Travis