Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

5220 PureDisk Volume Down

Created: 25 Mar 2013 | 11 comments

Hi,

I have 2x5220 Master / Media appliances running 2.5.1 with external storage units, the PureDisk Volume is going down on one or other of the appliances every few weeks, it usually takes multiple reboots / full shudown to get it back up but it's hit and miss, I have no definitive process for bringing it back online. How do I troubleshoot ? what logs should  I be looking at ? is there a correct way of bringing a PureDisk Volume back up ?

Thanks,

Dec

Operating Systems:

Comments 11 CommentsJump to latest comment

Mark_Solutions's picture

Lots of questions there and a little like how long is a piece of string!!

It all depends why it goes down as to how to stop it and how to bring it back up.

First places to look at the spoold.log and storaged.log for clues

There are a few issues on 2.5.1 ...

For example, if you do VMWare backups and your LUNS have multiple paths it can cause a memory leak causing the appliance to run out of memory - fixed in 2.5.2

Also there is a possible bug in spoold - EEB available but need to see what is happening in the spoold.log first

If you can post those 2 logs on here for a date when they have gone down we may be able to assist but i would reccomend raising a case with support as there are a lot of variable here

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

DecKy's picture

I've attached the spoold.log and storaged.log, this is the initial alert - Alert Raised on: 23 March 2013 06:50 - Alert Policy: Disk Storage Unit - DOWN

Is there a fix in 2.5.1 for the memory leak with multiple paths ?

Many Thanks,

Dec

 

AttachmentSize
storaged.zip 1.16 MB
spoold.log_.20130325083951.zip 809.64 KB
Mark_Solutions's picture

For the memory leak in 2.5.1, as long as it is just a Media Server and not a Master Server you can do the following:

To see if you have leaked semaphores run this single line command (copy and paste into putty!!)

for semid in `ipcs -s | awk '/^0x/ {if ($1=="0x00000000") print $2}'`; do ipcs -s -i $semid; done | egrep -v "se maphore|uid|mode|nsems|otime|ctime|semnum" | awk '{if ($5 != "0" && $5 != "") print $5}' | uniq | xargs ps -p

If this shows orphaned processes you can add this to the crontab (crontab -e) - again just one line so copy and paste from notepad:

*/15 * * * * /usr/bin/ipcs -s | grep 0x00000000 | awk '{print $2}' | while read sem; do /usr/bin/ipcrm -s $sem; done

Dont do this on a Master appliance as it takes EMM down!

I will have a look at the logs and get back to you....

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Mark_Solutions's picture

Just one other thing - multiple paths for VMWare LUNS is not actually supported at the moment - not sure if that has changed in 2.5.2.

From the spoold log - lots of these:

March 23 06:51:18 INFO [47685560046816]: Storage Manager: initializing
March 23 06:51:18 INFO [47685560046816]: Database Manager: initializing
March 23 06:51:18 INFO [47685560046816]: Database Manager: initialization complete
March 23 06:51:18 INFO [47685560046816]: Database Manager: closing storage database connection
March 23 06:51:18 INFO [47685560046816]: Database Manager: shutdown

Worth raising a case and ask them if ET2962020 applies to your appliance (new spoold binary)

Storaged log also has lots like this:

March 25 15:28:17 INFO [1082194240]: fakeFPCheck: DO fd00355bc39b497510379a4eb9ab2f49 is corrupt

so worth asking them to check this out as you may need some maintenance doing as well - this could be causing spoold to crash during queue processing / rebasing

Logs a call and see if they can get you sorted out

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

wan2see's picture

Just want to point out that this is not just an appliance issue, we have 3 PD pools and all are attached and using 7.5x and we encounter the same challenges.

NBU support has no answer and we have to reboot the masters multiple times also.

As stated above spoold and spad are the culprits

Mark_Solutions's picture

The original EEB was for MSDP and was converted to a rpm for me when i identified this issue on an appliance

Support should be able to provide you with both for your system to put it right

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Tahir Maqbool's picture

I hope NBU support can help you.... make sure that the mentioned above logs are checked by the support and ask the Assigned TSE to forward these logs to be verified by Back-line support before applying any EEB.

I have the same sort of issue and i am still in touch with NBU Support to resolve the issue as one of my appliance is useless these days.

-matt-'s picture

Puredisk Volume has been marked down

Known issue with the following workaround (resolved in 7.5.0.5)
Stop Netbackup
Start Netbackup
When spoold is accepting connections run the following to check, then turn off image re-basing;

cat /disk/log/spoold/spoold.log |grep rebase

/usr/openv/pdde/pdcr/bin/crcontrol --rebasestate
      Image rebasing: ON
      Rebasing busy: Yes

 /usr/openv/pdde/pdcr/bin/crcontrol --rebaseoff (takes a few minutes to complete)
      Data store conversion turned off

http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=TECH199067

Mark_Solutions's picture

-matt-

Same as the issue I mentioned and the EEB for 7.5.0.4 that is available - though that tech note gives a different ET number to the original - maybe renamed for the appliance version now

Good spot though

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

DecKy's picture

The EEB didn't work for me, I still had the problem after it was installed, I'll use the above workaround if it happens again, hopefully it won't

Mark_Solutions's picture

Ok - surprise at that as i assume it is the 2.5.2 spoold file - but yes, turn off rebasing for now but it will affect performance eventually as it is basically de-dupe de-frag

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.