Video Screencast Help

vmoprcmd extall command hangs for specific media server

Created: 05 Mar 2013 • Updated: 08 Mar 2013 | 5 comments
This issue has been solved. See solution.

Before I detail the problem here are some details about my setup.  The master server is AIX 5.3 with version 7.5.0.3 and the media server in question is RHEL 6.3 with 7.5.0.3 as well. This is a new media server install as well and hasn't yet been used for anything.

It seems that when running "vmoprcmd -h mediaserver -extall" on the master server that it hangs for along period of time before exiting with "network protocol error (39)".  I enabled verbose logging for volmgr in vm.conf, the log directories under the debug directory and touched a bunch of "touch files" to enable more debug output but when this error occurs the only log file created on the media server is under the daemon folder and there isn't anything useful inside.

If I run tpconfig to display the device config on the media server (via the -l, -d or -dl options. I haven't tried any other options) the problem seems to temporarily go away.  After some undetermined amount of time though if I again try to run "vmoprcmd -h mediaserver -extall" it will hang and time out again.

When first attempting to configure the robot and drives that are connected to this new media server my client had issues and crashed.  I'm not sure if this problem is a result of that or the cause.  After the first failed attempt to configure the robot and drives I deleted them from Netbackup and reinstalled the media server.  After that I was able to successfully configure the robot and drives.

 

Here is the output from running the 'vmoprcmd -h mediaserver -extall' command:

 

513: vmoprcmd -h mediaserver -extall
ROBOTIC 0 none NONE 1070595959 507721 13 11 -1 -1 -1 -1 1 0 0 NONE - Not Robotic
ROBOTIC 1 acs ACS 251918401 200200 6 6 0 0 -1 -1 7 -2 131063 ACS - Automated Cartridge System
ROBOTIC 2 ts8 TS8 806092849 274432 16 4 0 20 21 2 0 0 0 TS8 - Tape Stacker 8MM
ROBOTIC 5 odl ODL 7 64 9 1 1 490 980 12 0 0 0 ODL - Optical Disk Library
ROBOTIC 6 tl8 TL8 806092849 274432 16 4 0 16000 16000 -1 1 0 0 TL8 - Tape Library 8MM
ROBOTIC 7 tl4 TL4 1537 256 12 9 1 15 15 2 0 0 0 TL4 - Tape Library 4MM
ROBOTIC 8 tld TLD 1070594417 507401 13 11 1 32767 32767 -1 3 0 0 TLD - Tape Library DLT
ROBOTIC 10 tsd TSD 201529345 133632 13 11 0 13 14 1 0 0 0 TSD - Tape Stacker DLT
ROBOTIC 11 tsh TSH 50389057 66568 6 6 1 10 10 1 0 0 0 TSH - Tape Stacker Half-inch
ROBOTIC 12 tlh TLH 50389057 66568 6 6 0 0 -1 256 5 4094 0 TLH - Tape Library Half-inch
network protocol error (39)
 
Operating Systems:

Comments 5 CommentsJump to latest comment

watsons's picture

What in the debug dir did you create? The usual one are reqlib, tpcommand, daemon & ltid.

Check vmd & ltid on this media server to see if they goes up and down intermittently, it shouldn't do that but if it did, check the logs above to find out why.

Are you sharing tape devices (SSO) with other media servers?

nbemmcmd -listhosts -verbose from this media server to check if there is any setting/connectivity issue.

Eric Engberg's picture

I created the following directories in the debug directory: acsssi, daemon, ltid, reqlib, robots, tpcommand.  There are no log files created in any of those directories when the problem occurs other than the debug directory and that only contains entries for the 'vmoprcmd -h mediaserver -extall' command with no error or useful informatino to the problem.

The robot and tape drives are local only, not shared.

The output from 'nbemmcmd -listhosts -verbose' looks no different for the media server in question than any other.

watsons's picture

It's unusual you can't see any logs in those debug dirs. Have you got VERBOSE in vm.conf?

Check bp.conf to find out if there is any SERVER or MEDIA_SERVER entries that you can't ping or bpclntcmd to.

Also check this technote: http://www.symantec.com/docs/TECH126537

 

Eric Engberg's picture

This seems to possibly be a network issue somehow.  I see a packet going out of the media server but it doesn't ever appear at the master server.  I'm not sure how running 'tpconfig' first fixes a packet from disappearing.

I'm attempting to engage our network guys and will hopefully be able to dig into this some more tomorrow.

Eric Engberg's picture

This turned out to be a fragmentation problem with an intersite link between data centers that is using a GRE tunnel. One of 2 solutions worked to fix this.  Either lower the MTU on the network or enable tcp_mtu_probing in the linux kernel.

SOLUTION