Several issues on 5.1 mp7....
Master = Solaris 9, NBU 5.1 mp7
Patched environment to mp7 in preperation of migrating to 6.5.4:
Getting server failures:
1:: Jobs queueing up and staying queued with tape drives available...
2:: Getting error 213's on several media servers....
3:: bpdbm initiates every 20 mins. then dies
The following messages come up every 20 mins:
1258109653 1 2 4 cscnb01 0 0 0 *NULL* bpdbm INITIATING bpdbm: NetBackup 5.1 2008032813 on cscnb01 IDIRSTRUCT=2 (VERBOSE = 0)
1258109653 1 2 32 cscnb01 0 0 0 *NULL* bpdbm cannot get bound socket: Address already in use (125)
1258109653 1 2 4 cscnb01 0 0 0 *NULL* bpdbm bpdbm TERMINATED
Surpisingly enough this is the same time frame that the bpsched wakes up ......
I would like to clean this up prior to the upgrade.....
any idea's....
Thanks
Joe Despres
Comments
All the old 'bad memories'
All the old 'bad memories' are coming back...
1st of all, decrease tcp_time_wait_interval: seer.entsupport.symantec.com/docs/230050.htm
Next, work through these 2 TechNotes :
http://seer.support.veritas.com/docs/237534.htm
seer.entsupport.symantec.com/docs/264705.htm
ALL of these issues should motivate you to upgrade! No more bpsched in 6.x!
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
Will check out..
I'll generate documentation to make a few changes...
I've only gottent these issues after I patched to MP7!
Thanks!
Joe Despres
tcp_close_wait_interval
hhhmmm.... my tp_close_wait_interval is set to 1000.... I wonder if I need to dump back to 60000?
Here's another issue within this issue:
Here's another issue.....
Running this command does not show all errors!
/usr/openv/netbackup/bin/admincmd/bperror -by_statcode -U -backstat
We are getting swamped with error 213's in the gui....
Yet that commnd doesn't show them!
-* sigh *-
Joe Despres
Status 213 means NO stu's are
Status 213 means NO stu's are available for use...
This could mean network comms problems to media servers.
Check bpsched log for evidence of the master connecting to media servers and counting UP drives.
Look for something like this:
09:12:54.546 [1556574] <2> nb_getsockconnected: host=mediaserver service=bpcd address=mediaserver-ip protocol=tcp reserved port=13782
09:12:54.546 [1556574] <2> nb_getsockconnected: Connect to mediaserver on port 544
09:12:54.546 [1556574] <2> logconnections: BPCD CONNECT FROM master-ip.544 TO mediaserver-ip.13782
09:12:54.854 [1556574] <2> start_bptm: /usr/openv/netbackup/bin/bptm bptm -count -cmd -rt 8 -rn 7 -stunit stunit-name -den 20 -mt 2 -masterversion 510000
09:12:54.854 [1556574] <2> start_bptm: Received BPCD success message
09:12:54.962 [1556574] <2> get_num_avail_drives: NUM UP 2 0 0 0 2 0 Drive0 Drive1
09:12:54.962 [1556574] <2> ?: available drives = 2, shared drives = 0, allow_mult_ret = 0
Verify that you have the CLEAN_IN_BACKGROUND and DISABLE_COUNTMEDIA touch-files in place.
According to the TechNote the DISABLE_COUNTMEDIA problem was fixed in 4.5, but it re-appeared in one of the 5.1 patches...
What was your patch level prior to MP7?
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
We were at mp5...
We were at MP5 prior to patching... I have put in for a change of the following:
1:: Create the following touch file on cscnb01...
/usr/openv/netbackup/DISABLE_COUNTMEDIA
2:: Rename the following file....
/usr/openv/netbackup/bin/backup_exit_notify
3:: Change prep level from 4 hrs to 8 hrs....
bpconfig -prep 8
4:: Add the following to bp.conf on cscnb01.....
BPTM_QUERY_TIMEOUT = 60
5:: Change the following entry in bp.conf on cscnb01.....
CLIENT_CONNECT_TIMEOUT = 60
6:: Remove any server(s) from bp.conf & vm.conf that are gone!
for all the servers in the env. Bot Master & Media servers
7:: Need to set the following tcp variables:
ndd -set /dev/tcp tcp_xmit_hiwat 262144
ndd -set /dev/tcp tcp_recv_hiwat 262144
ndd -set /dev/tcp tcp_time_wait_interval 10000
Note: These setting need to survive a reboot!
Note: tcp_time_wait_interval is currently set to 1000 (1 sec). Is this OK?
8:: Need to add the following to "/etc/system" file:
set sq_max_size=100
One of the many thing I have notcied: Running support on the media servers... getting the Media info hangs or takes a long time!!!
Thanks....
Joe Despres
Master server model? CPU?
Master server model? CPU? Memory? Seems your master is taking serious strain...
Size of environment? Amount of media servers and stunits? # of records in volDB? # of images? Master also media server?
Maybe a good idea to re-read the Planning & Performance Tuning Guide before upgrading to 6.x...
Also not sure why you want to change tcp_xmit_hiwat and tcp_recv_hiwat to 256 when TechNote 264705 recommends 64k?
We have always stayed with the recomended tcp_close_wait_interval of 60000.
To survive reboot, add new tcp settings to /etc/rc2.d/S69inet.
Have you had a look at bpsched log yet to see if all Media Servers (for which stunits are configured) are successfully contacted to count UP drives?
To force a drive count on all media servers, issue following 2 commands:
bpschedreq -read_stunits
bpschedreq -read_stu_config
Check bottom of bpsched log for connection to media servers and drive counts.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
Master = v440, 8gig, 4
Master = v440, 8gig, 4 cpu's
All drives are presently up.....
The 262144 matches the block size of the tape drives...
Thanks....
Joe Despres
Found the solution on this issue...
One of my Co-workers found this:
http://seer.entsupport.symantec.com/docs/317014.htm
This matches one of the issues we have to the letter...
We also tuned bpsched a bit as well.... with the settings as noted in this discussion..
Still in monitoring phase.... If it goes thru the weekend without a hit ....
Then we will be free and clear....
Thanks....
Joe Despres
Bummer
Looks like 2 out of 3 are solved......
2 & 3 were solved when we did the following:
1:: Create the following touch file on the master...
/usr/openv/netbackup/DISABLE_COUNTMEDIA
2:: Rename the following file....
/usr/openv/netbackup/bin/backup_exit_notify
3:: Change prep level from 4 hrs to 8 hrs....
bpconfig -prep 8
4:: Add the following to bp.conf on the master.....
BPTM_QUERY_TIMEOUT = 60
5:: Change the following entry in bp.conf on the master.....
CLIENT_CONNECT_TIMEOUT = 60
6:: Remove any server(s) from bp.conf & vm.conf that are gone!
for all the servers in the env. Both Master & Media servers...
7:: Added the following variable in bp.conf on the master...
TIMEOUT_IN_QUEUE = 72000
Issue #1 is still in play..... I do have a work around. I put in a script in the cron
on the master to run the following command once a hr:
/usr/openv/netbackup/bin/admincmd/bpschedreq -read_stunits
I guess at this point.... Can I still upgrade to 6.5.4 even with this one issue?
It's actually only affecting several STU's.... the new ones plus a virtual STU..
I created the Virtual Storage Unit with the help of this doc:
ftp://exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/276122.pdf
Thanks..........
Joe Despres
Ensure than nbcc runs without
Ensure than nbcc runs without any inconsistencies. If there are any problems with NBU databases, nbcc will point it out. Also confirm compatibilities and system resources.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
One more thing: the document
One more thing: the document that you've used to create stu's for virtual servers contains the following warning:
NOTE: It is best to allocate specific tape devices for this configuration, as to avoid storage unit over-commit.
NBU 6.x is cluster-aware and resource allocation works a lot better. This TechNote explains how it's done in 6.x: seer.entsupport.symantec.com/docs/285451.htm
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
NBCC checked out OK...
According to Symantec..... Output from NBCC is OK....
Well there's some frozen turkey there :) .... But I'm sure that can be taken care off..
Thanks!
Joe Despres
Would you like to reply?
Login or Register to post your comment.