Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Several issues on 5.1 mp7....

Updated: 23 May 2010 | 13 comments
Joe Despres's picture
0 0 Votes
Login to vote

Master = Solaris 9, NBU 5.1 mp7

Patched environment to mp7 in preperation of migrating to 6.5.4:

Getting server failures:

1::  Jobs queueing up and staying queued with tape drives available...

2::  Getting error 213's on several media servers....

3::  bpdbm initiates every 20 mins. then dies

The following messages come up every 20 mins:
1258109653 1 2 4 cscnb01 0 0 0 *NULL* bpdbm INITIATING bpdbm: NetBackup 5.1 2008032813 on cscnb01 IDIRSTRUCT=2 (VERBOSE = 0)
1258109653 1 2 32 cscnb01 0 0 0 *NULL* bpdbm cannot get bound socket: Address already in use (125)
1258109653 1 2 4 cscnb01 0 0 0 *NULL* bpdbm bpdbm TERMINATED

Surpisingly enough this is the same time frame that the bpsched wakes up ......

I would like to clean this up prior to the upgrade.....

any idea's....

Thanks

Joe Despres

discussion Filed Under:

Comments

Marianne van den Berg's picture
13
Nov
2009
0 Votes 0
Login to vote

All the old 'bad memories'

All the old 'bad memories' are coming back...
1st of all, decrease tcp_time_wait_interval: seer.entsupport.symantec.com/docs/230050.htm

Next, work through these 2 TechNotes :
http://seer.support.veritas.com/docs/237534.htm

seer.entsupport.symantec.com/docs/264705.htm

ALL of these issues should motivate you to upgrade! No more bpsched in 6.x!

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Joe Despres's picture
13
Nov
2009
0 Votes 0
Login to vote

Will check out..

I'll generate documentation to make a few changes...

I've only gottent these issues after I patched to MP7!

Thanks!

Joe Despres

Joe Despres's picture
15
Nov
2009
0 Votes 0
Login to vote

tcp_close_wait_interval

hhhmmm....  my tp_close_wait_interval is set to 1000....  I wonder if I need to dump back to 60000?

Joe Despres's picture
16
Nov
2009
0 Votes 0
Login to vote

Here's another issue within this issue:

Here's another issue.....

Running this command does not show all errors!
/usr/openv/netbackup/bin/admincmd/bperror -by_statcode -U -backstat

We are getting swamped with error 213's in the gui....

Yet that commnd doesn't show them!

-* sigh *-

Joe Despres

Marianne van den Berg's picture
16
Nov
2009
0 Votes 0
Login to vote

Status 213 means NO stu's are

Status 213 means NO stu's are available for use...
This could mean network comms problems to media servers.
Check bpsched log for evidence of the master connecting to media servers and counting UP drives.
Look for something like this:
09:12:54.546 [1556574] <2> nb_getsockconnected: host=mediaserver service=bpcd address=mediaserver-ip protocol=tcp reserved port=13782
09:12:54.546 [1556574] <2> nb_getsockconnected: Connect to mediaserver on port 544
09:12:54.546 [1556574] <2> logconnections: BPCD CONNECT FROM master-ip.544 TO mediaserver-ip.13782
09:12:54.854 [1556574] <2> start_bptm: /usr/openv/netbackup/bin/bptm bptm -count -cmd -rt 8 -rn 7 -stunit stunit-name -den 20 -mt 2 -masterversion 510000
09:12:54.854 [1556574] <2> start_bptm: Received BPCD success message
09:12:54.962 [1556574] <2> get_num_avail_drives: NUM UP 2 0 0 0 2 0 Drive0 Drive1
09:12:54.962 [1556574] <2> ?:       available drives = 2, shared drives = 0, allow_mult_ret = 0

Verify that you have the CLEAN_IN_BACKGROUND and DISABLE_COUNTMEDIA touch-files in place.
According to the TechNote the DISABLE_COUNTMEDIA problem was fixed in 4.5, but it re-appeared in one of the 5.1 patches...
What was your patch level prior to MP7?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Joe Despres's picture
16
Nov
2009
0 Votes 0
Login to vote

We were at mp5...

We were at MP5 prior to patching...  I have put in for a change of the following:

1:: Create the following touch file on cscnb01...
/usr/openv/netbackup/DISABLE_COUNTMEDIA

2:: Rename the following file....
/usr/openv/netbackup/bin/backup_exit_notify

3:: Change prep level from 4 hrs to 8 hrs....
bpconfig -prep 8

4:: Add the following to bp.conf on cscnb01.....
BPTM_QUERY_TIMEOUT = 60

5:: Change the following entry in bp.conf on cscnb01.....
CLIENT_CONNECT_TIMEOUT = 60

6:: Remove any server(s) from bp.conf & vm.conf that are gone!
for all the servers in the env. Bot Master & Media servers

7:: Need to set the following tcp variables:

ndd -set  /dev/tcp tcp_xmit_hiwat   262144
ndd -set  /dev/tcp tcp_recv_hiwat  262144
ndd -set /dev/tcp tcp_time_wait_interval 10000

Note:  These setting need to survive a reboot!
Note:  tcp_time_wait_interval is currently set to 1000 (1 sec).  Is this OK?

8:: Need to add the following to  "/etc/system" file:

set sq_max_size=100

One of the many thing I have notcied:  Running support on the media servers...  getting the Media info hangs or takes a long time!!!

Thanks....

Joe Despres

Marianne van den Berg's picture
16
Nov
2009
0 Votes 0
Login to vote

Master server model? CPU?

Master server model? CPU? Memory? Seems your master is taking serious strain...
Size of environment? Amount of media servers and stunits? # of records in volDB? # of images? Master also media server?
Maybe a good idea to re-read the Planning & Performance Tuning Guide before upgrading to 6.x...
Also not sure why you want to change tcp_xmit_hiwat and tcp_recv_hiwat to 256 when TechNote 264705 recommends 64k?
We have always stayed with the recomended tcp_close_wait_interval of  60000.
To survive reboot, add new tcp settings to  /etc/rc2.d/S69inet.

Have you had a look at bpsched log yet to see if all Media Servers (for which stunits are configured) are successfully contacted to count UP drives?
To force a drive count on all media servers, issue following 2 commands:
bpschedreq  -read_stunits
bpschedreq  -read_stu_config

Check bottom of bpsched log for connection to media servers and drive counts.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Joe Despres's picture
17
Nov
2009
0 Votes 0
Login to vote

Master = v440, 8gig, 4

Master = v440, 8gig, 4 cpu's

All drives are presently up.....

The 262144 matches the block size of the tape drives...

Thanks....

Joe Despres

Joe Despres's picture
19
Nov
2009
0 Votes 0
Login to vote

Found the solution on this issue...

One of my Co-workers found this:

http://seer.entsupport.symantec.com/docs/317014.htm

This matches one of the issues we have to the letter...

We also tuned bpsched a bit as well....  with the settings as noted in this discussion..

Still in monitoring phase....  If it goes thru the weekend without a hit .... 

Then we will be free and clear....

Thanks....

Joe Despres

Joe Despres's picture
22
Nov
2009
0 Votes 0
Login to vote

Bummer

Looks like 2 out of 3 are solved......

2 & 3 were solved when we did the following:

1:: Create the following touch file on the master...
/usr/openv/netbackup/DISABLE_COUNTMEDIA

2:: Rename the following file....
/usr/openv/netbackup/bin/backup_exit_notify

3:: Change prep level from 4 hrs to 8 hrs....
bpconfig -prep 8

4:: Add the following to bp.conf on the master.....
BPTM_QUERY_TIMEOUT = 60

5:: Change the following entry in bp.conf on the master.....
CLIENT_CONNECT_TIMEOUT = 60

6:: Remove any server(s) from bp.conf & vm.conf that are gone!
for all the servers in the env. Both Master & Media servers...

7::  Added the following variable in bp.conf on the master...
TIMEOUT_IN_QUEUE = 72000

Issue #1 is still in play.....  I do have a work around.  I put in a script in the cron
on the master to run the following command once a hr:

/usr/openv/netbackup/bin/admincmd/bpschedreq -read_stunits

I guess at this point....  Can I still upgrade to 6.5.4 even with this one issue?
It's actually only affecting several STU's....  the new ones plus a virtual STU..

I created the Virtual Storage Unit with the help of this doc:
ftp://exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/276122.pdf

Thanks..........

Joe Despres

Marianne van den Berg's picture
22
Nov
2009
0 Votes 0
Login to vote

Ensure than nbcc runs without

Ensure than nbcc runs without any inconsistencies. If there are any problems with NBU databases, nbcc will point it out. Also confirm compatibilities and system resources.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Marianne van den Berg's picture
23
Nov
2009
0 Votes 0
Login to vote

One more thing: the document

One more thing: the document that you've used to create stu's for virtual servers contains the following warning:
NOTE: It is best to allocate specific tape devices for this configuration, as to avoid storage unit over-commit.

NBU 6.x is cluster-aware and resource allocation works a lot better. This TechNote explains how it's done in 6.x: seer.entsupport.symantec.com/docs/285451.htm

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Joe Despres's picture
23
Nov
2009
0 Votes 0
Login to vote

NBCC checked out OK...

According to Symantec.....  Output from NBCC is OK.... 

Well there's some frozen turkey there  :) ....  But I'm sure that can be taken care off..

Thanks!

Joe Despres