Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Unix box and 41 error

Updated: 11 Aug 2010 | 15 comments
smwoodcrafts's picture
0 0 Votes
Login to vote

I have a Unix client that is failing with a 41 error. Following the troubleshooting guide, we created the log folder for bpbkar and set VERBOSE in bp.conf. When it failed, we looked at the end of the log and this is what we found:

14:37:00.461 [25159] <16> bpbkar sighandler: ERR - bpbkar killed by SIGPIPE
14:37:00.461 [25159] <2> bpbkar sighandler: INF - ignoring additional SIGPIPE signals
14:37:00.461 [25159] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 40: network connection broken
14:37:00.461 [25159] <4> bpbkar Exit: INF - EXIT STATUS 40: network connection broken
14:37:00.461 [25159] <2> bpbkar Exit: INF - Close of stdout complete
14:37:00.462 [25159] <4> bpbkar Exit: INF - setenv FINISHED=0

I had the Unix team put the  LOCKED_FILE_ACTION = SKIP in bp.conf just to see what happened
I am not a Unix person, so I am not sure what any of this is. This the particulars

Master and Media servers running Windows 2003 and 6.5.2A

Unix client running 6.5

As I was typing this it failed again. The next step in troubleshooting is to increase the CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT. They are set at default at the moment (300 seconds). Does anyone have a suggestion as to what I should set them to?

Thanks so much for your help.

Dan Seymour

Comments

Peter Jakobs's picture
19
Jan
2010
1 Vote +1
Login to vote

error 41

Dan
Error 41 and exit status 40 are almost always network problems. Check your setting (auto/fixed/full Duplex, drivers, etc)
check with netstat if you see any errors.
Client read timeout is only necessary to change in rare cases

Peter

J.Hinchcliffe's picture
19
Jan
2010
0 Votes 0
Login to vote

I agree

Make sure that the port on the switch is set the same as the nic on the server.

I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows

smwoodcrafts's picture
29
Jan
2010
0 Votes 0
Login to vote

I have an updated on this. We

I have an updated on this. We checked the netword ports and they seem to be fine. The box has 4 network ports grouped with 1 IP address.

This job always fails on the same directory. We put the directory on the exclude list and it ran fine. When we took it off the exclude list, it failed again. While talking to Sys. Support, they told me that this directory has 130,000 files in it. We increased the timeouts but it still failed.

Thanks,

Dan

David McMullin's picture
29
Jan
2010
1 Vote +1
Login to vote

LOL, my read timeout is 19200!

My client read timeout is 19200.

I have several large directories (larger than yours) and this helps.

NBU 7.0.1 on Solaris 10
writing to EMC 4206 VTL
duplicating to LTO2 in SL8500
(Soon to be LTO5)
using ACSLS 7.3.1

J.Hinchcliffe's picture
29
Jan
2010
1 Vote +1
Login to vote

Increase client timeout

I have my set to like 30000.

Are you doing all local drives as one stream or as multi stream?

If using multi stream can you check to see how many jobs are running on the server at once.

If you have too many, and one of them is this big dir, it may get a little overloaded and not be able to respond quick enought for NB to know the server is still up.  If that is the case you can limit the number of jobs allowed to run on the server at once just so it does not get bogged down.

I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows

smwoodcrafts's picture
02
Feb
2010
0 Votes 0
Login to vote

Please excuse my ignorance of

Please excuse my ignorance of UNIX speak. This is one of three boxes that have many mount points. We have Cross Mount Points checked and use exclude lists on each. We excluded the directory on the box that was failing and included it on another box. the failure followed the directory. I'm being told that the client timeout is not the problem by the engineer. From what I'm reading here, I should increase it even more than I have. The highest I went was 2500.

Question, could this be a memory problem? The file list is too big to fit in cache?

Thanks for your help on this problem.

Dan Seymour

wrobbins's picture
02
Feb
2010
2 Votes +2
Login to vote

exit status = 40: network connection broken

"We excluded the directory on the box that was failing and included it on another box. the failure followed the directory"

 Which server owns the data.  Back it up from there rather than a remote mount.

~ Bill

J.Hinchcliffe's picture
02
Feb
2010
2 Votes +2
Login to vote

I agree

backing up across mount points means you have to

from nb media
lan
to client
lan
to server it is mounted from
lan
get the data
lan
back to client
lan
back to media server
lan
to tape

heavy traffic on lan can slow you down.
if the client that owns the data is slow it will slow you down.

If you can best to back it up from the original owner rather then the mount point.

I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows

Andy Welburn's picture
03
Feb
2010
0 Votes 0
Login to vote

I'm tired now!

;)

Regards Andy

"It's not too late to panic ..."

rsamora@eprod.com's picture
09
Feb
2010
0 Votes 0
Login to vote

Same error but only on "/" directive.

I know it's not a port or NIC setting error because all of the other directives backup successfully.  The "/" directive gets to 2 to 2.5 MB each time, exactly 250 files, and then slows down to less than 10k/sec before failing with a Status 41.  Each time the backup fails around the 5 minute mark.  I have tried running that job by itself with the same results.

I don't know a thing about unix other than knowing how to spell it.  What is in the "/" selection?  Can I exclude it?

Thanks,
Randy

wrobbins's picture
09
Feb
2010
0 Votes 0
Login to vote

"/" is your root filesystem

which may be small but if your Netbackup policy has Cross mount points selected that means "everything" from root on down.

~ Bill

rsamora@eprod.com's picture
09
Feb
2010
0 Votes 0
Login to vote

These are my

These are my directives

/
/opt
/var
NEW_STREAM
/home
/usr
/stand
/tmp

Allow multiple data streams is checked and Cross mount points is not checked.

wrobbins's picture
09
Feb
2010
0 Votes 0
Login to vote

/  /opt  /var  /home  /usr 

/  /opt  /var  /home  /usr  are typically on separate filesystems; that's fine.
/stand must be something unique to your system; that's fine.

I have never - I mean NEVER - seen /tmp in a backup policy since, by definition, those are temporary files.  Unless someone can explain good reason (oxymoron?) to keep that I would remove it and add it instead to Netbackup Exclude list. 

Back to your root cause (pun intended).  Normally "/" would have a few (dozen?) small files, and mountpoints for the directories mentioned above.  I would have your server admin check it out to see why you apparently have hundreds of files adding up to megabytes in the root filesystem.  That is usually bad form in unix.

~ Bill

rsamora@eprod.com's picture
09
Feb
2010
1 Vote +1
Login to vote

Thank you Bill.  Now that you

Thank you Bill.  Now that you mentioned the number of files under /, that seems to ring a bell with an issue I had a few years back.  I have slept (and drank) a lot since then so who knows.

I really appreciate the help.

p.s.  And I'm getting rid of the tmp directive.  All temp files are excluded from my Windows servers so this only makes sense.

rsamora@eprod.com's picture
10
Feb
2010
0 Votes 0
Login to vote

Update...

I received this from my Unix guy.

We have upgraded OS from 11.31 v3 to 11.31 v5.

This issue may or may not related to OS upgrade.