Unix box and 41 error
I have a Unix client that is failing with a 41 error. Following the troubleshooting guide, we created the log folder for bpbkar and set VERBOSE in bp.conf. When it failed, we looked at the end of the log and this is what we found:
I had the Unix team put the LOCKED_FILE_ACTION = SKIP in bp.conf just to see what happened
I am not a Unix person, so I am not sure what any of this is. This the particulars
Master and Media servers running Windows 2003 and 6.5.2A
Unix client running 6.5
As I was typing this it failed again. The next step in troubleshooting is to increase the CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT. They are set at default at the moment (300 seconds). Does anyone have a suggestion as to what I should set them to?
Thanks so much for your help.
Dan Seymour
Comments
error 41
Dan
Error 41 and exit status 40 are almost always network problems. Check your setting (auto/fixed/full Duplex, drivers, etc)
check with netstat if you see any errors.
Client read timeout is only necessary to change in rare cases
Peter
I agree
Make sure that the port on the switch is set the same as the nic on the server.
I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows
I have an updated on this. We
I have an updated on this. We checked the netword ports and they seem to be fine. The box has 4 network ports grouped with 1 IP address.
This job always fails on the same directory. We put the directory on the exclude list and it ran fine. When we took it off the exclude list, it failed again. While talking to Sys. Support, they told me that this directory has 130,000 files in it. We increased the timeouts but it still failed.
Thanks,
Dan
LOL, my read timeout is 19200!
My client read timeout is 19200.
I have several large directories (larger than yours) and this helps.
NBU 7.0.1 on Solaris 10
writing to EMC 4206 VTL
duplicating to LTO2 in SL8500
(Soon to be LTO5)
using ACSLS 7.3.1
Increase client timeout
I have my set to like 30000.
Are you doing all local drives as one stream or as multi stream?
If using multi stream can you check to see how many jobs are running on the server at once.
If you have too many, and one of them is this big dir, it may get a little overloaded and not be able to respond quick enought for NB to know the server is still up. If that is the case you can limit the number of jobs allowed to run on the server at once just so it does not get bogged down.
I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows
Please excuse my ignorance of
Please excuse my ignorance of UNIX speak. This is one of three boxes that have many mount points. We have Cross Mount Points checked and use exclude lists on each. We excluded the directory on the box that was failing and included it on another box. the failure followed the directory. I'm being told that the client timeout is not the problem by the engineer. From what I'm reading here, I should increase it even more than I have. The highest I went was 2500.
Question, could this be a memory problem? The file list is too big to fit in cache?
Thanks for your help on this problem.
Dan Seymour
exit status = 40: network connection broken
"We excluded the directory on the box that was failing and included it on another box. the failure followed the directory"
Which server owns the data. Back it up from there rather than a remote mount.
~ Bill
I agree
backing up across mount points means you have to
from nb media
lan
to client
lan
to server it is mounted from
lan
get the data
lan
back to client
lan
back to media server
lan
to tape
heavy traffic on lan can slow you down.
if the client that owns the data is slow it will slow you down.
If you can best to back it up from the original owner rather then the mount point.
I don't have to know how to spell....I work on Unix.
NetBackup 7.0.1 - AIX & Windows
I'm tired now!
;)
Regards Andy
"It's not too late to panic ..."
Same error but only on "/" directive.
I know it's not a port or NIC setting error because all of the other directives backup successfully. The "/" directive gets to 2 to 2.5 MB each time, exactly 250 files, and then slows down to less than 10k/sec before failing with a Status 41. Each time the backup fails around the 5 minute mark. I have tried running that job by itself with the same results.
I don't know a thing about unix other than knowing how to spell it. What is in the "/" selection? Can I exclude it?
Thanks,
Randy
"/" is your root filesystem
which may be small but if your Netbackup policy has Cross mount points selected that means "everything" from root on down.
~ Bill
These are my
These are my directives
/
/opt
/var
NEW_STREAM
/home
/usr
/stand
/tmp
Allow multiple data streams is checked and Cross mount points is not checked.
/ /opt /var /home /usr
/ /opt /var /home /usr are typically on separate filesystems; that's fine.
/stand must be something unique to your system; that's fine.
I have never - I mean NEVER - seen /tmp in a backup policy since, by definition, those are temporary files. Unless someone can explain good reason (oxymoron?) to keep that I would remove it and add it instead to Netbackup Exclude list.
Back to your root cause (pun intended). Normally "/" would have a few (dozen?) small files, and mountpoints for the directories mentioned above. I would have your server admin check it out to see why you apparently have hundreds of files adding up to megabytes in the root filesystem. That is usually bad form in unix.
~ Bill
Thank you Bill. Now that you
Thank you Bill. Now that you mentioned the number of files under /, that seems to ring a bell with an issue I had a few years back. I have slept (and drank) a lot since then so who knows.
I really appreciate the help.
p.s. And I'm getting rid of the tmp directive. All temp files are excluded from my Windows servers so this only makes sense.
Update...
I received this from my Unix guy.
We have upgraded OS from 11.31 v3 to 11.31 v5.
This issue may or may not related to OS upgrade.
Would you like to reply?
Login or Register to post your comment.