6.0 MP6 Backup Problem
I have a AIX 5.3 master running 6.0 MP6. One of my SAN Media servers is a Solaris 9 box also running 6.0 MP6. The largest backup on this client is 4+TB and it fails EVERY single week. I set the checkpoints to 15 mins so when it restarts it doesn't have too far back to go. Some weeks I get one status code (50), some weeks I get 2 or 3.
08/17/2009 00:19:45 - current media -- complete, awaiting next media Any. Waiting for resources.
Reason: Drives are in use, Media server: xxxxxxxxxxxxx,
Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,
Volume Pool: MainPool, Storage Unit: xxxxxxxxxxxxx-hcart2-robot-tld-0, Drive Scan Host: N/A
08/17/2009 00:19:52 - granted resource XXXXXX
08/17/2009 00:19:52 - granted resource IBM.ULT3580-TD2.003
08/17/2009 00:19:52 - granted resource xxxxxxxxxxxxx-hcart2-robot-tld-0
08/17/2009 00:19:53 - end writing; write time: 4:48:08
client process aborted (50)
It ALWAYS fails during a tape change but not every tape change. It will go through 5-6 tapes then fail.
From the bpbkar log:
00:19:50.197 [15736] <16> bpbkar shm_addr: ERR - bpbkar exiting because backup is aborting
00:19:50.197 [15736] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 40: network connection broken
00:19:50.197 [15736] <4> bpbkar Exit: INF - EXIT STATUS 40: network connection broken
00:19:50.201 [15736] <4> bpbkar Exit: INF - setenv FINISHED=0
None of my other SAN Media servers or clients do this. The data is going from the client straight to the tape drives. I'm beginning to wonder if I have an issue between this client and the SAN. I will be upgrading to 6.5.4 next week and pretty much expect this issue to follow me.
Comments
What sort of backup is that.
What sort of backup is that. I mean that checkpoint won't help you if it's Oracle or MS SQL backup.
Have you tried to add VERBOSE to vm.conf ob thta media server and check MM logs?
It's a regular standard
It's a regular standard backup backing up rman files. I set the checkpoints so when it fails and restarts it doesn't restart from the beginning. I average 3 retries on this backup.
Ok, consider increasing
Ok, consider increasing CLIENT_READ_TIMEOUT on that media server.
Next, check if settings of TCP port ranges are the same as on the rest of your media servers. I reckon you might run out of TCP connections due to short range of TCP ports used by the media server in question.
http://seer.entsupport.symantec.com/docs/317763.htm
Details:
When running a backup, a network connection broken error occurs.
bpbkar FATAL exit status = 40: network connection broken
Facing "Network Connection Error" when trying to perform full backup of a mount point which was of large size.
Enable the logging with higher verbosity, collect the logs as given below:
http://seer.entsupport.symantec.com/docs/317763.htm
good Will backing-up
One more thng i almost
One more thng i almost missed. As it is SAN media server then LAN is excluded from backup. Another issue is shared memory on the SAN media server. Please check bpbkar logs and check if you have shared memory related error. Perhaps you should tune /etc/sytem on that box if it'sSolaris 8 or 9, or build a new project if it is Solaris 10.
6.0 MP6 Backup Problem
i appreciate everyone's input. I wanted to wait until I had at least 2 backups before replying. I changed CLIENT_READ_TIMEOUT from 300 to 9600 and the last 2 backups have been successful from start to end, to timeouts or retries needed. Thanks again.
I forgot to add I am now on
I forgot to add I am now on 6.5.4 :-)
Would you like to reply?
Login or Register to post your comment.