Netbackup and DataDomain 800 errors
I am wondering if someone has any ideas. I have searched these forums and have not found anything similar to the issue I am having.
Here is my Netbackup setup:
1 Master / media server - Netbackup 6.5.4
1 Media server - Netbackup 6.5.4 (recently added a 2nd media server)
1 DataDomain - DD690
Most of the time everything works fine. But about once or twice per month, my jobs start failing with error 800 - disk volume is down, resource request failed.
The only way I have found to fix the issue is by rebooting all netbackup servers. After that it will work fine for another month or so.
When I get the error, existing jobs will finish without a problem. Only new jobs fail. I am still able to browse and copy files back and forth from the media servers to the DataDomain. I do not believe there is anything wrong with the DD. The error occurs on both media servers when it happens.
I have used the nbdevquery command to force the volume up. This works for about a minute before it goes back down (at least netbackup claims it is down). If I am quick, I can kick a job off after running the nbdevquery and it will complete successfully, even after Netbackup starts reporting the volume down again.
Some more info:
My Master and Media servers are all connected to the DataDomain with Cat5 1GB connections.
Additionally, I have 2 LTO4 tape drives connected to each Media server (SAS connected)
I use Vault to run automatic duplications daily. These duplications copy the data from the disk (datadomain) to tape.
I suspect the duplications have something to do with the error. Here is my reason:
About 3 months ago, I added a 3rd media server. This server was only connected to the DataDomain, no tape drives, no duplications. It never got the 800 errors. The other 2 servers continued to get the error 1- 2 times a month but not the new server.
About 2 weeks ago, I started using this new server to duplicate and last night when the jobs started failing with 800 errors, this server was reporting the volume down as well.
The failures always occur around 4 -5 pm, this is when my duplicate jobs (writing to tape) are just finishing up and my nightly backups jobs (writing to disk) are just starting.
Thanks for any help!