Video Screencast Help
Scheduled Maintenance: Symantec Connect is scheduled to be down Saturday, April 19 from 10am to 2pm Pacific Standard Time (GMT: 5pm to 9pm) for server migration and upgrades.
Please accept our apologies in advance for any inconvenience this might cause.

Netbackup and DataDomain 800 errors

Created: 17 Jul 2012 | 3 comments

Hello -

I am wondering if someone has any ideas. I have searched these forums and have not found anything similar to the issue I am having.

Here is my Netbackup setup:

1 Master / media server - Netbackup 6.5.4

1 Media server - Netbackup 6.5.4 (recently added a 2nd media server)

1 DataDomain - DD690

Most of the time everything works fine. But about once or twice per month, my jobs start failing with error 800 - disk volume is down, resource request failed.

The only way I have found to fix the issue is by rebooting all netbackup servers. After that it will work fine for another month or so.

When I get the error, existing jobs will finish without a problem. Only new jobs fail. I am still able to browse and copy files back and forth from the media servers to the DataDomain. I do not believe there is anything wrong with the DD. The error occurs on both media servers when it happens.

I have used the nbdevquery command to force the volume up. This works for about a minute before it goes back down (at least netbackup claims it is down). If I am quick, I can kick a job off after running the nbdevquery and it will complete successfully, even after Netbackup starts reporting the volume down again.

 

Some more info:

My Master and Media servers are all connected to the DataDomain with Cat5 1GB connections.

Additionally, I have 2 LTO4 tape drives connected to each Media server (SAS connected)

I use Vault to run automatic duplications daily. These duplications copy the data from the disk (datadomain) to tape.

 

I suspect the duplications have something to do with the error.  Here is my reason:

About 3 months ago, I added a 3rd media server.  This server was only connected to the DataDomain, no tape drives, no duplications.  It never got the 800 errors.  The other 2 servers continued to get the error 1- 2 times a month but not the new server. 

About 2 weeks ago, I started using this new server to duplicate and last night when the jobs started failing with 800 errors, this server was reporting the volume down as well.

The failures always occur around 4 -5 pm, this is when my duplicate jobs (writing to tape) are just finishing up and my nightly backups jobs (writing to disk) are just starting.

Thanks for any help!

Comments 3 CommentsJump to latest comment

Nicolai's picture

Make sure you got the latest DD OS installed - current version is 5.1.1.0. If you are using the OST plugin upgrade that as well.

Both OST and DD OS is available from my.datadomain.com.

Are you in-control with how may streams you are using on the DD?. A DD890 can process 180 concurrent streams - oversubscribing the DD will result in status 800. 

You can control how may streams by setting "Maximum I/O streams per volume" on the disk pool.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

vedara's picture

Hello 

Have you checked the HBA's connected to the backup servers are compatible with Data Domain end as well ?

 

timko's picture

Nicolai -

Thanks for the info.  I am a revision back on the DD OS so I will look into upgrading.

Unofrtunatly, I do not seem to be in control of the number of streams we use.  I am not using the advanced disk option for the DD.  I am just using the standard "Disk" option for the storage unit type. I cannot find anyplace to limit the number of streams using this setup.  It seems to be available with the advanced disk option only.

I will also be setting up Advanced Disk on one of my media servers to see if it helps.

 

Vendara - Thanks.  Our HBAs are compatible with the Data Domain.