Netbackup 7.5 - Duplication / Replication Performance
I have a site with a 'strange' requirement (considering multiple 5220 appliances are in place) whereby data is required to be backed up initially to an advanced disk 'landing zone' on the appliance - and subsequently duplicated to de-dup disk on the same appliance and also to SAN attached tape. In addition the de-dup copy is then replicated to a remote 5220 appliance - and further onto tape from there. All very complicated, and some may say unneccessary - but that is the requirement.
After initially 'seeding' the remote appliance on site it was relocated to the DR site, and connected via an 800Mb link. All appeared well.
SLP's are being utilised to manage the backup/duplication/replication of the data.
Approximately 160 servers are being backed up totalling around 28TB of data for a full backup. Full backups are running mainly at weekends, with incrementals running during the week.
Using the standard LIFECYCLE parameters initially there were a high number of duplication and replication jobs running in order to satisfy the requirements. There is a 3 x LTO5 tape library attached via SAN to both of the local appliances - with two paths from each appliance to the 'tape SAN'. One major issue I came accross was the fact that each image being duplicated 'single streams' to a tape drive, and multiplexing is not provided even when configured within storage units.
This resulted in many duplication jobs queueing for tape drives, some for many hours, and jobs failing with various errors including 83/84/190/191.
I have a support case open at present and they recommended changing LIFECYCLE parameters to 'batch' the jobs. This was done by setting the following parameters:
This appeared to result in duplications / replications overrunning in a big way, so I subsequently reduced these to 128GB / 512GB / 60mins.
When I look into the failing duplication logs I see that the jobs have been waiting for a long time for the 'logical' tape drive resources, and when they get the 'logical' resource they then wait for the 'physical' tape resource. When they eventually get a drive, it appears not be allocated correctly and NBU is unable to mount the resource to the tpreq path and therefore when it comes to write to the tpreq location it gets the 83 error. My feeling is that this is a device allocation / SSO issue.
Based on the above, is there any option for 'streamlining' the duplication process ? The duplication step from 'landing zone' to 'de-dup' disk works fine every time - it is the 'landing zone' to tape which has all of the issues. Failing jobs do retry and eventually succeed, but this could take days....
I understand that the data flow is out of the norm, with the advanced disk landing zone (copy 1) duplicating to de-dup disk (copy 2) and tape (copy 3) - then replication to a remote appliance (copy 1 remote) and duplicating to remote tape (copy 2 remote) - but as above this is the specific customer reuirement.
Both Netbackup Masters are at 220.127.116.11, with all 3 appliances at 2.5.1B.
Any thoughts / input appreciated......