Video Screencast Help

Protecting an 11TB NetApp Volume

Created: 31 Jul 2013 • Updated: 01 Aug 2013 | 12 comments's picture
This issue has been solved. See solution.

I've seen some older discussions on this topic but I'm asking to see if anyone has any new suggestions.

Master Server - Windows Server 2008 R2

The data is on a NetApp device with tape drives directly connected and I'm using NDMP backup policies.  The volume in question is 11TB and growing, and contains (are you ready for it) millions of small files. 

Depending on which device I use, D2D or VTL, my backups range anywhere from 26 hours to 50 hours.  I'm tasked with replicating everything over to the DR site within 24 hours.  Obviously, if I have a backup that runs 26+ hours, I'm going to miss that 24 hour target.  I want to retain the ability to recover single files.

Any recommendations?  I don't believe backups were ever meant to protect that much data in a single volume without using some kind of snapshot technology which would elminate the ability to restore single files.

Any suggestions (or other job opportunities depending on how this goes) would be greatly appreciated.

Operating Systems:

Comments 12 CommentsJump to latest comment

teiva-boy's picture

If you can move to a LUN hosted off a server and not CIFS/NFS off your filer, you could use an Enterprise Client and FlashBackup (Or is it FlashSnap?).  Thats about your only realistic option short of not backing it up and using a snap and replicate strategy.

I do though have a customer that does snap and replicate, and backs up on the remote end, even though it takes 6 days for a full backup to be completed.  They have 500+ Million FIles, use NBU and way too much NDMP to be comfortable about the recovery aspect of it...

There is an online portal, save yourself the long hold times. Create ticket online, then call in with ticket # in hand :-) "We backup data to restore, we don't backup data just to back it up."

AAlmroth's picture

Today, with the growing size of disk space used on filers, we see increase in other methods of data protection than backup to tape. In the case of NBU and NetApp, you could use NBU Replication Director to control SnapVault replication to secondary/n-ary filer, and to NBU storage units. Duplication to tape is too heavy on most filers, so this is seldom a feasable option when talking about many TB and millions of files.

Your other option, if your setup allows, is to use the block level backup of NetApp volumes, more specifically SM2T (SnapMirror to tape) is one of the most efficient methods to send a lot of data to tape. This method is not geared to single-file restore scenarios though, as you would have to restore a whole volume to get the files you want.

I would recommend 2nd and 3rd copy of your data on other NetApp filers, data movement perhaps controlled by NBU. You can then spread your tape backups (if still a requirement), on multiple filers.


SOLUTION's picture

Both of those responses were what I expected.  As we have learned in the backup world, you can run the most efficient backups possible but nothing matters if you can't restore it.  Everyone is thinking "How can we back this up faster?," with no one considering, "How would we restore this quickly?"

Thanks again.

Omar Villa's picture

The number of files is not an issue for NDMP, what I will look at is to the VTL's I'm sure they are shared with all your other clients sharing the same pipe and usually customers thing that going to disk is faster but not if you have 100's of clients running their backups at the same time.

1. confirm the SIZE_DATA_BUFFERS_NDMP is there with 256KB sizes, small blocks can slow down throughput.

2. No multiplexing to the VTL, be sure you remove this in case is there

3. What is total bandwidth you have in your VTL if is Falconstor normally is 600MB/sec if is DataDomain around 1200 when is configured as VTL, you can use Falconstor tools as "ismon -d 1" to see the actual throughput.

4. Using NBU Replicator Director is a good option but costs money, but is a great option

Share what you find and maybe we can help you to do a little tuning.


Omar Villa

Netbackup Expert

Twiter: @omarvillaNBU's picture

Correction.  I do see the information for the NDMP backups and I am using 30 data buffers and 256k data buffer size.

jim dalton's picture

Nice problem.

I think filer-filer replication is about the size of it.

Forget sm2tape,,,or indeed try it: you can use netbackup to do smtape, hopefully a search will show you need to provide it as a directive in the file selection.

Then forget about it: slow , despite what you might think...block level, ooh nice. Not for me, slowest thing I ever encountered.

So your filers can replicate and you can get at individual files. Its not a netbackup problem if you can keep sufficient snaphots.

And if you need to put to another medium, you and I both need to know how to force our filer (netapp in my case too) how to devote more cpu to ndmp. I have a 2ndary filer and I do my backups (cifs, vms) off it.

Vms=quick, tape speed. 2drives(lto4)->250M/s aggregate.

Cifs=not nearly so, cruising the inodes and so forth, millions of files.

but the filer never breaks into a sweat. If we knew how to force ndmp to be much higher in the scheduleing I think it could go much faster. It spends a vast amount of time with the tape static while it works out what it needs to do. Its a waste of a filer, really!


jim dalton's picture

07/28/2013 18:36:58 - Info ndmpagent (pid=25848) w01: DUMP: Date of last level 0 dump: the epoch.
07/28/2013 18:36:58 - Info ndmpagent (pid=25848) w01: DUMP: Dumping /vol/xxxxx/appdata to NDMP connection
07/28/2013 18:36:59 - Info ndmpagent (pid=25848) w01: DUMP: mapping (Pass I)[regular files]
07/28/2013 19:30:18 - Info ndmpagent (pid=25848) w01: DUMP: mapping (Pass II)[directories]
07/28/2013 19:45:06 - Info ndmpagent (pid=25848) w01: DUMP: estimated 47804807 KB.
07/28/2013 19:45:06 - Info ndmpagent (pid=25848) w01: DUMP: dumping (Pass III) [directories]

07/28/2013 19:45:45 - Info ndmpagent (pid=25848) w01: DUMP: Sun Jul 28 19:45:45 2013 : We have written 8196 KB.

This is a million file vol, as you can see theres an hour spent doing the number of files matters a great deal...and it ends up pushing data at 4M/s overall.

Break it down into smaller lumps...that gets ugly.

Jim's picture

I'm not sure what I'm doing differently Jim, but I'm not seeing the same delay.  I hope you can see this.Capture.JPG's picture

If not . . .

7/26/2013 6:01:35 PM fas1a Info 1883280 Backup started backup job for client fas1a, policy FS01_CORP, schedule Weekly on storage unit NDMP-FS01_CORP
7/26/2013 6:01:47 PM fas1a Info 1883280 Media Device begin writing backup id fas1a_1374879695, copy 1, fragment 1, destination path NDMP-FS01_CORP
7/26/2013 6:01:48 PM fas1a Info 1883280 Backup fas1a: DUMP: creating "/vol/fs01corp_d/../snapshot_for_backup.17281" snapshot.
7/26/2013 6:01:49 PM fas1a Info 1883280 Backup fas1a: DUMP: Using Full Volume Dump
7/26/2013 6:01:51 PM fas1a Info 1883280 Backup fas1a: DUMP: Date of this level 0 dump: Fri Jul 26 18:01:48 2013.
7/26/2013 6:01:51 PM fas1a Info 1883280 Backup fas1a: DUMP: Date of last level 0 dump: the epoch.
7/26/2013 6:01:51 PM fas1a Info 1883280 Backup fas1a: DUMP: Dumping /vol/fs01corp_d to NDMP connection
7/26/2013 6:01:52 PM fas1a Info 1883280 Backup fas1a: DUMP: mapping (Pass I)[regular files]
7/26/2013 6:16:58 PM fas1a Info 1883280 Backup fas1a: DUMP: mapping (Pass II)[directories]
7/26/2013 6:17:13 PM fas1a Info 1883280 Backup fas1a: DUMP: estimated 11100337208 KB.
7/26/2013 6:17:13 PM fas1a Info 1883280 Backup fas1a: DUMP: dumping (Pass III) [directories]
7/26/2013 6:21:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:21:58 2013 : We have written 1084166 KB.
7/26/2013 6:26:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:26:58 2013 : We have written 2473980 KB.
7/26/2013 6:31:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:31:58 2013 : We have written 3567005 KB.
7/26/2013 6:36:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:36:58 2013 : We have written 4561776 KB.
7/26/2013 6:41:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:41:58 2013 : We have written 5955849 KB.
7/26/2013 6:46:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:46:58 2013 : We have written 7417926 KB.
7/26/2013 6:51:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:51:58 2013 : We have written 8728638 KB.
7/26/2013 6:56:58 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 18:56:58 2013 : We have written 10061973 KB.
7/26/2013 7:12:15 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 19:12:06 2013 : We have written 10657188 KB.
7/26/2013 7:12:59 PM fas1a Info 1883280 Backup fas1a: DUMP: dumping (Pass IV) [regular files]
7/26/2013 7:17:06 PM fas1a Info 1883280 Backup fas1a: DUMP: Fri Jul 26 19:17:06 2013 : We have written 30614878 KB.

jim dalton's picture

Im genuinely interested in this delay or lack thereof asI would love to be shot of it in my environment.Which ver of ontap are you running rsamora?