Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

MS-Windows Backup to 5230 Netbackup Appliance - Using Accelerator and OpenStorage Direct (CSD)

Created: 10 Sep 2013 | 4 comments
SYMAJ's picture

I have a situation whereby a Physical Windows 2003 R2 Standard SP2 server will backup fine with a Differential Incremental schedule but fails consistently with a FULL schedule - status 87.

It has previously backed up OK on a FULL schedule to the 5230 several times, but all of a sudden fails every time a FULL backup is attempted.  The backup appears to work successfully and backs up around 330GB of the 624GB total before failing.  Log below:

Client replaced with clientname

Master Server replaced with Master

10/09/2013 11:06:13 - Info nbjm(pid=3704) starting backup job (jobid=4323) for client clientname, policy Daily_18.30__Weekly_00.01__Monthly_00.00__Physical_Windows, schedule MONTHLY 
10/09/2013 11:06:13 - Info nbjm(pid=3704) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=4323, request id:{88A16AE2-9817-42E1-8DA8-7A1122A7D995}) 
10/09/2013 11:06:13 - requesting resource DUB-5230-DeDup
10/09/2013 11:06:13 - requesting resource Master.NBU_CLIENT.MAXJOBS.Clientname
10/09/2013 11:06:13 - requesting resource Master.NBU_POLICY.MAXJOBS.Daily_18.30__Weekly_00.01__Monthly_00.00__Physical_Windows
10/09/2013 11:06:13 - granted resource Master.NBU_CLIENT.MAXJOBS.Clientname
10/09/2013 11:06:13 - granted resource Master.NBU_POLICY.MAXJOBS.Daily_18.30__Weekly_00.01__Monthly_00.00__Physical_Windows
10/09/2013 11:06:13 - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=dp_disk_dub-5230;Path=PureDiskVolume;StorageServer=dub-5230;MediaServer=dub-5230
10/09/2013 11:06:13 - granted resource DUB-5230-DeDup
10/09/2013 11:06:13 - estimated 0 Kbytes needed
10/09/2013 11:06:13 - Info nbjm(pid=3704) started backup (backupid=Clientname_1378807573) job for client Clientname, policy Daily_18.30__Weekly_00.01__Monthly_00.00__Physical_Windows, schedule MONTHLY on storage unit DUB-5230-DeDup
10/09/2013 11:06:13 - started
10/09/2013 11:06:14 - started process bpbrm (32523)
10/09/2013 11:06:18 - Info bpbrm(pid=32523) Clientname is the host to backup data from    
10/09/2013 11:06:18 - Info bpbrm(pid=32523) reading file list from client       
10/09/2013 11:06:18 - Info bpbrm(pid=32523) accelerator enabled          
10/09/2013 11:06:21 - connecting
10/09/2013 11:06:22 - Info bpbrm(pid=32523) starting bpbkar on client        
10/09/2013 11:06:22 - connected; connect time: 00:00:01
10/09/2013 11:06:24 - Info bpbkar(pid=2856) Backup started          
10/09/2013 11:06:24 - Info bpbrm(pid=32523) bptm pid: 32530         
10/09/2013 11:06:24 - Info bptm(pid=32530) start           
10/09/2013 11:06:24 - Info bptm(pid=32530) using 1048576 data buffer size       
10/09/2013 11:06:24 - Info bptm(pid=32530) using 64 data buffers        
10/09/2013 11:06:24 - Info dub-5230(pid=32530) Using OpenStorage client direct to backup from client Clientname to dub-5230 
10/09/2013 11:06:26 - begin writing
10/09/2013 11:07:11 - Info bpbkar(pid=2856) change journal NOT enabled for <C:\>      
10/09/2013 11:09:36 - Info bpbkar(pid=2856) change journal NOT enabled for <D:\>      
10/09/2013 11:19:42 - Critical bptm(pid=32530) sts_close_handle failed: 2060002 memory allocation       
10/09/2013 11:19:42 - Critical bptm(pid=32530) cannot write image to disk, media close failed with status 2060002 
10/09/2013 11:19:42 - Info dub-5230(pid=32530) StorageServer=PureDisk:dub-5230; Report=PDDO Stats for (dub-5230): scanned: 330010997 KB, CR sent: 303631 KB, CR sent over FC: 0 KB, dedup: 99.9%
10/09/2013 11:19:42 - Critical bptm(pid=32530) sts_close_server failed: error 2060005 object is busy, cannot be closed  
10/09/2013 11:19:46 - Info bptm(pid=32530) EXITING with status 87 <----------       
10/09/2013 11:19:47 - Info bpbkar(pid=2856) done. status: 87: media close error      
10/09/2013 11:19:47 - end writing; write time: 00:13:21
media close error(87)
termination requested by administrator(150)  - I CANCELLED THE JOB AT THIS POINT AS IT WENT INTO INCOMPLETE STATUS
10/09/2013 11:20:32 - Info bpbrm(pid=2822) Starting delete snapshot processing        
10/09/2013 11:20:32 - Info bpfis(pid=0) Snapshot will not be deleted       
10/09/2013 11:20:51 - Info bpfis(pid=3480) Backup started          
10/09/2013 11:21:16 - Error bpbrm(pid=2822) from client Clientname ERR - Delete bpfis state from Master failed. status = 71
10/09/2013 11:21:21 - Info bpfis(pid=3480) done. status: 0         
10/09/2013 11:21:21 - end operation
10/09/2013 11:21:21 - Info bpfis(pid=0) done. status: 0: the requested operation was successfully completed   

The above job was run during a quiet time on the backup environment with only two other jobs running from different clients.

I have reviewed a number of forum posts around this but am confused as to where to look.  The above is stating memory allocation issues - on the client or on the media server (5230 in this case) ?  I have an almost identical client to this backing up successfully every time (FULL or DIFF), whwereas this one fails a FULL every time now (DIFF OK).

As can be seen I am using ACCELERATOR and Client Side DeDup (OpenStorage Client Direct) here, and all was well with great performance until now.

Is this an issue with previous backup images catalogued on the appliance ?  I have seen similar before where the same client has had upper and lower case backup images and a logiical link had to be created on the appliance which resolved the issue.  Could this be someway related ?

AJ

Operating Systems:

Comments 4 CommentsJump to latest comment

Mark_Solutions's picture

It is not going to be the case sensitivity as that would cause it to fail straight away - it would not be able to write anything.

I would expect the memory issue to be related to the appliance itself so you need to look there first.

Seach the storaged.log for "out of memory"

It may also be that the system just runs out of threads and cannot do what it needs to during an accelerator backup

To resolve this check your /disk/etc/puredisk/contentrouter.cfg file

Look for the WorkerThreads value (line 52  - 5% of the way down the file). The default is 64 but when using Accelerator it needs to be 128

If you change this you will need to re-start its services or re-boot it to take effect.

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

SYMAJ's picture

Mark,

Thanks - I will look into this in the morning.  I am puzzled as to why the DIFF-INCR always works and the FULL always fails ??

AJ.

D.Flood's picture

I see more than one drive listed so it might be running out of memory or resources on the sending side trying to hold the entire fingerprint cache for all the drives at once. 

You might try checking the "Allow Multiple Data Streams" box but also set the "Limit Jobs Per Policy" to One so that only one drive gets backed up at a time.

 

 

SYMAJ's picture

Multiple Streams is NOT checked and there is only one job running from this policy (during testing). I will check the client as I have other Windows physical clients with much bigger disks / more data running ok.
As you can see it processes 330 GB before failing, sending 303MB. Why would the client run out of memory as surely it is simply sending the data from the accelerator change log (although CSD is also taking place I guess).

AJ