bplabel -erase didn't actually leave assigned tapes alone. Ouch.
This happened. It is kind of a sad story with a moral lesson and a warning to all.
Netbackup 7.0.
So there was this media pool with some 50+ VTL tapes.
(Yes, I know, VTLs are a dying breed.)
Most of the tapes in the pool were assigned and full.
The rest were unassigned because all backup images inside them had expired.
I wanted to do a quick erase on each of the unassigned tapes in the pool.
This is a common practice for VTLs because only then would an unassigned tape actually has its data deleted inside the VTL.
Some VTLs do this automatically but our's isn't one that does.
For more information about erasing unassigned VTL tapes and why it is necessary:
https://www-secure.symantec.com/connect/idea/optio...
https://www-secure.symantec.com/connect/forums/can...
https://www-secure.symantec.com/connect/forums/scr...
Anyhow, the Netbackup admin console GUI doesn't actually allow you to erase an assigned tape,
so I thought its command-line equivalent would behave the same.
If you do this to an assigned tape:
bplabel.exe -erase -m vt0001 -d hcart3 -o -p VTL_pool -host nbumaster
It will say "Media is assigned, and will not be labeled".
And if you check the Activity Monitor, you will find a Media Erase job that says the following:
Info bptm(pid=1076) INF - Waiting for mount of media id VT0001 on server nbumaster for reading.
mounted; mount time: 00:00:07
Info bptm(pid=1076) EXITING with status 0 <----------
Error bplabel(pid=3196) Media is assigned, and will not be labeled
end Erase; elapsed time: 00:00:14
cannot overwrite media, data on it is protected(168)
So it won't let you erase assigned tapes by mistake.
Sounds reassuring.
If you still don't believe it, check the catalog, you will find that all backup images are still there on the tape.
So bplabel -erase does protect you from erasing assigned media.
Except when it doesn't.
What would actually happen - at least in my case - is that even though Netbackup would keep its word on not touching the tape,
the VTL would somehow still receive the erase command and delete all data on the tape. Magic.
You'll know when you try to restore some files from the tape and couldn't.
You'll get some kind of read timeout error.
The Netbackup catalog-data about the backup images on the tape becomes irrelevant,
because the data will not be on the tape anymore.
Now, when you are ready for this:
I scheduled and ran a daily-recurring script that does bplabel -erase against all tapes in the VTL media pool.
It should leave the assigned tapes alone right?
The script didn't get to run the second time when we figured out what happened.
In short: Netbackup kept its promise (apparently) while the VTL destroyed all.
I couldn't even import the tapes because our VTL apparently does a very good job in securely deleting the data associated with the tapes when it is told to erase them.
A stupid destructive script was clearly ran against the media pool,
and Netbackup had clearly skipped all assigned tapes and protected all catalog-data about the backup images on the tapes.
Our VTL vendor says that there is nothing they could do about it because the VTL was clearly told to erase the tapes.
Only that it shouldn't have been told to.
We haven't logged a support case with Symantec yet, because we're pretty sure that "the VTL shouldn't have done it!".
I know, we probably should. But not before we are sober.
Comments 14 Comments • Jump to latest comment
What would actually happen - at least in my case - is that even though Netbackup would keep its word on not touching the tape,
the VTL would somehow still receive the erase command and delete all data on the tape. Magic.
Well, how did you check if the VTL recieve the command?
I could not find any clue in messages you post. If NetBackup did not send erase command, who did it?
Have you already tried to restore after this test?
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
In my post:
It was actually the restore errors that clued us on checking the VTL.
In the VTL admin interface, we checked all the tapes that the bplabel -erase command ran against.
All of them have had their data deleted, and disk space reclaimed. Unrecoverable.
Meanwhile, Netbackup still happily tells me that all images on the assigned tapes are fine, despite that they were bplabel -erase'd.
Checked both the catalog and the media pool, and Netbackup did not lie.
As said above, we actually checked from the VTL's own interface. All tapes in that pool, assigned or not, have had their data deleted and disk space reclaimed.
I feel your pain....
To be fair, and also to be sure, we tested the whole scenario again, using dummy data.
The same thing happened.
Before bplabel -erase, we could restore the data.
After bplabel -erase, Netbackup would boldly claim that it didn't touch assigned tapes, while the VTL would show that the tape data has just been deleted.
Test a restore now, you could still see all the files in BAR because the catalog image db data about the images is untouched, but when you actually go ahead with the restore, you will get read timeout errors, because as we know, the tape is empty now, but Netbackup doesn't know it.
Sorry, I could not make sense from your first post.
What you would to say is that NetBackup erase tapes althought it looks like failed when you run "bplabel -erase" with "-o" option. Right?
Now, I have tested with NetBackup 7.5 and mhvtl, but backups are not erased and can be restored.
I'm not sure why backups are erased in your environment, but if you have to keep data, you should not use "force" option like "-o".
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
No worries, I couldn't make sense of any of this at all either, especially now that you are telling me that it doesn't behave that way in 7.5 and mhvtl.
The -o option was used because I run bplabel in a scheduled daily-recurring non-interactive script.
But then, even if I drop the -o and manually press 'yes' to all the promots, I bet the same thing would still happen.
As I said in the top post:
I admit to part of the responsibility of doing something risky to our backed up data. But let me add that it wasn't considered risky at the time because bplabel -erase was proven to skip assigned media.
And it does! But my data is gone. This is so confusing.
What OS are you on ?
I've just run your command on my system (not VTL) but bear with me ...
Eg.
I have two tapes :
R0TP00 - Unassigned
R0TP01 - Assigned with data on it
I ran :
bplabel -erase -m <mediaid> -d 4mm -o -p NetBackup -host womble
Both times the tape was loaded, but, only R0TO00 was 'erased'. For R0TP01 the message came back saying assigned.
Both times, I used a script that used dtrace to 'trace' the sg scsi ioctls. Comparing the outputs, this was the only reall difference, that is , these lines only appeared when the tape was erased for sure ...
Thank you for spending time testing things.
I just want to share some information about this potential pitfall.
We have accepted that whatever happened to our data, it cannot be reversed, and that I don't believe anyone could do anything to help now.
We are on Win 2003 SP2 32bit, connected to a VTL through FC HBAs.
I'm not familiar of anything like DTrace on Windows, and wouldn't know what to make out of the data that such a tool captures.
I'm struggling ... there is no dtrace on windows ... you might need to log a call ....
From the above, I'm actually slighthly wrong , the scsi caomnds sent are just these ...
This is when the tape really was labelled :
The first hex number in the line is the 'command'
pid=18135 SCSI CDB bytes: 0x16, 0x0, 0x0, 0x0, 0x0, 0x0
the 12th line down is :
I fully follow the concept and logic behind your method, and understand the results.
Yea, Windows can be annoying. But I'm not worrying about the pass thru driver thing.
I understand what happened in our environment shouldn't have, because as you have shown, that Netbackup shouldn't send that scsi command to assigned tapes.
I just want to tell people not to do this against a media pool, like I did.
Or, at least script it so that it filters for the unassigned tapes before doing bplabel -erase. (I.e., Script the skipping logic yourself, don't let bplabel do it.)
*** NOTE ***
Cannot edit this post for some reason, I'm not quite right as it happens, so have continued below ...
Martin
man page says
-o Unconditionally overwrites the selected media ID.
and it sounds like it did
will restore -- where there is a Will there is a way
And here is the answer ...
A look through eTracks led to this ..
... which led me to this ...
http://www.symantec.com/docs/TECH144380
The GUI checks if the media is assigned, the command line doesn't ...
So in summary, proir to NBU 7.5 an erase run from the command line will do exactly that, even on assigned media.
"When executed via the command line, the bplabel command will not check the assignment status of the tapes being labeled or erased. "
This fairly major flaw has been fixed at 7.5 (which is what I test on).
I would argue that you shouldn't run a erase command on an assigned media BUT, fully understand that this could happen and you would expect the software to protect against this, esp. if the GUI does (well, mostly as explained in the TN).
Big thanks to RLeon for taking the time to explain his issue, I certainly learnt something I didn't know today. I also appreciate his calm, professional approach.
Martin
Martin:
I started this thread not intending to find a solution but you found me one anyway! Thanks!
It is unfortunate that the incident happened in a nbu domain that just so happened to be running 7.0, not 7.5.
Oh if only we had upgraded it last month when we have the resources to. But welp.
On the bright side, someone else lost 200 tapes! I only lost around 50...Heh.
This is no laughing matter.
wr:
I did read up on bplabel -o from both the 7.0 and 7.5 command guides.
If only I had interpreted it as something more than just a silent mode.
But then again, in 7.5, even with -o it does still skip over and ignore assigned medias, so I guess the victim count must have been pretty high at some point prior to 7.5.
Thanks all,
RLeon
....... For anyone who is here with the same problem, the solution marked here is not the kind that could rescue your data. Get over it. I marked it to be some kind of monument...
Pleased to help - it was bugging me ...
I think clearly, although I 'could' argue that NBU was working as designed, that part is a bad design and as mentioned, has been changed.
Normally we 'know' about these potentail issues, but that one was new to me - perhaps not that many people run the erase command with a VTL from the command line.
Regards,
Martin
Would you like to reply?
Login or Register to post your comment.