Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

NBU Encryption issues - Quantum and Symantec

Created: 08 Nov 2013 • Updated: 17 Jan 2014 | 43 comments
This issue has been solved. See solution.

I could use some advice here - currently have cases open with the vendors but wanted to get some outside perspectives as well.

 

We are using NetBackup for a data migration, nothing fancy: Backup the current data at site A, ship the tapes to site B and restore.

Info for both sites:

site A:  Spectra T120, LTO5 drives, NBU 7.0 (I could not convince them to upgrade and we are doing catalog recoveries, so stuck at 7.0)

site B:  Quantum i500, LTO5 drives, NBU 7.0

 

It's been problematic - since day one the restore have been painfully slow, using LTO5 drives and 2-8gb connections I was seeing restore speeds no better than 5mb/sec.  There was a lot of troubleshooting done, cases opened and closed, optimizing, etc.  Nothing seems to break that barrier of horrible throughput.  We tested outside of NBU, network and disk speeds are just fine, there is no reason those restores should perform so poorly.

Then I had the idea to rule out the incoming tapes, try a local backup and restore with fresh tapes.  Bingo!  Decent speeds, no errors, etc.  Until I tried to use KMS - trying to write encrypted backups fails with these types of errors:

 

 Error bptm(pid=2272) FREEZING media id <Media ID>, Encryption unavailable for an ENCR pool 

Now this is where it gets odd - KMS has been replicated from site A to site B, and from the begining there were never any indications that anything was misconfigured; the tapes coming from site A were always able to be read, and running nbkmsutil on both sites shows identical info.  I followed the simple instructions on exporting/importing keys so no surprises there.

The case I opened with Symantec found errors in bptm that point to the hardware being the issue - Quantum got involved and are confused as well.  Heck, I am confused too - the tapes I am not able to create encrypted backups on are from the same pool we use in our production site, there are no issues with them.  I looked at them personally, they are in fact LTO5 tapes, no damage, and so far six of them give me that Encr error, yet I can run a non encrypted backup and restore with  them with no problems at respectable speeds,

 

While I wait for Symantec and Quantum to review logs I am still poking around trying to find clues, if anybody has seen anything like this please let me know.

 

 

 

 

 

Operating Systems:

Comments 43 CommentsJump to latest comment

Will Restore's picture

Found this old technote which describes the bits used in bptm log to verify both the media and the drive support encryption:  http://www.symantec.com/business/support/index?pag...

 

Will Restore -- where there is a Will there is a way

mph999's picture

Fairly sure there have been similar cases - KMS restores fail, down to a drive firmware issue.

Found it ...

http://www.symantec.com/docs/TECH204085

Now this may not match your issue, but it at least shows there have been issues in that area, and there is a possibility you have a slight variation perhaps ?

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
HoldTheLine's picture

Good information, thanks both of you.

 

The firmware question on the drives (and libraries) came up before as we knocked that out a week or so ago - both libraries and drives are now at the latest revs.  That being said, I have not had a chance to test a restore created after the upgrades - so there may be something to that.

HoldTheLine's picture

re: http://www.symantec.com/business/support/index?page=content&id=TECH87444

The media can be physically inspected to check the type.

 

I found this technote early on which prompted me to do a personal visit to the data center - I popped all 6 tapes out and looked at them.   Didnt see anything that jumped out at me, they are all LTO5 tapes, brand new.  Even ran quick erases on them.

Will Restore's picture

On the i500, select Reports > System Information from the web client to confirm tape drive encryption settings.  Sorry, don't know about the other library. 

 

Will Restore -- where there is a Will there is a way

HoldTheLine's picture

Yea been down that road, looked at every Quantum setting there is. Not only is encryption on the library not enabled, its not even licensed so there is no proprietary encryption getting in the way.

mph999's picture

Check the basic tuning settings of the OS - eg. nofiles (should be minimum 8192).

We had a multiple cases on Linux for a customer where very very odd thngs were happening.  The one I was involved with, some of the backup headers on the VTL 'tapes' were in fact a bunch of 0's - not good when you try ad restore.

Not the same as you, but you have to agree it's in the same kinda area.

I have absolutley no idea how such a setting could cause such an issue - but once changed all the issues disappeared, 

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
HoldTheLine's picture

Check the basic tuning settings of the OS - eg. nofiles (should be minimum 8192).

What/where is this setting?  It's not familiar to me.

 

Thanks

mph999's picture

It depends on the OS - solaris it is viewed in ulimit -a and can be set in /etc/system
Could you confirm the OS involved

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
HoldTheLine's picture

It depends on the OS - solaris it is viewed in ulimit -a and can be set in /etc/system
Could you confirm the OS involved

 

Oh that would help, wouldn't it! :)

 

All systems involved (Both masters and one media server) are Windows 2008 Server R2 Enterprise

mph999's picture

OK, far as I know no concerns with this on windows, I think the max is around 16000 which is fine.
This issue is more unix/ linux which has a low default that requires increasing to min of 8192

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
mph999's picture

OK, far as I know no concerns with this on windows, I think the max is around 16000 which is fine.
This issue is more unix/ linux which has a low default that requires increasing to min of 8192

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
jim dalton's picture

There seems to be more than one issue at play here, perhaps you can clarify.

FWIW Having the latest firmware isnt relevant, what you need is the latest firmware that works! Been there done that. Backups were clean, couldnt restore any data.

So lets deal with encryted backups : need to check its supported for your device and for this version of NB and what limitations are, check you have it (KMS) configged correctly, check the policy is configged correctly and of course who is doing the encryption - which looks to me to be data at rest T10, like what I use. I would focus on the firmware and netbackup levels.

You need to explain the comment about "from the begining there were never any indications that anything was misconfigured; the tapes coming from site A were always able to be read, and running nbkmsutil on both sites shows identical info. ". So: simply put: whats changed? It used to work , now it doesnt.

We'll get there...

Jim

 

HoldTheLine's picture

FWIW Having the latest firmware isnt relevant, what you need is the latest firmware that works! Been there done that. Backups were clean, couldnt restore any data.

I agree - the first time we addressed this, during troubleshooting the Symatnec tech noticed the drive firmware was very old so suggested we upgrade it.  So we did - now it's at the latest but like you say the most recent is not always the best

 

You need to explain the comment about "from the begining there were never any indications that anything was misconfigured; the tapes coming from site A were always able to be read, and running nbkmsutil on both sites shows identical info. ". So: simply put: whats changed? It used to work , now it doesnt.

You are right I do need to explain that - I may have muddled things up here.  It never really did work - at least not very well.  What I meant by "From the begining" was the restores - we have so much data to move that since this environment has been built all it has been doing is restores.  Each client was taking 3-6 days to complete so they were running around the clock.

 

That is what I meant by "from the beginning..." - I may be wrong but my understanding of KMS is that if it was not configured  correctly at both sites, would we be able to read tapes at all?  If I had been asked that question 2 months ago I would have said "Heck no, it's an all or nothing deal - if KMS is not set up you will not be able to read anything at all"

 

At this point I am not so sure - there is literally no other reason we can see that these restores should trickle out at 2-5mb/sec when the network, disk and tape are all capable of much greater performance.

 

So what has changed:

 

LTO5 firmware updates (didnt affect restore speeds)

The backup tests I have done recently were never done before because we were concentrating on troubleshooting the performance of the restores - once I had a lull it occured to me to try local backups so that is a very recent development.

I am just as confused as anybody by all this ...

 

 

jim dalton's picture

I would agree with KMS...if the data coming out of the restore are unencrypted then the decryption is working, whats left is a performance issue.

But that doesnt explain your log snippet ...Encryption unavailable.

Do yourself a favour: on the target master only , find yourself a big fat file, a few Gb , (an iso supplied by Symantec say!), create a policy for it, write it out to tape, restore from tape to a different location and compare with original and observe.

Bugs and misconfigurations aside, this will be rapid both ways and will be encrypted to tape and decrypted on restore. You might want to check this ie report on the image/observe during both backup and restore that encr/decr is happening.

I would add to check the blocking factor on tape: this can slow things: but thats easily checked from the backup viewpoint: what speed backup do you get? ( Its a good idea to up the blocksize for performance to 256k or more - check for LTO5 what you can use). I dont think this is the issue, as you would have logged it as a backup issue not a restore issue. The blocking is (normally) handled automatically on restore since you cant do anything with it at that point. Tape drive issues notwithstanding.

How long have you been running with this setup?  Any other odd issues? I ask since if there are name resolution issue this can be very detrimental to netbackbackup generally.

This sounds like a fun interesting problem, my rates are very reasonable!

 

Jim

HoldTheLine's picture

I would agree with KMS...if the data coming out of the restore are unencrypted then the decryption is working, whats left is a performance issue.

But the data coming out of the restore is encrpyted - at least KMS is set up, and it is using an ENCR_ pool.

 

Do yourself a favour: on the target master only , find yourself a big fat file, a few Gb , (an iso supplied by Symantec say!), create a policy for it, write it out to tape, restore from tape to a different location and compare with original and observe.

Interesting idea - assume you mean unencrpyted, correct?  Well, thats my only option since any attempt at a new encrpyted backup fails :)

 

 

How long have you been running with this setup? Any other odd issues? I ask since if there are name resolution issue this can be very detrimental to netbackbackup generally.

 

The DR site has been configured for about 1 1/2 months.  The prod site  pre-dates me, as far as I know since it is still running 7.0 it's been around since - 7.0 came out!  Before we started this project they were using the Spectra proprietary Encryption.

Wonder if we should have tried a backup out there after running a long erase on the media, just in case Spectra left anything wierd on the headers.

 

Of course that doesnt help with the current situation, but something else occurs to me:  If the tapes I am trying to use in this 7.0 environment were once upon a time used in a 7.5 Encrypted environment, might these be some of the symptoms? i.e. can't encrypt because its already encrypted via a method that 7.0 era KMS doesnt know about?

 

jim dalton's picture

Interesting point re versions, that could have mileage - and indeed more work for you!

I am not familiar with the on-tape format: dd would be the tool here if you were unix, hopefully others might chip in with information on the subject under Win.

You say the data coming out of restore is encrypted. How do you know? To clarify my thought: determine its encrypted on write and decrypted on read: once its back on disk you wont be any the wiser.My drives tell me "encrypting" when I write, and of course the reverse on read.The catalog should also tell you. 

For whats its worth, I've done this kind of exercise as part of DR several times and it should work fine.

You are certain the two environments are the same versions/rev/patches?

Jim

mph999's picture

You mentioned earlier, if KMS wasn't configured correctly it wouldn't work at all - I agree, all or nothing. I think it is also 'very unlikely' for the config to give intermittent results (that's my sensitive way of suggesting not a chance ... but with a 0.1% get out clause in case I'm wrong ... ;0) )

I am wondering if this is something to do with the data held on the internal chip in the cartridge - I don't know if this is used in KMS, but it does hold quite a lot of info about the tape, so there is a possibility it is.

Can you advise of the tape brand ? This may be different from what is written on the tape (Eg. Oracle branded tapes are actually made by Imation (or at least they used to be)). If you look in the bptm log, you should see the manufacture listed - I guess searching for 'man' might narrow down the search.

If they are fuji, we might be in luck as I know someone who works there who might be able to confirm a couple of things.

Is there any current Symantec case number as a matter of interest ? Even if not, I can ping an email direct to to a few people to see if we can get some more ideas.

To answer Jim's question on 'dd' for win. I think we're stuffed there, I am not aware of any 3rd party equivalent. There is a utility within NBU for windows, but I can't remember what it's called and I think it's only useful for positioning - I'll ask about.

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
HoldTheLine's picture

Can you advise of the tape brand ? This may be different from what is written on the tape (Eg. Oracle branded tapes are actually made by Imation (or at least they used to be)). If you look in the bptm log, you should see the manufacture listed - I guess searching for 'man' might narrow down the search.

 

Looks like they are Fuji:

 

 10:08:40.000 [2876.3080] <2> manage_drive_attributes: Reported medium manufacturer [FUJIFILM], sn [EPAMR4GD2U]

 

There is a case open, I am not sure if it would be kosher to post it publically - would it?  The tech I am working with is pretty good so it's not like I want to bash him or anything :)

HoldTheLine's picture

If they are fuji, we might be in luck as I know someone who works there who might be able to confirm a couple of things.

 

We are seeing some interesting things here - the LTO5s that we cannot write to are all Fuji.  Soon I will have some HP LTO5s to test, sort of a shot in the dark but who knows maybe that Fuji EEPROM is causing the libraries some problems.  If you have any inside information I am all ears :)

 

I was able to put one of the LTO5s that failed in the i500 into our i6000 and it wrote encrypted backups with no problems.

 

The further we get into this the more it looks like tape - will know more over the next couple of days.

Will Restore's picture

>>since any attempt at a new encrpyted backup fails :)

 

Wait a sec...  Does this mean you can't write a new tape? 

Will Restore -- where there is a Will there is a way

HoldTheLine's picture

Wait a sec... Does this mean you can't write a new tape?

 

Sorry if it's confusing - I am still able to write to new tapes.  Just not encrypted - i.e. if set the volume pool to anything but ENCR_ it works.  As soon as I try to wrtie to the encrypted pool which matches the keys in KMS I get the encryption errors.

jim dalton's picture

WR is confused, me too! Its catching.

I think we all need a rundown of what does what where, followed by a stiff drink.

I had another idea: if the restore does work, is there a chance that you have a monumentally large MPX setting, combined with a huge number of small files? This scenario could slow thing down majorly but it would need to be mpx 100 and sub 1k files. Im guessing at the numbers but both slow things down so in combo you'd get the (un)desired effect.

Jumping ahead a bit tell us about the lifecycle of a single large file from backup through to restore, preferably carried out in isolation to everything else.

Jim 

HoldTheLine's picture

I had another idea: if the restore does work, is there a chance that you have a monumentally large MPX setting, combined with a huge number of small files?

 

At first the backups were configured at an MPX of 8 - which was my first suspicion when the restores were going so slow.  Since that time they have been kicked down to 2 - about as low as they can go in production.

 

Good thought, MPX can be killer on restores but for this particular issue it has already been addressed.

HoldTheLine's picture

Update - I was finally able to get a good encrypted backup.  Using the tapes that were sent from the site with the T120.  Still investigating why we can write to those tapes and no others, might be the barcodes.

The barcodes for the media that work are 9 characters in this format:

 

#######LA

 

Where # = a digit

L is static

A = alpha

 

So an example would be

2880248LA

The only way to get NBU to work with them is to set up a media ID generation rule to chop off the first 2 characters and the last character, so the ID above would show up in NBU as 80248L

 

The tapes that I was never able to write to have a format of S51234L5 and would get read in as S51234

Sort of scratching my head here but we are not done looking into it, especially since we have a lot of data to move and need to make sure we get media that the T120 can write to and the i500 can read from....

 

jim dalton's picture

Struggling to believe its the barcode rules...that determines how the robot recognises media, picks them out and mounts, verifies the labels. If the media werent the right media it wouldnt be picked nor mounted, yet media have been selected, we are beyond that point...but I'm happy to be enlightened!

There is something else yet to be revealed  i feel.

jim

HoldTheLine's picture

I agree, seems like either a barcode is readable or it's not.  The issues we are seeing are just bizzare - Quantum took it to IBM and they are stumped as well. 

 

This is really an odd one.

mph999's picture

Just occassionally, you find an issue with a cause that makes no sense. If there is one thing I have learnt, never evevr rule anything out ... For example the issue I saw with 0's on the VTL tape where the header should have been, caused as far as we know by OS tuning setings - how the hell does that happen ?

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Will Restore's picture

>>Quantum took it to IBM and they are stumped as well.

 

OK we are 'dying' here.  Any progress??

Will Restore -- where there is a Will there is a way

HoldTheLine's picture

Sorry, it's been really crazy around here.  Not real progress - still working with Quantum and we installed some IBM tool to get logs for the drives after the failures.  I did recieve a couple of test tapes from them. sort of a "Try these tapes that SHOULD work " and am seeing the exact same results -

 

Encryption unavailable for an ENCR pool  

 

Is there some double checking I can to in KMS?

 

 

Will Restore's picture

verify bptm log ouput per Article URL http://www.symantec.com/docs/TECH87444

A backup policy is configured to use media from a pool name with the prefix "ENCR".

This is the trigger for the bptm process to enable encryption in the tape drive. The bptm process mounts it's tape then checks that encryption is possible, given the selected tape and drive.

It logs the results of its checks in its bptm log file; for example:
  16:54:17.552 [8584] <2> manage_drive_attributes: report_attr, fl1 0x00010049, fl2 0x0000000c

<snip>

Check the value for "fl1" in the bptm log. In the example above it is 0x00010049 and this was for an LTO3 media. When the correct media is loaded, the value is 0x20000 greater. In this example, if LTO4 media is used, the fl1 value is 0x00030049

 
Bit 0x00010000 indicates the Drive supports Encryption.
Bit 0x00020000 indicates the Media supports Encryption.

If both the drive and media supports encryption, these values will be added together (0x00030000) in the fl1 field.

Will Restore -- where there is a Will there is a way

HoldTheLine's picture

It logs the results of its checks in its bptm log file; for example:
16:54:17.552 [8584] <2> manage_drive_attributes: report_attr, fl1 0x00010049, fl2 0x0000000c

<snip>

 

This is very familiar to me - we have been looking at these exact entries.  The thing is, we always DO see the fl1 entry after the backup failure but it doesnt't really help pinpoint what the problem is.  Why?  Because:

 

- These are LTO5 drives that most certainly DO support encryption

- Using LTO5 tapes, ditto.  They DO support encryption.  Even when that entry says otherwise.  

 

When we see the fl1 entry in BPTM that says "Sorry, this drive and/or tape does not support encryption" yet we know that the drive and tapes in fact should support encryption, we have a false positive and are back to the drawing board.

 

Hope I didnt muddle things even more - short story is, in this case, that bptm entry is not useful in shedding any light on the issue because we know that all the components are capable of encryption.

Will Restore's picture

If fl1 in your log is not 0x00030049 then encryption is not supported.  It's not a false positive. It's a positive negative blush

Computers are very logical and we just have to figure out what they are telling us.  We had discovered some media that we thought was LTO4 was really LTO3.

 

Will Restore -- where there is a Will there is a way

HoldTheLine's picture

If fl1 in your log is not 0x00030049 then encryption is not supported. It's not a false positive. It's a positive negative

Computers are very logical and we just have to figure out what they are telling us. We had discovered some media that we thought was LTO4 was really LTO3.

 

No argument here about the logical nature.

 

But it just makes no sense - a tape gets encrypted backups written to it in one library.  I load that same tape into another library,  do a long erase just to be safe, try to run an encrypted backup and see the BPTM entry above -

 

Since at one point it was able to be encrypted, the vendor doesn't see anything wrong with the new library/drives, the only thing left I can think of to look at is KMS.

?

 

mph999's picture

manage_drive_attributes - looks like that comes from the drives, so my thoughts ...

1. The drive is screwing up somehow eg. firmware issue
2. Could NBU be getting it wrong ... - as in what is in the log isn't what is reported by the drive

I can only think of one way to check 1. - san analyzer. he prblem with this is that you probably don't have one and two understading the output (with no offence intended but they through out a log of data ...). The vendors will almost certainly be able to lend one, and if you are not familiar with the output (and I'm not ...) - between the vendors the understanding part can be done.

2. If the values come from the drives I doubt we do any processing on them but I'll have a look in the code. I'm not a programmer so if I can't conclude anything I suggest a case is opened with us (if not done already) and we work together via TSAnet if necessary to see what is going on. I image (but cannot promise) that this could go pretty much straight to Engineering.

Just to save me reading the complete thread again, is there anything common that has been spotted, or are the failures pretty much certain - ie. are there any 'patterns' to the behaviot.

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Magpie1888's picture

Looks like you have been digging pretty deep here, apologies if this is obvious, but have you checked the encryption settings for the library partition on the i500 its self?

There are 3 possible settings, it need to be ‘Allow Application Managed’, if this is not set correctly you cannot create an encrypted backup, even if KMS is correctly installed.

There is also a FIPS setting, if this is set I believe that the drives are only allowed to read/write encrypted tapes, so make sure this option is un-set.

Clutching at straws here, but this would explain why you can’t create encrypted tapes

Getting even more desperate, during restore, the drive will make a SCSI call to the media server for the key, it’s obvious that KMS is returning the key (you’re getting data back), is the library seeing the drive seeing the drive go to encrypted mode & having a nasty library/drive handshaking argument which kills restore performance

Must the musing of a mad man, let us know

HoldTheLine's picture

Sorry for the lack of  updates - have been away a while and am dusting this off again.  To answer some of the questions:

 

Just to save me reading the complete thread again, is there anything common that has been spotted, or are the failures pretty much certain - ie. are there any 'patterns' to the behaviot.

 

I just finished some testing and do see some common things:

- Backups to LTO5 tapes work with no encrpytion

- Bakups to LTO5 tapes fail using encryption

- Backups to LTO4 tapes are good, with or without encryption.

 

Am still working with the vendors and have sent my latest findings - it seems that if I use an LTO4 tape everything works, so that rules out any issues with KMS, library, drive, etc.

 

 

There are 3 possible settings, it need to be ‘Allow Application Managed’, if this is not set correctly you cannot create an encrypted backup, even if KMS is correctly installed.

 

Confirmed, it's at Application Managed.

 

There is also a FIPS setting, if this is set I believe that the drives are only allowed to read/write encrypted tapes, so make sure this option is un-set.

 

Not sure where this is, but since we can encrypt to LTO4 tapes I would guess this doesnt apply. 

 

HoldTheLine's picture

Back to the drawing board - tested the same backups with a different brand of LTO5, HP to be exact and get the same results :(

 

 

mph999's picture

There is a known firmware bug related to multiplexed backups, that is the only 'issue' I can find that relates to KMS failues.  The problem doesn;t happen for non-multiplexed backups.

The issue is that the drive firmware under certain conditions reports the tape in not encrypted when in fact it is - when we fsf past the tape backup headers.  If we read through te backup headers the issue doesn;t happen - very odd.  Although it is a firmware issue, NBU gets around the problem be reading through the headers as opposed to fsf.

This surley has to be a firmware issue ...

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
jim dalton's picture

mph999 is right firmware problem with encrypted backups BUT the issue manifested itself on restore only.

The backup was good, only the restore failed. And indeed the backed up encrypted data were good.

So its not that particular firmware issue since writing backups fails. But it could be another firmware issue with LTO5.

Jim

Will Restore's picture

>>The backup was good, only the restore failed.

 

not really, as he stated above

Sorry if it's confusing - I am still able to write to new tapes.  Just not encrypted...

Will Restore -- where there is a Will there is a way

jim dalton's picture

Yes really wr: he cant write encrypted tapes. That isnt the firmware bug I encountered and the one refererenced by mph999: that bug only surfaced on restore of mpx and encrypted media. Ie writing of same was a success.

Jim

HoldTheLine's picture

Finally have an update -

I was able to upgrade the master from 7.0 to 7.5 and run some tests, encrypted backups are working now.

Go figure...

Maybe I should have suggested upgrading before we started.  Oh wait, I did! :) 

 

SOLUTION