Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Multiplex w/ KMS

Created: 22 Apr 2013 • Updated: 30 Aug 2013 | 30 comments
This issue has been solved. See solution.

NetBackup 7.1.0.4 with the IBM TS3310 tape library.

It would seem I cannot perform Multiplex w/ Encryption (KMS).  The backup runs without error (false positive?). The tape is unreadable afterwards.

Anyone else share this experience?  How did you fix it?

 

Operating Systems:

Comments 30 CommentsJump to latest comment

Nicolai's picture

You can have Netbackup KMS and multiplexing enabled - no problem. Done that myself. If you have configured KMS the intended way backup images will be encrypted wit AES256 in the tape drive.

What tape library does not matter - just a tape drive with that support the T10 encryption standard.

http://www.symantec.com/docs/HOWTO56305

Do not try to mix library based encryption with Netbackup based encryption. It's like oil and water - they do not mix.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

I confirmed the drives do support encryption back when I first setup KMS.  Infact, when I don't use multiplexing, the tapes can be read just fine.

Nicolai's picture

When you don't multiplex are the tapes encrypted then ?. Verify with bpimagelist or using the GUI

The volume group must have the prefix ENCR_ else no encryption take place.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

Yes, tapes show up encrypted.

"Flags:            0x40  (Tape Encrypted)"

ENCR_ is the prefix.  Even the IBM tape utiltiies show it is encrypted.

I have also not had a problem with mutiplexed tapes until encryption came along.

I'm getting the feeling that what I am experiencing is a little unique.  I was hoping someone else may have had this experience before.

bills's picture

I have used KMS with MPX before as well.  You say the tape is unreadable afterwards - what do you mean?  The restore process can't read it?  Check the KMS logs (unified logs only, if I recall).  It should say something about retrieving keys, etc.

Bill

Nicolai's picture

You are on Linux - right?.

What status code are you seeing - status 85 ?

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

Linux, Yes.

Status 85.  Yes.

I'll have to check the logs out tomorrow.  I'll see if I can post something then.

Nicolai's picture

Sound Good.

And you are 100% sure you did not change the encryption key somwhere in time (yes I made that one in a lab - did cost me 2 hours of debugging in vain) ?.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

Yup.  Only ever made one key.

[root@bumaster ~]# nbkmsutil -listkgs
 
Key Group Name        : ENCR_PCI-Encrypted
Supported Cipher      : AES_256
Number of Keys        : 1
Has Active Key        : Yes
Creation Time         : Wed Sep  5 16:05:04 2012
Last Modification Time: Wed Sep  5 16:05:04 2012
Description           : -
 
 
Number of Key Groups: 1
 
[root@bumaster ~]# nbkmsutil -listkeys -kgname ENCR_PCI-Encrypted
 
Key Group Name        : ENCR_PCI-Encrypted
Supported Cipher      : AES_256
Number of Keys        : 1
Has Active Key        : Yes
Creation Time         : Wed Sep  5 16:05:04 2012
Last Modification Time: Wed Sep  5 16:05:04 2012
Description           : -
 
  Key Tag    : 20e16a269394992f92db19cbd4bd2ea2b09cbbbb0bdc4838ef7174c6897b773b2f92db
  Key Name              : nbuttckey
  Current State         : ACTIVE
  Creation Time         : Wed Sep  5 16:07:57 2012
  Last Modification Time: Wed Sep  5 16:07:57 2012
  Description           : TTC Backup Key
 
 
Number of Keys: 1
 
bills's picture

One of the verbose image listings will list the key tag for the encrypted image - might be a good idea to make sure that matches what's in your keystore....

Bill

mph999's picture

If you look in bptm for a job of the failed restore you should see lines containing this :

 

establish_decryption_key: next block encryption

This line should appear after this line:

 

io_position_for_read: successfully positioned 

The question:  how many 'establish_decryption' lines are there directly after the postioning line, there should only be 1.

If there are more, you are probably looking at an issue similar to eTrack 3093882, and will need to log a call.

The bptm log, KMS log and nbsu -c -t outputs will be required, as well as a exact description of the issue.

The casue apeears to be firmware related, and an EEB was released to work around the issue.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
SOLUTION
Nicolai's picture

Interesting !

No TN for eTrack 3093882 ?. I could not find any.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

I do get those.

 

 

09:59:53.290 [12822] <2> io_position_for_read: successfully positioned 000170 to file number 5, mpx_header = 2

09:59:53.291 [12822] <2> establish_decryption_key: next block encryption status: LON 0x000000000000331e, algorithm index 1, encryption status 0x3
 

 

This eventually happens in the job.

 

04/23/2013 10:14:54 - Error bptm (pid=12822) cannot read image from media id 000170, drive index 0, Input/output error

04/23/2013 10:14:54 - Error bptm (pid=12828) The following files/folders were not restored:

04/23/2013 10:14:54 - Error bptm (pid=12828) UTF - /root/anaconda-ks.cfg

04/23/2013 10:14:55 - Info bpbrm (pid=12813) got ERROR 85 from media manager

04/23/2013 10:14:55 - Info bpbrm (pid=12813) terminating bpbrm child 12833 jobid=198940

04/23/2013 10:14:55 - Error bpbrm (pid=12833) bpcd on torbk01 exited with status 21: socket open failed

 

This comes up almost immediately in bptm

 

 

09:59:54.509 [12822] <2> mpx_read_data: attempting read error recovery, err = 5

09:59:54.509 [12822] <2> tape_error_rec: error recovery to block 13086 requested

09:59:54.509 [12822] <2> tape_error_rec: attempting error recovery, delay 3 minutes before next attempt, tries left = 5
 
It happens 5 times and then
 
10:14:54.663 [12822] <2> set_job_details: Tfile (198940): LOG 1366726494 16 bptm 12822 cannot read image from media id 000170, drive index 0, Input/output error


10:14:54.696 [12822] <16> mpx_read_data: cannot read image from media id 000170, drive index 0, Input/output error

10:14:54.696 [12822] <2> send_MDS_msg: DEVICE_STATUS 1 83395 bumaster 000170 4001353 IBM.ULT3580-TD4.000 2000283 READ_ERROR 0 0
 
 
 

 

Jevon's picture

Oh, and yes, there is only one 'establish_decryption_key' line.

 

I have just found out that the original admin for this system did use the generic 'SCSI Tape' driver that comes with RHEL 5.3.  Apparently, back when this backup server was created, using the IBM drivers made the drives undetectable by NetBackup.  Fastforward to today, and the result of that is likely biting me in the butt.

Seems like I'll have to do an OS upgrade and/or driver update.

Jevon's picture

The solution was to upgrade the firmware.  I finally got around to getting IBM to provide me with a newer firmware:  C7QH for the ULT3580-TD4 LTO4 tape drive.

Nicolai's picture

I am using the inbox (system default) driver on RH 5.6 - no problems. I think i have read somwhere the inbox driver is the only supported driver by Symantec.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

Going to upgrade to OEL6.4 on Monday.  Will update how that goes.

Jevon's picture

Hello again.  Just an update.

 

I went for RHEL 5.3 to Oracle Linux 5.9.  I confirmed with the IBM Express Update utility that I have the latest BIOS on the QLogic HBA.

Still doesn't work.

I have opened up a support ticket with Symantec.  Let's see how they do.

Nicolai's picture

Thanks for the update. If Symantec can help you, Plese post the soloution to the issues.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Jevon's picture

Months later, and the problem has solved*.

What was the solution?  I'm really not sure.

I did notice during my testing that the data expiration dates were slightly off.  I had made a retention level of 2 hours, and the expiration date of newly assigned media would be set to yesterday.  Retention level of 2 days, expires today.  Somewhere, a time calculation was way off.  If that is the case, and timestamps are factored into the decryption process, than this could be a hint as to the cause of the problem.  But why Multiplexed tapes only?

How did I fix the timestamp issue?  No idea.  I restarted all NetBackup on the master/media server, and that behavior stopped.  For all I know, it could come back again, and when it does, it could cause this to happen again.  I have performed this restart several times prior.

I also upgraded to 7.5.0.6.  Could there be anything in there that fixed it?  I didn't see anything related in the patch notes.

If it comes up again, I'll share.

Edit: It is also very uncomfortable to lose trust in the integrity of ones backup because of something like this.

Rick Stone's picture

It is unlikely that the retention level or timestamps had anything to do with not being able to decrypt your data.  With KMS encryption, during a read process, the tape drive will encounter an encrypted block of data on the tape and notify the BPTM process via SCSI commands.  BPTM will then communicate with NBKMS on the master server to retrieve the encryption key.  Without actually seeing a restore failure from a full BPTM log we can't know if the tape drive is telling us if the data is encrypted or not.

One of the comments earlier mentioned that this is likely a firmware issue, and I would agree to that.
What model of tape drive are you using and what is the firmware version?  It wouldn't be the first time that we have encountered defective firmware that would cause this issue.  Check out the following technote:

www.symantec.com/docs/TECH204085

SOLUTION
Jevon's picture

1st Tape Library: ult3580-td4 Firmware: C7Q4

2nd Tape Library: ULT3580-HH5 - Firmware: B171

I don't recall seeing that doc before.  Yes, it could very much be the cause as the firmware is the same.  Funny, since I upgraded the firware quiet a bit since this problem was first detected.  Also, the only time it came up was when MPX was involved.  All non-multiplexed tapes decrypted fine.

If the problem comes up again, I'll post some BPTM details.

mph999's picture

There is a bug that affects certain makes of drive due to a firmware issue.
The drive is unable to correctly position after reading the backup header for the restore that is requested.
Although this is not a NetBackup fault, there is a EEB that provides a workaround by getting the drive to read trough the headers as opposed to an ' mt -f ' which is what is failing.

I'll try and find the details for his issue tomorrow (Fri)

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
SOLUTION
jim dalton's picture

Oddly I seem to be experiencing this issue...mpx encrypted backup, one key in the keygroup. Backup ( a good while ago) clean

16:39:47.340 [21900] <2> io_position_for_read: positioning 004291 to file number 25
16:39:47.340 [21900] <2> io_position_for_read: locating to absolute block number 291572
16:41:24.021 [21900] <2> io_position_for_read: locate block is done
16:41:24.132 [21900] <2> io_position_for_read: successfully positioned 004291 to file number 25, mpx_header = 3
16:41:24.134 [21900] <2> establish_decryption_key: next block encryption status: LON 0x00000000000472f7, algorithm index 1, encryption status 0x3
 

Restore job fails with permission denied on the device.

Had a firmware update on my LTO4s recently...would love to know the answer.

Im on 7504 Solaris Sparc.

Jim

jim dalton's picture

16:41:25.261 [21900] <16> mpx_read_data: cannot read image from media id 004291, drive index 3, Permission denied
 

Then error 40 from tar, then error 85 from media manager.

Love it.

Jim

Rick Stone's picture

16:41:24.134 [21900] <2> establish_decryption_key: next block encryption status: LON 0x00000000000472f7, algorithm index 1, encryption status 0x3
 

A return of 0x3 for encryption status indicates a response from the tape drive telling us that the data on the tape is not encrypted.  Because the drive is telling us the data is not encrypted, BPTM will not retrieve the encryption key details from NBKMS on the master server and will try to restore the data, which will result in status 85 in BPTM.

Since it is the drive telling NetBackup (via SCSI command) that the data is not encrypted, I would take a look at your firmware.  You say you just updated it recently.  What version are your tape drives currently running?

mph999's picture

Opps apologies, forgot about this, I'll log in later and check the etracks.
M

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
mph999's picture

OK, found it.

eTrack  3093882 - and I have just realised this is the eTrack I posted previously.

This seems to be reported against 3592 drives, I am not sure what type you have.

It might be reasonable to try the etrack, but, depending on how the EEB works, it may be a code change that only affects 3592 drives, if you have another type, then it may need updating, but for this to happen Engineering will have to investigate that the issue is the same.

Perhaps you can post the case number up here, I'll contact the TSE involved and suggest they escalate to BL.

This issue / case has been going on too long.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
jim dalton's picture

Good,interesting info Rick. I'm usng LTO4, and worried it thinks the data arent encrypted. Where does this info 0x03 originate? From the drive, by looking at the tape? If so then looking like a drive f/ware issues.

My drives are all at c7q0, im ultrium4.

Jim

jim dalton's picture

BBH4 works (IBM ultriums), just tested it.But it sems to me this is 2011 f/ware?