Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Too Many Jobs Active = Status 84's

Updated: 22 May 2010 | 24 comments
Randy Samora's picture
0 0 Votes
Login to vote

Has anyone had any experience with this. It just started happening the past couple of days and tonight's Fulls should be an adventure. I have about 600 jobs that begin at 6:00 PM. I have a 20 drive library and multiplexing set to 3 per drive. Even when everything is running great, I dont' think I've ever actually seen 60 jobs running at once. For the past couple of nights, there are over 90 Active jobs trying to write to tapes and I get Status 84's out the wazoo. That goes on for maybe an hour and then it all seems to settle down and start running smoothly. Most of the jobs only fail the one time, requeue, and finish fine. Some jobs fail completely and when I restart them, they finish fine. Everything runs great until the 10 o'clock wave of jobs begin and it's Status 84 time again for another 30 or 45 minutes. I have 7 media servers sharing the library and it's as if they have forgotten that the library is being shared and everyone is trying to write at the same time. Each storage unit is set to use only 4 drives max. This is a Windows environment and i'm on 5.1 MP6 and Windows server 2003. Any ideas?

Thanks,
Randy

discussion Filed Under:

Comments

Rakesh Khandelwal's picture
16
Mar
2007
0 Votes 0
Login to vote

Something to start with .......


This issue has been seen on Windows 2003 Media servers that are using a value larger than 64k (56636 bytes) in the SIZE_DATA_BUFFERS after upgrading to Windows 2003 SP1. The issue is a change to tape.sys in the SP1 patch that limits the block size to <64k for Tape transfers. Microsoft is aware of this issue and has published a knowledge base article and hotfix to correct this (see below)

If the patch cannot be applied for any reason as a work around the value in SIZE_DATA_BUFFER can be set to 64k or below but this may effect backup performance. The SIZE_DATA_BUFFERS touchfile can be found in the netbackup\db\config directory and should contain a byte value that is a multiple of 1024, below are a few examples.

64k = 56636

128k = 131072

256k = 262144

Here is a link to the Microsoft Knowledge base article

http://support.microsoft.com/?kbid=907418Message was edited by:
RK

Rakesh Khandelwal's picture
16
Mar
2007
0 Votes 0
Login to vote

STATUS CODE: 84 "Media Write Failed" error occurs consistently on certain media that are not known to be defective.

http://support.veritas.com/docs/277081

Exact Error Message
Media Write Failed (<84>)

<16> io_write_block: write error on media id N00041, drive index 2, writing header block, 19

Details:
Overview:
Status Code 84 "Media Write Failed" error occurs consistently on certain media that are not known to be defective.

Troubleshooting:
Please look for messages similar to the following. Observe the "19" at the end of the line, following the write error on the media header. This is a message number reported by the OS, that can be translated with the net command. Typing net helpmsg 19 from a command prompt reports "The media is write protected." When this message is seen, write protection is the cause.

Master Log Files: N/A

Media Server Log Files:
BPTM:
<16> io_write_block: write error on media id N00041, drive index 2, writing header block, 19

Client Log Files: N/A

Resolution:
Remove the write protection from the tape by adjusting the write protect notch, or by following the recommended procedure from the media manufacturer on how to disable write protection. If media is intentionally meant to be write protected, either remove the media from the robot or freeze the media in NetBackup (tm) so that it is not available for backup attempts. Media can be frozen and unfrozen using the bpmedia command. For more information on this procedure, please see the NetBackup Commands for Windows Guide.

http://support.veritas.com/docs/275076
http://support.veritas.com/docs/275066

In-depth Troubleshooting Guide for Exit Status Code 84 in NetBackup (tm) Server / Enterprise Server 5.0 / 5.1

http://support.veritas.com/docs/273908Message was edited by:
RK

Stumpr's picture
16
Mar
2007
0 Votes 0
Login to vote

7 media servers X 4 drives = total of 28 tape drives
simply doesn't add up. Your short 8 tape drives.oh...you already knew that !

I have about 600 jobs that begin at 6:00 PM
.....That goes on for maybe an hour
how about starting some of them at 7PM?

Everything runs great until the 10 o'clock wave of jobs
....again for another 30 or 45 minutes
How about starting some of these at 11 o'clock

too simple? j/k

seriously, try adjusting the backup windows so that you don't get slammed all at once. I think there are also some changes that you should probably make for tuning if you do subscribe to the big bang theory. I don't remember it off hand but there is a technote out on it.

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

I used to have over 1200 that started at 6:00. Sad story but we finally hired someone to start helping out so I could find time to fine tune our installation and then last week one of my guys was killed in a motorcycle wreck. Now we're back to having just enough time to put the fire out and move on to the next task.

I am in the process of prioritizing Production Critical boxes and i'm going to kick those off first and then move down the food chain of clients. I'm looking into Rakesh's suggestion because I also saw an error about block size and the tape not accepting a certain size but I don't recall the exact error. I thought it was strange because i hadn't changed anything regarding block size but we did just roll out the latest MS patches.

Stumpr's picture
16
Mar
2007
0 Votes 0
Login to vote

> one of my guys was killed in a motorcycle wreck

Randy, I'm so sorry for you. It is hard losing someone like that.

When I was in the Navy during an overhaul period in Bremerton Washington shipyards, I had 2 room mates die from motorcycle accidents. One hit a tree and the other one hit a mailbox. They were both young. mid/early twenties. I haven't been on a bike since then.

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

Last year I FINALLY made up my mind that I was going to buy a motorcylce with my tax refund this year. I now finally made up my mind that i never will.

Here's the error I'm seeing on one of the media servers.

rror bptm(pid=7912) The tape device at index -1 has a maximum block size of 32768 bytes, a buffer size of 65536 cannot be used

Stumpr's picture
16
Mar
2007
0 Votes 0
Login to vote

STATUS CODE 84: After applying Service Pack 1 in Windows 2003, Status 84 errors occur during tape backup. An additional error appears in the Activity Monitor, noting that the buffer size cannot be used.
http://support.veritas.com/docs/278837

Backup fails with a Status Code 84.
http://support.veritas.com/docs/246554Message was edited by:
Bob Stump

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

Oh great, I just switched to the latest HP driver because it was newer. But it was working fine before the patch. Maybe I need to run the driver install again to make sure?

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

From looking through my bptm logs, it looks like all of my drives are set to a 32k limit. How did that happen or where does that get configured? Would that be the driver? I'm using the latest HP driver now; should I contact HP? Or just reinstall the Veritas driver i was using before?

Stumpr's picture
16
Mar
2007
0 Votes 0
Login to vote

I would reinstall the Veritas driver

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

I have a SAN Media server that fails immediately with a Status 84 and gives me the 32k vs. 64k buffer error. If I point it at one of the media servers, the backup runs fine. I have verified that they are all using the same HP driver. Some work, some don't. At this point I'm getting so close to tonight's backups that I'm afraid to change anything. I'd rather just restart the failed jobs and contact HP or Symantec on Monday unless someone has a quick solution. We're patching a remote site tonight and I'm wondering if I'm going to do the same to the backups there.

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

Since applying the MS patches last Friday, I've had 586 Status 8x regarding media. The 3 weeks prior to the patch, I had 137. Seems like I found the culprit but I can't do anything about it for now. No reboots allowed until after the weekend. This should be a fun weekend.

Stumpr's picture
16
Mar
2007
0 Votes 0
Login to vote

any changes in throughput? perhaps active jobs are running slower thus longer?

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

The SAN media server that was failing immediately with Status 84's had a NUMBER_DATA_BUFFERS file set at 128 but didn't have a SIZE_DATA_BUFFERS file. I created the SIZE_DATA_BUFFERS file and set it at 32668 and the backup is running fine. It's running slower than normal, A LOT slower, but that's to be expected I would guess. My enterprise is too big to try to fix this now. I'm going to have to wade through the errors over the weekend and fix this on Monday.

Do I ask HP for a new driver; do I lay this on them and make them fix it? Or will installing the old Veritas driver fix it? Why wouldn't the MS patch affect the Veritas driver? Or would it have affected the Veritas driver if that's what I had loaded at the time?

Sorry about all of the questions but I've had my head stuck so far up NetBackup's tail end that I've forgotten all of my microsoft basics.

Rakesh Khandelwal's picture
16
Mar
2007
0 Votes 0
Login to vote

Any reason you are using buffer size only 32K instead of 64K?

I would suggest install veritas driver unless NBU release notes says anything specific about the drive/library type.

Randy Samora's picture
16
Mar
2007
0 Votes 0
Login to vote

Apparently the HP driver is restricting me to 32k. The normal backups were failing with Status 84 with a message about the drive only being able to handle 32k. I added the SIZE_DATA_BUFFER file, set it to 32k, and the backups ran fine. I don't want to use 32k but something is restricting me to that number. I'm sure rolling back the driver to the previous Veritas release will fix it but that will require a reboot and i'm out of my reboot window. I'll have to hope for the best this weekend and fix it on Monday.

I would assume applying the first patch you recommended would also fix the problem?

Stumpr's picture
19
Mar
2007
0 Votes 0
Login to vote

Randy,
My current contract expires in 1 month. I am talking with a company called Rackspace about an opportunity in San Antonio. Do you know anything about this company? I thought since you lived only a couple hundred miles that you may have heard of them.
http://www.rackspace.com/index.php

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Fred2010's picture
19
Mar
2007
0 Votes 0
Login to vote

Bob,

Good luck on your future job!

Do me a favor though: Tell 'em to get rid of that 'automatic-chat-to-a-live-person thingy' on their homepage ;)Message was edited by:
Manfred Engels

Randy Samora's picture
20
Mar
2007
0 Votes 0
Login to vote

I have to give it to Rakesh on this one. The "19" at the end led me to the culprit. When I first read the response I dismissed it because certainly no one would load so many write protected tapes to cause hundreds of Status 84's. If I could list names I would but I doubt they want that kind of publicity. I pulled 30 tapes just from last night's errors and I'm going back through the logs now to see how many more there are. I have jobs running constantly so it's difficult for me to open the door and just look.

Other than the failure message, will NetBackup tell me if a tape is write protected BEFORE it tries to use it?

Rakesh Khandelwal's picture
20
Mar
2007
0 Votes 0
Login to vote

There is no way for NetBackup to find out if tape is write protected until it tries to write on the tape.

Other than error in bptm logs I think you may see mdia is write protected and NetBackup freezing it in your bperror -media logs

Randy Samora's picture
20
Mar
2007
0 Votes 0
Login to vote

I can live with that now that I know what the problem is. Thanks again, Rakesh.

Rakesh Khandelwal's picture
20
Mar
2007
0 Votes 0
Login to vote

Glad I was able to help.

Thanks for the points :)

Stumpr's picture
21
Mar
2007
0 Votes 0
Login to vote

Believe it or not, I had an operator place the tapes upside down in the library. The inventory went OK. I'm still not sure how the barcode reader could read a tape label that was upside dowm. But it did! Then when Netbackup tried to use the tape it couldn't load the tape upside down and so it would freeze it.
My problem at the time was that I could not afford to have so many tapes unavailable. I needed them for backups.

Bob Stump VERITAS - "Ain't it the truth?" Incorrigible punster -- Do not incorrige

Randy Samora's picture
21
Mar
2007
0 Votes 0
Login to vote

I used to leave tapes on one guy's desk with a sticky note letting him know the tapes were ready to be put back in the library. And that's exactly what he did; sticky note and all. I'm telling you, there's a cartoon strip in the making here. NBU Peanuts.