Verbose output truncated in bpdbjobs -all_columns?
I have some tools I wrote that trawl through "bpdbjobs -all_columns -jobid <jobid>" to pull out the writing speed over the course of the job. This lets me graph it and see how things are going. (I have a lot of netapps that when they do NDMP incrementals, they do nothing for a while (hours often) and then write at speed. So an "average speed" isn't very useful).
I mostly did this back under 6.0. I've noticed that I have a lot of 6.5 jobs where the information in bpdbjobs just stops. It might be something to do with 6.5, or it might be the jobs themselves have changed somehow.
As an example, this job began at 2/17/2011 21:53 and completed at 2/20/2011 08:58 (over 59 hours, ~7TB job). If I look at the end of bpdbjobs -all_columns for it, I get this:
[snip] 02/18/11 16:04:39 - 40005 KB written - 41157.113 KB/sec 02/18/11 16:04:39 - 40005 KB written - 41157.348 KB/sec 02/18/11 16:04:40 - 40005 KB written - 41157.586 KB/sec 02/18/11 16:04:41 - 40005 KB written - 41157.816 KB/sec 02/18/11 16:04:41 - 40005 KB written - 41158.043 KB/sec 02/18/11 16:04:42 - 40005 KB written - 41158.277 KB/sec 02/18/11 16:04:42 - 40005 KB written - 41158.504 KB/sec ... 6967081863 46377009 292129 33078 [snip]
Basically, the last performance/timing bit that it logged was less than 24 hours into the job. I don't have any more performance logs after that point.
Anyone ever seen this or have any idea what limits there might be on how much data is available through bpdbjobs? System is currently Linux, single master/media server, lots of NDMP hosts, 6.5.5, but I've run these tools on 6.0 systems as well.
Darren
Comments
Darren, I have weekend
Darren,
I have weekend backups that run for more than 24 hours. I think it's too late for this week but I could check next Monday ...
Can you share your code that extracts the data from bpdbjobs? Just running it from the command line a little hard to read.
GlenG
NBU master 7.0.1 on Sun X4500, Solaris 10
The code is pretty complex,
The code is pretty complex, but talking with someone else I just learned that it's all in the trylogs and much easier to see there:
Under ${install}/netbackup/db/jobs/trylogs/<job>.t, each of the performance lines begins with a "KBW ". Look there at the end of the file.
Now most of my jobs are good and the performance data reaches the end of the job. But some of them don't. I'm going to see if I can find something about the ones that don't (number of lines, size of data, etc.).
Darren
trylog differs from 'bpdbjobs'
Well, lo and behold, the "bpdbjobs" output is truncated, but the <job>.t trylog file is not.
So I need to adapt my program to be able to pull a trylog file directly. Slightly more restrictive, but much better than not having the data at all.
Last bits from bpdbjobs:
End of the same job trylog:
And 1299625969 => is March 8, 15:12 in my timezone. So bpdbjobs is just truncating it.
A little more background
I think we limited it to....50 entries? (Count 'em and see) due to concerns about overflow, memory issues, core dumping, or something equally scary. Or maybe it was 1000 entries? Or I may have this configuration completely mixed up with something else. Come to think of it, this post isn't useful at all. Good thing your question is already answered! ;-)
It's a lot more than that.
It's a lot more than that. That's why I wasn't certain what was going on. It's more than 65K.
# bpdbjobs -all_columns -jobid 294895| perl -F, -lane 'foreach (@F) {print;}' | grep -c "KB written -" 68422But:
I can well believe truncating it helps a lot of things (woe unto those that do a 'bpdbjobs -all_columns' on all jobs. I have to do that to collect some information from remote servers. Takes more than 5 minutes on some busy guys).
Thanks, though.
Would you like to reply?
Login or Register to post your comment.