Video Screencast Help
Backup and Recovery Community Blog

NetBackup Scheduling Timelines

Created: 29 Sep 2011 • Updated: 17 Oct 2011
AlanTLR's picture
0 0 Votes
Login to vote

I love using NetBackup OpsCenter.  Using its reports in conjunction with the Administrator Console can greatly help reduce triage time, and help provide reports that your manager and/or director can understand, when speaking to them in terms of cost savings, backup windows, storage availability, and data growth.  OpsCenter helps to take the data from Netbackup and put it into a nice visual form.

What it doesn't do on its own, though, is create a tape drive usage report in a format that I want (though, if I had the Analytics license, I could probably program this).  So, I created a not-so-simple perl program to create my intended usage table.

My intended table is supposed to look like this:

time Drive 1 Drive 2 Drive 3 Drive 4
00:00 client1 client2    
01:00 client1 client2 client3  
02:00 client1   client3  
03:00 client4 client5 client3  
04:00 client4 client5 client6  
05:00 client4 client7    
06:00 client4 client8    
07:00 client9 client10    
08:00        
... ... ... ... ...

And so on.  The reason for this is because I want to see how full my tape drive usage is at a given time.  OpsCenter has a Drive utilization report, but it only gives the average utilization over a specific period of time.  If I wanted to get the output above from that, I would have to run it for each hour of each day.

So, to start, we need data.  Where can we get the data?  We can either get it from OpsCenter or from NetBackup's Admin console.  Because I wanted as much data as I could get, I chose to use OpsCenter's Tabular Backup Report, as it can give me start and end times.  I filtered out all the policies that use disk storage and I also filtered on the schedule/level type of differential incremental, full, and incremental.  This should only give me backups and not restores.  Unfortunately, there is no filter on throughput, so I will filter that out later when processing.

After changing the columns, I export that report to a CSV for processing by my perl script.  Most of the placement of the columns don't matter, except the first 5: start time, end time, client name, policy name, and throughput.  I bring the CSV up in Excel to make sure the columns are sorted how I want them (earliest start time first).  The perl script, when run against the CSV file will do the following:

1. Create an "array" of empty "drives."

2. Get the first line, and set that as the "current time" as that time.

3. Check if any of the drives are in use and if so,

4. Check against the end time for that client/policy

5. Unallocate the drive if it's reached the end time.

6. Check to see if throughput for this backup is greater than 0 and if so, assume a backup is run

7. Allocate a drive.

8. Mark that drive as in use by that client/policy.

9. Save the end time

10. Move to the next line.

11. If the next line is beyond our sample rate (I used 15 minutes), then we increment our "current time."

 

The process is pretty extensive and may take some time, depending on how you've filtered your data, and how large your environment is.  It also is as accurate as your sample data (CSV) is.  You may or may not find it useful.  I know that I did, especially when trying to shuffle backups around to fit within backup windows, being limited to a certain number of tape drives.

Now that you have the idea, I'll create some dummy sample data. I know I want my start time and end time and I'll want to know either the name of the policy or the name of the client. I'll include both in my sample data, but I'll just need one in my report.  For now, I'm only going to focus on data within the 24 hours of a day.
I'll also exclude backups that have 0 bytes associated to them. This is often the parent job or is a duplicate. Alternatively, I can search to see if the job has a parent. These exclusions/inclusions will be done from the data aggregation side and will not be covered here. To create my sample data, I'll start by creating what I want my chart to look like, then create the data from it.

 

Date/Time

Drive 1

Drive 2

Drive 3

Drive 4

Drive 5

Drive 6

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

0130

hagar

honi

snert

 

 

 

0200

hagar

helga

snert

 

 

 

0230

 

helga

snert

 

 

 

0300

hamlet

 

snert

 

 

 

0330

hamlet

 

snert

 

 

 

0400

hamlet

 

snert

 

 

 

0430

hamlet

 

 

 

 

 

0500

hamlet

kvack

 

 

 

 

0530

hamlet

kvack

 

 

 

 

0600

 

 

 

 

 

 

0630

hernia

 

 

 

 

 

0700

 

 

 

 

 

 

 

From this table, I can see that hagar's backup started at 1:00 and ended by 2:30. In between that time, Honi's backup started at 01:30 and ended at 2:00, where helga's backup ran until 3:00. Snert's backup started the same time as honi's, but ended much later, at 4:30. hamlet had a backup start at 3:00 and end at 6:00. Kvack ran from 5:00 until 6:00. After 30 minutes of no backups, hernia's backup started at 6:30. Using the Start Time as my sort key for the data, the following CSV would be an appropriate set of data:

 

0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux

So the format I have for my CSV is “Start Time”, “End Time”, “client”, “policy”.

From Table 1, I can see that I want Time to be the major axis in gathering data. Time will increment as I sample the data. Notice, too, that two clients start their backup at the same time, so I will have to factor that in, as well.

Now, let's break down Table 1, comparing to my data set and see what's really going on. I'm starting out with a time set at midnight (0000), and I increment that by 30 minutes for each table row. I'm also going to say that my “current data line” is the first line of my data set. Since I sorted my data by the start time, I compare the start time of the current data line. If I haven't yet reached the start time, I increment my table row time. I do this until I find I've matched. When I've reached the start time of my current line, I “claim” the first available drive.

At the same time, I'm also looking for end times of “claimed” drives. But I realize that I should do this first, so that if a drive is “released” at 0300, that same drive can now be used at 0300. This is only a minor preference and can sometimes be erroneous, as I am rounding to the nearest half-hour. Because of this rounding, I will have to be careful not to omit small backups that ran for less than 15 minutes (0200-0210 would be listed as 0200-0200 and would cancel out), but I'll get to that later. In summary, my pseudo-code would look like this:

 

Time starts at 0000

Grab current data line

for each drive, if drive is claimed, if end time for claimed drive is current time, release drive.

If current data line's start time is current time, claim next unused drive; mark start time for the unused drive.

If current time has not exceeded current data line's start time yet, increment the time

Otherwise, I'll go to the next data line.

 

Now, Let's see how this pseudo-code stacks up to my sample data:

 

Time starts at 0000.

Grab current data line (0100, 0230, hagar, hagar-windows-full)

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0000 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

 

0000

 

 

 

 

 

 

 

Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0030

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0030 = 0100? No)...

If current time has not exceeded current data line's start time, increment the time:

 

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

Current data line: 0100, 0230, hagar, hagar-windows-full

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? No

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0100? YES!), Claim drive #1.

If current time has not exceeded current data line's start time, increment the time:

 

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

Current data line: 0130, 0200, honi, honi-data

Current Time: 0100

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0100 = 0130? No)...

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

Current data line: 0130, 0200, honi, honi-data

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? No

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#2).

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

Current data line: 0130, 0430, snert, snert-database-only

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? No

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0130? YES!), claim next available drive (#3), with end time of 0430.

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0130

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0130 = 0200? No)...

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

0130

hagar

honi

snert

 

 

 

Current data line: 0200, 0300, helga, helga-system-files

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0200) yet? Yes! Release this drive

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0200? YES!), claim next unclaimed drive (#2), with new end time of 0300

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

0130

hagar

honi

snert

 

 

 

Current data line: 0300, 0400, hamlet, hamlet-windows-full

Current Time: 0200

For each drive, if drive is claimed, ...

is drive 1 claimed? YES!

Have I reached my end time (0230) yet? No.

is drive 2 claimed? YES!

Have I reached my end time (0300) yet? No.

is drive 3 claimed? YES!

Have I reached my end time (0430) yet? No.

is drive 4 claimed? No

is drive 5 claimed? No

is drive 6 claimed? No

If current data line's start time is current time (Is 0200 = 0300? No)...

If current time has not exceeded current data line's start time, increment the time:

0000

 

 

 

 

 

 

0030

 

 

 

 

 

 

0100

hagar

 

 

 

 

 

0130

hagar

honi

snert

 

 

 

0200

hagar

helga

snert

 

 

 

 

Note that I run into a problem when I go to the next day. my clock only goes to 2330. The next day is 0000. What happens if my time is 2200 and the current data's start time is 0100? Because I'm testing with a less than (<), I'll have to either change the current data's start time to 0100 or change the basis of the current time (2330 would be -0030 and 2400 would become 0000).

 

SOLUTION

The above output is fine, as it's generated logically in my brain and just typed on here. but what about creating the output?  Because each line is a function of the time, I output every time the 'time' is incremented. So, from my pseudo-code, let's write a simple perl program to do what I've been doing by hand:

 

#!/bin/perl
use strict;
 
# Global variables
my $TIME=0000;
my $TINC=0030;
my $N_DRIVES=8;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
 
# First, I want to initialize all my drive usages:
# client, start time, and end time.
sub init_drives()
{
  local $IDX;
  foreach $IDX (0..$N_DRIVES-1)
  {
    $DRIVE[$IDX] = “”;
    $DRIVE_START[$IDX] = -1;
    $DRIVE_END[$IDX] = -1;
  }
}
 
sub nextempty
{
  local $IDX;
  foreach $IDX (0..$N_DRIVES-1)
  {
    if ( $DRIVE[$IDX] eq "" ) { return $IDX; }
  }
}
 
sub print_row()
{
   print “$TIME”;
  foreach $IDX (0..$N_DRIVES-1)
  {
    print “,$DRIVE[$IDX]”;
  }
  print “\n”;
}

 

sub main()
{
  my $IDX;
  init_drives;
  # Things I need before I can start processing:
  # 1. Current Time – got it (above)
  # 2. Current Line – Need to start on first line
  my $FD=open(FD,”<$INFILE”) or die “Cannot open: $!”;
  # I grab the first line of the file and extract the values.
  $CURR_LINE=;
  my ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) =
  split(','$CURR_LINE);
  # Now I can start my processing. I do this until the end of the file.
  do
  {
    # First, I'll check to see if each drive is claimed and if it is, I check if
    # it's reached its end time. If so, I release that drive.
    foreach $IDX (0..$N_DRIVES-1)
    {
      if ( $DRIVE[$IDX] ne “” && $DRIVE_END[$IDX] eq $TIME )
      {
        # To release the drive, I reset all values.
        $DRIVE[$IDX] = “”;
        $DRIVE_END[$IDX] = -1;
        $DRIVE_START[$IDX] = -1;
      }
    }
    # Second, I need to check if my data line's start time matches
    # the current time. If it is, I claim a drive. If it isn't, I increment
    # the time.
    if ( $TIME == $DATA_START )
    {
      $NEXT=nextdrive;
      $DRIVE[$NEXT] = $DATA_CLIENT;
      $DRIVE_START[$NEXT] = $DATA_START;
      $DRIVE_END[$NEXT] = $DATA_END;
    } else
    {
      print_row;
      $TIME = $TIME + 30;
    }
  } until (eof(FD));
  close(FD);
}

 

 

Of course, if you try to run the code, you'll find that it has a few issues with it, but I'll add a few subroutines and clean it up a bit.. Let's break down the subroutines first, though. I initialize the drives with my init_drives subroutine:

 

#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{
  foreach $IDX (0..$N_DRIVES-1)
  {
    $DRIVE[$IDX] = "";
    $DRIVE_START[$IDX] = -1;
    $DRIVE_END[$IDX] = -1;
  }
}

 

Second, I need a subroutine that finds my next empty drive:

#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{
  foreach $IDX (0..$N_DRIVES-1)
  {
    if ( $DRIVE[$IDX] eq "" ) { return $IDX; }
  }
}

 

Third, I am incrementing by 30 minutes each time, but I'm using basic math to do it, so 30 + 30 = 60, which isn't what I want. Also, I want to round everything to the closest increment. In this case, I create one subroutine to do both rounding and fixing:

#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{
  my ( $time ) = @_;
  my $minutes = $time % 100;
  my $hour = $time - $minutes;
  my $rem = $minutes % $TINC;
  if ( $rem < minutes =" $minutes">
  else { $minutes = $minutes + $TINC - $rem; }
  if ( $minutes < time =" $hour">
  else { $time = $hour + 100; }
  if ($time >= 2400) { $time = $time - 2400 ;}
  return $time;
}

We also need a subroutine that outputs everything I have in my basic CSV format. Let's call this “print_row”, since I'm printing one row of CSV every time.

#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{
  print "$TIME";
  foreach $IDX (0..$N_DRIVES-1)
  {
    print ",$DRIVE[$IDX]";
  }
  print "\n";
}

Finally, I'll need to find out what the latest end time is of each allocated drive. That is, what's the latest time that all drives will be released?

#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{
  my $RETURNVAL=0;
  foreach $IDX (0..$N_DRIVES-1)
  {
    if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }
  }
  return $RETURNVAL;
}

Now, I've added a debug function to help when things get really hairy. This isn't necessary, but it helps me figure out if my variables are getting updated when they're supposed to, or not getting updated when they're not supposed to.

#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{
  print "TIME: $TIME\n";
  print "DATA_START: $DATA_START\n";
  print "DATA_END: $DATA_END\n";
  print "DATA_CLIENT: $DATA_CLIENT\n";
  print "LAST END TIME: $LAST_END_TIME\n";
  print "CURR_LINE: $CURR_LINE\n";
}

And finally, here's my main program, as seen above, but tweaked with the added functions. Note that I've had to add checks for the do..while loop to see if I'm at the end of the file, and my current line is blank. This is because if I reach the end of the file, it will skip over that last line, which is something I definitely don't want.

#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{
  init_drives;
  # Things I need before I can start processing:
  # 1. Current Time "" got it (above)
  # 2. Current Line "" Need to start on first line
  my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";
  # I grab the first line of the file and extract the values.
  $CURR_LINE=;
  ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
  # Now I can start my processing. I do this until the end of the file.
  do
  {
    # First, I'll check to see if each drive is claimed and if it is, I check if
    # it's reached its end time. If so, I release that drive.
    foreach $IDX (0..$N_DRIVES-1)
    {
      if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )
      {
        # To release the drive, I reset all values.
        $DRIVE[$IDX] = "";
        $DRIVE_END[$IDX] = -1;
        $DRIVE_START[$IDX] = -1;
      }
    }
    # Second, I need to check if my data line's start time matches
    # the current time.
    # If it is, I claim a drive, and go to the next line.
    # If it isn't, I print the output and increment the time.
    if ( $TIME == $DATA_START )
    {
      $NEXT=nextempty;
      $DRIVE[$NEXT] = $DATA_CLIENT;
      $DRIVE_START[$NEXT] = $DATA_START;
      $DRIVE_END[$NEXT] = $DATA_END;
      $CURR_LINE=;
      ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
    } else
    {
      print_row();
      $TIME = round_to_incr($TIME + $TINC);
    }
    $LAST_END_TIME = round_to_incr(get_last_end_time());
  } until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );
  # I want to stop processing after (a) I've reached the end of the file, and (b)
  # and (b) I've gone past the last end time.
  print_row();
  close(FD);
}

And finally, I need my variables:

# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;

Piecing it all together, I have the following code:

#!/bin/perl
#####################################################################################
# #
# generate_drive_usage.pl() #
# #
# Written By: Alan T. Landucci-Ruiz #
# #
# Abstract: This program generates CSV output of Tape drive usage, based on CSV #
# input. It is designed to help facilitate scheduling of tape drive #
# allocations when creating and moving backup schedules. #
# #
# #
# Args: none #
# #
# Variables: #
# $TIME - The time component of my output CSV. #
# $TINC - The increment component of my CSV. #
# $N_DRIVES - The number of drives I have. #
# @DRIVE - My "DRIVE" array: holds the string of allocation. #
# @DRIVE_START - Time that the drive is allocated. #
# @DRIVE_END - Time that the drive is unallocated. #
# $INFILE - The input file csv. #
# $CURR_LINE - The line being processed from the input CSV. #
# $NEXT - Index of my next empty drive. #
# $IDX - Index counter. #
# $DATA_CLIENT - Client that is allocating the drive. #
# $DATA_START - Start time for the client. #
# $DATA_END - End time for the client. #
# $DATA_POLICY - Policy of the client that is allocating the drive. #
# $LAST_END_TIME - The latest end time of all drives. #
# #
# Known Issues: #
# Currently, if a last end time is the next day's time, but earlier than the #
# currently known last end time, then it will use the currently known last end #
# time instead of the earlier one the next day. #
# e.g., 0200 tomorrow will be considered earlier than 1400 today. #
# #
#####################################################################################
use strict;
# Global variables
my $TIME=0000;
my $TINC=30;
my $N_DRIVES=6;
my @DRIVE;
my @DRIVE_START;
my @DRIVE_END;
my $INFILE='NetBackup_Export.csv';
my $CURR_LINE;
my $NEXT=0;
my $IDX;
my $DATA_CLIENT;
my $DATA_START;
my $DATA_END;
my $DATA_POLICY;
my $LAST_END_TIME=-1;
 
#####################################################################################
# init_drives() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, blank out the indexed drive (unallocating it), and set start #
# and end times to '-1'. #
#####################################################################################
sub init_drives()
{
  foreach $IDX (0..$N_DRIVES-1)
  {
    $DRIVE[$IDX] = "";
    $DRIVE_START[$IDX] = -1;
    $DRIVE_END[$IDX] = -1;
  }
}
 
#####################################################################################
# nextempty() #
# A routine to find the next empty drive #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# - For each index, if the indexed drive is blank (i.e., empty), return that #
# that number. #
#####################################################################################
sub nextempty
{
  foreach $IDX (0..$N_DRIVES-1)
  {
    if ( $DRIVE[$IDX] eq "" ) { return $IDX; }
  }
}
 
#####################################################################################
# round_to_incr() #
# Rounds the argument to the next '$TINC' minute mark off the hour (up or down) #
# #
# Args: #
# $time - This is the time that needs to be rounded in HHMM format. #
# #
# Pseudocode: #
# - Grab the minutes by getting the modulus of $time and 100 #
# e.g. $time = 1422 --> 1400 = (1422 % 100) = 22 #
# - Grab the hour by subtracting the minutes from the time. #
# 1400 = 1422 - (22) #
# - Find out how close I are to the $TINC minute mark by creating my $rem #
# variable: my minutes modulus $TINC (e.g., above, 22 %30 = 22). #
# - If my remainder is less than 15, I round down, by subtracting the remainder #
# of my modulus from the actual minutes. #
# e.g. $minutes = 44 --> 44 - (44 % $TINC) = 44 - 14 = $TINC. #
# - Otherwise, I round up by adding the difference of ther remainder and $TINC. #
# $minutes = 22 --> 22 + 30 - (22 % 30) = 22 + 30 - 22 = 30 #
# $minutes = 48 --> 48 + 30 - (48 % 30) = 48 + 30 - 18 = 60 #
# - If my minutes component is less than 60 (less than 1 hour), I add that to #
# my hour component. Otherwise, I just ad 100 to the hour component to get #
# the rounded time. #
# - If my time has exceeded or is at 2400, I subtract 2400. #
# #
#####################################################################################
sub round_to_incr
{
  my ( $time ) = @_;
  my $minutes = $time % 100;
  my $hour = $time - $minutes;
  my $rem = $minutes % $TINC;
  if ( $rem < minutes =" $minutes">
  else { $minutes = $minutes + $TINC - $rem; }
  if ( $minutes < time =" $hour">
  else { $time = $hour + 100; }
  if ($time >= 2400) { $time = $time - 2400 ;}
  return $time;
}
 
#####################################################################################
# print_row() #
# The main printing function of my program; prints the time and drive allocations #
# for each drive. #
# #
# Args: none #
# #
# Pseudocode: #
# - Print the time in HHMM format #
# - Starting with the lowest index (0), #
# For each index, print a comma, then the indexed drive allocation. #
# - Print carriage return. #
#####################################################################################
sub print_row()
{
  print "$TIME";
  foreach $IDX (0..$N_DRIVES-1)
  {
    print ",$DRIVE[$IDX]";
  }
  print "\n";
}
 
#####################################################################################
# get_last_end_time() #
# Finds the latest end time for the current drive allocations. At this point, #
# does not factor if the end time is the next day. #
# #
# Args: none #
# #
# Pseudocode: #
# - Starting with the lowest index (0), #
# For each index, if the drive end time is greater than the last one I checked, #
# store that value to return. #
# - Return value stored. #
#####################################################################################
sub get_last_end_time()
{
  my $RETURNVAL=0;
  foreach $IDX (0..$N_DRIVES-1)
  {
    if ( $DRIVE_END[$IDX] > $RETURNVAL ) { $RETURNVAL=$DRIVE_END[$IDX]; }
  }
  return $RETURNVAL;
}
 
#####################################################################################
# print_debug() #
# A simple subroutine to print out the current variables. #
# #
# Args: none #
# #
#####################################################################################
sub print_debug()
{
  print "TIME: $TIME\n";
  print "DATA_START: $DATA_START\n";
  print "DATA_END: $DATA_END\n";
  print "DATA_CLIENT: $DATA_CLIENT\n";
  print "LAST END TIME: $LAST_END_TIME\n";
  print "CURR_LINE: $CURR_LINE\n";
}
 
#####################################################################################
# main() #
# The main body of the program. #
# #
# Args: none #
# #
# Pseudocode: #
# - Initialize the drive variables (call sub init_drives). #
# - Open the $INFILE for reading. #
# - Read in the first line. #
# - Split that line into my fields: #
# $DATA_START $DATA_END, $DATA_CLIENT, $DATA_POLICY #
# - Start processing within a do-while-loop: #
# - For each index of drives, if the drive is claimed and the drive end time #
# is at the current time, then I release the drive (reset indexed variables). #
# - If the current time is the same as the start time from the line I just got, #
# then claim the drive, by storing the client name, start, and end times. #
# I also grab the next line from my input file $INFILE. #
# Otherwise, I print out my row and increment the time by $TINC minutes. #
# - I find out when my latest end time for allocation is and I store it. #
# - processing ends when (1) I've reached the end of the file, and (2) the time #
# has passed the latest end time. #
# - I close my input file. #
# #
#####################################################################################
sub main()
{
  init_drives;
  # Things I need before I can start processing:
  # 1. Current Time "" got it (above)
  # 2. Current Line "" Need to start on first line
  my $FD=open(FD,"<$INFILE") or die "Cannot open: $!";
  # I grab the first line of the file and extract the values.
  $CURR_LINE=;
  ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
  # Now I can start my processing. I do this until the end of the file.
  do
  {
    # First, I'll check to see if each drive is claimed and if it is, I check if
    # it's reached its end time. If so, I release that drive.
    foreach $IDX (0..$N_DRIVES-1)
    {
      if ( $DRIVE[$IDX] ne "" && $DRIVE_END[$IDX] == $TIME )
      {
        # To release the drive, I reset all values.
        $DRIVE[$IDX] = "";
        $DRIVE_END[$IDX] = -1;
        $DRIVE_START[$IDX] = -1;
      }
    }
    # Second, I need to check if my data line's start time matches
    # the current time.
    # If it is, I claim a drive, and go to the next line.
    # If it isn't, I print the output and increment the time.
    if ( $TIME == $DATA_START )
    {
      $NEXT=nextempty;
      $DRIVE[$NEXT] = $DATA_CLIENT;
      $DRIVE_START[$NEXT] = $DATA_START;
      $DRIVE_END[$NEXT] = $DATA_END;
      $CURR_LINE=;
      ($DATA_START, $DATA_END, $DATA_CLIENT, $DATA_POLICY) = split(',',$CURR_LINE);
    } else
    {
      print_row();
      $TIME = round_to_incr($TIME + $TINC);
    }
    $LAST_END_TIME = round_to_incr(get_last_end_time());
  } until ( eof(FD) && $TIME == $LAST_END_TIME && $CURR_LINE eq "" );
  # I want to stop processing after (a) I've reached the end of the file, and (b)
  # and (b) I've gone past the last end time.
  print_row();
  close(FD);
}
main;

 

 

 

So, let's see how this program stacks up on our sample data:

 

~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $

 

 

Well, that looks pretty good so far. Let's double the data (i.e., add the same data for next day):

 

~ $ cat NetBackup_Export.csv
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
0100, 0230, hagar, hagar-windows-full
0130, 0200, honi, honi-data
0130, 0430, snert, snert-database-only
0200, 0300, helga, helga-system-files
0300, 0600, hamlet, hamlet-windows-full
0500, 0600, kvack, kvack-policy
0630, 0700, hernia, all-linux
~ $ ./generate_drive_usage2.pl
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
700,,,,,,
730,,,,,,
800,,,,,,
830,,,,,,
900,,,,,,
930,,,,,,
1000,,,,,,
1030,,,,,,
1100,,,,,,
1130,,,,,,
1200,,,,,,
1230,,,,,,
1300,,,,,,
1330,,,,,,
1400,,,,,,
1430,,,,,,
1500,,,,,,
1530,,,,,,
1600,,,,,,
1630,,,,,,
1700,,,,,,
1730,,,,,,
1800,,,,,,
1830,,,,,,
1900,,,,,,
1930,,,,,,
2000,,,,,,
2030,,,,,,
2100,,,,,,
2130,,,,,,
2200,,,,,,
2230,,,,,,
2300,,,,,,
2330,,,,,,
0,,,,,,
30,,,,,,
100, hagar,,,,,
130, hagar, honi, snert,,,
200, hagar, helga, snert,,,
230,, helga, snert,,,
300, hamlet,, snert,,,
330, hamlet,, snert,,,
400, hamlet,, snert,,,
430, hamlet,,,,,
500, hamlet, kvack,,,,
530, hamlet, kvack,,,,
600,,,,,,
630, hernia,,,,,
~ $

 

 

 

SUMMARY

 

To conclude, given a set of data, we can plot our drive usage per client (and with some modification, even per policy, probably), if we're given the start time, end time, and the clients. This should be pretty easy to get with any reporting software, such as NetBackup 7 OpsCenter, or an export from the NetBackup Administration Console report. Because these by default give the output in different time formats (hh:mm non-24-hour), there will be some additional scripting that you will have to do to convert it to this format, either externally (from another program), or internally (added to this program).