Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Another Backup Myth: Consecutive Failures

Updated: 21 May 2009
Joe Pfeiffer's picture
0 0 Votes
Login to vote

My post on backup reporting myths sparked some interest from a colleague (Hal Uygur) so here is his favorite myth - consecutive failures.  A common question from backup administrators is “can you tell me servers that failed x consecutive times?”.  Across the backup reporting vendors, many claim that they  can produce such a report.  Can they?  Well, it depends on what is meant by “consecutive”.  Sarcasm aside, but this topic requires looking under the covers so that the advertised functionality meets the requirement.  Consecutive failures can basically be broken down to “consecutive job” failures  and “consecutive window failures”.

Lets start with the easier one - “consecutive job failures”.  This is simply a straight line calculation of looking at jobs for a specific timeframe and determining when x many of them occurred consecutively.  For example, if we look at the last 7 jobs for server A and the results of these jobs are S F S S F F S (S = success, F = failure) and wanted to know if 2 consecutive failures took place, the answer is yes and is shown by jobs five and six.

Consecutive window failures is counting windows.  Well, we’re still going to count jobs, but need to do this in the context of a window.  For example, let’s say Server A requires a daily incremental backup  Mon – Thus.  Furthermore, the backup window starts at 6:00 PM and closes at 8:00 AM the following day.  Thus we have 4 windows consisting of 6:00 PM – 8:00 AM Mon – Thu.  So in the Monday window,  let’s say the scheduled run failed and a later rerun of the job also failed.  So for Mon we have F F.  On Tuesday the scheduled job runs and fails and the rerun is successful, so F S .  On Wednesday, the scheduled run and rerun fails and same holds true for Thursday.  So F F for both days.   In summary, you have Monday a failure, Tuesday success and Wednesday and Thursday as failures.  When asked the question, show me a report for Server A with 2 consecutive window failures, the answer is not  the two jobs that failed on Monday or the second job on Monday followed by the first job on Tuesday.  It is the last job that failed on Wednesday followed by (followed as in consecutive J)  the last job that failed on Thursday. 

Thus, “consecutive window failures” requires window awareness as well as measuring timeframes not by calendar days but rather cross-day windows, coupled with the ability to differentiate different window durations for incremental and full backups and last but not least, to be able to surgically pick off the status of the last job within the window.  There are yet additional “awareness” scenarios (like servers with multi-policy backups) that introduce yet even more complexity to picking off the last job.

So next time you read a vendor’s claim on reporting on consecutive failures, keep this in mind.  As more enterprises and services providers are bound by service level agreements  demonstrating that no server goes more than x consecutive days without a successful backup, the capability to produce the consecutive day/window report becomes a must have.