Another Backup Myth: Consecutive Failures
My post on backup reporting myths sparked some interest from a colleague (Hal Uygur) so here is his favorite myth - consecutive failures. A common question from backup administrators is “can you tell me servers that failed x consecutive times?”. Across the backup reporting vendors, many claim that they can produce such a report. Can they? Well, it depends on what is meant by “consecutive”. Sarcasm aside, but this topic requires looking under the covers so that the advertised functionality meets the requirement. Consecutive failures can basically be broken down to “consecutive job” failures and “consecutive window failures”.
Lets start with the easier one - “consecutive job failures”. This is simply a straight line calculation of looking at jobs for a specific timeframe and determining when x many of them occurred consecutively. For example, if we look at the last 7 jobs for server A and the results of these jobs are S F S S F F S (S = success, F = failure) and wanted to know if 2 consecutive failures took place, the answer is yes and is shown by jobs five and six.
Consecutive window failures is counting windows. Well, we’re still going to count jobs, but need to do this in the context of a window. For example, let’s say Server A requires a daily incremental backup Mon – Thus. Furthermore, the backup window starts at 6:00 PM and closes at 8:00 AM the following day. Thus we have 4 windows consisting of 6:00 PM – 8:00 AM Mon – Thu. So in the Monday window, let’s say the scheduled run failed and a later rerun of the job also failed. So for Mon we have F F. On Tuesday the scheduled job runs and fails and the rerun is successful, so F S . On Wednesday, the scheduled run and rerun fails and same holds true for Thursday. So F F for both days. In summary, you have Monday a failure, Tuesday success and Wednesday and Thursday as failures. When asked the question, show me a report for Server A with 2 consecutive window failures, the answer is not the two jobs that failed on Monday or the second job on Monday followed by the first job on Tuesday. It is the last job that failed on Wednesday followed by (followed as in consecutive J) the last job that failed on Thursday.
Thus, “consecutive window failures” requires window awareness as well as measuring timeframes not by calendar days but rather cross-day windows, coupled with the ability to differentiate different window durations for incremental and full backups and last but not least, to be able to surgically pick off the status of the last job within the window. There are yet additional “awareness” scenarios (like servers with multi-policy backups) that introduce yet even more complexity to picking off the last job.
So next time you read a vendor’s claim on reporting on consecutive failures, keep this in mind. As more enterprises and services providers are bound by service level agreements demonstrating that no server goes more than x consecutive days without a successful backup, the capability to produce the consecutive day/window report becomes a must have.
Our passion is protecting your data. Check out news and insights from the Symantec NetBackup team addressing datacenter issues like disaster recovery, de-duplication, Windows application protection and continuous data protection.