Recovery Solution: Diagnosing client backup failures.
Maybe I am unable to find the appropriate resources but it is extremely frustrating attempting to diagnose why a client does not complete a snapshot. In particular, there is no way that I know of to find out why a system has multiple attempts of retrying a snapshot. Since our shop does not do the Full system recoveries but we do the partial snapshots only, theres really no good reporting to find out if all the clients backed up succcessfully or not. We need to make sure that each system backs up within a 24 hour window, and if not it should be a big issue for us to handle.
I guess i'm just used to more of the netbackup or backupexec interface where it will show that it failed, start a new one, but that the failure gives a detailed reason why. Does anyone know a good way to figuring out what systems failed, why, and what happened on each retry?
Thanks.
Comments
Check the events
You can check the Events tab in the Resource Manager for the machine, under Recovery Solution > AeX RSA Events. Sort by the Date column then look for the snapshot failed error. You could also create a report to review these errors once you've identified the "ID" value that lists the failed file(s).
Thanks,
Kyle
Symantec Trusted Advisor
For Forum threads, please click "Mark as Solution" if answered.
For all content, please give a thumbs up if you agree with or support the post.
Similar Issues
I have been experiencing very similar frustrations. We have about 500 desktops that are backed up nightly. There are about 30 machines that consistently fail. The errors are ambiguous there seems to be little to no documentation on troubleshooting. In some cases reinstalling the RS agent remedies the problem however this is not good because it requires a reboot.
Checking the event logs does not shed light on the cause of the failed backup and has been a waste of time in my opinion.
It is tricky to do
As you've seen it is very tricky to troubleshoot. The difficult part is that the errors are not always logged to the NS. Most of the events are logged in the client's Application event log, the RS server's event log, and the Inv_AEX_RSA_Events table on the NS. On the RS you should have a specific error, like 9 or 21 or something. The failure can be due to a database problem, missing hotfixes, etc.
You could use the 'Idea" feature here on Connect to suggest better error logging. RS 7 is nearly released so it wouldn't be added there, but I have heard that the future of RS is a bit unclear. The "next evolution" of RS could better implement this kind of informational data. Perhaps it could even be added to a RS 7.1 or 7.0 SP1/2?
Thanks,
Kyle
Symantec Trusted Advisor
For Forum threads, please click "Mark as Solution" if answered.
For all content, please give a thumbs up if you agree with or support the post.
Idea feature?
Im not sure where this is, but i feel like logging and error reporting and error descriptions are EXTREMELY important in any program, much less in a backup solution. How am i supposed to know why it failed? Why isnt it in the interface? Why don't the ones that fail with retries show up in the failure reports at random?
It's just all very peculiar.
-Austin Lazanowski Backups cost way too much until you needed them.
Create > Idea
Austin,
To create a new Idea, click the "Create" menu at the top-right of the screen, then click "Idea". Also, you can see this blog post for more information.
Thanks,
Kyle
Symantec Trusted Advisor
For Forum threads, please click "Mark as Solution" if answered.
For all content, please give a thumbs up if you agree with or support the post.
Also, I am marking this
Also, I am marking this thread to Escalate to support to see if there are any methods that I'm just not aware of for evaluating RS snapshot failures.
Thanks,
Kyle
Symantec Trusted Advisor
For Forum threads, please click "Mark as Solution" if answered.
For all content, please give a thumbs up if you agree with or support the post.
Would you like to reply?
Login or Register to post your comment.