NBU 5220 Performance
We recently deployed a 5220 appliance into our environment as it was to be the savior in our battle against a backup window we were no longer able to meet. When we finally got it online and into our NBU environment the initial performance was great. The area we were to most benefit was with VMware backups. The data stores mounted directly to the appliance would allow direct access to the snapshots for a fast and efficient backup. Before this, we were performing client side backups so the impact on the hosts every night was significant as we tried backing up 600+ vms. The plan was to be able to move all Dev and test off-host backups to the middle of the day as the performance impact was minimal and the end result was an increased window to complete things. As we started with noon-time backups deduplication rates were high and so were the speeds.
However, this performance gain was short lived, as we began increasing the load we suddenly saw performance drop to a point of concern. Backups were no longer speedy, 3500KB per second to 10,000KB. There are some that might pop to 24,000KB, but in a sample size of 15 as I write this, only one is showing 24,000.
Now, I do have a few ideas.
1. We have a 72TB appliance, therefore, 2 disk trays, during the backups, the one disk tray is going crazy, all the lights are flashing and you can really see that it is working. However, the second tray is doing nothing. While you might see a blink here or there, it is almost nothing compared to the other disk tray. Is this to be expected? When we looked at the disk configuration, it shows concat, is this normal?
2. Too much data at once and we are simply burying the appliance. In reality, what sort of performance should I be able to expect from the appliance?
3. Relating to number 2, since we only have the one appliance right now while we wait to get the remote appliance in place, we are duping off to tape. This is running at the same time as a backup, so this means at the same time the appliance is writing a lot of data, it is also reading it back to tape.
4. We are overloading the data store so that read speed is bad from source to destination. We have fewer hosts,therefore, if we limit the jobs per host we limit the number of machines backing up at once (obviously). This means that backups take way long, so we removed the limit per host and just set a limit per data store. As we are new to it all, I am not sure what impact where, but again, I am trying to list any and all ideas from the start.
5. The appliance does not support multi-pathing, therefore, we only have a single path to the disk.
Beyond that I am not sure, but this is something that doesnt help with the showcasing of the appliances to management at the moment. However, given the initial performance I am confident we can get back there.