We tackle that in 2 ways. the back end is Virtual, so rebuilding is rather simple - we can just roll back to the previous backup and republish whatever is missing.
The other way is to have a second back end server, configured just like the primary, but have the timeouts and escalations turned off, and also turn off the Run Web Services. If the primary fails, all you would have to do is enable those 2 options and you back in business.
As for users, not sure what you mean by this. The Process Manager Database holds all the information regarding users, and it's a separate database on a separate server. All the users reside in that DB,so there are no duplicates since there is only one DB. We also are using a db for the message data, so no duplication there either.
Like I said, this works for Workflow in general, you may have to make some adjustments for SD, but I have to imagine it's possible, especially if you have a few hundred workers and several thousand users submitting tickets - you would need a load balancer somewhere!
rob