Service Group Restarts after Concurrency Violations
Recently, during a planned network outage, we experienced many of our Veritas Cluster Service groups restart themselves after concurrency violaitions. We have a ticket (05162728) that references this. We knew that we would lose network connectivity temporaily including LLT causing passive nodes to try to bring resources online. We also knew that the concurrency violations would occurr. But due to SCSI locks and other checks and balances, we knew that Service Groups would fail to come online on the passive nodes. However, and also to the shagrin of the technical engineer assigned to this case, after LLT was re-established, the active nodes proceeded to offline and restart Service Groups. This was not expected. Also in the dozen or so clusters we have, this was not consistent. In the future we know what we can freeze nodes and SG's to proactively try an prevent this however, that would not help for sudden lost of LLT.
We would like the option to be able to control what happens to Service Groups and resources after a concurrency violation is detected. We may or may not want to have Service Groups restart depending on the situation.