Video Screencast Help

Service Group Restarts after Concurrency Violations

Created: 04 Oct 2013 • Updated: 16 May 2014 | 2 comments
thibbert1's picture
0 Agree
0 Disagree
0 0 Votes
Login to vote
Status: In Review

Recently, during a planned network outage, we experienced many of our Veritas Cluster Service groups restart themselves after concurrency violaitions.  We have a ticket (05162728) that references this. We knew that we would lose network connectivity temporaily including LLT causing passive nodes to try to bring resources online.  We also knew that the concurrency violations would occurr.  But due to SCSI locks and other checks and balances, we knew that Service Groups would fail to come online on the passive nodes.  However, and also to the shagrin of the technical engineer assigned to this case, after LLT was re-established, the active nodes proceeded to offline and restart Service Groups.  This was not expected.  Also in the dozen or so clusters we have, this was not consistent.  In the future we know what we can freeze nodes and SG's to proactively try an prevent this however, that would not help for sudden lost of LLT.  

 

We would like the option to be able to control what happens to Service Groups and resources after a concurrency violation is detected.  We may or may not want to have Service Groups restart depending on the situation.

 

Comments 2 CommentsJump to latest comment

g_lee's picture

thibbert1,

It might be worthwhile checking the following group attributes / how they're set in your cluster, as the existing attributes may already be able to control this.

eg: AutoStartList takes effect on new cluster (eg: if LLT links are restored, and nodes start joining, then this becomes a "new" cluster?)

AutoRestart default is 1 - so by default will try to restart groups with persistent resources - as you mentioned it was network outage/work, the network resources would be persistent, so this would apply to the group(s).

--------------------
AutoRestart
Restarts a service group after a faulted persistent resource becomes online.
The attribute can take the following values:
• 0.Autorestart is disabled.
• 1.Autorestart is enabled.
• 2.When a faulted persistent resource recovers from a fault, the VCS engine clears the faults on all non-persistent faulted resources on the system. It then restarts the service group.

See "About service group dependencies" on page 489.

Note: This attribute applies only to service groups containing persistent resources.
• Type and dimension: integer-scalar
• Default: 1 (enabled)

AutoStartList
List of systems on which, under specific conditions, the service group will be started with VCS (usually at system boot). For example, if a system is a member of a failover service groupfs AutoStartList attribute, and if the service group is not already running on another system in the cluster, the group is brought online when the system is started.

VCS uses the AutoStartPolicy attribute to determine the system on which to bring the service group online.

Note: For the service group to start, AutoStart must be enabled and Frozen must be 0. Also, beginning with 1.3.0, you must define the SystemList attribute prior to setting this attribute.
• Type and dimension: string-keylist
• Default: {} (none)
--------------------

See VCS Administrator's (or User's) Guide for your platform/version for other group attributes that may be relevant - https://sort.symantec.com/documents (select platform/version and look under Product Guides)

regards,

Grace

If this post has helped you, please vote or mark as solution

+1
Login to vote
AHerr's picture

Hi thibbert1,

 

Did Grace answer your question?  Please let us know.

Thanks,
Anthony

0
Login to vote