Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

VCS error

Created: 07 Jan 2013 • Updated: 22 Jan 2013 | 7 comments
This issue has been solved. See solution.

HI All ,

I observerd VCS throwing a service group violation error.I just happened to check the logs i found the below messages.I understood why a service group violation wrror comes but not sure how VCS behaves in this situation.I pasted the logs with my queries.

Please share your thoughts

env:2 nodes in cluster .001 and 002

service group app is online on 001 node  

2013/01/07 23:20:11 VCS INFO V-16-1-10299 Resource comm (Owner: Unspecified, Group: app) is online on 002 (Not initiated by VCS)-------->here I see soomebody tried to bring this resource online on 002 node out of VCS

2013/01/07 23:20:11 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group ukdlxapp-------so the error message cocurrency viloation as the resource is online on 2 nodes

What is the Action of VCS in this type of scenarios ?

2013/01/07 23:20:11 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group app on all nodes-----------> what this message means ?

2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource mntapp (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource mntredprairie (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource mnthome (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource ukdlxapp_ip (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volapp01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volrp01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volhome01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS WARNING V-16-6-15034 (002) violation:Offlining group app on system 002
2013/01/07 23:20:11 VCS NOTICE V-16-1-10167 Initiating manual offline of group app on system 002-------------> what is manual offline here ?

2013/01/07 23:20:11 VCS NOTICE V-16-1-10300 Initiating Offline of Resource wmu_dlx_ (Owner: Unspecified, Group: app) on System 002
2013/01/07 23:20:11 VCS INFO V-16-6-15002 (002) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/violation 002 ukdlxapp   successfully
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource ukdlx_samba_server (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_netbios (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_share_tes (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_share_rrx (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_share_dcs (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_share1 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource tes_dlx_ (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlxapp_telnet (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource wmu_dlx_app (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource rrx_dlx_app (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource cups_printing (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource dcs_dlx_app (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource tes_dlx_app (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource dcs_dlx_ (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource rrx_dlx_ (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource dgapp01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:13 VCS INFO V-16-10031-504 (002) Application:wmu_dlx_comm:offline:Executed /usr/local/bin/stop-dlx-comms.sh as user root
2013/01/07 23:20:13 VCS INFO V-16-2-13716 (002) Resource(wmu_dlx_comm): Output of the completed operation (offline)
==============================================
-bash: stopcomms: command not found
stop-dlx-comms.sh: The DLX comms for user wmudba does not appear to have stopped successfully
==============================================

2013/01/07 23:20:15 VCS ERROR V-16-2-13064 (002) Agent is calling clean for resource(wmu_dlx_comm) because the resource is up even after offline completed.
2013/01/07 23:20:16 VCS INFO V-16-2-13068 (002) Resource(wmu_dlx_comm) - clean completed successfully.
2013/01/07 23:20:17 VCS ERROR V-16-2-13077 (002) Agent is unable to offline resource(wmu_dlx_comm). Administrative intervention may be required.------------> now the resource is offline on 002 then why it says adminstrator intrevention required ?

Discussion Filed Under:

Comments 7 CommentsJump to latest comment

mikebounds's picture

The group Restart attribute is an attribute for internal use only, but unlike some other internel attributes, it's use is not described in the VCS admin guide, so VCS is just resetting some internel flag/counter, so I wouldn't worry about this.

When the offline runs it looks like it does not exit with exit code 0 and so you don't see line "Offline completed successfully and clean is run.  The clean does complete successfully but this means it exits with exit code 0, but after this VCS will run the agent monitor entry point to check the resource is offline and the monitor entry point returns it is online.  It could be the case that the monitor entry point is wrong and it is actually offline, or it could be that after the clean ran, it tooks a second for the resource to offline after the clean exited and so the resource was still up when the monitor ran.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

shiv124's picture

HI Mike ,

Thanks for the reply.

My intial question was if a failover service group is online on one node and somebody knowingly or uknowingly brings the same service group or resource in the service group online on the second node what is the action of VCS ?

thanks
siva

mikebounds's picture

The action of VCS is to take the resource offline on the system it was brought online outside of VCS control.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

shiv124's picture

Hi MIke

so the action of vcs must bring the resource offline on second node .so when it was trying to offline this resource i see some violation message and intiating manual offline which i marked in bold in below messages any idea on this ?

2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volapp01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volrp01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource volhome01 (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:11 VCS WARNING V-16-6-15034 (002) violation:Offlining group app on system 002

User root fired command: hagrp -offline appfrom localhost----i see this message but nobody fired this command

013/01/07 23:20:11 VCS NOTICE V-16-1-10167 Initiating manual offline of group app on system 002
2013/01/07 23:20:11 VCS NOTICE V-16-1-10300 Initiating Offline of Resource wmu_dlx_ (Owner: Unspecified, Group: app) on System 002
2013/01/07 23:20:11 VCS INFO V-16-6-15002 (002) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/violation 002 ukdlxapp successfully
2013/01/07 23:20:11 VCS INFO V-16-1-10306 Resource ukdlx_samba_server (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_netbios (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)
2013/01/07 23:20:12 VCS INFO V-16-1-10306 Resource ukdlx_samba_share_tes (Owner: Unspecified, Group: app) is offline on 002 (Previous State = OFFLINE)

mikebounds's picture

A failover group is only allowed to run on one system so when VCS detects a resource is running on a second system it reports a concurrency violation and runs the violation trigger - see extract from VCS Admin guide:

violation event trigger
Description
 
This trigger is invoked only on the system that caused the concurrency
violation. Specifically, it takes the service group offline on the system where
the trigger was invoked. Note that this trigger applies to failover groups only.
The default trigger takes the service group offline on the system that caused
the concurrency violation.
So it is the violation trigger that runs the "hagrp -offline" - see extract from the trigger (in /opt/VRTSvcs/bin/triggers):
 
VCSAG_LOG_MSG ("W", "Offlining group $grp_name on system $system_name", 15034, "$grp_name", "$system_name");
# ABOVE IS THE WARNING MESSAGE YOU SEE AND THEN THE GROUP IS OFFLINED AS BELOW

if ( $grp_name eq "ClusterService" ) {
`$vcs_home/bin/hagrp -offline -force $grp_name -sys $system_name`;
} else {
`$vcs_home/bin/hagrp -offline $grp_name -sys $system_name`;
}

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
shiv124's picture

Hi Mike ,

Thats perfect :) .Thanks for the replies.That cleared all my queries

Prasanna Kulkarni's picture

Hi Siva,

You may also want to look at a new VCS feature where VCS does not let the concurrency happen at all. It is called Proactive Prevention of Concurrency Violation (or ProPCV). You can read about it at https://sort.symantec.com/public/documents/sfha/6....

The ProPCV feature of VCS prevents any accidental startups of processes outside VCS control on failover nodes. Here is a snippet from the above link:

<snip>

If ProPCV is set to 1, you cannot bring online processes that are listed in the MonitorProcesses attribute or the StartProgram attribute of the application resource on any other node in the cluster. If you try to start a process that is listed in the MonitorProcesses attribute or StartProgram attribute on any other node, that process is killed before it starts. Therefore, the service group does not get into concurrency violation.

</snip>

Hope this helps.

--
regards,
Prasanna