Video Screencast Help

Enabling resntoff trigger

Created: 05 Jul 2014 • Updated: 15 Jul 2014 | 9 comments
Francesco Ciacci's picture
This issue has been solved. See solution.

Hello to everyone.

I have a resource that some times fail and then the clean of the resource fail too. So the resource remain on-line. I want to enable the resnotoff trigger in order know when the resource clean fail.

I have configured the trigger at resource level, but when the issue occour the script is not invoked.

I have copied and configured the example script in %vcshome%\bin\triggers too.

I have to enable something else?

 

Thank you

francesco

Operating Systems:

Comments 9 CommentsJump to latest comment

mikebounds's picture

The Windows VCS admin guide says:

To configure this trigger, you must define the following:
Resource Name Define resources for which to invoke this trigger by
entering their names in the following line in the script: @resources
= ("resource1", "resource2");

 

Have you done this.  Also the Windows VCS admin guide says:

When invoked, the trigger script waits for a predefined interval and
checks the state of the resource. If the resource is not offline, the
trigger issues a system shutdown command, followed by the command
hastop -local -evacuate.

 

Is shutting down the system when the resource cannot offline what you want to happen?

What is the resource type that you are experiencing this issue with?

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Francesco Ciacci's picture

Thank you Mike,

Yes, I have configured the script section @resources with the name of my resurce:

 

@resources = ( "MSMQ" ) ;

 

The problem is that the script are not invoked. If i kill the msmq process I read on the log that the vcs try to fail the resource but the clean process fail and the resource still on-line. So I expect that the trigger script will invoked. But that's doesn't happen....

 

Francesco Ciacci's picture

Thank you Mike,

I've configured the script with:

@resources = ( "MSMQ" ) ;

So when I kill the mqsv.exe the vcs log that resource go unexpectly off-line and try to clean it. But clean fail because cannot stop the MQAC driver. So I want to send a "net stop mqac" with resnotoff trigger instead of shutdown..

But after the clean fail the resnotoff are not invoked..I don't understand where is the error...

 

mikebounds's picture

Try uncommenting line:

VCSAG_LOG_MSG("I", "Invoked with arg0=$ARGV[0], arg1=$ARGV[1]", msgid, $ARGV[0], $ARGV[1]);

I tried this and didn't make any other changes (I did not change @resources line) and in log I see:

2014/07/07 17:07:48 VCS INFO V-16-1-50135 User admin fired command: hares -offline nw_mount_z  w23v51a  from 192.168.56.1
2014/07/07 17:07:48 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nw_mount_z (Owner: unknown, Group: testsg) on System w23v51a
2014/07/07 17:07:50 VCS ERROR V-16-2-13064 (w23v51a) Agent is calling clean for resource(nw_mount_z) because the resource is up even after offline completed.
2014/07/07 17:07:50 VCS INFO V-16-2-13068 (w23v51a) Resource(nw_mount_z) - clean completed successfully.
2014/07/07 17:07:50 VCS ERROR V-16-2-13077 (w23v51a) Agent is unable to offline resource(nw_mount_z). Administrative intervention may be required.
2014/07/07 17:07:51 VCS INFO V-16-6-0 (w23v51a) resnotoff:Invoked with arg0=w23v51a, arg1=nw_mount_z
2014/07/07 17:07:51 VCS INFO V-16-6-15042 (w23v51a) resnotoff:Resource did not match, nw_mount_z for this trigger.
2014/07/07 17:07:52 VCS INFO V-16-6-15002 (w23v51a) hatrigger:hatrigger executed C:\Program Files\Veritas\Cluster Server\bin\triggers\resnotoff.pl successfully

 

If there is a syntax error in the resnotoff.pl trigger script then you should see:

2014/07/07 17:16:14 VCS INFO V-16-6-15003 (w23v51a) hatrigger:Failed to execute C:\Program Files\Veritas\Cluster Server\bin\triggers\resnotoff.pl

Just to confirm, I believe you have copied resnotoff.pl script from "Sample_Triggers" to "Triggers" directory.

Mike

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Francesco Ciacci's picture

Thank you Mike,

I have uncommented the line you have indicated and resnotoff.pl is copied under %VCSHOME%\bin\triggers 

This is my log:

2014/07/14 23:36:42 VCS ERROR V-16-2-13067 (VCS2) Agent is calling clean for resource(MSMQ) because the resource became OFFLINE unexpectedly, on its own.
2014/07/14 23:37:07 VCS ERROR V-16-2-13069 (VCS2) Resource(MSMQ) - clean failed.

And resnotoff are not invoked...I think the problem is fail on cleaning...I think that if clean fail script are not invoked.

I have to find the reason why resource cannot be cleaned after failure.

Thank you again

Francesco

 

 

mikebounds's picture

I tested putting exit 1 in a clean script and I get the following:

2014/07/14 23:14:18 VCS INFO V-16-1-50135 User admin fired command: hares -offline nw_mount_z  w23v51a  from 192.168.56.1
2014/07/14 23:14:18 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nw_mount_z (Owner: unknown, Group: testsg) on System w23v51a
2014/07/14 23:14:22 VCS ERROR V-16-2-13064 (w23v51a) Agent is calling clean for resource(nw_mount_z) because the resource is up even after offline completed.
2014/07/14 23:14:22 VCS ERROR V-16-2-13069 (w23v51a) Resource(nw_mount_z) - clean failed.
2014/07/14 23:15:28 VCS ERROR V-16-2-13077 (w23v51a) Agent is unable to offline resource(nw_mount_z). Administrative intervention may be required.
2014/07/14 23:15:29 VCS INFO V-16-6-0 (w23v51a) resnotoff:Invoked with arg0=w23v51a, arg1=nw_mount_z
2014/07/14 23:15:29 VCS INFO V-16-6-15042 (w23v51a) resnotoff:Resource did not match, nw_mount_z for this trigger.
2014/07/14 23:15:30 VCS INFO V-16-6-15002 (w23v51a) hatrigger:hatrigger executed C:\Program Files\Veritas\Cluster Server\bin\triggers\resnotoff.pl successfully

So here clean fails, the same as you, but then I get "Agent is unable to offline resource(nw_mount_z). Administrative intervention may be required" and then resnotoff is called - what do you see in the log after "clean failed"?  - can you post

Note if the monitor reports the resource is offline after then clean fails, then the resnotoff trigger will not be called, because then as far as VCS is concerned the resource will be offline.

What language is the agent written in - C, perl or shell ?

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Francesco Ciacci's picture

Hi Mike

I have change the clean retry limit from 0 to 1 for the MSMQ resource. This is what I get:

2014/07/15 07:34:29 VCS ERROR V-16-2-13067 (VCS2) Agent is calling clean for resource(MSMQ) because the resource became OFFLINE unexpectedly, on its own.
2014/07/15 07:34:54 VCS ERROR V-16-2-13069 (VCS2) Resource(MSMQ) - clean failed.
2014/07/15 07:35:54 VCS ERROR V-16-1-50148 ADMIN_WAIT flag set for resource MSMQ on system VCS2 with the reason 4

Resource still on line with excamative mark and and script are not invoked...

The agent is the bundled agent for MSMQ...I think is written in C , is a .dll file.

Thank you again

Francesco.

 

mikebounds's picture

Have you set ManageFaults attribute on service group to "NONE" as I have only ever seen a resource go into ADMIN_WAIT when ManageFaults attribute is set to "NONE" (default is ALL) - run:

"hagrp -display -attribute ManageFaults" to check.

If you look in the VCS Admin guide, I can only find ADMIN_WAIT for a resource (ADMIN_WAIT is also a state for systems) referencing the ManageFaults attribute being set to "NONE" - also see http://www.symantec.com/business/support/index?page=content&id=TECH52895

There is also a resadminwait trigger, but you ought to find out why resources are going into ADMIN_WAIT, rather than putting code in resadminwait trigger, which is a work-a-round by putting "clear-up" code in here.

As MSMQ is an agent provided with Symantec software, you should log a call with Symantec if the clean is failing and Symantec should change the code, rather than you have to implement a work-a-round using triggers, unless you have some custom config of MQ.

Mike

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
Francesco Ciacci's picture

Mike!

Solved!

I have enabled the resadminwait trigger!

Now I can insert a reboot or other instruction on the script...and all can work as expected!

 

Thank you!

 

Francesco