Video Screencast Help

resource dependency

Created: 19 Sep 2013 • Updated: 19 Sep 2013 | 4 comments

Hi, I like to check the following thing. I have a group service with 2 resources:resource a and resource b.Resource a is the parent of resorce b. Normally when resource b has problems and goes offline,then resource a goes offline and thus the group starts on the other node in VCS. The question is: When resourse b goes offline and resource a fails to goes offline,what i should to perform in order that the resource a to goes offline, so the group will start next on the other node? tnx a lot, marius

Operating Systems:
Discussion Filed Under:

Comments 4 CommentsJump to latest comment

kjbss's picture

Short answer:  As the VCS Administrator, you should manually handle this corner case, on a case by case basis.

...but I have a feeling you are looking for a bit more than that...

So, the problem as you state it is that "resource a fails to go offline"; if a resource fails to go offine via the OFFLINE entry point, then the CLEAN entry point will be called. 

In VCS context, the CLEAN entry point is supposed to be intelligent and comprehensive enough such that it will run all necessary commands to stop and perform some basic level of house-keeping on that resource's applicaiton/service environment, "no matter what external conditions exist". 

Generally, the above tenant is adhered to by the bundled agents (Symantec-supplied agents).

However, if you have found a situation where a VCS supplied agent does not offline after CLEAN is run, you should probably be getting Symnantec Tech support involved.

You can consider implementing one of the following VCS Triggers to help in this situation:

nofailover, resadminwait, resfault, resnotoff

 

On the other had, if you are writing your own agent, or using the Application agent and writing your own start/stop/monitor and clean scripts then...:

First consider that "if there are not sufficient CLI commands that can be run to comprehensively clean the resource environment in any situation, then that application/service is not compatible for VCS clustering".

IE:  You should first strive to maximum effort to resolve and fix why the resource "cannot go offline" via actions taken by the CLEAN entry point.

Only if that fails, should you look into further options of using an appropriate combination of the above mentioned VCS Trigggers.

***However, caution should be taken to make sure that you do not put code into a trigger to resolve your problem which really should have been implemented within the resource's CLEAN entry point.

 

 

tanislavm's picture

Hi Kjbss,

 

 

Tnx a lot for your comments.I like you to comment also the below conext.

 

In a 2 nodes vcs cluster we have an group with application resource, mount point resource and disk group resource. Application resource is the parent for mount point resource, and this one is parent for disk group resource.

Normally when there is an problem with disk group resource and this goes offline, then all the rest of resources are taken offline by vcs.
In the scenario with disk group resource goes offline ,and vcs is not able to take offline the application resource and mount point resource, what I should to perform?
In this case the application hang?

- I should use the application stop script and cleanup script or kill -9 in order to kill the application processes?

- umount with force the mount point?

- I should also stop the agents of those resources manually or they are stopped automatically when the resources are offline?

If the above are successfully(application resource and mount point resource are offline), the group will start automatically on other node or I should to start it manually?

tnx a lot,
marius

 

sajith_cr's picture

Hi Marius,

Could you give the following to understand the issue better?

1. Snippet of service group configuration containing application, Mount and DiskGroup from main.cf along with dependencies

2. What is the reason for Diskgroup failure?

     is it storage path failure?

3. Do you have fencing configured?

4. Version of VCS

5. log snippet from engine_A.log file from the moment Diskgroup resource failure is detected including messages from Application and Mount resources on why the offline/clean entry points are failing.

 

To answer your questions,

- I should use the application stop script and cleanup script or kill -9 in order to kill the application processes?

we need to see why stop/clean scripts of application resource is failing to stop the processes. Ideally such manual intervention is not required.

- umount with force the mount point?

This operation is performed by offline/clean entrypoints of mount agent. We need to see why these are not succeeding.

- I should also stop the agents of those resources manually or they are stopped automatically when the resources are offline?

Agent need to be running on all nodes of cluster for monitoring purpose.

 

 

Regards,

Sajith

If this post has helped you, please vote or mark as solution.

tanislavm's picture

Hi Kjbss,

tnx a lot for your comments.This questions I asked to clarify things and I do not need a solution.

You have answered to many of my questions and is still left one:
"If the above are successfully(application resource and mount point resource are offline), the group will start automatically on other node, or I should to start it manually?"

I think that the answer is yes because the rest of the nodes are signalized that those resources are offline on that node, so the group is brought online on the next node in systemlist. right?

Please comment the next thing.

When a resource is faulty(from OS point of view),and after the corresponding agent tried unsuccessfully to bring it offline,the agent user clear entry and if now the resource is brought offline,then this agent talk to had and tell him that this resource is offline.Next this had talk to the hads on the rest of the vcs nodes to signalize this thing.right?

tnx a lot,
marius