Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Active VCS node fails when reboting passive node

Created: 26 Nov 2012 • Updated: 29 Nov 2012 | 8 comments
This issue has been solved. See solution.

Dear all,

I've been asked to manage few asymmetric failover win 2003 systems with Veritas Cluster Server installed

in one of this clusters i'm experiencing a failure (service restart and sometimes service freeze)  on the active node when the passive one is rebooting.

Looking the forums i found similar problmes was resolved with the last ServicePak, my system looks already updated:

vxassist version = 5.1.20000.87

had.exe version: = 5.1.20024.495

haclus -value EngineVersion = 5.1.00.0

hasys -values NODENAME EngineVersion = 5.1.00.0

I'm just lil confused when i see all the above commands returning different versions is that correct ?. Do i need to install this SP2 ?

A colleague suggested to disable detail monitor in cluster config and configure the clustered service on windows services.msc = manual

Any suggestion is welcomed ,

Thanks in advance and Kind Regards,

/fabrizio

Comments 8 CommentsJump to latest comment

Marianne's picture

We need more info, please.

Overview of current cluster config will be helpful:

<Install Drive>:\Program Files\VERITAS\cluster server\conf\config\main.cf
 
Record of VCS activity  -  Engine_A log on active node: 
<Install Drive>:\Program Files\VERITAS\cluster server\log\engine_A.txt
Please post above as File attachments.
 
Please also check System Event viewer logs for errors. Windows 2003 Storport drivers are notorious for causing I/O errors, system freeze, etc.
Check Microsoft KB for latest Storport hotfixes.
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

fabrizio_tivano's picture

Hi Marianne, 

thanks for your reply.

storeport.sys : 5.2.3790.4121 (srv03_sp2_qfe.070720-0003)

In the attached .zip file both main.cf  and 201210engine_A.txt.

How i found the problem:

During some night-time maintenance works (2012/10/04)  where the services was
active in on node-2, when preformed a clean shutdown of node-1, 

all the activeservices active on node 2 failled working.

The only way to resolve problem was to force a reboot of node-2.

The day after, (2012/10/05) round 12:00 i was able to reproduce the failure,  i saw when rebooting the passive node1, all the services running on active node2 stopped, and then CVS tried  restarting them again on the same node2, but  a service failed to start and then all the services stopped again.

In order to fix the problem i cleared the failure on VCS console and manually ONLINE node2 from VCSCLI.

Thanks in Advance and Best Regards,

/fabrizio

AttachmentSize
201210_engine_A.zip 6.33 KB
fabrizio_tivano's picture

NEWS:

In main.cf I found:

====

requires group NIC online global firm 

====

VCS nodes that seems not be affected by this issue have:

====

requires group NIC online local firm

====

Could be this the problem ?

mikebounds's picture

Yes this is a problem:

requires group NIC online global firm

means the service group this line is below in the main.cf requires the NIC to be online on any system, but the NIC needs to be online on the same local system.  You shouldn't use service group dependencis for this so you shouldn't use

requires group NIC online local firm

either and you should use proxies.  So supposing the NIC resource in the NIC group is called public_nic, then in your application service group you should remove line "requires group NIC .." at the bottom and add a Proxy resource dependent on the IP resource in this group (lets call this app1_ip) like:

Proxy app1_pub_nic_proxy (
  TargetResName = pubic_nic)

app1_ip requires pub_nic_proxy

and if you have a second service grioup, then you do the same so:

Proxy app2_pub_nic_proxy (
  TargetResName = pubic_nic)

app2_ip requires pub_nic_proxy

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

fabrizio_tivano's picture

Thanks for you answer Mike!

As soon as I'll be able to make this changes and test them,

this  is a production enviroment,  i'll keep you informed! ;)

/fabrizio

fabrizio_tivano's picture

Thanks again Mike,

what is the difference between service group dependencies and proxie ?

/fabrizio

mikebounds's picture

A proxy is used to avoid have 2 resources controlling the same object.  So if you have 2 application service groups using the same NIC, if you put a NIC resource in each service group, then VCS is monitoring the same object twice and this is inefficient, so you create one NIC resource and create a Proxy to that resource that just copies the state from the NIC resource.  You could put a NIC resource in 1 application service group and the proxy in the other, but the usually way to do this is to put the NIC in its own "Parallel" group and both application service groups use Proxys

A service group dependency is used to link 2 service groups, so for example you might configure an application service group which requires a database service group to be online first (on any system).  Service group dependencies make the config more complex so should be avoided if they are neccessary, so if for instance the application and database were required to be on the same system, then you should put all application and database resources in a single service group.

So really service group dependencies and proxys are not similar.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION