Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

IO fencing in VCS

Created: 10 Feb 2013 • Updated: 10 May 2013 | 6 comments
Leed.Engineer's picture
This issue has been solved. See solution.

Hello folks ,

     Is there any way to disable the IO Fencing in VCS 5.1 Cluster without bringing the cluster down. I have a cluster consisting of 8 nodes , when I tried to add another node , Problems happened and the Fencing keys got corrupted. Therefore , I need to disable the IO fencing as it is not needed as I do not have CVM or CFS.

 

 

Thanks

Comments 6 CommentsJump to latest comment

arangari's picture

Question:  what operations did you perform to add another node?  why/how the keys got corrupted? Did you capture the evidence and reported issue to Symantec Support?

If you do not have CVM/CFS,  and you have not set  Cluster.UseFence=SCSI3, then most likely you can unconfigure the fencing. 

  Question: You do not want to bring down the applications monitored by VCS or do not want to bring down VCS? 

I presume you want to have applications running, in which case, you may want to freeze persistently all the applications, stop VCS cluster on all nodes (hastop -all -force), modify Cluster.UseFence=SCSI3 to Cluster.UseFence=NONE in main.cf manually, and start cluster making sure that node on which main.cf is modified is started first.

 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

Leed.Engineer's picture

 

Hello arangari ,

   Thanks for your reply :

 

Question:  what operations did you perform to add another node?  why/how the keys got corrupted? Did you capture the evidence and reported issue to Symantec Support?

 

First I used the script installvcs -addnode ofcourse after installing the binaries in the new node. After that the script did not add the node and made a very strange behaviour. I found that the file /etc/llthosts have already the new node with a duplicate node ID (7) . Therefore , I had to fix it manually in all nodes and start the llt and gab manually and everything went fine with 8 nodes.

 

After that we had to add another node (9 nodes). After installing the binaries , I used the command . hasys -addnode <node_name>. then I started the cluster on it then it did not start , After some investigation, I found that it was trying to configure the fencing and failing and retrying for some time then shutdown the cluster on that node. Then , I used the command vxfenadm -s all -f <path to coord disk>. It fails with error message that it cannot read the keys.

 

Question: You do not want to bring down the applications monitored by VCS or do not want to bring down VCS? 

I do not want the business to be affected by anyway. I am afraid that if I brought the cluster down using hastop -all -force with this missbehaviour of the cluster , that the system could panic and it is production.

 

I also commented out the line  of UseFencing= SCSI3 in main.cf in all nodes.

//  UseFencing= SCSI3

then I commented out the vxfen_mode in /etc/vxfenmode file in all nodes, then I stoped the vxfen service in all nodes. and moved the file /etc/vxfendg . However , still when i run hastart in the new node , it see that the fencing is ON and tries to configure it.

 

Thanks

Leed.Engineer's picture

Hello arangari ,

   Thanks for your reply :

 

Question:  what operations did you perform to add another node?  why/how the keys got corrupted? Did you capture the evidence and reported issue to Symantec Support?

 

First I used the script installvcs -addnode ofcourse after installing the binaries in the new node. After that the script did not add the node and made a very strange behaviour. I found that the file /etc/llthosts have already the new node with a duplicate node ID (7) . Therefore , I had to fix it manually in all nodes and start the llt and gab manually and everything went fine with 8 nodes.

 

After that we had to add another node (9 nodes). After installing the binaries , I used the command . hasys -addnode <node_name>. then I started the cluster on it then it did not start , After some investigation, I found that it was trying to configure the fencing and failing and retrying for some time then shutdown the cluster on that node. Then , I used the command vxfenadm -s all -f <path to coord disk>. It fails with error message that it cannot read the keys.

 

Question: You do not want to bring down the applications monitored by VCS or do not want to bring down VCS? 

I do not want the business to be affected by anyway. I am afraid that if I brought the cluster down using hastop -all -force with this missbehaviour of the cluster , that the system could panic and it is production.

 

I also commented out the line  of UseFencing= SCSI3 in main.cf in all nodes.

//  UseFencing= SCSI3

then I commented out the vxfen_mode in /etc/vxfenmode file in all nodes, then I stoped the vxfen service in all nodes. and moved the file /etc/vxfendg . However , still when i run hastart in the new node , it see that the fencing is ON and tries to configure it.

 

Thanks

 

 

arangari's picture

'installvcs -addnode' should work very normally. if it has not, please open a support case. also state that you saw the fencing key corruption.

 

i understand you dont want business to be affected - however momentory loss of HA to come out of this situation should be okey. If so, as suggested, plese freeze persistently all SGs. ('hagrp -freeze -persistent) for all groups. then issue 'hastop -all -force'. 

 

Also commenting line in main.cf, in a running cluster does not change the in-memory configuration. Please confirm that UseFencing value is indeed NONE by running 'haclus -value' command.

 

The reason when you run 'hastart' on the new node, it will get snapshot of VCS configuration from currently running node (REMOTE_BUILD), and hence it will have same values as present in running node. This also proves my above point - that change main.cf configuration for 'running cluster'  has no impact on in-memory configuration.

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

SOLUTION
Leed.Engineer's picture

Thanks a lot Amit ,

     I will try to ask for a downtime otherwise I will do the -force option and check this issue. I will alsp open a support case to check why this behaviour happened , I will also update you when we do this action.

 

 

Thanks and Regards

Waleed Badr

arangari's picture

please confirm if the above steps helped. 

also do provide the support case details.

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________