Video Screencast Help

VCS 5.1 - Failing to start a new node after adding it to the cluster

Created: 23 Dec 2013 • Updated: 06 Jan 2014 | 4 comments
This issue has been solved. See solution.

Hi,

I have just added a node to a cluster and during the conf. everything went fine.. Here is a summary of what I did:

root@dp-e9 # cat /etc/llthosts
0 DP-node5
1 DP-node6
2 DP-node4
3 DP-node8
4 dp-node9

root@dp-e9 # cat /etc/gabtab
/sbin/gabconfig -c -n5

root@dp-e9 # cat /etc/llttab
set-node dp-node9
set-cluster 10000
link igb1 /dev/igb1 - ether - -
link igb2 /dev/igb2 - ether - -

 

root@dp-node9 # dladm show-dev
igb0            link: up        speed: 1000  Mbps       duplex: full
igb1            link: up        speed: 1000  Mbps       duplex: full
igb2            link: up        speed: 1000  Mbps       duplex: full
igb3            link: unknown   speed: 0     Mbps       duplex: unknown
usbecm0         link: up        speed: 10    Mbps       duplex: full

 

In the other node I have:

root@DP-e6 # /sbin/gabconfig -a
GAB Port Memberships
================================
Port a gen   406917 membership 01234
Port b gen   406915 membership 0123
Port b gen   406915    visible ;   4
Port h gen   406914 membership 0123
Port h gen   406914    visible ;   4

=============================

After this I followed the manual to add a node:

Enter the command:
# haconf -makerw
2 Add the new system to the cluster:
# hasys -add dp-node9
3 Enter the following command:
# haconf -dump
4 Copy the main.cf file from an existing node to your new node:
# rcp /etc/VRTSvcs/conf/config/main.cf dp-node9:/etc/VRTSvcs/conf/
config/
5 Start VCS on the new node:
# hastart

 

==============================

But I noticed that the state of the new system is FAULTED now.

I have attached the engine_A log.

And the main.cf looks fine.

I have read that the PSSTAMP being different can be a problem... is that it? Pls see below:

dp-node9 # pkginfo  -l VRTSvcs
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  Veritas-5.1-10/06/09-14:37:00
  INSTDATE:  Dec 23 2013 13:26
    STATUS:  completely installed
     FILES:      279 installed pathnames
                  25 shared pathnames
                   4 linked files
                  59 directories
                 101 executables
              233702 blocks used (approx)
 

DP-node6 # pkginfo  -l VRTSvcs
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  5.1.001.000-5.1RP1-2010-02-24_23:49:00
  INSTDATE:  Jul 05 2011 12:51
    STATUS:  completely installed
     FILES:      280 installed pathnames
                  27 shared pathnames
                   4 linked files
                  59 directories
                 102 executables
              233783 blocks used (approx)
 

===========================

Let me know if any additional info is need and how to clear this faulted state as well.

DP-node6 # hasys -state
#System    Attribute          Value
DP-node4   SysState           RUNNING
DP-node5   SysState           RUNNING
DP-node6   SysState           RUNNING
DP-node8   SysState           RUNNING
dp-node9   SysState           FAULTED
 

Form the new node i cant do nothing:

dp-node9 # hares -clear dp-node9
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
 

Tks,

Joao

 

 

 

 

 

 

 

Operating Systems:

Comments 4 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hello,

From the gabconfig output pasted above, it looks like you have not configured fencing on last node

output

root@DP-e6 # /sbin/gabconfig -a
GAB Port Memberships
================================
Port a gen   406917 membership 01234
Port b gen   406915 membership 0123
Port b gen   406915    visible ;   4   <<<<<<<<<<<<<<<<< port b is not joined membership
Port h gen   406914 membership 0123
Port h gen   406914    visible ;   4

=============================

 

Unless you get port b to join membership, HAD won't start & port h won't join membership.

1. Make sure the /etc/vxfendg file is populated in new node as other nodes.

2. Make sure that /etc/vxfenmode & /etc/vxfentab is identical on new node as other cluster nodes.

3. Make sure that new node can see the fencing DG & its disks correctly.

4. Start the fencing on new node using /etc/init.d/vxfen start 

5. Once fencing is started, check registration keys on coordinator disks using below command

# /sbin/vxfenadm -s all -f /etc/vxfentab    (output should be identical on all nodes)

Once above is done, run hastart on new node & cluster should start, it will take the main.cf from other nodes while remote_build process.

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

joaotelles's picture

Hi,

 

On the other node sth fencing is not enabled...

DP-node6 # cat /etc/vxfenmode
#
# vxfen_mode determines in what mode VCS I/O Fencing should work.
#
# available options:
# scsi3      - use scsi3 persistent reservation disks
# customized - use script based customized fencing
# sybase     - use scsi3 disks in kernel but coordinate membership with Sybase ASE
# disabled   - run the driver but don't do any actual fencing
#
vxfen_mode=disabled

root@DP-node6 # cat /etc/vxfentab
cat: cannot open /etc/vxfentab
 

Is it ok?

mikebounds's picture

The vxfen files should be the same on all nodes, so copy from other working node and then start vxfen using "/etc/init.d/vxfen start"

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Gaurav Sangamnerkar's picture

There are 3 files for fencing configuration, /etc/vxfendg, /etc/vxfenmode & /etc/vxfentab. These files should be same on all the nodes whether you are running fencing in disabled mode.

IOFencing, helps prevent Split Brain scenario which can cause data corruption, you may consider to use IOFencing in enabled mode in order to prevent split brain.

For now, on this new node

# /etc/init.d/vxfen stop  (to ensure fencing is stopped)

# modinfo |grep -i vxfen  (find module ID for vxfen)

# modunload -i <mod_id_vxfen>   (unload vxfen module)

# copy the 3 files as instructed above

# /etc/init.d/vxfen start            (start fencing)

# gabconfig -a      (you should now see fencing joined the membership)

# hastart              (on new node, this would get port h also to join membership)

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION