Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

V-35-410: Cluster server not running on local node on solaris

Created: 26 Nov 2013 • Updated: 09 Dec 2013 | 11 comments
This issue has been solved. See solution.

Hello,

We had a hardware failure and on restarting the server we could not reach our mount points or even start the server with hastart but nothing was started and we keep getting the error in Title above.

Kindly assist in resolving this issue.

 

 

Operating Systems:

Comments 11 CommentsJump to latest comment

stinsong's picture

Hi,

Is this a CFS server ? And what's the hardware failure exactly, since the local hard disk failure could lead to data loss which impact VCS configuration.

And pls paste content of below files:

/etc/VRTSvcs/conf/sysname

/etc/llttab

And  output of below commands:

lltstat -nvv active

gabconfig -a

IT-SYSMIKE's picture

Hello stinsong,

Thanks for your response.

The hardware failure caused a shared mount point to become unavailable, Yes it is a CFS server, currently we could see the filesystem  on one node but it is not coming up on the other node , when trying to start it up it gives a new error.

VCS ERROR V-16-1-10600 Cannot connect to VCS engine.

 
Gaurav Sangamnerkar's picture

Hello,

Could not connect to VCS engine means your "had" process has not started or not running.

For VCS to run, you need to ensure that components like LLT, GAB & Fencing (if configured) are running. Please paste the output of

# lltconfig

# lltstat -vvn | head -10

# gabconfig -a

# modinfo | egrep 'gab|llt|vxfen'

# had -version

# uname -a

 

when you say that nothing was started .. assuming its a unix system, are your rc scripts all OK ? i.e

/etc/rc2.d/S70llt

/etc/rc2.d/S92gab

/etc/rc3.d/S99vcs

If services are configured under SMF, are the SMF services in online state ?

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

IT-SYSMIKE's picture

Hello,

This are the results:

root@ap1.gf.net # lltconfig
LLT is running
root@ap1.gf.net # lltstat -vvn | head -10
LLT node information:
    Node                 State    Link  Status  Address
   * 0 ap1          OPEN    
                                  igb2   UP      00:21:28:BB:40:3C
                                  igb3   UP      00:21:28:BB:40:3D
     1 ap2          OPEN    
                                  igb2   UP      00:21:28:BB:0F:04
                                  igb3   UP      00:21:28:BB:0F:05
     2                   CONNWAIT
                                  igb2   DOWN    
 
root@ap1.gf.net # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   68f501 membership 01
Port d gen   68f506 membership 01
 
root@ap1.gf.net # df -ah
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/s10s_u9wos_14a
                       274G    26G   238G    10%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                    53G   504K    53G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
/platform/sun4v/lib/libc_psr/libc_psr_hwcap2.so.1
                       264G    26G   238G    10%    /platform/sun4v/lib/libc_psr.so.1
/platform/sun4v/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
                       264G    26G   238G    10%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                    53G    72K    53G     1%    /tmp
swap                    53G    72K    53G     1%    /var/run
swap                    53G     0K    53G     0%    /dev/vx/dmp
swap                    53G     0K    53G     0%    /dev/vx/rdmp
applprod1              150G    40G   109G    27%    /applprod1
applprod2               98G    39G    59G    40%    /applprod2
rpool/export           274G    23K   238G     1%    /export
rpool/export/home      274G   3.6G   238G     2%    /export/home
rpool                  274G    97K   238G     1%    /rpool
-hosts                   0K     0K     0K     0%    /net
auto_home                0K     0K     0K     0%    /home
ap1.gf.net:vold(pid2375)
                         0K     0K     0K     0%    /vol
/dev/odm                 0K     0K     0K     0%    /dev/odm
root@ap1.gf.net # cfsmount all
  Error: V-35-410: Cluster Server not running on local node: to
 
root@ap1.gf.net # modinfo | egrep 'gab|llt|vxfen'
234 7aaea000  2cf88 331   1  llt (LLT 5.1SP1)
235 7ab0e000  5a338 332   1  gab (GAB device 5.1SP1)
236 7ab4c000  6a0c8 333   1  vxfen (VRTS Fence 5.1SP1)
 
root@ap1.gf.net # had -version
Engine Version    5.1
Join Version      5.1.10.0
Build Date        Fri Oct 01 07:30:00 2010
PSTAMP            5.1.100.000-5.1SP1-2010-09-30_23.30.00
 
root@ap1.gf.net # uname -a
SunOS ap1.gf.net 5.10 Generic_147440-19 sun4v sparc sun4v
 
This scripts below do not exist in our server:

/etc/rc2.d/S70llt

/etc/rc2.d/S92gab

/etc/rc3.d/S99vcs

 

 

 

kjbss's picture

Well 'had' is not running for some reason, but most everything else seems to be...

You may not see those 'rc'-scripts because llt, gab, and vcs may be under Solaris' SMF control on your system.  Check your SMF configuration and see when 'had' (vcs) should have been started.

What run-level is your system in?:

# who -r 

You may be in a run-level whereby SMF is not configured to run VCS ('had'), and then you would get an error like:  'Cluster Server not running on local node'

Either manually start VCS (via 'hastart') or transition your host to the appropriate run-level. 

-HTH

 

IT-SYSMIKE's picture

Hello,

It is on run-level 3.

 

root@ap1.gf.net # who -r 
   .       run-level 3  Nov 27 15:52     3      0  3

 

kjbss's picture

Have you tried to 'hastart' it yet?

Make sure to report back to us any error messages that go into VCS' message log (/opt/VRTSvcs/log/engine_A.log) and the Solaris messages log (/var/adm/messages) after you you ran 'hastart'...

Do you have a valid VCS license? -- if it has expired than you will get a message in those logs.

Run 'vxlicrep -s' and provide the output..., as well as the relevent output from the various messages files mentioned above...

-kjb

 

IT-SYSMIKE's picture

Yes we do have a valid license.

After running hastart again , below are the outputs of the logs and the command.

root@ap1.gf.net # tail -f messages
Nov 28 11:11:40 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status
Nov 28 11:11:40 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10625 Local cluster configuration valid
Nov 28 11:11:40 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11034 Registering for cluster membership
Nov 28 11:11:40 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-11035 Waiting for cluster membership
Nov 28 11:11:45 ap1.gf.net gab: [ID 316943 kern.notice] GAB INFO V-15-1-20036 Port h gen   589a10 membership 01
Nov 28 11:11:45 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS INFO V-16-1-10077 Received new cluster membership
Nov 28 11:11:45 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System ap1 (Node '0') is in Regular Membership - Membership: 0x3
Nov 28 11:11:45 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10086 System  (Node '1') is in Regular Membership - Membership: 0x3
Nov 28 11:11:45 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS NOTICE V-16-1-10075 Building from remote system
Nov 28 11:11:46 ap1.gf.net Had[9709]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
 
 
root@ap1.gf.net # tail -f /var/VRTSvcs/log/engine_A.log
2013/11/28 11:11:46 VCS NOTICE V-16-1-10181 Group VCShmg AutoRestart set to 1
2013/11/28 11:11:46 VCS INFO V-16-1-10466 End of snapshot received from node: 1.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0
2013/11/28 11:11:46 VCS NOTICE V-16-1-52006 UseFence=SCSI3. Fencing is enabled
2013/11/28 11:11:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:16 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:31 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:13:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:13:16 VCS CRITICAL V-16-1-10031 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing
 
 
root@ap1.gf.net # vxlicrep -s
 
Symantec License Manager vxlicrep utility version 3.02.51.010
Copyright (C) 1996-2010 Symantec Corporation. All rights reserved.
 
Creating a report on all VERITAS products installed on this system
 
   License Key                         = AJZU-IDP3-WNDY-HV23-4JVC-OWP8-CPO4-O4O6-P
   Product Name                        = VERITAS Storage Foundation for Cluster File System
   License Type                        = PERMANENT
 
 
   License Key                         = 
   Product Name                        = VERITAS File System
   License Type                        = PERMANENT
 
 
   License Key                         = 
   Product Name                        = VERITAS Database Edition for Oracle
   License Type                        = PERMANENT
 
 
   License Key                         = 
   Product Name                        = VERITAS Volume Manager
   License Type                        = PERMANENT
 
 
   License Key                         = 
   Product Name                        = VERITAS SANPoint Control
   License Type                        = PERMANENT
 
 
   License Key                         = AJZH-N3I9-JT7C-GT8O-PPPP-PPPP-PPPC-PAT8-P
   Product Name                        = VERITAS Cluster Server
   License Type                        = PERMANENT
 
 
   License Key                         = AJZ9-ER3F-C4NZ-34DV-JPPP-PO4O-6PPP-PP63-P
   Product Name                        = VERITAS File System
   License Type                        = PERMANENT
 
 
   License Key                         = AJZH-YYIC-RTFO-UZCR-P3O4-EPPP-PPPC-P838-P
   Product Name                        = VERITAS Volume Manager
   License Type                        = PERMANENT
 
Gaurav Sangamnerkar's picture

Also, make sure that "had -version" is same on both the nodes, above you have only pasted outputs from one node so can't confirm.

Also, as suggested above, try an hastart & let us know the output from engine_A.log

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Gaurav Sangamnerkar's picture

Hello

looks like you have made your cluster to use IOFencing however fencing is not configured correctly ..

refer below logs

2013/11/28 11:11:46 VCS NOTICE V-16-1-52006 UseFence=SCSI3. Fencing is enabled
2013/11/28 11:11:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

2013/11/28 11:12:16 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying..

 

See if your main.cf contains below line

UseFence = SCSI3

# cat /etc/VRTSvcs/conf/config/main.cf |grep -i usefence

 

Above line exists in main.cf, that means cluster is intended to use fencing which is not configured correctly.

IOFencing provided data protection from cluster split brain situations

 

Refer to VCS admin guide & see the article on how to configure IOFencing. If you do not intend to use IOFencing (which is not recommended), you can remove the entry from main.cf after stopping the cluster & start the cluster again.

 

Link to documentation

 

https://sort.symantec.com/documents

IOFencing link for VCS 5.1 on solaris

https://sort.symantec.com/public/documents/sf/5.1/...

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
Venkata Reddy Chappavarapu's picture

Please check your SMF services for any issues.

#svcs -a|egrep 'vxfen|vcs|llt|gab'

online         Oct_17   svc:/system/vxfen:default
online         Oct_17   svc:/system/llt:default
online         Oct_17   svc:/system/gab:default
online         Oct_17   svc:/system/vcs:default
 
If any service is not online please check the reason why is it not online 
#svcs -xv vxfen
 
And you may check the SMF logs for more details.
 
#more /var/svc/log/system-vxfen\:default.log 
 
If fencing is not coming up you may configiure fencing with vxfenconfig command.
 
#vxfenadm -d
#vxfenconfig 
 
Thanks,
Venkat

Venkata Reddy Chappavarapu,

Sr. Manager,

Information Availability Group (VCS),

Symantec Corporation

===========================

PS: If you are happy with the answer provided, please mark the post as sol