Video Screencast Help

Solaris10 SPARC 5.0MP3 RP5 LLT GAB VXFEN not starting at boot

Created: 21 Apr 2014 | 18 comments

Hi.

Solaris10 SPARC 5.0MP3 RP5.

llt gab vxfen and vcs is not starting at boot in Solaris10.

During the boot i see the bellow message on the console.

-------------------------------------------------------------------------------------------

VxVM sysboot INFO V-5-2-3390 Starting restore daemon...
LLT INFO V-14-1-10009 LLT Protocol available
GAB INFO V-15-1-20021 GAB available

----------------------------------------------------------------------------------------------

But the services are not up.

# /etc/init.d/llt status
LLT: is loaded but not configured.

# /etc/init.d/gab status
GAB: module not configured

But if i issue the command explicitly then they will start.

# /etc/init.d/llt start
Starting LLT...
Starting LLT done.

# /etc/init.d/llt status
LLT: is loaded and configured.

# /etc/init.d/gab start
Starting GAB...
Starting GAB done.

# /etc/init.d/gab status
GAB: module is configured

Now here i am facing problem with vxfen.

In the log i see the message "VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying..."

I tried sevaral times to start vxfen, but no luck.

But if i do "vxfenconfig -c" then it will come up.

# /sbin/vxfenconfig -c
VXFEN vxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

And now the cluster is UP with "hastart".

I have to do this all the time in all the nodes whenever nodes/cluster reboots.

 

Someone please suggest what could be the issue and why the cluster services are not running at boot.

 

Thanks & Regards,

Shashi Kanth.

 

 

 

 

 

 

 

 

 

 

Operating Systems:

Comments 18 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hi,
Are you using fencing? Or just staring fencing in disabled mode?
If main.cf file contains "Usefence=SCSI3" then you have configured cluster to use fencing. If this is the case you should have coordinator disks and diskgroup setup. This diskgroup name should be updated in /etc/vxfendg file
Also you should have /etc/vxfenmode file populated indicating fencing mode (either raw or dmp).
If you are not using fencing, update /etc/vxfenmode file to reflect mode as "disabled".
It is highly recommended to use fencing in RAC setup or even with VCS. I believe you have issues with vxfenmode file

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

rsharma1's picture

Could you try increasing SLEEP_INTERVAL in /sbin/vxfen-startup as mentioned in this technote:

http://www.symantec.com/business/support/index?pag...

stinsong's picture

Hi Shashi,

For the fencing issue, I think you didn't configure it properly with main.cf, vxfendg, vxfentab and vxfenmode.

For the LLT / GAB issue, I think it may be related with Solaris system service configure status. You can check Solaris service configured status with "svcs -l <service>"

LLT enabled status should be like:

# svcs -l llt
fmri         svc:/system/llt:default
name         Veritas Low Latency Transport (LLT) Init service
enabled      true        <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< enabled status should be "true"
state        online        <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< current status
next_state   none
state_time   Mon Dec 23 22:19:46 2013
logfile      /var/svc/log/system-llt:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/network/initial (online)
dependency   optional_all/none svc:/network/routing/ndp:default (disabled)
 

 

stinsong's picture

To enable/disable Solaris services:

svcadm enable llt

svcadm disable llt

This will enable/disable service auto start/stop when OS booting.

shashi's picture

 

There is no issue with the fencing in the cluster.

# /sbin/vxfenadm -d

I/O Fencing Cluster Information:
================================

 Fencing Protocol Version: 201
 Fencing Mode: SCSI3
 Fencing SCSI3 Disk Policy: dmp
 Cluster Members:  

          0 (hyi01sehost85.ind.hp.com)
        * 1 (hyi01sehost83.ind.hp.com)

 RFSM State Information:
        node   0 in state  8 (running)
        node   1 in state  8 (running)

 

# cat /etc/VRTSvcs/conf/config/main.cf
include "types.cf"

cluster SF50MP3Sol10 (
        UserNames = { admin = gmnFmhMjnInnLvnHmk }
        Administrators = { admin }
        UseFence = SCSI3
        )

system "hyi01sehost83.ind.hp.com" (
        )

system "hyi01sehost85.ind.hp.com" (
        )

 

Now i changed the SLEEP_INTERVAL parameter from 5 to 25 in /sbin/vxfen-startup file.

# cat /sbin/vxfen-startup | grep SLEEP_INTERVAL
                SLEEP_INTERVAL=25

 

Now i rebooted both machines. But still i see the same issue.

# /etc/init.d/llt status
LLT: is loaded but not configured.

# /etc/init.d/gab status
GAB: module not configured

Now if i start all services manually then it will come up.

 

 

 

 

shashi's picture

 

One point to inform is the SFM commands doesn't working for VCS.

# svcadm enable llt
svcadm: Pattern 'llt' doesn't match any instances

# svcs -l llt
svcs: Pattern 'llt' doesn't match any instances

 

 

shashi's picture

Fencing is properly configured in the cluster.

I have increased the SLEEP_INTERVAL parameter in the file /sbin/vxfen-startup from 5 to 25, and even in  /etc/init.d/vcs file i have added " sleep 180" as per the note http://www.symantec.com/business/support/index?page=content&id=TECH186884, but no luck. 

In one cluster node i see the messages like bellow.

VxFEN driver not configured. Retrying...
2014/04/22 13:12:22 VCS CRITICAL V-16-1-10031 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing
Retry limit of 12 exhausted trying for vxvm-recover to come up. Giving up.

 

I found the issue could be with VxVM which is not starting at boot properly. I don't see ant VxVM services were running after boot.

On the console, during boot, i see the messages like bellow.

VxVM sysboot INFO V-5-2-3409 starting in boot mode...

 

VxVM sysboot INFO V-5-2-3390 Starting restore daemon...

 

But after boot up, i don't see any VxVM related services were running.

Gaurav Sangamnerkar's picture

Are you saying that you don't find vxconfigd running ? are you able to execute any vx commands once server comes up ? do you see vxconfigd starting after sometime ?

though coordinator diskgroup is not imported but still the fencing module would be verifying the disks in coordinator diskgroup defined in /etc/vxfentab

can you paste below from both the nodes

# vxdisk -o alldgs list | grep -i fen   (if you have given any other name for fencing dg, show us the disks)

# cat /etc/vxfentab

# cat /etc/vxfenmode

# cat /etc/vxfendg

also for a note, I would suggest to remove server names when you paste outputs (for your own security)

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

starflyfly's picture

Hi, 

 

check if following file exist,

ls -l /etc/vx/reconfig.d/state.d/install-db

 

If yes, rm this file, restart server, test again.

 

 

 

If the answer has helped you, please mark as Solution.

Gaurav Sangamnerkar's picture

As the vxvm is starting in boot mode I would believe that install-db is not there

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

shashi's picture

 

Now i manually brough the cluster by starting all services manually.

# hastatus -sum

-- SYSTEM STATE
-- System               State                Frozen              

A  hyi01sehost83.ind.hp.com RUNNING              0                    
A  hyi01sehost85.ind.hp.com RUNNING              0      

 

# /sbin/vxfenadm -d

I/O Fencing Cluster Information:
================================

 Fencing Protocol Version: 201
 Fencing Mode: SCSI3
 Fencing SCSI3 Disk Policy: dmp
 Cluster Members:  

        * 0 (hyi01sehost85.ind.hp.com)
          1 (hyi01sehost83.ind.hp.com)

 RFSM State Information:
        node   0 in state  8 (running)
        node   1 in state  8 (running)

 

Now i rebooted all nodes.

# reboot

 

Now i see the trouble again.

# /etc/init.d/llt status
LLT: is loaded but not configured.

# /etc/init.d/gab status
GAB: module not configured

# /sbin/vxfenadm -d
VXFEN vxfenadm ERROR V-11-2-1115 Local node is not a member of cluster!

# ps -ef | grep vx
    root    54     1   0 14:53:11 ?           0:01 vxconfigd -x syslog -m boot
    root  1977  1834   0 15:02:41 pts/1       0:00 grep vx
    root   674     1   0 14:53:23 ?           0:01 /sbin/vxesd

And now if i start all services manaully then they wll come up after sevaral tries.

 

 

 

 

 

 

 

 

shashi's picture

Retry limit of 12 exhausted trying for vxvm-recover to come up. Giving up.

 

mikebounds's picture

It looks like you have an issue with LLT not starting as LLT starts first so even if there is a problem with fencing, llt should still start.

As you are using an old version of VCS, then it looks like it is using the old /etc/init.d scripts rather than svcs.

If llt starts manually, then LLT config files must be ok so issue is probably /etc/init.d/llt is not being called by boot process or it is being called too soon (I am not sure how Solaris integrates svcs with legacy /etc/init.d scripts to make sure they start in the right order)

I would make a copy of /etc/init.d/llt and then edit to add something like:

echo "LLT call script called at: "`date` >> /var/tmp/llt.log

and edit line that says "lltconfig -c" to:

lltconfig -c >> /var/tmp/llt.log 2>&1

and then check /var/tmp/llt.log after booting.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

robertino's picture

Not sure if you have fixed or resolved this issue, but I have seen this if your private links are using bonded interfaces.  

IF you are using bonded interfaces then the problem is the the bonded interfaces are not available when LLT starts and then as a result GAB want start and neither will VXFEN.

You can check /var/log/boot.log and you will see something like bonded interface not available.

If

 

robertino's picture

Oops sorry, I didn't read the post properly.  Bonded interfaces are only available in LINUX.  Please ignore my post :-)

robertino's picture

Please ignore my post.  If I had of read the post cleary I would have seen your running Solaris. Bonded interfaces are in LUNIX 

sajith_cr's picture

VCS 5.0MP3RP5 do not support SMF for service management. It uses rc script.

Could you check whether all the rc sctipts are in place for the VRSTllt, VRTSgab and VRTSvcs packages?

Following commands will tell you wether any packaged RC scripts are missing or not.

 

#pkgchk VRTSllt

#pkgchk VRTSgab

#pkgchk VRTSvxfen

#pkgchk VRTSvcs

 

Ideally /etc/rc2.d/S92gab must be a link to /etc/init.d/gab, /etc/rc2.d/S70llt link to /etc/init.d/llt and /etc/rc2.d/S97vxfen link to /etc/init.d/vxfen.

Ensure that these rc sctipts in place.

If all the above  RC scripts in place and still service is not starting, then it is possible that system is not going through run level 2 in your boot process. If that is the case, creates RC script links to the  run level you are booting into.

Regards,

Sajith

If this post has helped you, please vote or mark as solution.

starflyfly's picture

Hi, Check  3 files in  /etc/default:

 

bash-3.2# more llt
#
# This file is sourced :
#       from /etc/init.d/llt            for Solaris < 2.10
#       from /lib/svc/method/llt        for Solaris 2.10 
#
# Set the two environment variables below as follows:
#
#       1 = start or stop llt
#       0 = do not start or stop llt
#

LLT_START=1               <<<<<<
LLT_STOP=1
bash-3.2# more gab
#
# This file is sourced :
#       from /etc/init.d/gab            for Solaris < 2.10
#       from /lib/svc/method/gab        for Solaris 2.10 
#
# Set the two environment variables below as follows:
#
#       1 = start or stop gab
#       0 = do not start or stop gab
#

GAB_START=1  <<<<<<<<<<
GAB_STOP=1
bash-3.2# more vcs
# $Id: vcsconf_sun,v 1.5 2011/09/27 09:59:28 asontakk Exp $ #
# $Copyright: Copyright (c) 2012 Symantec Corporation.
# All rights reserved.
#
# THIS SOFTWARE CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF
# SYMANTEC CORPORATION.  USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED
# WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SYMANTEC CORPORATION.
#
# The Licensed Software and Documentation are deemed to be commercial
# computer software as defined in FAR 12.212 and subject to restricted
# rights as defined in FAR Section 52.227-19 "Commercial Computer
# Software - Restricted Rights" and DFARS 227.7202, "Rights in
# Commercial Computer Software or Commercial Computer Software
# Documentation", as applicable, and any successor regulations. Any use,
# modification, reproduction release, performance, display or disclosure
# of the Licensed Software and Documentation by the U.S. Government
# shall be solely in accordance with the terms of this Agreement.  $ #
#
# This file is sourced :
#      from /etc/init.d/vcs            for Solaris < 2.10
#      from /lib/svc/method/vcs        for Solaris 2.10
#
# option to vcs (i.e hastart)
# if ONENODE is set to _yes_, vcs will be started using -onenode option to
# form a single node cluster.
# possible values of ONENODE : yes/no (case sensitive)
ONENODE=no

# Set the two environment variables below as follows:
#
#       1 = start or stop VCS
#       0 = do not start or stop VCS
#

VCS_START=1  <<<<<<<<<<<<<
VCS_STOP=1
bash-3.2# pwd
/etc/default
bash-3.2# 

If the answer has helped you, please mark as Solution.