Video Screencast Help

VCS HA over Solaris 10

Created: 14 Dec 2011 | 7 comments
Caushiph Unvar's picture

Hi All,

 

I have Veritas HA cluster running over Solaris 10 x86 plateform. day before yesterday, the cluster stopped working and the file system gone. the admin in office at that time reboot the both servers and after that, disks are available in format but in vxdisk list they are in error mode. i tried to run vxdctl enable but even after that its not comming out of this error state. i know, that it happens with new assigned lun and the only thing you have to do is format and label it. and after that when you run vxdctl enable, it comes out of error status.

followin are the out put from the cluster

(root_sahomxapp01u)@/var/VRTSvcs/log # vxdisk -e list

DEVICE       TYPE           DISK        GROUP        STATUS

OS_NATIVE_NAME   ATTR

c1t0d0s2     auto:none      -            -           online invalid     c1t0d0s2         -

emc_clariion0_0 auto:cdsdisk   -            -           error         c0t60060160CA401700584727D8D2ECE011d0s2 -

emc_clariion0_1 auto:cdsdisk   -            -           error         c0t60060160CA401700D431FBE3D2ECE011d0s2 -

 

vxdisk -o alldgs list

DEVICE       TYPE            DISK         GROUP        STATUS

c1t0d0s2     auto:none       -            -            online invalid

emc_clariion0_0 auto:cdsdisk    -            -            error

emc_clariion0_1 auto:cdsdisk    -            -            error

 

(root_sahomxapp01u)@/var/VRTSvcs/log # hastatus -summ

 

-- SYSTEM STATE

-- System               State                Frozen

 

A  sahomxapp01u         RUNNING              0

A  sahomxapp02u         RUNNING              0

 

-- GROUP STATE

-- Group           System               Probed     AutoDisabled    State

 

B  App_Group       sahomxapp01u         Y          N               OFFLINE|FAULTED

B  App_Group       sahomxapp02u         Y          N               OFFLINE|STARTING|FAULTED

B  ClusterService  sahomxapp01u         Y          N               ONLINE

B  ClusterService  sahomxapp02u         Y          N               OFFLINE

 

-- RESOURCES FAILED

-- Group           Type                 Resource             System

 

D  App_Group       DiskGroup            App_dg               sahomxapp01u

D  App_Group       DiskGroup            App_dg               sahomxapp02u

 

-- RESOURCES ONLINING

-- Group           Type            Resource             System       IState

 

F  App_Group       DiskGroup       App_dg               sahomxapp02u        W_ONLINE

 

 

 

 

The disk is available in format and i can see its partition table.

AVAILABLE DISK SELECTIONS:

       0. c0t60060160CA401700D431FBE3D2ECE011d0 <DGC-RAID5-0226 cyl 4349 alt

2 hd 16 sec 3012>

          /scsi_vhci/disk@g60060160ca401700d431fbe3d2ece011

       1. c0t60060160CA401700584727D8D2ECE011d0 <DGC-RAID5-0226 cyl 4349 alt

2 hd 16 sec 3012>

          /scsi_vhci/disk@g60060160ca401700584727d8d2ece011

       2. c1t0d0 <DEFAULT cyl 17747 alt 2 hd 255 sec 63>

          /pci@0,0/pci8086,3a40@1c/pci1014,3b2@0/sd@0,0

 

 

luxadm -e port ------------ shows proper connectivity

cfgadm -al ---------- shows proper configured controller,

 

no error found in /var/adm/messages and also while rebooting the server. can anyone suggest please where should i look into?

Comments 7 CommentsJump to latest comment

Marianne's picture

All cluster and device level error messages are normally logged in /var/adm/messages, but syslog needs to be running. Please check/verify.

Also check if messages file was not renamed in the meantime (should be messages.0)

Please extract and post all the info for 2 days ago (when the error occurred) from engine_A.log (/var/VRTSvcs/log/).

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

TonyGriffiths's picture

Hi

Is the issue observed on all nodes in the cluster or just one ?

Can you confirm that the disk has a valid partiion table ?

cheers

tony

Caushiph Unvar's picture

Hi All,

 

Thanks you for reply, Logs are attached for your review.

@Tony: yes observed on both nodes, i'll share the partition table shortly

 

regards,

 

 

AttachmentSize
Engine_Alog_server1.txt 944.63 KB
Engine_Alog_Server2.txt 895.17 KB
varadmlog_server1.txt 574.69 KB
varadmlog_server2.txt 518.84 KB
Marianne's picture

You need to find out what led up to all the filesystem errors on system sahomxapp02u - your system log starts at 09:31 with 'message no 11' of a filesystem error.

Dec 12 09:31:07 sahomxapp02u vxfs: [ID 702911 kern.warning] WARNING: msgcnt 11 mesg 008: V-2-8: vx_direrr: vx_dirscan_2 - /var/mqm file system dir inode 2 dev/block 0/16967 dirent inode 0 error 6

'Something' happened to disks that seemed to have caused corruption.

VCS  did not stop working - as you can see in the engine log, VCS was desperately trying to import the disk group, but could not find any valid diskgroup.

The good news is that VxVM is making regular backups of diskgroup configuration to /etc/vx/cbr/bk.

See https://sort.symantec.com/public/documents/sf/5.0/...

The vxconfigrestore utility is used to restore a disk group's configuration information if this has been lost or become corrupted. The disk group whose configuration is to be restore is specified either by name or by ID.

Any disks whose private region headers have become corrupted are reinstalled when the disk group configuration is restored.

 

If you don't feel comfortable attempting vxconfigrestore on your own, please log a support call. A support engineer will guide you through the steps.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Caushiph Unvar's picture

Thank you Marianne for your consideration,

now after reading your reply, i'll first review the old /var/adm/messages and try to find the orignal reason for this corruption mean while, i'll do following

 

vxconfigrestore -n mydg

and then verify the configuration with vxprint -hrt , if found okay then

vxconfigrestore -p mydg

vxprint -hrt [verify again :) ]

vxconfigrestore -c mydg

 

please comment.

 

Regards,

Marianne's picture

I will feel a lot more comfortable if the config restore is done with the assistance of a Symantec Support engineer who will first examine the contents of the diskgroup backups.

vxconfigrestore will restore the the private region of the disks, but if corruption occurred at filesystem level, you will need to restore from backup...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Caushiph Unvar's picture

Yes Marianne ! the issue esclated to support and they found the filesystem level corrouption so suggested to restore from backup. Thanks alot for your guidance

 

-Regards