Video Screencast Help

VCS is trying to bring online SG on node B even though it is online on node A

Created: 05 Jul 2013 • Updated: 31 Jul 2013 | 8 comments
This issue has been solved. See solution.

I have two nodes VCS on solaris. Two SGs are configured as dependet:

nssitdb01-zone_sg is parralel, Autostart is on on both nodes. Failover service group dbhost-app_sg is dependet on nssitdb01-zone_sg, Autostart is on on both nodes.

After clean reboot (init 6) of node B (sirius) VCS started nssitdb01-zone_sg and then is trying to bring online dbhost-app_sg too, even though it is online on node A(arcturus)!

Hier is some info from engine_A.log:

2013/07/05 10:16:13 VCS NOTICE V-16-1-10438 Group nssitdb01-zone_sg has been probed on system sirius
2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius
2013/07/05 10:16:33 VCS NOTICE V-16-1-10447 Group nssitdb01-zone_sg is online on system sirius
2013/07/05 10:16:33 VCS WARNING V-16-1-50045 Initiating online of parent group dbhost-app_sg, PM will select the best node

2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-10162 Group dbhost-app_sg has not been fully probed on system sirius
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS NOTICE V-16-1-10438 Group dbhost-app_sg has been probed on system sirius
2013/07/05 10:16:53 VCS INFO V-16-1-50007 Initiating auto-start online of group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group dbhost-app_sg on all nodes
2013/07/05 10:20:16 VCS ERROR V-16-1-10205 Group dbhost-app_sg is faulted on system sirius
2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
 

 

Only zfs pools and IP prevent to go online of dbhost-app_sg on Node B:

..

2013/07/05 10:17:44 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-admin:online:zpool import limsdb-admin failed. Try again using the force import -f option
2013/07/05 10:17:45 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-archivedata:online:zpool import limsdb-archivedata failed. Try again using the force import -f option
2013/07/05 10:17:48 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-datafiles:online:zpool import limsdb-datafiles failed. Try again using the force import -f option
2013/07/05 10:17:54 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-indexfiles:online:zpool import limsdb-indexfiles failed. Try again using the force import -f option
..

013/07/05 10:16:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online
2013/07/05 10:17:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online

..

 

main.cf:

-------------------------------------------------------

group dbhost-app_sg (
    SystemList = { sirius = 1, arcturus = 0 }
    ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    AutoStartList = { arcturus, sirius }
    Administrators = { z_nssitdb01-zone_arcturus, z_nssitdb01-zone_sirius }
    )
...

requires group nssitdb01-zone_sg online local firm
--------------------------------------------------------

group nssitdb01-zone_sg (
        SystemList = { arcturus = 0, sirius = 1 }
    ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    Parallel = 1
    AutoStartList = { arcturus, sirius }
    Administrators = { z_nssitdb01-zone_sirius, z_nssitdb01-zone_arcturus }
    )

        FileNone nssitdb01-zone-root_FileNone (
        PathName = "/export/home/nssitdb01-zone/root/.vcs-FileNone-agent"
        )

        Zone nssitdb01-zone (
        Critical = 0
        DetachZonePath = 0
        )

        nssitdb01-zone requires nssitdb01-zone-root_FileNone
 

---------------------------------------------------------------------------------

Operating Systems:
Discussion Filed Under:

Comments 8 CommentsJump to latest comment

arangari's picture

what version of VCS are you running? also can you provide the logs where the failover group is declared as ONLINE.

 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

mcapler's picture

It is "relatively" complex VCS configuration. I've attached some screenshots, if it helps :)

I'v cuted engine_A.log to have only data from last reboot of both nodes (not at once) on 2013/07/03 till today.

IPs are changed from aaa.bbb.*.* to XXX.YYY.*.*

Version of VCS:

SPARC64, Solaris 10 8/11 (Update 10)

# pkginfo -l VRTSvcs                    
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  5.1.103.000-5.1SP1RP3-2012-09-13_16.00.00
  INSTDATE:  Oct 02 2012 16:17
    STATUS:  completely installed
     FILES:      284 installed pathnames
                  26 shared pathnames
                  61 directories
                 105 executables
              237190 blocks used (approx)

 

 

SG-View.png nssitdb01-zone_sg.png dbhost-app_sg.png
AttachmentSize
engine_A.log-2013-07.gz 35.48 KB
arangari's picture

I agree with you that the online of the failover group should not be evaluated. You may want to try this with latest version of VCS if you have.

i think this issue must be already fixed in later versions - you may want to check the same on https://sort.symantec.com/documents

 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

SOLUTION
mcapler's picture

I do no agree with status of this as Solved! It is not solved all. goto https://sort.symantec.com/documents or goto google.com is not a solution.

 

Thank you.

Daniel Matheus's picture

Hi mcapler,

 

I agree google is not a solution.

 

The Message:

2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus

Is completely normal during failover as VCS checks all cluster nodes as possible failover targets.

 

Regarding the zpool import error.

Can you please check whether the zpool mountpoint automount is set to legacy?

Please see the bundled agents guide for details (page 72)

http://sfdoccentral.symantec.com/sf/5.1SP1/solaris...

Can you import the zpool manually on the command line?

Are there any zpool related errors logged in the system log?

If you try to import manually, do you get any error?

 

Regarding the IP resource error.

This is quite straigt forward, you try to online an IP address that is already in use on another node.

You need to make sure to use unique IP addresses.

 

Cheers,
Daniel

 

If this post has helped you, please vote or mark as solution

mcapler's picture

Hello Daniel

yes all mount points are set to legacy. Zpools are not a problem but benefit. Zpool and MultinicB prevent SG to go online twice.

nssitdb01-zone_sg is parallel SG - online on both Nodes
dbhost-app_sg is failover SG depend on nssitdb01-zone_sg

Autostart for dbhost-app_sg is NodeA,NodeB, dbhost-app_sg is online on NodeA

NodeB is rebooted an comes up:

2013/07/05 10:16:33 VCS NOTICE V-16-1-10447 Group nssitdb01-zone_sg is online on system NodeB
2013/07/05 10:16:33 VCS WARNING V-16-1-50045 Initiating online of parent group dbhost-app_sg, PM will select the best node
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating NodeB as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-10162 Group dbhost-app_sg has not been fully probed on system NodeB
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating NodeA as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system NodeA
2013/07/05 10:16:53 VCS NOTICE V-16-1-10438 Group dbhost-app_sg has been probed on system NodeB
2013/07/05 10:16:53 VCS INFO V-16-1-50007 Initiating auto-start online of group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating NodeA as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group dbhost-app_sg on all nodes
.....
2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource listener_32 (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB
...
2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_dbhost-appdata (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB
2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_giedb-admin (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB
2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_giedb-archivedata (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB
....

2013/07/05 10:17:13 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_giedb-admin:online:zpool import giedb-admin failed. Try again using the force import -f option
2013/07/05 10:17:14 VCS INFO V-16-1-10299 Resource nic_bge10002 (Owner: Unspecified, Group: dnshost-app_sg) is online on sirius (Not initiated by VCS)
2013/07/05 10:17:15 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_giedb-archivedata:online:zpool import giedb-archivedata failed. Try again using the force import -f option
2013/07/05 10:17:15 VCS INFO V-16-1-10299 Resource nic_bge21002 (Owner: Unspecified, Group: dnshost-app_sg) is online on sirius (Not initiated by VCS)
...
2013/07/05 10:17:23 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_dbhost-appdata:online:zpool import dbhost-appdata failed. Try again using the force import -f option
2013/07/05 10:17:27 VCS INFO V-16-2-13716 (NodeB) Resource(zpool_giedb-archivedata): Output of the completed operation (online)
==============================================
cannot import 'giedb-archivedata': pool may be in use from other system, it was last accessed by NodeA (hostid: 0x809947b2) on Fri Jul 5 09:12:36 2013
use '-f' to import anyway
==============================================

Yes, it is all Online on NodeA !

Have a nice day.

mcapler

mikebounds's picture

My understanding of issue is that if failover service group dbhost-app_sg which is online local firm dependent on parallel group nssitdb01-zone_sg is online on arcturus, then when VCS starts on sirius, group dbhost-app_sg starts on sirius, but it shouldn't as dbhost-app_sg is a failover group and already online on arcturus.  I think I have seen something similar to this issue before, but I had PreOnline scripts, so a second node would try to online a group while that group was already onlining, but was in PreOnline script.

Could you provide more logs as there seems to be entries missing as I don't see the entry "Initiating online of group nssitdb01-zone_sg" and logs says "Group dbhost-app_sg is online or faulted on system arcturus" and if group is faulted on arcturus, then online of dbhost-app_sg on sirius is ok, so do the logs show that dbhost-app_sg is online on arcturus.

Another oddity if that I see entries:

2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius

So first VCS says group dbhost-app_sg is "offline" and then in the same second it says it is "online or faulted", but for the group to transition from offline to online or faulted a resource must have gone online, but this is not reported in the logs.

If your issue is a bug then it MAY be fixed in 6.0, but before upgrading to 6.0 you would want to know that this incident is fixed and I couldn't find anything in VCS 6.0 release notes to say this issue had been identified as an incident.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

mcapler's picture

The logs are attached below.

There are no messages like "Initiating online of group nssitdb01-zone_sg"  but "auto-start online":

..

2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius

...

 

mcapler.

PS.  I see it now it is not a question for community, it is question for Support. But I thought it will better to ask it here because It is almost very stressful to come throw First-Level Support.

BTW. I have in this moment a couple of Cases opened and I say you I'm tired. I'm tired to answer a beginner questions, the people trying to send you throw basics. of course it can work but not for my setup with zfs, 36 zpools over iSCSI from two S7320, VCS, Parallel und Failover Solaris Zones,  Clustered Nebackup, 5 Oracle DBs, one Samba-fileserver, BIND-DNS, DHCP on two Cluster Nodes.

Okay forget it.  Close the thread, but do not mark it as resolved.

In three weeks when I come from Holyday and then go throw the First-Second-Level-Support hell cause of this case, I will post the “Solution”. :)