Video Screencast Help

VCS ERROR V-16-2-13067 (nodexxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.

Created: 13 Jul 2010 • Updated: 20 Aug 2010 | 9 comments
This issue has been solved. See solution.
2010/07/12 12:58:15 VCS ERROR V-16-2-13067 (nodexxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
2010/07/12 12:58:17 VCS INFO V-16-2-13068 (nodexxxx) Resource(Lvb-prd-DB-IP-Res) - clean completed successfully.
2010/07/12 12:58:17 VCS INFO V-16-1-10307 Resource Lvb-prd-DB-IP-Res (Owner: unknown, Group: LVB-PRD-DB-SG) is offline on nodexxxx (Not initiated by VCS)
2010/07/12 12:58:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Lvb-dbapp-Application-Res (Owner: unknown, Group: LVB-PRD-DB-SG) on System nodexxxx
2010/07/12 12:58:17 VCS INFO V-16-6-15004 (nodexxxx) hatrigger:Failed to send trigger for resfault; script doesn't exist.
after this  my service group get faulted and takover to secondary node.
we have face the same issue 2 times in last 30 days.
we have solaris 10 with patch set of 2007-10 on Fujitsu Prime Power 650 hardware.
$ pkginfo -l VRTSvcs
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.0
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  Veritas-5.0-01/11/07-17:01:00
  INSTDATE:  Nov 13 2007 13:54
    STATUS:  completely installed
     FILES:      160 installed pathnames
                  22 shared pathnames
                   2 linked files
                  45 directories
                  83 executables
              141345 blocks used (approx)
can anyone tell  me the what can be the cause of this issue? any suggestion to avoid this issue.
I am facing failover issue in my 2 node cluster.

Comments 9 CommentsJump to latest comment

g_lee's picture

It appears the IP resource went offline unexpectedly/outside VCS. Have you checked the system logs for any messages regarding network issues on the node during that time?

Additionally, you're running 5.0 GA (unpatched) - while it may not be related to this particular issue, it would be a good idea to look at patching to avoid running into known issues that have already been fixed.

If this post has helped you, please vote or mark as solution

SOLUTION
Gaurav Sangamnerkar's picture

Fully agree with Grace .... Check system logs,  IP resource went offline outside to VCS......

you need to find why that happened....

Can you paste the resource defination from main.cf ?

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Ritesh1711's picture

HEllo Lee,

Thanks for reply.

I have checked the system log, but could not find any relevent entry. below is from /var/adm/messages.

Jul 12 12:58:12 nodexxxxx inetd[2468]: [ID 317013 daemon.notice] auto_remote_PH3[19360] from 169.77.35.236 52655
Jul 12 12:58:15 nodexxxxx AgentFramework[8271]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13067 Thread(3) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
Jul 12 12:58:15 nodexxxxx Had[7682]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13067 (nodexxxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
Jul 12 12:58:15 nodexxxxx inetd[2468]: [ID 317013 daemon.notice] auto_remote_PH3[19438] from 169.77.35.236 52664
Jul 12 12:58:17 nodexxxxx AgentFramework[8271]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13068 Thread(3) Resource(Lvb-prd-DB-IP-Res) - clean completed successfully.

Thanks,
Ritesh

Ritesh1711's picture
LVB-HK-PROD-Cluster (
        UserNames = { admin = JOOiOMnVOddqOWkJLjLL,
                 operator = dqqYrkPprLqlQoddqYrm }
        Administrators = { admin }
        Operators = { operator }
        UseFence = SCSI3
        )
system nodexxxx3 (
        )
system nodexxxx4 (
        )
group LVB-PRD-DB-SG (
        SystemList = { nodexxxx3 = 0, nodexxxx4 = 1 }
        AutoStartList = { nodexxxx3 }
        )
        Application Lvb-dbapp-Application-Res (
                Critical = 0
                User = root
                StartProgram = "/etc/init.d/lvbdbappstart.ksh"
                StopProgram = "/etc/init.d/lvbdbappstop.ksh"
                MonitorProgram = "/etc/init.d/chkihs"
                OfflineTimeout = 1200
                OnlineTimeout = 1200
                )
        Application Tivoli-Application-Res (
                Critical = 0
                User = root
                StartProgram = "/opt/share/Tivoli/scripts/Tivoli_HA_Start.sh start"
                StopProgram = "/opt/share/Tivoli/scripts/Tivoli_HA_Stop.sh stop"
                PidFiles = { "/opt/share/Tivoli/lcf/dat/2/lcfd.pid" }
                )
        DiskGroup tslvbappdg-DiskGroup-Res (
                DiskGroup = tslvbappdg
                )
        DiskGroup tslvbdbpdg-DiskGroup-Res (
                DiskGroup = tslvbdbpdg
                )
        IP Lvb-prd-DB-IP-Res (
                Device = fjgi0
                Address = "169.19.201.11"
                NetMask = "255.255.255.0"
                )
        Mount application1-Mount-Res (
                Critical = 0
                MountPoint = "/application1"
                BlockDevice = "/dev/vx/dsk/tslvbappdg/application1"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        Mount mqm-Mount-Res (
                Critical = 0
                MountPoint = "/var/mqm"
                BlockDevice = "/dev/vx/dsk/tslvbappdg/mqm"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        Mount mqm_log-Mount-Res (
                MountPoint = "/var/mqm/log"
                BlockDevice = "/dev/vx/dsk/tslvbappdg/mqm_log"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        Mount opt_share-Mount-Res (
                MountPoint = "/opt/share"
                BlockDevice = "/dev/vx/dsk/tslvbappdg/opt_share"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        Mount u01-Mount-Res (
                Critical = 0
                MountPoint = "/u01"
                BlockDevice = "/dev/vx/dsk/tslvbdbpdg/u01"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        Mount wmq_security-Mount-Res (
                MountPoint = "/var/wmq_security"
                BlockDevice = "/dev/vx/dsk/tslvbappdg/wmq_security"
                FSType = vxfs
                FsckOpt = "-y"
                SecondLevelMonitor = 1
                )
        NIC Lvb-prd-Nic-Res (
                Device = fjgi0
                )
        NotifierMngr LVB-SNMP-NMGR (
                SnmpConsoles = { "10.152.20.144" = Warning }
                )
        Volume application1-Volume-Res (
                Volume = application1
                DiskGroup = tslvbappdg
                )
        Volume mqm-Volume-Res (
                Volume = mqm
                DiskGroup = tslvbappdg
                )
        Volume mqm_log-Volume-Res (
                Volume = mqm_log
                DiskGroup = tslvbappdg
                )
        Volume opt_share-Volume-Res (
                Volume = opt_share
                DiskGroup = tslvbappdg
                )
        Volume u01-Volume-Res (
                Volume = u01
                DiskGroup = tslvbdbpdg
                )
        Volume wmq_security-Volume-Res (
                Volume = wmq_security
                DiskGroup = tslvbappdg
                )
        LVB-SNMP-NMGR requires Lvb-prd-Nic-Res
        Lvb-dbapp-Application-Res requires Tivoli-Application-Res
        Lvb-dbapp-Application-Res requires application1-Mount-Res
        Lvb-dbapp-Application-Res requires mqm_log-Mount-Res
        Lvb-dbapp-Application-Res requires u01-Mount-Res
        Lvb-dbapp-Application-Res requires wmq_security-Mount-Res
        Lvb-prd-DB-IP-Res requires Lvb-prd-Nic-Res
        Tivoli-Application-Res requires Lvb-prd-DB-IP-Res
        Tivoli-Application-Res requires opt_share-Mount-Res
        application1-Mount-Res requires application1-Volume-Res
        application1-Volume-Res requires tslvbappdg-DiskGroup-Res
        mqm-Mount-Res requires mqm-Volume-Res
        mqm-Volume-Res requires tslvbappdg-DiskGroup-Res
        mqm_log-Mount-Res requires mqm-Mount-Res
        mqm_log-Mount-Res requires mqm_log-Volume-Res
        mqm_log-Volume-Res requires tslvbappdg-DiskGroup-Res
        opt_share-Mount-Res requires opt_share-Volume-Res
        opt_share-Volume-Res requires tslvbappdg-DiskGroup-Res
        u01-Mount-Res requires u01-Volume-Res
        u01-Volume-Res requires tslvbdbpdg-DiskGroup-Res
        wmq_security-Mount-Res requires wmq_security-Volume-Res
        wmq_security-Volume-Res requires tslvbappdg-DiskGroup-Rescluster
Gaurav Sangamnerkar's picture

Hi Ritesh,

doesn't see anything wrong with the resource defination... 

Was there any manual activity at that time on server ?  another thing I can think is, was the server extemely busy during the time this happened (there are rare chances of this since we should see affect on other resources as well)

If you have shell logs, try to see if any manual up/down of IP happened....

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Ritesh1711's picture

Hello,

I have checked all logs of server, but could not find any relavent information.

I have found below symantec issue.

http://seer.entsupport.symantec.com/docs/325290.htm.

can this be the cause? if i want to upgrade then, what is the latest version to upgrade?

Currently i have installed VCS 5.0.

Regards,
Ritesh

g_lee's picture

Ritesh,

The problem/fix described in TN 325290 is for Application resources; in your case it was the IP resource that faulted, so the hotfix will not help you.

That said, as you appear to be running SF/VCS 5.0 GA (ie: unpatched), it would be a good idea to patch to ensure you have fixes for any known issues.

The latest version is 5.0MP3 RP4

First need to install 5.0MP3, see the following link:
https://vos.symantec.com/patch/detail/1326

then install the rolling patch (RP4):
for sparc: sfha-sol_sparc-5.0MP3RP4
https://vos.symantec.com/patch/detail/3781

More information about the 5.0MP3RP4 rolling patch here:
http://www.symantec.com/connect/blogs/new-rolling-...

If this post has helped you, please vote or mark as solution

rregunta's picture

Hello Ritesh,

I agree with lee and you should be upgrading the node to the latest patches. Also do you collect network performance stats on the host? Did you observer any issue there? Also did you find any similarity in both incidents such as date, time, etc?

Regards
Rajesh

 

Regards

Rajesh Regunta

---------------------------------------------------------------------------------------------------------------------

PS: Please mark this note as solution, if this helps.

Anoop_Kumar's picture

Hello Ritesh,

As per the logs, the IP resource went offline outside VCS.

If you dont find anything for fjgi0 in system logs, then a manual intervention could be the reason. I experienced on solaris, a manual NIC down, unplumb does not log in /var/adm/messages.

You can check for any change on system on network device. For other obvious reasons, you got above answers :)

Regards,
~Anoop

~Anoop