VCS ERROR V-16-2-13067 (nodexxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
Created: 13 Jul 2010 | Updated: 20 Aug 2010 | 9 comments
This issue has been solved. See solution.
2010/07/12 12:58:15 VCS ERROR V-16-2-13067 (nodexxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
2010/07/12 12:58:17 VCS INFO V-16-2-13068 (nodexxxx) Resource(Lvb-prd-DB-IP-Res) - clean completed successfully.
2010/07/12 12:58:17 VCS INFO V-16-1-10307 Resource Lvb-prd-DB-IP-Res (Owner: unknown, Group: LVB-PRD-DB-SG) is offline on nodexxxx (Not initiated by VCS)
2010/07/12 12:58:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Lvb-dbapp-Application-Res (Owner: unknown, Group: LVB-PRD-DB-SG) on System nodexxxx
2010/07/12 12:58:17 VCS INFO V-16-6-15004 (nodexxxx) hatrigger:Failed to send trigger for resfault; script doesn't exist.
2010/07/12 12:58:17 VCS INFO V-16-2-13068 (nodexxxx) Resource(Lvb-prd-DB-IP-Res) - clean completed successfully.
2010/07/12 12:58:17 VCS INFO V-16-1-10307 Resource Lvb-prd-DB-IP-Res (Owner: unknown, Group: LVB-PRD-DB-SG) is offline on nodexxxx (Not initiated by VCS)
2010/07/12 12:58:17 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Lvb-dbapp-Application-Res (Owner: unknown, Group: LVB-PRD-DB-SG) on System nodexxxx
2010/07/12 12:58:17 VCS INFO V-16-6-15004 (nodexxxx) hatrigger:Failed to send trigger for resfault; script doesn't exist.
after this my service group get faulted and takover to secondary node.
we have face the same issue 2 times in last 30 days.
we have solaris 10 with patch set of 2007-10 on Fujitsu Prime Power 650 hardware.
$ pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: Veritas-5.0-01/11/07-17:01:00
INSTDATE: Nov 13 2007 13:54
STATUS: completely installed
FILES: 160 installed pathnames
22 shared pathnames
2 linked files
45 directories
83 executables
141345 blocks used (approx)
can anyone tell me the what can be the cause of this issue? any suggestion to avoid this issue.
I am facing failover issue in my 2 node cluster.
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: Veritas-5.0-01/11/07-17:01:00
INSTDATE: Nov 13 2007 13:54
STATUS: completely installed
FILES: 160 installed pathnames
22 shared pathnames
2 linked files
45 directories
83 executables
141345 blocks used (approx)
can anyone tell me the what can be the cause of this issue? any suggestion to avoid this issue.
I am facing failover issue in my 2 node cluster.
Discussion Filed Under:
Comments 9 Comments • Jump to latest comment
It appears the IP resource went offline unexpectedly/outside VCS. Have you checked the system logs for any messages regarding network issues on the node during that time?
Additionally, you're running 5.0 GA (unpatched) - while it may not be related to this particular issue, it would be a good idea to look at patching to avoid running into known issues that have already been fixed.
If this post has helped you, please vote or mark as solution
Fully agree with Grace .... Check system logs, IP resource went offline outside to VCS......
you need to find why that happened....
Can you paste the resource defination from main.cf ?
Gaurav
PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
HEllo Lee,
Thanks for reply.
I have checked the system log, but could not find any relevent entry. below is from /var/adm/messages.
Jul 12 12:58:12 nodexxxxx inetd[2468]: [ID 317013 daemon.notice] auto_remote_PH3[19360] from 169.77.35.236 52655
Jul 12 12:58:15 nodexxxxx AgentFramework[8271]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13067 Thread(3) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
Jul 12 12:58:15 nodexxxxx Had[7682]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13067 (nodexxxxx) Agent is calling clean for resource(Lvb-prd-DB-IP-Res) because the resource became OFFLINE unexpectedly, on its own.
Jul 12 12:58:15 nodexxxxx inetd[2468]: [ID 317013 daemon.notice] auto_remote_PH3[19438] from 169.77.35.236 52664
Jul 12 12:58:17 nodexxxxx AgentFramework[8271]: [ID 702911 daemon.notice] VCS ERROR V-16-1-13068 Thread(3) Resource(Lvb-prd-DB-IP-Res) - clean completed successfully.
Thanks,
Ritesh
UserNames = { admin = JOOiOMnVOddqOWkJLjLL,
operator = dqqYrkPprLqlQoddqYrm }
Administrators = { admin }
Operators = { operator }
UseFence = SCSI3
)
system nodexxxx3 (
)
)
SystemList = { nodexxxx3 = 0, nodexxxx4 = 1 }
AutoStartList = { nodexxxx3 }
)
Critical = 0
User = root
StartProgram = "/etc/init.d/lvbdbappstart.ksh"
StopProgram = "/etc/init.d/lvbdbappstop.ksh"
MonitorProgram = "/etc/init.d/chkihs"
OfflineTimeout = 1200
OnlineTimeout = 1200
)
Critical = 0
User = root
StartProgram = "/opt/share/Tivoli/scripts/Tivoli_HA_Start.sh start"
StopProgram = "/opt/share/Tivoli/scripts/Tivoli_HA_Stop.sh stop"
PidFiles = { "/opt/share/Tivoli/lcf/dat/2/lcfd.pid" }
)
DiskGroup = tslvbappdg
)
DiskGroup = tslvbdbpdg
)
Device = fjgi0
Address = "169.19.201.11"
NetMask = "255.255.255.0"
)
Critical = 0
MountPoint = "/application1"
BlockDevice = "/dev/vx/dsk/tslvbappdg/application1"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
Critical = 0
MountPoint = "/var/mqm"
BlockDevice = "/dev/vx/dsk/tslvbappdg/mqm"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
MountPoint = "/var/mqm/log"
BlockDevice = "/dev/vx/dsk/tslvbappdg/mqm_log"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
MountPoint = "/opt/share"
BlockDevice = "/dev/vx/dsk/tslvbappdg/opt_share"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
Critical = 0
MountPoint = "/u01"
BlockDevice = "/dev/vx/dsk/tslvbdbpdg/u01"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
MountPoint = "/var/wmq_security"
BlockDevice = "/dev/vx/dsk/tslvbappdg/wmq_security"
FSType = vxfs
FsckOpt = "-y"
SecondLevelMonitor = 1
)
Device = fjgi0
)
SnmpConsoles = { "10.152.20.144" = Warning }
)
Volume = application1
DiskGroup = tslvbappdg
)
Volume = mqm
DiskGroup = tslvbappdg
)
Volume = mqm_log
DiskGroup = tslvbappdg
)
Volume = opt_share
DiskGroup = tslvbappdg
)
Volume = u01
DiskGroup = tslvbdbpdg
)
Volume = wmq_security
DiskGroup = tslvbappdg
)
Lvb-dbapp-Application-Res requires Tivoli-Application-Res
Lvb-dbapp-Application-Res requires application1-Mount-Res
Lvb-dbapp-Application-Res requires mqm_log-Mount-Res
Lvb-dbapp-Application-Res requires u01-Mount-Res
Lvb-dbapp-Application-Res requires wmq_security-Mount-Res
Lvb-prd-DB-IP-Res requires Lvb-prd-Nic-Res
Tivoli-Application-Res requires Lvb-prd-DB-IP-Res
Tivoli-Application-Res requires opt_share-Mount-Res
application1-Mount-Res requires application1-Volume-Res
application1-Volume-Res requires tslvbappdg-DiskGroup-Res
mqm-Mount-Res requires mqm-Volume-Res
mqm-Volume-Res requires tslvbappdg-DiskGroup-Res
mqm_log-Mount-Res requires mqm-Mount-Res
mqm_log-Mount-Res requires mqm_log-Volume-Res
mqm_log-Volume-Res requires tslvbappdg-DiskGroup-Res
opt_share-Mount-Res requires opt_share-Volume-Res
opt_share-Volume-Res requires tslvbappdg-DiskGroup-Res
u01-Mount-Res requires u01-Volume-Res
u01-Volume-Res requires tslvbdbpdg-DiskGroup-Res
wmq_security-Mount-Res requires wmq_security-Volume-Res
wmq_security-Volume-Res requires tslvbappdg-DiskGroup-Rescluster
Hi Ritesh,
doesn't see anything wrong with the resource defination...
Was there any manual activity at that time on server ? another thing I can think is, was the server extemely busy during the time this happened (there are rare chances of this since we should see affect on other resources as well)
If you have shell logs, try to see if any manual up/down of IP happened....
Gaurav
PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
Hello,
I have checked all logs of server, but could not find any relavent information.
I have found below symantec issue.
http://seer.entsupport.symantec.com/docs/325290.htm.
can this be the cause? if i want to upgrade then, what is the latest version to upgrade?
Currently i have installed VCS 5.0.
Regards,
Ritesh
Ritesh,
The problem/fix described in TN 325290 is for Application resources; in your case it was the IP resource that faulted, so the hotfix will not help you.
That said, as you appear to be running SF/VCS 5.0 GA (ie: unpatched), it would be a good idea to patch to ensure you have fixes for any known issues.
The latest version is 5.0MP3 RP4
First need to install 5.0MP3, see the following link:
https://vos.symantec.com/patch/detail/1326
then install the rolling patch (RP4):
for sparc: sfha-sol_sparc-5.0MP3RP4
https://vos.symantec.com/patch/detail/3781
More information about the 5.0MP3RP4 rolling patch here:
http://www.symantec.com/connect/blogs/new-rolling-...
If this post has helped you, please vote or mark as solution
Hello Ritesh,
I agree with lee and you should be upgrading the node to the latest patches. Also do you collect network performance stats on the host? Did you observer any issue there? Also did you find any similarity in both incidents such as date, time, etc?
Regards
Rajesh
Regards
Rajesh Regunta
---------------------------------------------------------------------------------------------------------------------
PS: Please mark this note as solution, if this helps.
Hello Ritesh,
As per the logs, the IP resource went offline outside VCS.
If you dont find anything for fjgi0 in system logs, then a manual intervention could be the reason. I experienced on solaris, a manual NIC down, unplumb does not log in /var/adm/messages.
You can check for any change on system on network device. For other obvious reasons, you got above answers :)
Regards,
~Anoop
~Anoop
Would you like to reply?
Login or Register to post your comment.