Video Screencast Help

unexpected offline of a resource is not logged as FAULTED in the engine log

Created: 07 Jun 2009 • Updated: 21 May 2010 | 5 comments

 
Here is a simpletest .
I have a fileonoff resource . i delete the file .The resource becomes faulted.
In the enginelog there is no mention of the resource being faulted.and the clean action is taken

This does not matter if the resource is a critical or not also the type of the resource.
this happens in 4.0, 4.1 and 5.0 as well

but when the DBG_TRACE is added it displays as RESOURCE FAULTED.

Steps to reproduce the issue 
1. create a sg with one fileonoff resource
2. online the SG .
3. rm the file configured.
check the engine log

ass tags halog -addtags DBG_TRACE

Now perform the same operation you can see the difference.

Noticed that only when "monitor times out" the faulted message is in the engine log.

From the user's guide it is not that clear whether it will hog the FAULTED message of not:

VCS considers a resource faulted in the following situations:
■ When the resource state changes unexpectedly. For example, an online
resource going offline. <<<< 

■ When a required state change does not occur. For example, a resource failing
to go online or offline when commanded to do so.
In many situations, VCS agents take predefined actions to correct the issue
before reporting resource failure to the engine. For example, the agent may try
to bring a resource online several times before declaring a fault.
When a resource faults, VCS takes automated actions to “clean up the faulted
resource. The Clean function makes sure the resource is completely shut down
before bringing it online on another node. This prevents concurrency violations.
When a resource faults, VCS takes all resources dependent on the faulted
resource offline. The fault is thus propagated in the service group



Comments 5 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hello,

Just to clarify, VCS agent will surely take all the actions before declaring it faulted....

So if file is deleted.... Agent will detect that something happened outside to VCS (here you should see message in Log like, "resource become offline unexpectedly on its own, followed by resource is offline (not initiated by VCS) ), I wouldn't expect VCS to declare the fault until next step mentioned below is completed...

As soon VCS detects it, it should first complete 4 monitor cycles (by default) & then call for a clean action... If 2 attempts of clean also fails, resource should be declared faulted...

Do you say that even after clean is called, agent is not faulting the resource ?

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Sundar Rajan's picture

THanks Gaurav for your reply.
I could not copy the entire log.
This is what happens.

1. Agent detects that the resource went offile unexpectedly.
2. Then it calls the clean action.

when we do a hares -display it shows the resource state as "FAULTED"
and the SG state goes to "ONLINE|PARTIAL"

What is not happening is the the message " RESOURCE FAULTED" is not logged 
in the engine log. It appears what the "halog -addtags DBG_TRACE" is set.

Every things works as designed except the logging does not happen at default log level
when the resource goes from ONLINE to UNEXPECTED OFFLINE. Th

Sundar Rajan's picture

[root@localhost ~]# halog -info
Log on sundar:
path = /var/VRTSvcs/log/engine_A.log
maxsize = 33554432 bytes
tags =
flushtags =

[root@localhost ~]# cat /etc/VRTSvcs/conf/config/main.cf
include "vcsApacheTypes.cf"
include "types.cf"

cluster sun (
UserNames = { admin = gNOgNInKOjOOmWOiNL }
Administrators = { admin }
CounterInterval = 5
)

system sundar (
)

group testgrp (
SystemList = { sundar = 0 }
AutoStartList = { sundar }
)

FileOnOff fileon (
Critical = 0
PathName = "/tmp/file1"
)

// resource dependency tree
//
// group testgrp
// {
// FileOnOff fileon
// }

2009/06/12 18:22:22 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
2009/06/12 18:22:22 VCS NOTICE V-16-1-10301 Initiating Online of Resource fileon (Owner: unknown, Group: testgrp) on System sundar
2009/06/12 18:22:22 VCS INFO V-16-1-10298 Resource fileon (Owner: unknown, Group: testgrp) is online on sundar (VCS initiated)
2009/06/12 18:22:22 VCS NOTICE V-16-1-10447 Group testgrp is online on system sundar
2009/06/12 18:22:24 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist
#########START unexpected offline #####################################

2009/06/12 18:24:23 VCS ERROR V-16-2-13067 (sundar) Agent is calling clean for resource(fileon) because the resource became OFFLINE unexpectedly, on its own.
2009/06/12 18:24:23 VCS INFO V-16-2-13068 (sundar) Resource(fileon) - clean completed successfully.
2009/06/12 18:24:23 VCS INFO V-16-1-10307 Resource fileon (Owner: unknown, Group: testgrp) is offline on sundar (Not initiated by VCS)
2009/06/12 18:24:23 VCS ERROR V-16-1-10212 TargetCount dropped below zero; setting to zero
2009/06/12 18:24:23 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
2009/06/12 18:24:23 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for resfault; script doesn't exist
2009/06/12 18:24:23 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist

[root@localhost tmp]# hares -display fileon
#Resource Attribute System Value
fileon Group global testgrp
fileon Type global FileOnOff
fileon AutoStart global 1
fileon Critical global 0
fileon Enabled global 1
fileon LastOnline global sundar
fileon MonitorOnly global 0
fileon ResourceOwner global unknown
fileon TriggerEvent global 0
fileon ArgListValues sundar /tmp/file1
fileon ConfidenceLevel sundar 0
fileon Flags sundar
fileon IState sundar not waiting
fileon Probed sundar 1
fileon Start sundar 1
fileon State sundar FAULTED  <<<<<<<<<<<<<<<<<<<<<<<<<<<
fileon ComputeStats global 0
fileon PathName global /tmp/file1
fileon ResourceInfo global State Stale Msg TS
fileon MonitorTimeStats sundar Avg 0 TS
[root@localhost tmp]#

After adding the tags:
[root@localhost tmp]# hares -clear fileon
[root@localhost tmp]# halog -addtags DBG_TRACE
[root@localhost tmp]# hagrp -online testgrp -any
VCS NOTICE V-16-1-50735 Attempting to online group on system sundar
[root@localhost tmp]#

############## After adding the tag DBG_TRACE #################################

2009/06/12 18:30:21 VCS ERROR V-16-2-13067 (sundar) Agent is calling clean for resource(fileon) because the resource became OFFLINE unexpectedly, on its own.
2009/06/12 18:30:21 VCS INFO V-16-2-13068 (sundar) Resource(fileon) - clean completed successfully.
2009/06/12 18:30:21 VCS INFO V-16-1-10307 Resource fileon (Owner: unknown, Group: testgrp) is offline on sundar (Not initiated by VCS)
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 *** RESOURCE FAULTED (unexpected offline): fileon (node: sundar)
Resource.C:perform_is_offline[7383]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Trigger sent on node sundar; '"/opt/VRTSvcs/bin/hatrigger" -resfault 0 sundar fileon ONLINE'
System.C:invoke_trigger[6631]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Received message 6=Resource has faulted in state 11
Note.C:fill_notifier_trap[1163]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 fileon::state transition from ONLINE to FAULTED

Resource.C:set_local_state[5666]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Decrementing ActiveCount (prevval=1) by 1 for resource fileon on node sundar
Resource.C:set_local_state[5724]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Modifying CurrentCount (prevval=1) by -1 for testgrp
Group.C:update_notify[11244]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Modifying TargetCount (prevval=1) by -1 for testgrp
Group.C:update_notify[11244]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Trigger sent on node sundar; '"/opt/VRTSvcs/bin/hatrigger" -postoffline 0 sundar testgrp'
System.C:invoke_trigger[6631]
2009/06/12 18:30:21 VCS DBG_TRACE V-16-50-0 Received message 10=Service group is offline in state 11
Note.C:fill_notifier_trap[1163]
2009/06/12 18:30:21 VCS NOTICE V-16-1-10446 Group testgrp is offline on system sundar
2009/06/12 18:30:21 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for resfault; script doesn't exist
2009/06/12 18:30:21 VCS INFO V-16-6-15004 (sundar) hatrigger:Failed to send trigger for postoffline; script doesn't exist

As you can see from the above it looks like a bug in reporting the state of the resource in the engine_A.log.

-Sundar

Gaurav Sangamnerkar's picture

Hi Sundar,

Well looks to be a bug.... can be reported to Technical Support & to raise a case....

I don't see any obvious reason for reporting it unexpected offline..... Just a very raw guess, can you try creating a file somewhere else in any other directory (not /tmp)... though it shoudn't make a difference... however /tmp directory in solaris has sticky bit with it..... if that is concerning agent somewhere....

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Sundar Rajan's picture

Thanks Gaurav. It does not matter with the type of the agent you use the behaviour is the same.

Thanks for your time.