The best practice to track down the problem from VCS Notifier Agent with going fault.

Article:TECH150538  |  Created: 2011-01-27  |  Updated: 2012-07-21  |  Article URL http://www.symantec.com/docs/TECH150538
Article Type
Technical Solution


Environment

Issue



 [ ISSUE ]
Notifier Agent failed.


Error



[ ERROR MESSAGES ]
2011/01/10 11:34:01 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux1

2011/01/10 11:34:01 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux1 (VCS initiated)
2011/01/10 11:34:02 VCS INFO V-16-1-10304 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (First probe)
2011/01/10 11:35:02 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:35:02 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:35:02 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:35:02 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:36:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:36:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:36:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:36:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:37:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:37:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:37:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:37:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:38:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:38:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux1 (Not initiated by VCS)
2011/01/10 11:38:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux2
2011/01/10 11:38:03 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux2 (VCS initiated)
2011/01/10 11:39:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:39:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:39:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:39:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:40:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:40:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:40:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:40:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:41:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:41:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:41:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:41:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:42:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:42:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:42:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (Not initiated by VCS)

 

[ SUMMARY STATUS OF VCS ]

-- SYSTEM STATE
-- System               State                Frozen             

A  symc-linux1         RUNNING              0                   
A  symc-linux2         RUNNING              0            
       
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         
B  ClusterService  symc-linux1         Y          N               OFFLINE|FAULTED      <<<<<<
B  ClusterService  symc-linux2         Y          N               OFFLINE|FAULTED      <<<<<<

B  asic-queueing   symc-linux1         Y          N               ONLINE        
B  asic-queueing   symc-linux2         Y          N               OFFLINE       
B  fop-servlet     symc-linux1         Y          N               ONLINE        
B  fop-servlet     symc-linux2         Y          N               OFFLINE       
B  network         symc-linux1         Y          N               ONLINE        
B  network         symc-linux2         Y          N               ONLINE        
B  nfs-share       symc-linux1         Y          N               OFFLINE       
B  nfs-share       symc-linux2         Y          N               ONLINE        
B  webservices     symc-linux1         Y          N               ONLINE        
B  webservices     symc-linux2         Y          N               OFFLINE       
 
-- RESOURCES FAILED
-- Group           Type                 Resource             System             
C  ClusterService  NotifierMngr         Notifier             symc-linux1       
C  ClusterService  NotifierMngr         Notifier             symc-linux2     
  
-- RESOURCES NOT PROBED
-- Group           Type                 Resource             System             
D  ClusterService  NIC                  csgnic               symc-linux1       
D  ClusterService  NIC                  csgnic               symc-linux2 

Environment



[ CONFIGURATION ]
- Two nodes in VCS configuration

[ VERSION OF OS/PACKAGE ]
1.
Linux symc-linux1 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
Linux symc-linux2 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
 
2. SFHA5.0MP4 

Cause



[ CONFIGURATION AND LOGS ]

1) /var/VRTSvcs/log/notifier_A.log
-------------------------------------------------------------------------
2010/12/15 16:24:41 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:26:14 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:27:18 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:28:32 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:29:56 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:31:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:36:54 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:32:31 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:43 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:34:34 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:35:35 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:36:46 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:40:50 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:41:21 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:53 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:43:55 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:45:06 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 16:08:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
 

2)  /var/VRTSvcs/log/Notifier_A.log
-------------------------------------------------------------------------
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) name(Notifier) op(1607)
        VCSAgTimer.C:check_timers[297]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Resetting periodic timer for resource Notifier op 1607 to expire at 1485   <<<<<< Set the timer
        VCSAgTimer.C:_res
et_periodic_timer[999]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Adding timer for Notifier with tmo 1485                                                  <<<<<<
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Timer id is 28
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Appending command minor code 1607 for resource Notifier
        VCSAgRes.C:append_cmd[340]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Scheduled resource Notifier
        VCSAgSched.C:put_req[173]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Picked Res(Notifier) from Scheduler
        VCSAgSched.C:_dequeue[64]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Resource (Notifier) received cmd minor code (1607)
        VCSAgRes.C:process_cmd[4727]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource Notifier transitioning from Online to Monitoring
        VCSAgRes.C:internal_state[4083]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) The values of ArgList attributes are given below
        VCSAgRes.C:call_entry_point[986]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[0] is (14141)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[1] is (30)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[2] is (14144)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[3] is (162)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[4] is (public)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[5] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[6] is (172.16.141.15)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[7] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[8] is (mailgatensw.ffx.jfh.com.au)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[9] is (0)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[10] is (10)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[11] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[12] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[13] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[14] is (
admin@symc.com)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[15] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) No OS encoded ArgList attributes
        VCSAgRes.C:call_entry_point[1028]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Adding timer for Notifier with tmo 1485
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Timer id is 32
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Calling monitor for resource Notifier
        VCSAgType.C:call_monitor[1268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) agent ep version is 1
        VCSAgType.C:_is_script_ep[4948]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource(Notifier) - monitor entry point exited with a confidence value 0.                <<<<<<< There was no response within its monitoring timeout.
        VCSAgType.C:call_monitor[1368]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Notifier reported state (Offline) & conf_level (0)                                                      <<<<<<< Then place "offline" flag..
        VCSAgRes.C:call_entry_point[1324]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1608)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Removing thread_id 4151311248
        VCSAgThreadTbl.C:remove[221]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1605)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1621)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Res(Notifier) - ToleranceCount (1) ToleranceLimit(0)
        VCSAgRes.C:tolerance_limit_reached[5262]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) ToleranceLimit reached
        VCSAgRes.C:tolerance_limit_
reached[5268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1607)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 Thread(4151311248) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
..
<snip>
..
 
[ Comment ] According to the debug logs, the Notifer Agent got "offline". On the contrary, there was no explanation about "REASON"..
 
 
3) Reviewing the configuration of Notifier.
 
## main.cf
group ClusterService (
        SystemList = { symc-linux2 = 0, symc-linux1 = 1 }
        AutoStartList = { symc-linux1 }
        )
 
        NIC csgnic (
                Enabled = 0
                Device @symc-linux2 = bond0
                Device @symc-linux1 = bond0
                )
 
        NotifierMngr Notifier (
                SnmpConsoles = { "192.168.1.123" = Warning }
                SmtpServer = "mailgate.test.symantec.com"
                SmtpRecipients = { "
admin@symc.com" = Warning }
                )
 
        Notifier requires csgnic
 
 
4) According to the logs in /etc/VRTSvcs/conf/config/main.cmd, there were something changed in the past.
 
$ egrep -i SmtpServerVrfyOff main.cmd
hatype -modify NotifierMngr ArgList EngineListeningPort MessagesQueue NotifierListeningPort SnmpdTrapPort SnmpCommunity SnmpConsoles SmtpServer SmtpServerVrfyOff SmtpServerTimeout SmtpReturnPath SmtpFromPath SmtpRecipients
haattr -add NotifierMngr SmtpServerVrfyOff -boolean 0
hares -modify Notifier SmtpServerVrfyOff 0
 
[ Comment ] Need to check the current setting parameter in types.cf

$ egrep -i SmtpServerVrfyOff types.cf
        static str ArgList[] = { EngineListeningPort, MessagesQueue, NotifierListeningPort, SnmpdTrapPort, SnmpCommunity, SnmpConsoles, SmtpServer, SmtpServerVrfyOff, SmtpServerTimeout, SmtpReturnPath, SmtpFromPath, SmtpRecipients }
        boolean SmtpServerVrfyOff = 0
 
[ Comment ] According to Amin Guide,
Set this value to 1 if your mail server does not support SMTP VRFY command.
 If this sets with value to 1, the notifier does not send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails.
 
Type and dimension: boolean-scalar Default: 0
 
So therefore, if this parameter is "SmtpServerVrfyOff = 0", the notifier should send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails accordingly.
As of now, it is a question of verifying if SMTP server supports the VCS notifer service and the SMTP VRFY command.

Solution



 

[ WHAT NEED TO DO ]
 
1) Please peruse the technote below for the sake of tracking down SMTP server eligible for Notifier..
 
 
2) Thus, please try out the following command line;
/opt/VRTSvcs/bin/notifier -s m=north -s m=south,p=2000,l=Error,c=your_company -t m=north,e="abc@your_company.com",l=SevereError
 
In this example, notifier:
- Sends all level SNMP traps to north at the default SNMP port and community value public.
- Sends Error and SevereError traps to south at port 2000 and community value your_company.
- Sends SevereError email messages to north as SMTP server at default port and to email recipient abc@your_company.com.
 
 
3) Thus, it may be required to get the strace output on Notifer.
- have truss of Notifier processes (when resource failed so we can check in truss if it have tried to open the smtp connection)
#strace -f -v -p PID -o notifier_strace__`hostname`_`date '+%d.%m.%y'`.out -s 512
 
 
4) For the last workaround, please check if making "SmtpServerVrfyOff" disable make a difference of not.
#haconf -makerw
#hares -modify ntfr SmtpServerVrfyOff 1
#haconf -dump -makero




Article URL http://www.symantec.com/docs/TECH150538


Terms of use for this information are found in Legal Notices