Video Screencast Help

Application agent falsely detect NetWorker process as offline even when the process is running properly

Created: 15 Jan 2013 • Updated: 15 Jan 2013 | 3 comments
omiot's picture

Hi,

I've problem with NetWorker in VCS cluster. VCS kill process and restart it on second node. I've truned on debug in my Application log and I can see that monitor process return state:Offline.

 

2013/01/15 14:33:18 VCS DBG_2 V-16-50-0 Application:nw_server:monitor:Command prepared for getting pid is </bin/ps --cols=100000 --User=root -o pid,args | /bin/egrep '/usr/sbin/nsrd -k clusterFQDN\.domain\.com' | /bin/egrep -v /bin/grep | /usr/bin/tr -s " " " " | /bin/sed -e 's/^ //' | /bin/cut -f1 -d" ">.
        Application.C:processExists[583]
2013/01/15 14:33:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:/usr/sbin/nsrd -k clusterFQDN.domain.com; return state: Offline.
        Application.C:application_monitor[300]
2013/01/15 14:38:09 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:UseSUDash:<0>.
        Application.C:application_monitor[163]
2013/01/15 14:38:09 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:User Shell is other than csh, returning 0
        Application.C:getuserinfo[1198]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:MonitorProgram returned state:110.
        Application.C:monitorState[920]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:return state:STATE_TRUE
        Application.C:monitorState[974]
2013/01/15 14:38:19 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:Total number of Pid Files specified:0.
        Application.C:application_monitor[231]
2013/01/15 14:38:19 VCS DBG_1 V-16-50-0 Application:nw_server:monitor:Total number of Processes specified:<1>.
        Application.C:application_monitor[272]
2013/01/15 14:38:19 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:</usr/sbin/nsrd -k clusterFQDN.domain.com>; User:<root>.
        Application.C:processExists[479]
2013/01/15 14:38:19 VCS DBG_2 V-16-50-0 Application:nw_server:monitor:Command prepared for getting pid is </bin/ps --cols=100000 --User=root -o pid,args | /bin/egrep '/usr/sbin/nsrd -k clusterFQDN\.domain\.com' | /bin/egrep -v /bin/grep | /usr/bin/tr -s " " " " | /bin/sed -e 's/^ //' | /bin/cut -f1 -d" ">.
        Application.C:processExists[583]
2013/01/15 14:38:20 VCS DBG_4 V-16-50-0 Application:nw_server:monitor:Process:/usr/sbin/nsrd -k clusterFQDN.domain.com; return state: Offline.
 
I'm using Storage Foundation for HA ver 5.1 SP1 RP3 on RHEL 5.5.
 
Regards
Pawel

Comments 3 CommentsJump to latest comment

Marianne's picture

Please post main.cf section for this service group.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

omiot's picture

Hi,

Thanks for your replay. In attachment I put a piece of my main.cf.

 

Regards.

Pawel

AttachmentSize
main.zip 1.18 KB
Marianne's picture

Please double-check your documentation for the MonitorProcess:

MonitorProcesses = { "/usr/sbin/nsrd -k clusterFQDN" }

should clusterFQDN possibibly the Virtual hostname? 

What does 'ps -ef |grep nsrd' show?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links