5.1 Oracle Agent is reporting oracle resource offline V-16-1-10307 "Not initiated by VCS"

Article:TECH148645  |  Created: 2011-01-20  |  Updated: 2011-03-07  |  Article URL http://www.symantec.com/docs/TECH148645
NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made.
Article Type
Technical Solution

Product(s)

Environment

Issue



When the number of Oracle processes increases. The Oracle Agent may report the Oracle resource offline, eventhough Oracle processes are still actively running.


Error



2011/01/10 04:04:42 VCS INFO V-16-1-10307 Resource oracle_res_name (Owner: unknown, Group: oracle) is offline on oracle_server (Not initiated by VCS)


Environment



Storage Foundation for Oracle 5.1
Storage Foundation for Oracle RAC 5.1


Cause



The 5.1 Oracle Agent uses `ps` to create a list of processes created by Oracle for which the Oracle Agent monitors.

# ps --cols=10000 -eo pid,args | grep -e "_<SID>\b" | grep -v grep | tr -s " " " " | sed -e 's/^ //'

The list of processes is saved to /var/VRTSvcs/log/tmp/Oracle-0. The Oracle Agent requires the following processes be in the list

ora_pmon
ora_smon
ora_lgwr
ora_dbw0
ora_lmon

If the above processes are not included in the list the Oracle Agent will determine the Oracle resource is offline.


Example,
# cat /var/VRTSvcs/log/tmp/Oracle-0
10301 ora_j000_podp15u
24926 ora_pmon_podp15u
24928 ora_psp0_podp15u
24930 ora_mman_podp15u
24932 ora_dbw0_podp15u
24934 ora_lgwr_podp15u
24936 ora_ckpt_podp15u
24938 ora_smon_podp15u
24940 ora_reco_podp15u
24942 ora_cjq0_podp15u
24944 ora_mmon_podp15u
24946 ora_mmnl_podp15u
24948 ora_d000_podp15u
24950 ora_s000_podp15u
25425 ora_qmnc_podp15u
25673 ora_q000_podp15u
25842 ora_q001_podp15u

Note: The above file has a byte count of 390 and shows all the required processes are present.

Command to display byte count.

# ps --cols=10000 -eo pid,args |grep -e "_podp15u\b" | grep -v grep | tr -s " " " " | sed -e 's/^ //' |wc -c
390

If the number of processes created by Oracle increases the file byte count will increase.

# ps --cols=4096 -eo pid,args |grep -e "_podp15u\b" | grep -v grep | tr -s " " " " | sed -e 's/^ //' |wc -c
21175


The Oracle Agent has a max buffer size of 4096 bytes, where as the output of ps command ran by the agent is more than 4096 bytes. The current issue is that ora_pmon_SID processes for the oracle resource may not be captured in the buffer if the number of Oracle processes increases past the 4096 max buffer limit.


This can be seen when enabling Oracle Agent debug.
# haconf -makerw
# hatype -modify Oracle LogDbg DBG_1 DBG_2 DBG_3 DBG_4 DBG_5


Review of /var/VRTSvcs/log/Oracle_A.log, we can see why the monitor failed.

Note: The "...." were used to save space.

2011/01/05 15:35:33 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Proc list ora_pmon ora_smon ora_lgwr ora_dbw0 ora_lmon
        Oracle.linux.C:proc_processing[167]
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:The output returned after executing command is (
17164 ora_p788_podp15u
17166 ora_p789_podp15u
17168 ora_p790_podp15u
17170 ora_p791_podp15u
17172 ora_p792_podp15u
17174 ora_p793_podp15u
............
.......
17596 ora_p960_podp15u
17598 ora_p961_podp15u
17600 ora_p9           <<< We can see the output is getting truncated because the max buffer 4096 has been reached.

        Oracle.linux.C:proc_processing[193]
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Comparing process name (1716):( ora_p788_podp15u)
        Oracle.linux.C:proc_processing[235]
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Comparing process name (1716):( ora_p789_podp15u)
        Oracle.linux.C:proc_processing[235]
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Comparing process name (1716):( ora_p790_podp15u)
        Oracle.linux.C:proc_processing[235]
............
......
 
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Comparing process name (1760):( ora_p964_podp15u)
        Oracle.linux.C:proc_processing[235]
2011/01/05 15:35:34 VCS DBG_3 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:Comparing process name (1760):( ora_p965_podp15u)
        Oracle.linux.C:proc_processing[235]
2011/01/05 15:35:34 VCS DBG_1 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:state is Offline
        Oracle.linux.C:proc_processing[308]
2011/01/05 15:35:34 VCS DBG_1 V-16-50-0 Oracle:CDORA-podp15u_ora:monitor:state is Offline with value as (0)


In the above output as you can see ora_pmon which is one of the primary Oracle processes the Oracle Agent monitors is not included in the buffer as noted by the processes displayed after the line "The output returned after executing command is"; therefore the monitor determined the Oracle resource offline. Note running the ps command manually will show the processes still running.


Note: To disable Oracle LogDbg
# hatype -modify Oracle LogDbg -delete DBG_1 DBG_2 DBG_3 DBG_4 DBG_5
# haconf -dump -makero


Solution



Solution is to upgrade to 5.1SP1. In Storage Foundation 5.1SP1 we use egrep to get specific Oracle processes; thus the max buffer will never be exceeded.

# ps --cols=10000 -eo pid,args |egrep "ora_pmon|ora_smon|ora_lgwr|ora_dbw0|ora_lmon" | grep -v grep | tr -s " " " " | sed -e 's/^ //'
24926 ora_pmon_podp15u
24932 ora_dbw0_podp15u
24934 ora_lgwr_podp15u
24938 ora_smon_podp15u

Alternatively if upgrade is not an option. Ensure that the number of processes never exceeds max buffer count of 4096 bytes. This can be verified by running the `ps` command then using `wc -c` to verify the byte count. 
 


Supplemental Materials

SourceETrack
Value2100371
Description

Oracle instance not seen as online by VCS on Linux



Article URL http://www.symantec.com/docs/TECH148645


Terms of use for this information are found in Legal Notices