How to determine whether SFRAC node panicked due to CRS timeout

Article:TECH136134  |  Created: 2010-01-17  |  Updated: 2010-01-17  |  Article URL http://www.symantec.com/docs/TECH136134
Article Type
Technical Solution

Product(s)

Environment

Problem



How to determine whether SFRAC node panicked due to CRS timeout

Solution



Obtain crash dump from customer system and verify panic string/thread

SolarisCAT(vmcore.7/10U)> panic
panic on cpu 1
panic string:   forced crash dump initiated at user request
==== panic user (LWP_SYS) thread: 0x300056dc340  PID: 16038  on CPU: 1 ==== --------<<< Note PID id
cmd: /sbin/uadmin 5 1    --------<<< Note cmd
t_procp: 0x30003dc5120
 p_as: 0x300059873f8  size: 2621440  rss: 1474560
 hat: 0x30008299880  cnum: 0x0  cpusran: 1
 zone: global
t_stk: 0x2a100bdbae0  sp: 0x2a100bdb0b1  t_stkbase: 0x2a100bd6000
t_pri: 59(TS)  pctcpu: 0.037107
t_lwp: 0x60012438098  machpcb: 0x2a100bdbae0
 mstate: LMS_SYSTEM  ms_prev: LMS_USER
 ms_state_start: 0.0000116 seconds earlier
 ms_start: 0.2235608 seconds earlier
psrset: 0  last CPU: 1
idle: 0 ticks (0 seconds)
start: Wed Jun 16 07:06:51 2010
age: 0 seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffce8) (sysent: genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_PANIC - thread initiated a system panic
       T_DFLTSTK - stack is default size
tpflg:  TP_TWAIT - wait to be freed by lwp_wait
       TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
       TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SMSACCT - process is keeping micro-state accounting
       SMSFORK - child inherits micro-state accounting

pc:      0x106b2f4      unix:panic+0x1c:   call unix:vpanic

unix:panic+0x1c(0x1269e48, 0x1, 0x1815000, 0x1815000, 0x2b, 0x0)
genunix:kadmin+0x4ac(, 0x1, 0x0, 0x60010803d98)
genunix:uadmin+0x11c(, 0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --

Print process tree of panic pid

SolarisCAT(vmcore.7/10U)>proc tree 16038
4059  /bin/sh /etc/init.d/init.cssd fatal
 6855  /bin/sh /etc/init.d/init.cssd daemon ---------------<<< This shows Oracle CRS daemon issued uadmin command which resulted in system panic
   16038 /sbin/uadmin 5 1

There are many reason can cause this type of panics
-System is too busy
-Slow SAN response
-Files system is not responding

Verify whether customer has configured OCR and VOTEDISK on CFS file system

# export PATH=$PATH:/apps/crshome/bin
# ocrcheck
Status of Oracle Cluster Registry is as follows :
        Version                  :          2
        Total space (kbytes)     :     262144
        Used space (kbytes)      :       3264
        Available space (kbytes) :     258880
        ID                       : 1962738043
        Device/File Name         : /ocrvote/ocrdisk
                                   Device/File integrity check succeeded

                                   Device/File not configured

        Cluster registry integrity check succeeded

# crsctl query css votedisk
0.     0    /ocrvote/votedisk

located 1 votedisk(s).

# mount -v |grep /ocrvote
/dev/vx/dsk/ocrvotedg/ocrvotevol on /ocrvote type vxfs read/write/setuid/devices/mincache=direct/delaylog/largefiles/qio/cluster/ioerror=mdisable/crw/mntlock=VCS/dev=4f0dea8 on Wed Jun 16 14:19:25 2010


Check current timeout values for CRS

# /apps/crshome/bin/crsctl get css disktimeout
# /apps/crshome/bin/crsctl get css misscount
# /apps/crshome/bin/crsctl get css  reboottime


Advise customer to increase value of above TIMEOUT values on all RAC nodes to prevent similar panics based on out come of crash dump analysis

# /apps/crshome/bin/crsctl set css misscount 300
# /apps/crshome/bin/crsctl set css disktimeout 300


Legacy ID



355668


Article URL http://www.symantec.com/docs/TECH136134


Terms of use for this information are found in Legal Notices