How to determine whether SFRAC node panicked due to CRS timeout
| Article:TECH136134 | | | Created: 2010-01-17 | | | Updated: 2010-01-17 | | | Article URL http://www.symantec.com/docs/TECH136134 |
Problem
How to determine whether SFRAC node panicked due to CRS timeout
Solution
Obtain crash dump from customer system and verify panic
string/thread
SolarisCAT(vmcore.7/10U)> panic
panic on cpu 1
panic string: forced crash dump initiated at user request
==== panic user (LWP_SYS) thread: 0x300056dc340 PID: 16038 on CPU: 1 ==== --------<<< Note PID id
cmd: /sbin/uadmin 5 1 --------<<< Note cmd
t_procp: 0x30003dc5120
p_as: 0x300059873f8 size: 2621440 rss: 1474560
hat: 0x30008299880 cnum: 0x0 cpusran: 1
zone: global
t_stk: 0x2a100bdbae0 sp: 0x2a100bdb0b1 t_stkbase: 0x2a100bd6000
t_pri: 59(TS) pctcpu: 0.037107
t_lwp: 0x60012438098 machpcb: 0x2a100bdbae0
mstate: LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.0000116 seconds earlier
ms_start: 0.2235608 seconds earlier
psrset: 0 last CPU: 1
idle: 0 ticks (0 seconds)
start: Wed Jun 16 07:06:51 2010
age: 0 seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffce8) (sysent: genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: 0x106b2f4 unix:panic+0x1c: call unix:vpanic
unix:panic+0x1c(0x1269e48, 0x1, 0x1815000, 0x1815000, 0x2b, 0x0)
genunix:kadmin+0x4ac(, 0x1, 0x0, 0x60010803d98)
genunix:uadmin+0x11c(, 0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --
Print process tree of panic pid
SolarisCAT(vmcore.7/10U)>proc tree 16038
4059 /bin/sh /etc/init.d/init.cssd fatal
6855 /bin/sh /etc/init.d/init.cssd daemon ---------------<<< This shows Oracle CRS daemon issued uadmin command which resulted in system panic
16038 /sbin/uadmin 5 1
There are many reason can cause this type of panics
SolarisCAT(vmcore.7/10U)> panic
panic on cpu 1
panic string: forced crash dump initiated at user request
==== panic user (LWP_SYS) thread: 0x300056dc340 PID: 16038 on CPU: 1 ==== --------<<< Note PID id
cmd: /sbin/uadmin 5 1 --------<<< Note cmd
t_procp: 0x30003dc5120
p_as: 0x300059873f8 size: 2621440 rss: 1474560
hat: 0x30008299880 cnum: 0x0 cpusran: 1
zone: global
t_stk: 0x2a100bdbae0 sp: 0x2a100bdb0b1 t_stkbase: 0x2a100bd6000
t_pri: 59(TS) pctcpu: 0.037107
t_lwp: 0x60012438098 machpcb: 0x2a100bdbae0
mstate: LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.0000116 seconds earlier
ms_start: 0.2235608 seconds earlier
psrset: 0 last CPU: 1
idle: 0 ticks (0 seconds)
start: Wed Jun 16 07:06:51 2010
age: 0 seconds (0 seconds)
syscall: #55 uadmin(, 0xffbffce8) (sysent: genunix:uadmin+0x0)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
T_DFLTSTK - stack is default size
tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting
pc: 0x106b2f4 unix:panic+0x1c: call unix:vpanic
unix:panic+0x1c(0x1269e48, 0x1, 0x1815000, 0x1815000, 0x2b, 0x0)
genunix:kadmin+0x4ac(, 0x1, 0x0, 0x60010803d98)
genunix:uadmin+0x11c(, 0x1)
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --
Print process tree of panic pid
SolarisCAT(vmcore.7/10U)>proc tree 16038
4059 /bin/sh /etc/init.d/init.cssd fatal
6855 /bin/sh /etc/init.d/init.cssd daemon ---------------<<< This shows Oracle CRS daemon issued uadmin command which resulted in system panic
16038 /sbin/uadmin 5 1
There are many reason can cause this type of panics
-System is too busy
-Slow SAN response
-Files system is not responding
Verify whether customer has configured OCR and VOTEDISK on CFS file system
# export PATH=$PATH:/apps/crshome/bin
# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262144
Used space (kbytes) : 3264
Available space (kbytes) : 258880
ID : 1962738043
Device/File Name : /ocrvote/ocrdisk
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
# crsctl query css votedisk
0. 0 /ocrvote/votedisk
located 1 votedisk(s).
# mount -v |grep /ocrvote
/dev/vx/dsk/ocrvotedg/ocrvotevol on /ocrvote type vxfs read/write/setuid/devices/mincache=direct/delaylog/largefiles/qio/cluster/ioerror=mdisable/crw/mntlock=VCS/dev=4f0dea8 on Wed Jun 16 14:19:25 2010
Check current timeout values for CRS
# /apps/crshome/bin/crsctl get css disktimeout
# /apps/crshome/bin/crsctl get css misscount
# /apps/crshome/bin/crsctl get css reboottime
Advise customer to increase value of above TIMEOUT values on all RAC nodes to prevent similar panics based on out come of crash dump analysis
# /apps/crshome/bin/crsctl set css misscount 300
# /apps/crshome/bin/crsctl set css disktimeout 300
|
|
Legacy ID
355668
Article URL http://www.symantec.com/docs/TECH136134
Terms of use for this information are found in Legal Notices









Thank you.