AIX system panic at llt_msgalloc() because of incident 1597480

Article:TECH74998  |  Created: 2009-01-29  |  Updated: 2009-01-21  |  Article URL http://www.symantec.com/docs/TECH74998
Article Type
Technical Solution


Environment

Issue



AIX system panic at llt_msgalloc() because of incident 1597480

Error



CRASH INFORMATION:
CPU 0 CSA F000000030617600 at time of crash, error code for LEDs: 30000000
pvthread+00B600 STACK:
[F1000000A02CBDC4].llt_msgalloc+00005C ()
[F1000000A033DBB4]gab_mem_allocmsg+000038 (??)
[F1000000A03282A8]gab_allocmsg+000014 (??)
[04ABBAD4].vx_msgalloc+000118 ()
...

Solution



The following Etrack can cause AIX system panic at kernel function llt_msgalloc().
Etrack 1597480 - Call tstop before tstart in LMX code
The incident affects SFCFS 5.0MPRP2 and is fixed on VxFS patch 5.0MP3RP2HF1.  

In order to get the complete symbolic panic stack, you have to run kdb by specifying  the LLT driver.   Without specifying the LLT driver, the LLT function will be shown only as numbers.

ERROR CODE/ MESSAGE:
CRASH INFORMATION:
CPU 0 CSA F000000030617600 at time of crash, error code for LEDs: 30000000
pvthread+00B600 STACK:
[F1000000A02CBDC4]F1000000A02CBDC4 () <======= what is this function call
[F1000000A033DBB4]gab_mem_allocmsg+000038 (??)
[F1000000A03282A8]gab_allocmsg+000014 (??)
[04ABBAD4].vx_msgalloc+000118 ()
[04B27D98].vx_cistat_msg+0000CC ()

DIAGNOSTIC STEPS:

AIX core dump analysis
============================================
CRASH INFORMATION:
CPU 0 CSA F000000030617600 at time of crash, error code for LEDs: 30000000
pvthread+00B600 STACK:
[F1000000A02CBDC4]F1000000A02CBDC4 () <======= what is this function call that might cause the issue
[F1000000A033DBB4]gab_mem_allocmsg+000038 (??)
[F1000000A03282A8]gab_allocmsg+000014 (??)
[04ABBAD4].vx_msgalloc+000118 () <============= read from bottom up
============================================
(0)> lke 04ABBAD4
ADDRESS FILE FILESIZE FLAGS MODULE NAME
1 F10001A015816C00 04A53000 0024A000 00080252 vxfs.ext64/usr/lib/drivers/vxfs.ext_61 <==== this module is from vxfs
============================================
(0)> lke F1000000A02CBDC4
ADDRESS FILE FILESIZE FLAGS MODULE NAME
1 F10001A015816700 F1000000A02BA000 0002B000 00090252 Driver64.o/usr/lib/drivers/pse/llt <==== this module is from llt
- the function "F1000000A02CBDC4 ()" is from llt driver.
- request customer to send the llt driver in/usr/lib/drivers/pse folder
============================================

Using the llt driver provided by the customer
==========================
CRASH INFORMATION:
CPU 0 CSA F000000030617600 at time of crash, error code for LEDs: 30000000
pvthread+00B600 STACK:
[F1000000A02CBDC4].llt_msgalloc+00005C () <========= Now we know F1000000A02CBDC4 is in function llt_msgalloc
[F1000000A033DBB4]gab_mem_allocmsg+000038 (??)
[F1000000A03282A8]gab_allocmsg+000014 (??)
[04ABBAD4].vx_msgalloc+000118 ()
[04B27D98].vx_cistat_msg+0000CC ()
[04A6D8D0].vx_validate_cistat+000024 ()
[04A6D7A0].vx_get_icnmmap+000014 ()
[04AA5904].vx_aio_delayiodone+000080 ()
[04AA5AF4].vx_aioq_drain+0000DC ()
[04ADF000].vx_workitem_process+000050 ()
[04AE8C98].vx_worklist_process+0001D4 ()
[04AE8EE8].vx_worklist_thread+000090 ()
[04A99728].vx_thread_base+00004C ()
[001E697C]threadentry+00005C (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF66D0
(0)>

==========================
Issue/incident:
1597480 - Call tstop before tstart in LMX code
The fix is in AIX VxFS 5.0MP3RP2HF1.
Related cases/ etracks:
Etrack 1792221 (mention the fixed was inculcated in RP2HF1)
1792221- Call tstop before tstart in LMX code

SOLUTION:
Incident number 1597480 is fixed by applying HF1 on top of AIX VxFS 5.0MP3RP2

Or there is a work around:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A temporary workaround is to turn off storage keys feature on the customer systems that are panicking. However, a rebuild of the kernel image and subsequent reboot of the systems will be required after turning off storage keys. There are 2 ways in which this can be done:
(1) smitty (recommended way):
=============================
smitty -> Problem Determination -> Storage Protection Keys -> Change/Show Kernel
Storage Protection Keys State -> Change Next Boot Kernel Storage Protection Keys
State -> Next Boot Kernel Storage Protection Keys State
Now press <Tab> so that setting now says 'disabled'. Also, set "Run bosboot
automatically" to 'yes'. Press Enter and then exit smitty after changes are
applied and bosboot has completed.
Reboot the system.
(2) skeyctl command:
====================
Run the skeyctl on one of the systems. It should display output similar to the
following:
[/]# skeyctl
Storage Key attributes for current boot session:
Number of hardware keys = 8
Number of user keys = 2
Kernel keys = enabled
Exclusive kernel key value = disabled
If the 'Kernel keys' says 'enabled', then the storage keys feature is enabled on
the system. This may be disabled and settings for next boot may be viewed using
following commands:
[/]# skeyctl -k off
[/]# skeyctl -v boot
Storage Key attributes for next boot session:
Number of hardware keys = default
Number of user keys = default
Kernel keys = disabled
Exclusive kernel key value = disabled
Build kernel image using bosboot.
Reboot the system now.
The storage keys feature was added by IBM with the intent of being able to catch illegal memory accesses across different kernel modules. The advantage of this is the ability to pinpoint the exact memory reference causing the problem and hence faster debugging when a problem occurs in the system (such as a system crash). In other words, having this feature enabled is a sort of an extra aid in problem determination when a problem occurs in the system (that is why it is located under the "Problem Determination" menu in smit/smitty).
Turning off this feature will have no side impact as such and definitely no performance deterioration.



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Supplemental Materials

SourceETrack
Value1676936
Description(Parent Etrack) AIX-5.1A23 CFS-Stress test panic with "Data Storage Interrupt - PROC" on P6 cluster.

SourceETrack
Value1597477
Description(Parent Etrack) Call tstop before tstart in LMX code

Legacy ID



333467


Article URL http://www.symantec.com/docs/TECH74998


Terms of use for this information are found in Legal Notices