VERITAS File System cookie-based Directory Name Lookup Cache corruption due to overflow of the 1-byte CPU ID field in the cookie

Article:TECH38135  |  Created: 2007-01-25  |  Updated: 2007-01-25  |  Article URL http://www.symantec.com/docs/TECH38135
Article Type
Technical Solution

Product(s)

Environment

Issue



VERITAS File System cookie-based Directory Name Lookup Cache corruption due to overflow of the 1-byte CPU ID field in the cookie

Solution



Description


Starting with version 3.5,  VERITAS File System implements its own cookie-based Directory Name Lookup Cache (DNLC) which is independent of the Solaris native DNLC. The cookie has a length of 12 bytes (96 bits) and within it one byte is used to store the CPU ID. This one-byte CPU ID field allows a range of CPU IDs from 0 to 255.

With the advance of CPU technologies, the CPU IDs can now exceed 255.  Especially for the latest Ultra SPARC IV CPUs, each CPU package has two cores where each core is a CPU by itself.  The CPU numbering scheme for some high-end system models will have CPU IDs of 0-255 for the first core (CPU) and 512-767 (512+255=767) for the second core.  When VERITAS Files System tries to fit the CPU ID of the second core to the DNLC cookie, the high order bits are discarded and only the remaining lower 8 bits are used. That means two cookies will have the same value in the one-byte CPU ID field even if they are generated by two different CPUs. This can potentially lead to separate DNLC entries holding identical cookies.

Note that duplicate cookies will not necessarily occur even if two CPUs have the same lower 8 bits in their CPU IDs. it is the result of the CPU ID field being only one byte in the 12-byte cookie.  Under normal circumstances,  the cookies remain unique as long as the other 11 bytes are different.

The potential symptoms of the problems are:

System hang and panic due to race condition in handling the inodes affected by the duplicate DNLC cookies

"ls" command shows two path names with the same inode number but their link count is smaller than the number of files with that inode number. Usually,  if two path names have the same inode number in the same file system, the link count should be two. If the system encounters the problem, the two path names will each have a link count of 1 instead of 2.


Conditions for this issue to occur


This problem is known to occur only if all the following conditions coexist:

- File System 3.5 and above is being used on a 32-bit or 64-bit Solaris (7.0, 8.0, 9.0, 10.0) system
- There are two CPUs where the lower 8 bits of their CPU IDs are the same
- Two DNLC cookies are generated by two CPUs where the lower 8 bits of  their CPU IDs are the same and the other 11 bytes in the cookie are also the same

Note: The occurrence of the last condition is extremely rare.

The CPU IDs can be checked with the command "psrinfo".  For example (the first column is the CPU IDs)

# psrinfo
16      on-line   since 04/14/2005 18:37:21
17      on-line   since 04/14/2005 18:37:22
18      on-line   since 04/14/2005 18:37:22
19      on-line   since 04/14/2005 18:37:22
528     on-line   since 04/14/2005 18:37:22
529     on-line   since 04/14/2005 18:37:22
530     on-line   since 04/14/2005 18:37:22
531     on-line   since 04/14/2005 18:37:22



Recommended Courses of Action


The incident is fixed in the following patches by increasing the size of the DNLC cookie to 16 bytes and two bytes are now allocated for the CPU ID field. These two-byte CPU ID fields can accommodate CPU IDs from 0 to 65535.

VERITAS File System 3.5 MP4
VERITAS File System 4.0 MP2

Refer to the Related Documents section of this TechNote to download and install the required patches

Note: Due to the change in the DNLC cookie structure in a VERITAS Cluster File System or VERITAS Real Application Cluster environment, the Cluster File System must be shut down on all nodes before the patch can be applied; otherwise, kernel data corruption will occur.  Rolling upgrade is not possible with the patches.

If the above patches cannot be applied to the system readily and the system continuously experiences the symptoms, offlining one of the two CPUs having the same lower 8 bits in their CPU IDs can be considered.

Note: Continuous occurrence of the symptoms is extremely rare and usually a system reboot will fix the problem.


Supplemental Materials

SourceiTools
Value152278
Description96-bit cookie based DNLC can be duplicated


Legacy ID



276135


Article URL http://www.symantec.com/docs/TECH38135


Terms of use for this information are found in Legal Notices