Fail over may not work correctly for CLARiiON series arrays when there is a failure in the LCC hardware or the cables that connect the back end of the storage processor to the drives

Article:TECH70045  |  Created: 2010-01-28  |  Updated: 2010-01-28  |  Article URL http://www.symantec.com/docs/TECH70045
Article Type
Technical Solution

Product(s)

Issue



Fail over may not work correctly for CLARiiON series arrays when there is a failure in the LCC hardware or the cables that connect the back end of the storage processor to the drives

Solution



Back end failures in CLARiiON behavior can cause multiple fail over symptoms depending on the release, fail over mode and platform. Laboratory and customer experience indicates these issues are currently confined to Unix operating systems (Solaris and Linux platforms only). The most common symptom is a fail over followed by an attempted to fail back in 5 minute intervals as DMP attempts to restore the primary path.


Cause      

If a CLARiiON array suffers a hardware failure on the back end loop, behind the storage processor (SP) (such as an LCC or cable failure), which disables that back end loop, and you are in an environment running DMP version 4.1 through 5.0, the initial trespass may succeed.  However, five minutes later DMP will try to auto-restore the original path because in these failure situations, the SP itself is still responsive. This will continue until the bad component is replaced.
A failure of the SP itself or of any component of the path between the SP and the host will not cause this undesirable behavior. The way this appears to the user is that the initial failure trespassed over properly, but five minutes later the host loses access to the array if the original hardware failure has not yet been recovered.

Note: Arrays running Release 26 or later of FLARE are not susceptible to this, because the lower redirector will route I/O's through the mid-plane and over to the other SP in the event of a back end hardware failure, avoiding fail over entirely.

Etrack 1094018 is the master incident for this issue.


Fix
 
DMP has been modified to accommodate this behavior in the designated releases below and are carried forward .  This same solution resolves some issues found in a cluster when using CLARiiON Snap LUNs.

Solaris
4.1 MP2 RP4
5.0 MP3 (SPARC)
5.0 MP3 (Opteron)

Linux
4.1 MP4 RP3
5.0 MP3

AIX
5.0 MP3

HP-UX
5.0 MP2 (11i v2)



Legacy ID



323791


Article URL http://www.symantec.com/docs/TECH70045


Terms of use for this information are found in Legal Notices