To avoid cluster wide panics and/or database failures, Storage Foundation for Oracle RAC (SFRAC/SFCFS for RAC) installations using Cluster Volume Manager (CVM) shared disk groups must have a dgfailpolicy of "leave"
|Article:TECH144672|||||Created: 2010-11-20|||||Updated: 2013-12-29|||||Article URL http://www.symantec.com/docs/TECH144672|
|NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made.|
When using Storage Foundation Cluster Volume Manager (SFRAC/SFCFS) with shared disk groups of version 120 or higher, disk groups contain an attribute called dgfailpolicy. This attribute determines how the node should react if it loses access to disk in the corresponding disk group. If shared disk groups are set to the default dgfailpolicy of dgdisable a cluster wide panic could ensue and/or the database can halt clusterwide, should the Cluster Volume Manager (CVM) master lose connectivity to storage. To avoid this behaviour dgfailpolicy should be set to leave for shared disk groups.
An example vxdg list <diskgroup> on a shared disk group with DG version 150 is shown below:
alignment: 8192 (bytes)
cluster-actv-modes: host2=sw host1=sw host3=sw host43=sw host5=sw
dg-fail-policy: dgdisable <<< =================== Currently set to default i.e. dgdisable
copies: nconfig=default nlog=default
config: seqno=0.1422 permlen=0 free=0 templen=0 loglen=0
Storage Foundation for Oracle RAC (SFRAC) or Storage Foundation for Cluster File System (SFCFS)
All supported Unix versions
In a CVM RAC environment where a shared disk group is using a dgfailpolicy of dgdisable, should the master lose connectivity to all disks in the disk group, the master will disable the disk group (dgdisable). As this is a CVM environment the disk group is also disabled across all slave nodes (as all nodes must have a consistent view of the configuration as seen by the master).
Once a disk group is dgdisabled any new opens against volumes in that disk group will fail. Some examples of when opens are attempted are:
- When a volume containing a file system is mounted
- When an I/O is attempted against a raw volume device
This scenario can have potentially severe implications. For example if using Oracle RAC with vote devices on raw volumes, as soon as the corresponding disk group is dgdisabled cluster wide, all nodes will be unable to perform I/O to vote disks meaning that they can no longer heartbeat. As a result of this all nodes will be panic'd by Oracle Cluster Ready Services (CRS) causing a cluster wide loss of service.
To avoid this issue all shared disk groups of version 120 and higher should be set to use a dgfailpolicy of leave. Once set, should the master lose connectivity to disks in the disk group, it will panic and leave the cluster rather than disabling the disk group cluster wide. This then allows one of the surviving slave nodes to take over the master role and assuming that the new master has not issues with connectivity to storage allows the surviving members of the cluster to continue to function as normal.
vxdg -g <diskgroup> set dgfailpolicy=leave
This policy is consistent through reboots.
In SFCFS 6.0 later, the dg fail policy is obsolete. From SFCFS 6.0 release notes:
Article URL http://www.symantec.com/docs/TECH144672