Cannot find disks on slave node after either rebooting or device rescanning.

Article:TECH158069  |  Created: 2011-04-14  |  Updated: 2011-04-14  |  Article URL http://www.symantec.com/docs/TECH158069
Article Type
Technical Solution


Environment

Issue



After the slave node was rebooted or done the operation such as device scanning, The slave node failed to join the cluster with a "Cannot find disk" error.

Everything is fine at present if customer executed the hastop –local/hastart and the vxdctl enable in case of this problem is occurred.


Error



From the engine_A.log:

2011/03/31 19:01:19 VCS ERROR V-16-10001-1005 (xxxx)
CVMCluster:???:monitor:node - state: out of cluster reason: Cannot find disk on
slave node: retry to add a node failed
 

From the messages:

Mar 31 19:01:11 xxxx vxio: [ID 567674 kern.notice] NOTICE: VxVM vxio V-5-3-0
joinsio_done: Overlapping reconfiguration, failing the join for node 1. The
join will be retried.
Mar 31 19:01:11 xxxx vxio: [ID 317193 kern.notice] NOTICE: VxVM vxio V-5-3-0
abort_joinp: aborting joinp for node 1 with err 17
Mar 31 19:01:11 xxxx vxvm:vxconfigd: [ID 702911 daemon.notice] V-5-1-12144
CVM_VOLD_JOINOVER command received with error
 


Environment



2 nodes SFRAC 5.0 MP3RP4 cluster
Solaris 10 Sparc
Hitachi HDS9970V


Cause



We noticed all shared disks on slave node are in error state, before joining the cluster we need ensure the shared disks could be accessed from slave node.

c1t50060E80035BDF10d4s2 auto            -            -            error

Even prtvtoc could not read the OS raw disk.
prtvtoc: /dev/rdsk/c1t50060E80035BDF10d4s2: Unable to read Disk geometry errno = 0x16

It looks like an array. OS raw device could not be read as abnormal, there may have a delay in device recognizing.


Solution



As a result of double checking to see whether the required condition is met, We found out that there was an unsupported configuration on the array side regarding the required conditions described in our HCL. The machine has set the system mode option to only a “186” compared to our recommendation on the HCL and then Customer let Hitachi engineer changed it to “186, 254”. After that, We tried to do the same test again then the same issue didn’t occur again and everything is fine. Consequently, we did turn out this problem didn’t cause from our side.




Article URL http://www.symantec.com/docs/TECH158069


Terms of use for this information are found in Legal Notices