Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Rebuilding MSCS cluster node

Created: 28 Feb 2010 • Updated: 21 May 2010 | 4 comments
This issue has been solved. See solution.

I found a KB document detailing the steps needed to rebuild a cluster node when using SFW HA for Windows, but I need to rebuild a Win2003 cluster node that is running SFW 5.0 with MSCS. If it was a cluster that wasn't using SFW, I would just evict the node from the cluster, rebuild the OS on it, and add it back to the cluster. What is the correct procedure for SFW with MSCS? The cluster quorum is a volume manager disk resource, so I can't add the node back to the cluster before installing SFW with the MSCS option, which is the usual installation order when creating a new cluster from scratch. Is the rebuild procedure:

1. Evict the node
2. Install Win2003 on node
3. Install SFW with MSCS option
4. Add node to cluster

Comments 4 CommentsJump to latest comment

Wally_Heim's picture

Hi Nick,

The MSCS option will not install until the node is part of a MCSC cluster.  Once you evict the node you need to follow the same steps that do when adding a node to the cluster. 

You can add the node into the cluster even with the quorum being dynamic.  The quorum will be running on a node that is still in the cluster with SFW on it.

Here are the basic steps that you need to do 

1. Evict the node
2. Install Win2003 on node
3. Add node to cluster
4. Install SFW with MSCS option
5. Reinstall any clustered applications as needed.

Thanks,
Wally

SOLUTION
Nick Payne's picture

I followed your precedure above (no problems encountered during the rebuild procedure), but after step 4 and a reboot, when I start VEA, although all the SAN disks are visible in VEA on the rebuilt node, none of the disk groups that exist in VEA on the node that remained in the cluster are visible on the rebuilt node, and trying to fail the cluster over to the rebuilt node just results in the cluster going offline and then coming back online on the same node. If I move the Quorum back to a basic disk, then I can move the cluster group to the rebuilt node, but none of the dynamic disk groups will import on the rebuilt node.

So it looks as though there are more configuration steps needed when recovering a node on a Windows cluster that is using SFW.

Nick

Wally_Heim's picture

Hi Nick,

The newly rebuilt node does not know anything about the disks groups in its isis database.  This is because scsi reservations are preventing the rebuilt node from reading the disks during boot time. 

To resolve this you will need to offline the service group on the active node then perform a rescan on the rebuilt node.  Now you should be able to see the disk groups and online the service groups on the rebuilt node.

Once this is done the first time you won't have any more problems.

The only other issue that you might have is that the drive letters do not get assigned correctly during the first import of the disk groups.  You may need to import the disk groups and then manually assign the drive lettters to the volumes.  Again this is a one time deal to properly setup the mount manager information correctly.

Thanks,
Wally

Nick Payne's picture

Actually, I found that the problem is that when the newly built node without SFW installed is added to the cluster, MSCS doesn't add the node as a possible owner of any of the Volume Manager Disk Group resources. It does add it as an owner of all the other resources.

After installing SFW and rebooting the node, you have to use Cluster Administrator to add the node as a owner of each Volume Manager Disk Group. After this failover to the newly built node works correctly.

There really needs to be a KB article on this.

Nick

ps. This was with SFW 5.1 SP1.