Video Screencast Help

How to stop Windows mounting read-only VVR volumes

Created: 20 Sep 2011 • Updated: 28 Sep 2011 | 4 comments
mikebounds's picture
This issue has been solved. See solution.

If you kill a server and VVR does a takeover on another node, then when the node comes back, Windows assigns drive letters to the now read-only volumes when the diskgroup is imported.  This means in VCS, after the VVR SG is onlined (and hence diskgroup imported), after a few minutes when the offline monitor runs for the MountV resources it detects these are online and you get a partially online application service group.  You can't then fail back to that node until you manually offline the application service group.  If there a formal way to stop this or technotes explaining this issue?

 

This is what I believe is happening:

When you online a MountV resource in VCS, when the drive letters gets assigned to a volume, this gets written to registry in \HKLM\SYSTEM\MountedDevices, so you will see something like:

\DosDevices\E:

This registy keys are there for all drives including the C:, so this includes partitions as well as volumes and these keys are present on XP clients too.

When you offline MountV resource, this registry entry is removed. 

This makes sense so that with a server without VCS, if you reboot a server, then when the server returns the volume or partition receives the same drive letter that it had before.

With shared diskgroup failover, when a faulted node returns, the diskgroup is not imported as it is now imported on the failover node, so this does not cause a problem, it is only with VVR where diskgroup is imported on Seconday (or acting Secondary) as well as primary.

I wrote a quick postonline trigger for the VVR group to get offline Mounts when VVR is a secondary, but I figured there must be an official work-a-round or technote describing this issue.

Note, I see this more as a SFW problem rather than a VVR issue as this is not an issue for VVR on UNIX which is why I have started this forum under SFW rather than VVR.

Thanks

Mike

Comments 4 CommentsJump to latest comment

Marianne's picture

I'm wondering if the same advice for VMware backups will work for the VVR volumes...

http://www.techrepublic.com/blog/datacenter/disabl...

diskpart
automount disable
automount scrub
exit

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mikebounds's picture

Thanks for the response Marianne.  I don't think this will help as this sounds like this is a setting which assigns drive letters to NEW drives.  This is not a NEW drive, it is a drive letter that was there the last time the server was running.  As far as I can tell, this is expected behaviour in Windows - i.e in Windows if you assign a driver letter in VEA or Windows disk manager or VCS, an entry is added into the Registry.  As this is added to the registry, then if a server is killed, then when it returns the registry entry still exists so the drive letter is assigned.  This is totally unlike UNIX where if you mount a filesystem, this is NOT persistent and is only persistent if you perform the separate step of adding the fileysystem to fstab.

This does not cause an issue for shared diskgroup failover as even though driver letter assignment exists in registry on the returning failed node, it cannot be mounted as diskgroup is not imported.  But for VVR secondary, diskgroup is imported so drive letter is assigned to the read-only volume when a previous failed primary returns

This causes Concurrency errors in VCS which are not resolved automatically in GCO (I think would be resolved by VCS if this was an RDC) so just wanted to know if there is an official work around or whether there is something clever in VCS that sorts this out and is for some reason not working.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Wally_Heim's picture

Hi Mike,

You are correct, the automount only applies to new volumes to the server.  In your case, the drive is being returned to the last known mount location of the volume.  This is standard for operation of MountManger (OS item) to return the volume to the drive letter or mount point that it was at when the disk group was last deported.

In the situation that you describe, you will need to take corrective action to resync VVR back to the original Primary site.  VVR during a takeover operation (by default) switches the original primary to a primary acting as secondary and replication does not happen until administrative intervention is performed.  If manual intervention is already needed to correct VVR is it that much trouble to manually offline the MountV resource on the original primary site?  (I'm just trying to see how much of a pain point this is.  I'm not trying to dictate procedures or process to you or your customer. I'm just trying to get a better understanding of exactly what the main point of concern is.)

I don't think the violation.pl trigger is fired in a GCO site concurrency violations.  I can check to see if there is a way to get this trigger to be fired in this situation.  If this is not a simple configuration change then I might need you to open a support case to have this looked into further.  I'll let you konw what I find out shortly.

Thanks,

Wally

SOLUTION
mikebounds's picture

Thanks Wally, you have outlined exactly what I thought (but I wasn't sure as I work predominantly on UNIX):

  1. The behaviour is normal.  I thought it was an O/S operation, so good to know it is more specifically MountManager.
     
  2. I told the customer that as they had to do fbsync manually, they just needed to amend their procedures to offline Mounts.  However, their procedure is in an install guide that was written by a Consultant, which I dont think they knew was there and as they have only invoked DR once in 2 years, there is a good chance that next time they invoke DR, they won't know where this procedure is.  Also, you can get RVGPrimary agent to do fbsync automatically by setting AutoResync option, although I always advise customers NOT to set this.  So I wrote a postonline trigger so that Mounts are offlined automatically and  left customer with the choice as to whether they want to implement postonline (was successfully tested in a test cluster) or offline manually.
     
  3. violation trigger is only called within a cluster (so would be called for an RDC), but not for GCO, but you still get Concurrency error in the engine log.

I really just wanted it confirmed that this is expected behaviour - it would be good if a technote could be written to explain the issue and resolution (manually offline as part of fbsync recovery procedure) so that I could point the customer to this to reassure them this is not a bug.  But I will now tell the customer that Symantec have confirmed this is not a bug.

I don't think Symantec should amend VCS to call violation for GCO, I think it is right it is only called for violations within a cluster.  I think a postonline is the best auto solution if it is deemed an auto solution is required.  My postonline had the following logic (which I will probably post in Downloads later this week):

  1. Only run for SGs containing a VvrRvg resource
  2. Only run if that VvrRvg is a secondary or acting secondary
  3. Probe and Offline MountV resources in the parent SG.

So basically when replication group onlines the diskgroup has just being imported, but MountV resources won't have been probed yet.  Postonline probes MountV resources, waits a few seconds and then offlines MountV resources if they are reported online.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below