Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

SG goes into Partial state if Native LVMVG is imported and activated outside VCS control

Created: 21 Aug 2014 • Updated: 21 Aug 2014 | 3 comments
sanderfiers's picture
This issue has been solved. See solution.

Environment:

  • Cluster: Veritas Cluster Server 6.1.0.000
  • OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)
  • Volume Manager: Red Hat LVM
  • Multipathing: Veritas Dynamic Multi Pathing
     

Introduction:

I have a two-node Veritas Cluster on Red Hat. My volumes are under LVM (customer standard).
Since Symantec does not support Device Mapper multipathing for the combination LVM+VCS on Red Hat,
I am using Veritas DMP for multipathing. Now I finally managed to get it working properly, I still have one issue.
The Cluster has 1 failover Service Group on the first node. And a second failover Service Group on the second node.
 

Description:

When I reboot both nodes, and the cluster starts, my Service Groups are in Partial state.
Even though the AutoStart attribute is set to false for both SG's. Every resource in my SG's is offline, execpt for my LVM Volumes.
VCS is not trying to start any resource, which is correct since AutoStart is disabled.
But the cluster sees all LVM Volumes as online, not the LVM Volume Groups.
The funny thing is that all LVM volumes (for both SG's) are online on the first node.
Even when the LastOnline was the second node. And even when priority for the second SG is set to the second node.
It appears that LVM activates all volumes at boot time, just before the Cluster kicks in.
This was proven by stopping the cluster on all nodes, deactivating all LVM volumes manual via the vgchange command,
starting the cluster again (so no host reboot). That results in expected behavior: no resource online and SG's offline.
This operational issue is know to Symantec and described in various recent release notes.
The workaround is rather vague and I tried already several options.
 

Resources:

Apperently this is know to Symantec.
See page 38 in "Veritas Cluster Server 6.0.1 Release Notes - Linux"
And see page 30 in "Veritas Cluster Server 6.0.4 Release Notes - Linux"
And see page 47 in "Veritas Cluster Server 6.1 Release Notes - Linux"

SG goes into Partial state if Native LVMVG is imported and activated outside VCS control
If you import and activate LVM volume group before starting VCS, the
LVMVolumeGroup remains offline though the LVMLogicalVolume resource comes
online. This causes the service group to be in a partial state.
Workaround: You must bring the VCS LVMVolumeGroup resource offline manually,
or deactivate it and export the volume group before starting VCS.

 

Question:

Does Symantec has more information, procedure, ... about this workaround?
 

Thanks in advance,
Sander Fiers

Operating Systems:

Comments 3 CommentsJump to latest comment

sanderfiers's picture

These are the options that I already tried, without success:

  • Adjusting the startup scripts
    I tried editing the /etc/rc.d/rc.sysinit file to deactivate my Service Group LVM volumes and an export of all LVM volumegroups with inactive volumes.
if [ -x /sbin/lvm ]; then
#       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a ay --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a y v1 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v11 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v2 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v3 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v4 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v5 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v6 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v8 --sysinit
        action $"Setting up Logical Volume Management:" /sbin/lvm vgexport -a --sysinit
fi
  • Adjusting the LVM configuration
    I also tried editing /etc/lvm/lvm.conf file to only auto activate the LVM volumes used for the OS, thus not auto activating the LVM volumes used in the Service Groups:
# If auto_activation_volume_list is defined, each LV that is to be
    # activated is checked against the list while using the autoactivation
    # option (--activate ay/-a ay), and if it matches, it is activated.
    #   "vgname" and "vgname/lvname" are matched exactly.
    #   "@tag" matches any tag set in the LV or VG.
    #   "@*" matches if any tag defined on the host is also set in the LV or VG
    #
    # auto_activation_volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
      auto_activation_volume_list = [ "v1", "v1/home", "v1/root", "v1/swap", "v1/tmp", "v1/usr", "v1/var" ]

 

sanderfiers's picture

I found out that auto_activation_volule_list in /etc/lvm/lvm.conf depends on the service lvmetad.
Used reference https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/metadatadaemon.html

  • So first I enabled lvmetad on both hosts in the /etc/lvm/lvm.conf file and enable the service persistent.
[root@bl20-13~]# vi /etc/lvm/lvm.conf
   # If lvmetad has been running while use_lvmetad was 0, it MUST be stopped
   # before changing use_lvmetad to 1 and started again afterwards.
      use_lvmetad = 0
[root@bl20-13 ~]# grep "use_lvmetad = " /etc/lvm/lvm.conf
    use_lvmetad = 1
[root@bl20-13 ~]# service lvm2-lvmetad start
Starting LVM metadata daemon:                              [  OK  ]
[root@bl20-13 ~]# chkconfig lvm2-lvmetad on
[root@bl20-13 ~]# service lvm2-lvmetad status
lvmetad (pid  7619) is running...

 

  • Then I make sure that all LVM volumes (specially the root disks on volume group v1) are activated at boot. On both hosts offcourse in the /etc/rc.d/rc.sysinit file.
[root@bl19-13 ~]# vi /etc/rc.d/rc.sysinit
   if [ -x /sbin/lvm ]; then
           action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a ay --sysinit
           action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a y v1 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v11 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v2 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v3 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v4 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v5 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v6 --sysinit
   #       action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a n v8 --sysinit
   fi

 

  • Afterwards I only allow LVM volumes for my root disks to be activated automaticly via the /etc/lvm/lvm.conf file, on both hosts.
[root@bl19-13 ~]# vi /etc/lvm/lvm.conf
  # If auto_activation_volume_list is defined, each LV that is to be
      # activated is checked against the list while using the autoactivation
  # option (--activate ay/-a ay), and if it matches, it is activated.
      #   "vgname" and "vgname/lvname" are matched exactly.
     #   "@tag" matches any tag set in the LV or VG.
      #   "@*" matches if any tag defined on the host is also set in the LV or VG
      #
      # auto_activation_volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
        auto_activation_volume_list = [ "v1", "v1/home", "v1/root", "v1/swap", "v1/tmp", "v1/usr", "v1/var" ]

 

  • I make sure my cluster is stopped properly on both hosts and reboot them both.
     
  • After my two hosts are up and running again, I get strange warnings about my devices.
    No physical volumes are found on the hosts. Only the first node has it's root volumes.
    Funny enough the second node is booted correctly but doesn't have any LVM physical volumes?!
[root@bl19-13 ~]# pvs
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.
  PV                   VG   Fmt  Attr PSize   PFree
  /dev/vx/dmp/disk_0s2 v1   lvm2 a--  135.93g 66.93g

[root@bl20-13 ~]# pvs
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV C7cIfI-hmNt-L6uF-h7rr-1knf-jA92-fojE5j.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.

 

  • Offcourse I get the same result for LVM volume groups and for LVM logical volumes
[root@bl19-13 ~]# vgs
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  VG   #PV #LV #SN Attr   VSize   VFree
  v1     1   6   0 wz--n- 135.93g 66.93g

[root@bl20-13 ~]# vgs
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV C7cIfI-hmNt-L6uF-h7rr-1knf-jA92-fojE5j.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No volume groups found


[root@bl19-13 ~]# lvs
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  LV   VG   Attr      LSize  Pool Origin Data%  Move Log Cpy%Sync Convert
  home v1   -wi-ao---  2.00g
  root v1   -wi-ao---  5.00g
  swap v1   -wi-ao--- 48.00g
  tmp  v1   -wi-ao---  2.00g
  usr  v1   -wi-ao---  8.00g
  var  v1   -wi-ao---  4.00g

[root@bl20-13 ~]# lvs
  No device found for PV NaRuH8-7t2J-5vtd-NJ40-11MC-uaEB-OAcfXH.
  No device found for PV 50BqNZ-SRTP-DNZ3-Gfkk-DxJt-PsGM-dSVBoF.
  No device found for PV C7cIfI-hmNt-L6uF-h7rr-1knf-jA92-fojE5j.
  No device found for PV 3ivHzC-dHO2-OvSN-qP9h-GPbU-uoa8-r3LEPi.
  No device found for PV qwqLhF-5zJo-4lhA-bu9W-BqBp-SOt2-fbLc7n.
  No device found for PV pRCzeA-SvFh-B0hz-XdcH-75Nh-HqH5-3BhusO.
  No device found for PV O8tgMH-l1jR-e6ao-Md0U-N4zI-0ucm-oUEHcc.
  No device found for PV HIGyN7-yT68-Mls6-UhJk-1JAp-cpyJ-tDL6F4.
  No volume groups found

Cluster results:

  • My two Service Groups are offline as expected since AutoStart is set to false on both.
  • I can online my 2nd Service Group on the second node. And LVM commands can see the physical volumes, volume groups, logical volumes used by the 2nd Service Group. And strange enough now my LVM volumes for my root disks are listed as well.
  • I can not failover my 2nd Service Group to the first node. The cluster fails to online the LVM Volume Group resources on the 1st node.
  • I can not online my 1st Service Group on the first node. The cluster fails to online the LVM Volume Group resources on the 1st node.
  • I can not online my 1st Service Group on the second node. The cluster fails to online the LVM Volume Group resources on the 2nd node.

Any ideas?

 

sanderfiers's picture

I found the cause of my issue. It seems that the Veritas Volume Manager in the boot scripts deactivates all LVM volumes and export all LVM volume groups. To activate and export them again afterwards.

I commented out the second section with the export and deactivation LVM command.

 

Orginal snippet of the file:

[root@bl19-13]# vi /etc/rc3.d/S14vxvm-boot
                vgchange -a n $vg
                vgexport $vg
                vgimport $vg
                vgchange -a y $vg

New snippet of the file:

[root@bl19-13]# vi /etc/rc3.d/S14vxvm-boot
                vgchange -a n $vg
                #vgexport $vg
                #vgimport $vg
                #vgchange -a y $vg

 

SOLUTION