CVMVolDg resource fails to return ONLINE status, despite the diskgroup being imported, properly activated, and all volumes itemized in CVMVolume attribute are ENABLED ACTIVE and able to service IO

Article:TECH69443  |  Created: 2009-01-06  |  Updated: 2009-01-06  |  Article URL http://www.symantec.com/docs/TECH69443
Article Type
Technical Solution

Product(s)

Environment

Issue



CVMVolDg resource fails to return ONLINE status, despite the diskgroup being imported, properly activated, and all volumes itemized in CVMVolume attribute are ENABLED ACTIVE and able to service IO

Error



CVMVolDg resource fails to return ONLINE status, despite the diskgroup being imported, properly activated, and all volumes itemized in CVMVolume attribute are ENABLED ACTIVE and able to service IO

Solution



If a CVMVolDG resource monitor invocation times out and subsequently calls clean, you may need to recreate a "stat" file as well as manually restart vxnotify to return that resource to an ONLINE state.

"stat" file:

From the CVMVolDg agent's /opt/VRTSvcs/bin/CVMVolDG/clean script:

   16  cvmvoldg_resname=$1
   17  shift 2
   18  . /opt/VRTSvcs/bin/CVMVolDg/cvmvoldg.lib
   19
   20  cvmvoldg_dgname=$1
   21  VCS_VOL_STAT="/var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_stat"
   22  CVMVOLDG_STAT=voldg_stat
   23
   24  $RM $VCS_VOL_STAT    <---
   25
   26  exit 0

Notice line 24 removes the stat file.  This file gets [re]created from the /opt/VRTSvcs/bin/CVMVolDG/online script:

   96  # setup vxstat
   97  cvmvoldg_setup_stat

/opt/VRTSvcs/bin/CVMVolDG/cvmvoldg.lib file:

   98  VCS_VOL_STAT="/var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_stat"
...
 1428  cvmvoldg_setup_stat() {
 1429          _funcname=cvmvoldg_setup_stat
 1430          echo_trace_enter
 1431
 1432          $ECHO 1 > $VCS_VOL_STAT   <---
 1433          $WHO -b > $WHO_STAT
 1434          echo_trace_exit 0
 1435          return
 1436  }

While you could completely offline the service group, and then re-online to invoke the online script which recreates the file, you may elect to manually recreate this file as Support sometimes requests for you to do:

## for each of the following commands, please replace "MyCVMVolDgResourceName", "MyDiskgroupName", "MyFirstSystemName", etc. with your values:

root# echo 1 > /var/VRTSvcs/lock/MyCVMVolDgResourceName_MyDiskgroupName_stat  
root# cat /var/VRTSvcs/lock/MyCVMVolDgResourceName_MyDiskgroupName_stat
1
root# hares -clear MyCVMVolDgResourceName -sys MyFirstSystemName
root# hares -clear MyCVMVolDgResourceName -sys MySecondSystemName
root# hares -probe MyCVMVolDgResourceName -sys MyFirstSystemName
root# hares -probe MyCVMVolDgResourceName -sys MySecondSystemName
root# hares -display MyCVMVolDgResourceName -attribute State
#Resource              Attribute System             Value
MyCVMVolDgResourceName State     MyFirstSystemName  ONLINE
MyCVMVolDgResourceName State     MySecondSystemName ONLINE
root#



vxnotify process:


From the CVMVolDg agent's /opt/VRTSvcs/bin/CVMVolDG/cvmvoldg.lib script:

  103  VXNOTIFY_PID="/var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_pid"
...
 1083  # cvmvoldg_setup_vxnotify : setup notify to run in the background and
 1084  # redirect all output from vxnotify into a file. Then save the sum of the
 1085  # file in as a cookie that can be read by later successsions of the monitor
 1086  # script.
 1087  # There is not file locking between vxnotify and taking the sum since vxnotify
 1088  # does not understand the file locking sematics. Thus, to overcome any problems
 1089  # we do a sum twice. If anything changed, we restart vxnoify. This is still not
 1090  # foolproof. So we do that in a loop.
 1091  # This takes one(1) arguments. The first argument is
 1092  # is where the result is returned. It will be non-zero if any error occured.
 1093  cvmvoldg_setup_notify() {
...
 1105    # if the VCS_VXNOTIFY_FILE exists then kill it using signal 9. We
 1106    # don't want a team of vxnotifes here. Of course, if we do this, this
 1107    # is a event that begs for a log msg.
 1108    if [ -r $VXNOTIFY_PID ] ; then
 1109      _cosn_old_pid=`$CAT $VXNOTIFY_PID`
 1110
 1111      # before killing the process, make sure that it exists
 1112      # and is vxnotify
 1113      $KILL -0 $_cosn_old_pid 2>/dev/null ; _cosn_stat=$?
 1114      if [ $_cosn_stat -eq 0 ] ; then
 1115        $PS -fp $_cosn_old_pid | $GREP vxnotify >/dev/null 2>&1 ; _cosn_stat=$?
 1116        if [ $_cosn_stat -eq 0 ] ; then
 1117          _cosn_kill=1
 1118        fi
 1119      fi
 1120      if [ $_cosn_kill -eq 1 ] ; then
 1121        $KILL -9 $_cosn_old_pid 2>/dev/null ; _cosn_stat=$?
 1122        if [ $_cosn_stat -ne 1 ] ; then
 1123          VCSAG_LOG_MSG "W" "setup_vxnotify: old vxnotify of pid $_cosn_old_pid will be killed. my pid is $$" 1074
$_cosn_old_pid $$
 1124        fi
 1125      fi
 1126    fi
 1127
 1128    # cleanup and previous vxstat file.
 1129
 1130    $RM -f $VCS_VXNOTIFY_FILE
 1131    $RM -f $VXNOTIFY_PID
...
 1138    _cosn_pid=`$VXNOTIFY -g $cvmvoldg_dgname >$VCS_VXNOTIFY_FILE & echo $!`
 1139
 1140    # give vxnotify enough time
 1141    $SLEEP 2
 1142
 1143    # Sometimes vxnotify is returned with error which is not handled above.
 1144    # Retry once more if pid does not exists.
 1145    $KILL -0 $_cosn_pid 2>/dev/null ; _cosn_stat=$?
 1146    if [ $_cosn_stat -ne 0 ] ; then
 1147      $RM -f $VCS_VXNOTIFY_FILE
 1148      _cosn_pid=`$VXNOTIFY -g $cvmvoldg_dgname >$VCS_VXNOTIFY_FILE & echo $!`
 1149      echo_debug "Restarting vxnotify. New PID is $_cosn_pid."
 1150      $SLEEP 2
 1151    fi
...
 1198    $RM -f $VXNOTIFY_PID
 1199    $ECHO $_cosn_pid > $VXNOTIFY_PID


The above function will kill the per-diskgroup vxnotify process (a separate process runs for each imported diskgroup that is managed by VCS) and attempt to restart it, storing the process id within a file named "/var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_pid".

This function gets called by the online as well as monitor scripts, focusing on the latter:

   74  # if something happened to notify, we will have to restart it. However, that
   75  # means that we will also have to do all verification that was done in the
   76  # online script again.
   77
   78  # we check for all initial conditions, like dgs being imported etc, and then
   79  # start vxnotify. There is a hole in there that the conditions can change.
   80  # Fix that.
   81
   82  cvmvoldg_check_notify_health _cvm_res
   83  if [ $_cvm_res -ne 0 ] ; then
...
  101          cvmvoldg_setup_notify _cvm_res
  102          if [ $_cvm_res -ne 0 ] ; then
  103                  VCSAG_LOG_MSG "E" "Can not restart vxnotify. Failed" 1038
  104                  exit $CVMVOLDG_MONITOR_FAILURE
  105          fi

Solution:

## for each of the following commands, please replace "MyCVMVolDgResourceName", "MyDiskgroupName", "MyFirstSystemName", etc.  with your values:

If the per-diskgroup vxnotify process is not running, you can start it from the command line:

root# ps -elf |grep "vxnotify -g"
root#
root# /usr/sbin/vxnotify -g MyDiskgroupName
root#

You can then echo its process ID into the /var/VRTSvcs/lock/${cvmvoldg_resname}_${cvmvoldg_dgname}_pid file:

root# ps -elf |grep "vxnotify -g"
0 S     root 16255     1   0  40 20        ?    297        ?   Feb 20 ?           0:00 /usr/sbin/vxnotify -g MyDiskgroupName
root# echo 16255 > /var/VRTSvcs/lock/MyCVMVolDgResourceName_MyDiskgroupName_pid
root# cat /var/VRTSvcs/lock/MyCVMVolDgResourceName_MyDiskgroupName_pid
16255
root# hares -clear MyCVMVolDgResourceName -sys MyFirstSystemName
root# hares -clear MyCVMVolDgResourceName -sys MySecondSystemName
root# hares -probe MyCVMVolDgResourceName -sys MyFirstSystemName
root# hares -probe MyCVMVolDgResourceName -sys MySecondSystemName
root# hares -display MyCVMVolDgResourceName -attribute State
#Resource              Attribute System             Value
MyCVMVolDgResourceName State     MyFirstSystemName  ONLINE
MyCVMVolDgResourceName State     MySecondSystemName ONLINE
root#



Summary:

During a severe system performance issue (possibilities of which are outside the scope of this Technote), the CVMVolDG monitor routine may become timed out by the CVMVolDGAgent daemon, which may result in calling the clean entry point for that resource.  Depending upon timing, it is possible that the aforementioned $VCS_VOL_STAT and $VXNOTIFY_PID contents may become inaccurate.

Until those two discrepancies are resolved, the CVMVolDG resource will not return ONLINE again despite the fact the shared diskgroup is imported, properly activated, and all volumes listed in the resource's CVMVolume attribute are confirmed to be ENABLED/ACTIVE and fully capable of servicing IO.

This Technote documents an alternative solution for recreating those files with the correct content, without having to unfreeze the service group to re-run the online entry point.  

Please note there may be additional circumstances not outlined in this Technote that might still prevent bringing a previously faulted CVMVolDg resource back to the ONLINE state.  This Technote describes only two of those in particular, given they are commonly observed by our Support personnel.  

If the above troubleshooting does not resolve your issue, please open a Support case, and also optionally have a look at the shell scripts mentioned above to determine which condition is preventing the resource from being declared ONLINE.  Once the monitor script conditions are met such that it returns status 110, VCS will then report the resource as ONLINE.



Legacy ID



322871


Article URL http://www.symantec.com/docs/TECH69443


Terms of use for this information are found in Legal Notices