Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

4-nodes cvm, one node can't join,since minor number in use, Need deport dg on all nodes to reminor?

Created: 16 Apr 2013 • Updated: 18 Apr 2013 | 14 comments
starflyfly's picture
This issue has been solved. See solution.
error:
 
Apr 16 19:06:51 nodeb kernel: GAB INFO V-15-1-20036 Port v gen   418c65 membership 0123
Apr 16 19:06:51 nodeb vxvm:vxconfigd: V-5-1-7900 CVM_VOLD_CONFIG command received
Apr 16 19:06:51 nodeb vxvm:vxconfigd: V-5-1-7899 CVM_VOLD_CHANGE command received
Apr 16 19:06:56 nodeb kernel: GAB INFO V-15-1-20036 Port w gen   418c5d membership 0123
Apr 16 19:06:56 nodeb vxvm:vxconfigd: V-5-1-8066 minor number 17000 disk group dgname in use
Apr 16 19:06:56 nodeb vxvm:vxconfigd: V-5-1-11092 cleanup_client: (Cannot assign minor number) 231
Apr 16 19:06:56 nodeb vxvm:vxconfigd: V-5-1-11467 kernel_fail_join() :                Reconfiguration interrupted: Reason is retry
 to add a node failed (13, 0)
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-0-164 Failed to join cluster gis-cfs, aborting
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 0 being failed
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 0 with err 11
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 1 being failed
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 1 with err 11
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 2 being failed
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 2 with err 11
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 3 being failed
Apr 16 19:06:56 nodeb kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 3 with err 11
Apr 16 19:06:56 nodeb kernel: GAB INFO V-15-1-20032 Port v closed
Apr 16 19:06:56 nodeb vxvm:vxconfigd: V-5-1-7901 CVM_VOLD_STOP command received
 
 
 
at master:
 
root@mtvsparccore # more vxdctl*
::::::::::::::
vxdctl_c_mode
::::::::::::::
mode: enabled: cluster active - MASTER
master: noded
 
 
root@mtvsparccore # grep -i minor vxprint*
vxprint_ht:DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
vxprint_m_dgname:       base_minor=17000
vxprint_m_dgname:       minor=-1
vxprint_m_dgname:       minor=-1
vxprint_m_dgname:       minor=-1
vxprint_m_dgname:       minor=17000
vxprint_m_dgname:       forceminor=off
vxprint_mpvshr_dgname:  minor=17000
vxprint_mpvshr_dgname:  forceminor=off
vxprint_mpvshr_dgname:  minor=-1
vxprint_mpvshr_dgname:  minor=-1
vxprint_mpvshr_dgname:  minor=-1

Comments 14 CommentsJump to latest comment

starflyfly's picture
env:
 
 # more etc/*release*
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 2
 # egrep "vxvm|vxfs|vcs"  rpm_aq
VRTSvcsdr-5.1.133.000-SP1RP3_SLES10
VRTSvxfs-5.1.133.100-SP1RP3P1_SLES10
VRTSvcs-5.1.133.000-SP1RP3_SLES10
VRTSvcsea-5.1.133.000-SP1RP3_SLES10
VRTSvcsag-5.1.133.000-SP1RP3_SLES10
VRTSvxvm-5.1.133.100-SP1RP3P1_SLES10

If the answer has helped you, please mark as Solution.

Marianne's picture

Why not find out what is using 17000 on node 4 and reminor that?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

starflyfly's picture

Thanks ,Marianne.

Any detail method to find who is using 17000?

I checked /dev/vx/dsk,no more tips:

nodeb:/opt/VRTSspt/VRTSexplorer # ls -lR /dev/vx/dsk
/dev/vx/dsk:
total 0

on master:

noded:/opt/VRTSspt/VRTSexplorer # ls -lR /dev/vx/dsk

/dev/vx/dsk:
total 0
drwxr-xr-x 2 root root 60 Apr 16 18:38 dgname

/dev/vx/dsk/gisdg3:
total 0
brw------- 1 root root 199, 17000 Apr 16 18:38 gisvol3

If the answer has helped you, please mark as Solution.

mikebounds's picture

You can look in your vxprint_m_dgname files to see what the name of the diskgroup is, or run "vxprint -g diskgroup_name -m | grep base" for each diskgroup.

Just to explain what is happening:

Your shared diskgroup is using minor number 17000 so this minor number must be free on your 3 other nodes, but is being used by a local diskgroup on the problem node.  What I would advise if all nodes have a local diskgroup is to make these all have the same minor number as this will prevent this issue reoccuring if you create new shared diskgroups.  For example, suppose the local diskgroups on the other nodes use a minor number of 18000, then if you reminor local diskgroup on problem node from 17000 to say 19000, then if you create a new diskgroup and then make it shared it could get created with minor number 18000 or 19000 depending on what node you create the diskgroup, which will cause conflict, but if you reminor as 18000, then 18000 can't be used on any node for a new diskgroup so you won't get conflicts.

If the 3 nodes use different minor numbers for their local diskgroups, then I would make these the same at some point (in you next planned outage).

To reminor your local diskgroup on problem node use

vxdg -g local_dg_name reminor new_minor_number

You will need to deport and reimport for change to take effect.

You could alternatively reminor the shared diskgroup, but this would require you deport and reimport so this would cause outage on 3 nodes, so I would reminor the local diskgroup.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

hildaqu's picture

1. Please get & check default minor number for sharedg on all nodes first:

run "/sbin/vxdefault list" on all nodes.

note the value of "sharedminorstart" & "autoreminor"

2. try to test vxdg reminor command on sharedg in lab to check if any impact to runnig application in lab

SOLUTION
starflyfly's picture

Thanks reply , Mike & Hilda.

In my case, customer has no local dg in nodeb.But according Hilda's suggestion, I found customer didn't autoreminor on  on nodeb:

nodeb:/opt/VRTSspt/VRTSexplorer # /sbin/vxdefault list
KEYWORD CURRENT-VALUE DEFAULT-VALUE 
autostartvolumes on on 
fssmartmovethreshold 100 100 
reclaim_on_delete_start_time 22:10 22:10 
reclaim_on_delete_wait_period 1 1 
same_key_for_alldgs off off 
sharedminorstart 33000 33000 
usefssmartmove all all 
usesmartmovewithvvr on on 
 
 
noded:/opt/VRTSspt/VRTSexplorer # /sbin/vxdefault list
KEYWORD CURRENT-VALUE DEFAULT-VALUE 
autoreminor on on     <<<<<
autostartvolumes on on 
fssmartmovethreshold 100 100 
reclaim_on_delete_start_time 22:10 22:10 
reclaim_on_delete_wait_period 1 1 
same_key_for_alldgs off off 
sharedminorstart 33000 33000 
usefssmartmove all all 
usesmartmovewithvvr on on 
 
nodec:/opt/VRTSspt/VRTSexplorer # /sbin/vxdefault list
KEYWORD CURRENT-VALUE DEFAULT-VALUE 
autoreminor on on 
autostartvolumes on on 
fssmartmovethreshold 100 100 
reclaim_on_delete_start_time 22:10 22:10 
reclaim_on_delete_wait_period 1 1 
same_key_for_alldgs off off 
sharedminorstart 33000 33000 
usefssmartmove all all 
usesmartmovewithvvr on on 
 
nodea:/tmp # /sbin/vxdefault list
KEYWORD CURRENT-VALUE DEFAULT-VALUE 
autoreminor on on 
autostartvolumes on on 
fssmartmovethreshold 100 100 
reclaim_on_delete_start_time 22:10 22:10 
reclaim_on_delete_wait_period 1 1 
same_key_for_alldgs off off 
sharedminorstart 33000 33000 
usefssmartmove all all 
usesmartmovewithvvr on on 
 
 
===
 
 

If the answer has helped you, please mark as Solution.

starflyfly's picture

And I also found ,during application running, dg can't be reminor :

Essentially, even with autoreminor set to on,
there are still cases that we can't do auto-reminor. Mostly falls into node join
cases that DGs are already imported and there are minor conflicts. There might
be file systems mounted on top, we can't reminor the devices underneath.

If the answer has helped you, please mark as Solution.

mikebounds's picture

Can you provide output from "vxdisk -o alldgs" on problem node (nodeb) and on one of the good nodes.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

starflyfly's picture

Hi, 

   I found customer's nodeb   has not install patch 5.1sp1rp3p1, while installed on other nodes.

  That can explain why autoreminor not found in nodeb.

  in 5.1sp1rp3p1, we introduce new option "autoreminor"  , default is on.

https://sort.symantec.com/patch/detail/6986

* 2847333 (Tracking ID: 2834046)

SYMPTOM:
VxVM dynamically reminors all the volumes during DG import if the DG base minor
numbers are not in the correct pool. This behaviour cases NFS client to have to
re-mount all NFS file systems in an environment where CVM is used on the NFS
server side.

DESCRIPTION:
Starting from 5.1, the minor number space is divided into two pools, one for
private disk groups and another for shared disk groups. During DG import, the DG
base minor numbers will be adjusted automatically if not in the correct pool,
and so do the volumes in the disk groups. This behaviour reduces many minor
conflicting cases during DG import. But in NFS environment, it makes all file
handles on the client side stale. Customers had to unmount files systems and
restart applications.

RESOLUTION:
A new tunable, "autoreminor", is introduced. The default value is "on". Most of
the customers don't care about auto-reminoring. They can just leave it as it is.
For a environment that autoreminoring is not desirable, customers can just turn
it off. Another major change is that during DG import, VxVM won't change minor 
numbers as long as there is no minor conflicts. This includes the cases that 
minor numbers are in the wrong pool.

If the answer has helped you, please mark as Solution.

starflyfly's picture

Hi, Mike 

good node:(master)

DEVICE       TYPE            DISK         GROUP        STATUS
sda          auto:none       -            -            online invalid
3pardata0_335 auto:cdsdisk    3pardata0_335  dgname       online thinrclm shared
3pardata0_336 auto:cdsdisk    3pardata0_336  dgname      online thinrclm shared
 
 
error node:
 
DEVICE       TYPE            DISK         GROUP        STATUS
sda          auto:none       -            -            online invalid
3pardata0_335 auto:cdsdisk    -            (dgname)     online thinrclm shared
3pardata0_336 auto:cdsdisk    -            (dgname)     online thinrclm shared

If the answer has helped you, please mark as Solution.

mikebounds's picture

Loading this patch looks like it will help and regardless, you should definately make all nodes the same patch level.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

starflyfly's picture

Thanks, Mike, have suggest customer upgrade patch first.

If the answer has helped you, please mark as Solution.

starflyfly's picture

Thanks all, problem solved after customer install 5.1sp1rp3p1 patch on nodeb.

Thanks for the timely help.

If the answer has helped you, please mark as Solution.