Video Screencast Help

CVM Issue

Created: 02 Apr 2010 • Updated: 25 Aug 2010 | 6 comments
sn27's picture
This issue has been solved. See solution.

Hi All ,

There are  4 nodes of  vcs oracle RAC cluster , 3 of them were at maintence mode and 1 was alive node. My question over here is why CVM master didnt failover to alive node when 3 nodes come into maintence mode ?..

As per the theory if master goes down then other slave nodes become CVM master.

Thanks!

Comments 6 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hello,

Can you elaborate more on the issue ?

a) calrify on maintenance mode ? what was server state & run level ?
b) How did server went down ? what happened to 3 servers ?
c) Was there any issue with network heartbeats ?
d) Was CVM membership fine before these servers rebooted ?

What is OS version & VCS version ?

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

sn27's picture

Here is required details.Server1/3/4 was in maintence mode except server2.

the server <Server1> was in maintenance mode  but it was shown as the master node for CVM.

<Server1>:root# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: <Server1>

Only after rebooting the host the master node got changed to <server2>

<server1>:root# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: <Server2>
<Server1>:root#

Below error messages was found in console. Please investigate further.

-------------------------------------------------------------------------------------------------------------------------------------------
May 29 00:53:34 <Server1>.com vxfsckd: vxfs vxfsckd: Cannot create pipe: Too many open files
May 29 00:57:32 <Server1>.com last message repeated 23559 times
May 29 00:57:32 <Server1>.com svc.startd[7]: system/console-login:default failed: transitioned to maintenance (see 'svcs -xv' for details)

May 29 00:57:32 <Server1>.com vxfsckd: vxfs vxfsckd: Cannot create pipe: Too many open files
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Could not fork to start sulogin: Resource temporarily unavailable
Directly executing sulogin.
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

Root password for system maintenance (control-d to bypass): May 29 01:00:14 <Server1>.com last message repeated 31502 times

May 29 01:00:14 <Server1>.com vxfsckd: vxfs vxfsckd: Cannot create pipe: Too many open files

Login incorrect

Root password for system maintenance (control-d to bypass): Login incorrect

Root password for system maintenance (control-d to bypass): May 29 01:02:49 <Server1>.com last message repeated 17979 times
May 29 01:02:49 <Server1>.com cpudiagd[5105]:
May 29 01:02:49 <Server1>.com      Could not start CPU test program: /usr/platform/sun4u/sbin/sparcv9+vis2/cputst
May 29 01:02:49 <Server1>.com      System call fork() failed. Reason: Resource temporarily unavailable
May 29 01:02:49 <Server1>.com vxfsckd: vxfs vxfsckd: Cannot create pipe: Too many open files
May 29 01:06:54 <Server1>.com last message repeated 28854 times
May 29 01:06:54 <Server1>.com vxfsckd: vxfs vxfsckd: Cannot create pipe: Too many open files

sn27's picture

O/S = solaris 10 update 4.3
VCS = 5.0MPERP2

Gaurav Sangamnerkar's picture

Hello,

Thanks for outputs, well as you see above, even nodes were in maintenance state, your CVM cluster was still active... however state was inconsistent for sure, the output which you pasted:

<Server1>:root# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: <Server1>

Above output says, server 1 is slave & master both, which is definately inconsistent state.

Have you got a chance to check what was CVM group state in cluster when this issue happened ?

I presume this needs a real indepth investigation, however just to constitute a theory, I suspect that before CVM master failover was attempted, CVM was stuck somewhere & couldn't perform the full operation resulting in above inconsistent state...

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

sn27's picture

Yes I am aware about below out put and its needs a indepth investigation ...

Server1>:root# vxdctl -c mode
mode: enabled: cluster active - SLAVE
master: <Server1>

We need to find out why it got stuck and etc....I am suspecting because of the system HIGH load it may stuck but not sure about 100% .....Let me know if you find something else...

thanks!

Gaurav Sangamnerkar's picture

only thing that could reveal why it was stuck would be a crash dump from all the nodes at the same time..... unfortunately analyzing that won't be possible here & you will need to open a support case with Symantec.....

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION