Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

system status

Created: 18 May 2013 • Updated: 20 May 2013 | 8 comments
This issue has been solved. See solution.

Why is the state of the second system showing uknown

[root@system1:log]# /opt/VRTSvcs/bin/hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen

A  system1       RUNNING              0
A  system2       UNKNOWN              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State        

B  ClusterService  system1       Y          N               ONLINE       
B  ClusterService  system2       Y          N               OFFLINE      
B  appsGroup       system1       N          Y               OFFLINE      
B  appsGroup       system2       Y          Y               OFFLINE      
B  vxfen           system1       Y          N               OFFLINE      
B  vxfen           system2       Y          Y               OFFLINE      

-- RESOURCES NOT PROBED
-- Group           Type                 Resource             System             

E  ClusterService  IP                   webip                system2     
E  ClusterService  NIC                  csgnic               system2     
E  appsGroup       DiskGroup            appsDG               system2     
E  appsGroup       IP                   SysIP                system1     
E  appsGroup       IP                   SysIP                system2     
E  appsGroup       Mount                mntGroup             system1     
E  appsGroup       Mount                mntGroup             system2     
E  appsGroup       NIC                  SysNIC               system1     
E  appsGroup       NIC                  SysNIC               system2     
E  appsGroup       Volume               VolGroup             system2     
E  vxfen           CoordPoint           coordpoint           system2     
[root@system1:log]#

Discussion Filed Under:

Comments 8 CommentsJump to latest comment

mikebounds's picture

This usually mean "had" daemon is not running on system2, so VCS does not know state of the system, but it knows system is up as LLT is up.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
MIG31's picture

Thanks Mike for the quick response when i do the gabconfig -a on both ssystems this is the output. the ones italized what does that mean

[root@system2:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01
Port h gen   db7e12 membership ;1
Port h gen   db7e12    visible 0

[root@system1:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01
 

MIG31's picture

Hey Mike thanks just found that i had #mv /etc/VRTSvcs/conf/config/types.cf /etc/VRTSvcs/conf/config/types.cf.old and i forgot to copy them to the types.cf now its back to running state
#cp  /etc/VRTSvcs/conf/types.cf  /etc/VRTSvcs/conf/config/types.cf

mig31

mikebounds's picture

What did you do after copying types.cf file - did you run "hastart"

I can't see how this would have been this issue.  When you VCS (start "had" with hastart), then if there is a node already running, which there was in your case, then main.cf and types.cf (and any other types files) are read from memory of the running node and then the in-memory copy dumps to disk, so it does not matter if types file is missing.

Mike

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

MIG31's picture

Hey Mike

I checked to confirm  that the /etc/VRTSvcs/conf/types.cf file is the same as /etc/VRTSvcs/conf/config/types.cf.
they were different made a backup of the existing types.cf file and copy the latest file to the config directory.did a hastop -all -force on both systems. Then did a hastart on both systems, the issue was that on system2 i just did the backup and forgot to copy the latest file to the config directory, so the system couldn't join the cluster because it did not have system entry in the config

mig31

 

arangari's picture

@MIG31 - Mike's observation is right. I am not sure from the first stage you posted, by copying the 'right' files back on 2nd node (system2) should resolve the issue.  It go resolved just because you did 'hastart'. 

The output you shown from system1 - where system2 is in UNKNOWN while the output of 'gabconfig -a' is from system2 - indicating the HAD on system2 is running , while on system1, there is no HAD running - which contradicts to output shown in the first posting.

 

[snip of your posting]

[root@system2:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01
Port h gen   db7e12 membership ;1
Port h gen   db7e12    visible 0

 

[root@system1:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01

[/end snip]

in either case, the 'hastart' on both the nodes after stopping the VCS using 'hastop -all -force' on each node, would bring the cluster in proper state, by picking up one of the configuration.

The 'copy' operation you indicated will not change anything in general - and, IMHO, marking it as solution gives wrong information for future readers. 

Regards,

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

mikebounds's picture

I agree with Amit, post1 shows "had" is not running on system2 and post2 shows "had" not running on system1 and that issue got "resolved just because you did 'hastart'"

As VCS uses in-memory-copy of types.cf, then types.cf ONLY gets read when the first node starts, so if the types.cf file is wrong, then either both nodes will not start, or neither node will start - a bad types.cf cannot effect one node and not the other unless you have split-brain (this is when you have heartbeats down, but in your sitution GAB is communicating fine).

You can test this quite easily:

Run hastop -local on system1 and you will see:

 

[root@system2:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01
Port h gen   db7e12 membership ;1
Port h gen   db7e12    visible 0

[root@system1:]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   db7e05 membership 01
Port b gen   db7e0b membership 01

And "hastatus -sum" on system 2 will NOT show system1 as in RUNNING state (I think it will be EXITED as "had" was stopped cleanly rather than UNKNOWN)

Then move types.cf to types.cf.old on system1 (and main.cf if you want)

Then run "hastart" on system1 and you will see types.cf file is re-created and in the engine_A.log you will see that system1 did a "REMOTE" build (build from memory of other node), where as if you look when the first system started in the cluster it does a "LOCAL" build (build from local .cf files)

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

MIG31's picture

Thanks Mike and Amit the information it was helpful I do appreciate

Thanks,

mig31