Video Screencast Help

Suggestion for updating sfha5.0 MP3RP3 environment

Created: 28 Jan 2013 • Updated: 07 Mar 2013 | 7 comments
Zahid.Haseeb's picture
This issue has been solved. See solution.

Environment

RHEL = 5.3

SFHA/DR = 5.0 MP3RP3

Primary Site = Two Nodes Cluster

DR Site = One Node Cluster

Query

We are planning to update our existing 5.0 version with the last updated version of 5.0. (After some time we had a plan to go with latest version which may be 6.0 but this need to update our OS as well. So for a short term plan we need to update our sfha 5.0 MP3 RP3 till the last available patch for sfha 5.0 )

- Our understanding is we can run the below highlighted rolling patch sfha 5.0 MP4RP1 directly on sfha 5.0 MP3 RP3

Any quick update will be higly appriciated

Discussion Filed Under:

Comments 7 CommentsJump to latest comment

Marianne's picture

Rather upgrade to SF/HA 5.1 SP1 (with subsequent patches).
RHEL 5.3 is still supported plus 5.1 allows for rolling upgrades when you eventually upgrade the OS. 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mikebounds's picture

The 5.0MP4RP1 release notes say:

The Veritas Storage Foundation and High Availability (SFHA) 5.0 Maintenance
Pack (MP) 4 Rolling Patch (RP) 1 release is cumulative with and based on the
Veritas Storage Foundation 5.0 MP3 release.
 
So this suggests it can be applied to MP3.
 
Mike

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

SOLUTION
Zahid.Haseeb's picture

Thanks both guys for your kind words.

(Marking thumb for now)

 

@Marianne You means we can directly install sfha 5.1 SP1 on sfha 5.0 MP3RP3 product ?

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Zahid.Haseeb's picture

I upgraded the patch as below procedure:

- Offline Service Group which also deported DiskGroup so no vxfs filesystem is mounted.

- Stop HAD on one Cluster Node

# hastop -local

- Stop the below services as per Release Notes

# /etc/init.d/vxodm stop     (this daemon is not installed so not able to stop)

# /etc/init.d/vxgms stop          (this daemon is not installed so not able to stop)

# /etc/init.d/vxglm stop             (this daemon is not installed so not able to stop)

# /etc/init.d/vxfen stop

# /etc/init.d/gab stop

# /etc/init.d/llt stop

- Apply the patch via the below way

cd /patches/StorageFoundation/rpms

# rpm -Uvh *.rpm

(Support only said to run rpm -Uvh *.rpm at the mentioned folder"/patches/StorageFoundation/rpms" and told not need to run rpm -Uvh *.rpm at the mentioned folder"/patches/Veritas_Cluster/rpms" So the same we did.)

- Verify the MP4 is either installed or not

 rpm -qa | grep VRTSvcs
VRTSvcssy-5.0.41.000-MP4RP1_RHEL5
VRTSvcsvr-5.0.30.00-MP3_GENERIC
VRTSvcs-5.0.30.20-MP3RP2_RHEL5
VRTSvcsdb-5.0.41.000-MP4RP1_GENERIC
VRTSvcsmg-5.0.30.00-MP3_GENERIC
VRTSvcsmn-5.0.30.00-MP3_GENERIC
VRTSvcsor-5.0.41.000-MP4RP1_RHEL5
VRTSvcsdr-5.0.41.000-MP4RP1_RHEL5
VRTSvcsag-5.0.30.20-MP3RP2_RHEL5
 

-Rebooted the System

After the NODE up the HAD service is not able to UP.

# hastart

# hastatus

attempting to connect....

VCS ERROR V-16-1-10600 Cannot connect to VCS engine attempting to connect....not available; will retry attempting to connect....retrying

=======================================

Discussed with Symantec Support and they suggested the below reason:

Check /etc/llttab, /etc/llthosts, /etc/VRTSvcs/conf/sysname and /etc/VRTSvcs/conf/config/main.cf for any inconsistencies in naming conventions across the nodes in the cluster. This incident related to use of a fully qualified domain name being used in all files except the main.cf file.

I made the same name of Cluster Node in all above mentioned files on the Cluster Node only where I upgraded the Rolling Patch (Although I changed but a deep Concern that if the Name is matter then why it was running in Earlier version). Below is as a reference:

( Previous Names PRI and SEC .. New Names are PRIMARY_NODE and SEC_NODE  where I applied the patch)

# cat /etc/VRTSvcs/conf/config/main.cf |grep PRIMARY_NODE
system PRIMARY_NODE (
        SystemList = { PRIMARY_NODE = 0, sec = 1, secphoenix = 2 }
        AutoStartList = { PRIMARY_NODE, secphoenix }
        SystemList = { PRIMARY_NODE = 0, secphoenix = 1 }
        AutoStartList = { PRIMARY_NODE, secphoenix }
        SystemList = { PRIMARY_NODE = 0, secphoenix = 1 }
        AutoStartList = { PRIMARY_NODE, secphoenix }

# cat /etc/VRTSvcs/conf/config/main.cf |grep SEC_NODE
system SEC_NODE (
system SEC_NODE (
        SystemList = { PRIMARY_NODE = 0, sec = 1, SEC_NODE = 2 }
        AutoStartList = { PRIMARY_NODE, SEC_NODE }
        SystemList = { PRIMARY_NODE = 0, SEC_NODE = 1 }
        AutoStartList = { PRIMARY_NODE, SEC_NODE }
        SystemList = { PRIMARY_NODE = 0, SEC_NODE = 1 }
        AutoStartList = { PRIMARY_NODE, SEC_NODE }
 

 

# cat /etc/llttab
set-node SEC_NODE
set-cluster 789
link eth1 eth-00:15:17:95:36:35 - ether - -
link eth2 eth-00:24:e8:2e:e1:ec - ether - -
link-lowpri eth7 eth-00:10:18:2e:8b:4d - ether - -

 

# cat /etc/llthosts
0 PRIMARY_NODE
1 SEC_NODE
 

# cat /etc/VRTSvcs/conf/sysname
SEC_NODE
 

But Still not able to UP the HAD

=================

Below engine log details are for reference

2013/01/29 03:13:15 VCS NOTICE V-16-1-11022 VCS engine (had) started
2013/01/29 03:13:15 VCS NOTICE V-16-1-11050 VCS engine version=5.0
2013/01/29 03:13:15 VCS NOTICE V-16-1-11051 VCS engine join version=5.0.30.2
2013/01/29 03:13:15 VCS NOTICE V-16-1-11052 VCS engine pstamp=Veritas-5.0MP3RP2-01/22/09-15:17:00
2013/01/29 03:13:15 VCS NOTICE V-16-1-10114 Opening GAB library
2013/01/29 03:13:15 VCS INFO V-16-1-10196 Cluster logger started
2013/01/29 03:13:15 VCS NOTICE V-16-1-10619 'HAD' starting on: SEC_NODE
2013/01/29 03:13:15 VCS ERROR V-16-1-10621 Local cluster configuration error
2013/01/29 03:13:15 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2013/01/29 03:13:20 VCS INFO V-16-1-10077 Received new cluster membership
2013/01/29 03:13:20 VCS NOTICE V-16-1-10112 System (SEC_NODE) - Membership: 0x3, DDNA: 0x0
2013/01/29 03:13:20 VCS NOTICE V-16-1-10322 System  (Node '0') changed state from UNKNOWN to INITING
2013/01/29 03:13:20 VCS NOTICE V-16-1-10086 System  (Node '0') is in Regular Membership - Membership: 0x3
2013/01/29 03:13:20 VCS NOTICE V-16-1-10086 System secphoenix (Node '1') is in Regular Membership - Membership: 0x3
2013/01/29 03:13:20 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in STALE_DISCOVER_WAIT state
2013/01/29 03:13:20 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in STALE_DISCOVER_WAIT state
2013/01/29 03:13:20 VCS NOTICE V-16-1-10453 Node: 0 changed name from: '' to: 'pri'
2013/01/29 03:13:20 VCS NOTICE V-16-1-10322 System pri (Node '0') changed state from INITING to RUNNING
2013/01/29 03:13:20 VCS NOTICE V-16-1-10322 System SEC_NODE (Node '1') changed state from STALE_DISCOVER_WAIT to REMOTE_BUILD
2013/01/29 03:13:20 VCS NOTICE V-16-1-10464 Requesting snapshot from node: 0
2013/01/29 03:13:20 VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0
2013/01/29 03:13:20 VCS ERROR V-16-1-10391 System 'PRI' listed in main.cf but absent in LLT config file
2013/01/29 03:13:20 VCS ERROR V-16-1-10068 System PRI - VCS engine will exit to prevent any inconsistency. Please review names of systems listed in VCS configuration and make sure they are present and node ids are consistent in LLT configuration files on all systems in the cluster
 

 

I am not able to understand why the old name which is PRI still showing even I changed this name with PRIMARY_NODE as I shown above

=================================

=================================

Usual help will be highly appriciated

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

Have a look in /etc/VRTSvcs/conf/sysname

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

Marianne's picture

System 'PRI' listed in main.cf but absent in LLT config file.

Have you tried to find reference to PRI in main.cf?
You have only showed us output of grep PRIMARY_NODE.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mikebounds's picture

Zahid,

Were you nodes names:

PRI and SEC_NODE when you has 5.0 MP3RP3 installed

and did you change PRI to PRIMARY_NODE as part of the process of upgrading to 5.0 MP4RP1.

If, so I think your issue is you only stopped VCS on one node and so the main.cf is in the memory of PRI node and when you start VCS on SEC_NODE, then SEC_NODE will ignore the local main.cf (which I guess you edited to change name from PRI to PRIMARY_NODE) and will get config from memory of PRI - the logs verify this as you see the following lines in the log:

 

2013/01/29 03:13:20 VCS NOTICE V-16-1-10322 System SEC_NODE (Node '1') changed state from STALE_DISCOVER_WAIT to REMOTE_BUILD
2013/01/29 03:13:20 VCS NOTICE V-16-1-10464 Requesting snapshot from node: 0
2013/01/29 03:13:20 VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0

 

So this is showing VCS did a REMOTE_BUILD (get in-memory main.cf from another node), not a LOCAL_BUILD (get local on-disk main.cf).  You will also see, that if you check main.cf on SEC_NODE, that it has been overwritten by copy from PRI

If you want to change name then you need to stop VCS on both nodes (you can use hastop -all -force, if you want to leave Apps running) and you may need to restart LLT and GAB on both nodes also.  Then restart everything.

You do need to change entry in /etc/VRTSvcs/conf/sysname, which I assume you have, but you only gave output from SEC_NODE, not  PRIMARY_NODE.

As an aside, it is not a good idea to make 2 unrelated changes at the same time as if things go wrong, it is not always clear which change caused the issue.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution