Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Faulty Memory Replacement on one of the node of Oracle RAC Cluster

Updated: 21 May 2010 | 1 comment
Zubair.mohammed's picture
0 0 Votes
Login to vote
This issue has been solved. See solution.

Hi,

Appreciate your help in reviewing the attached plan for getting a memory module replaced on one of RAC Cluster. Its a 3 node cluster running with 9 SG in ACTIVE/ACTIVE mode.

# /etc/vx/bin/vxclustadm nidmap   [Identify the Master Node]

#haconf -makerw     [On Master node]

# hasys -freeze -persistent <nodename>  [Freeze the systems which do not have faulty Memory]

# Login to the Node     [Which has faulty Memory].

#hagrp -list      [ SG which are ONLINE on the faulty memory node].

#haconf -dump -makero

# hagrp -offline <SG> -sys <nodename>

# hastop -local

#/sbin/gabconfig -a     [Port h, v & W should be stopped].

#/opt/VRTSvcs/rac//uload_drv

# /sbin/vxfenconfig -U
# /sbin/vcsmmconfig -U

# /sbin/lmxconfig -U

#/sbin/gabconfig -a

# modinfo | egrep "lmx|vxfen|vcsmm"  [Determine the module IDs for VCSMM, I/O fencing, and LMX]

# modunload -i <ID>
# modunload -i <ID>
# modunload -i <ID>

# /sbin/gabconfig -U

# /sbin/lltconfig -U

# modinfo | egrep "gab|llt"
# modunload -i <ID>
# modunload -i <ID>

#shutdown -g0 -y -i0

#su - sms-svc
$setkeyswitch -d <> OFF

Hand over the box for Memory Module Replacement

$setkeyswitch -d <> ON upon confirmation of successfull replacement of Memory Module.

Cross verify LLT, GAB, LMX, VXFEN & VCSMM drivers has been loaded upon having the system running in Multiuser mode.

#haconf -makerw

#hasys -unfreeze -persistent <nodename>

#hagrp -online <SG> -sys <nodename>

#hastatus -sum

Regards,
Zubair

Comments

Gaurav Sangamnerkar's picture
22
Nov
2009
0 Votes 0
Login to vote

Hello Zubair, Plan is ok

Hello Zubair,

Plan is ok however couple of things would like to highlight..

a) On the faulty node, you are offlining the group, don't you want to switch them to other active nodes to avoid downtime ?
b) I don't see any harm in freezing the whole cluster rather then just the nodes with no-faulty memory, plz note, on the system which you have not frozen, VCS will be ok to take any actions... you can offline service group even after freezing the system, I guess that should be possible, try that out in Simulator, it works,,,

c) while stopping the stack, I don't see a line to stop ODM... you might want to include that
d) Order of unconfguring modules doesn't seems to be correct, fencing should be at last just before GAB, you should consider unoconfiguring vcsmm, ODM & LMX first...... than later go to fencing once all others are closed....

Hope this helps..

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.