how to add master in clustered nbu 7.5
i have a new 2 node master that i did the installation for nbu 7.5. we have vcs 5.1 for clustering. and os is rhel6. and 4 other media server running the same os.
cluster name is: nbu
node1: master01
node2: master02
[root@master02]# cat bp.conf SERVER = nbu.domain.com SERVER = media01.domain.com SERVER = media02.domain.com SERVER = media03.domain.com SERVER = media04.domain.com SERVER = master01.domain.com SERVER = master02.domain.com CLIENT_NAME = master02.domaincom CLUSTER_NAME = nbu.domain.com CONNECT_OPTIONS = localhost 1 0 2 USE_VXSS = PROHIBITED VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY EMMSERVER = nbu.domain.com HOST_CACHE_TTL = 3600 VXDBMS_NB_DATA = /opt/VRTSnbu/db/data KMS_DIR = /opt/VRTSnbu/kms TELEMETRY_UPLOAD = NO
the problem is: i do not see node2 of the master listed in the nbemmcmd.
[root@master02]# ./nbemmcmd -listhosts NBEMMCMD, Version: 7.5 The following hosts were found: server nbu.domain.com cluster nbu.domain.com master master01.domain.com Command completed successfully.
the other thing i did was the installation of media servers first. i thought adding via nbemmcmd as media server would be enough or i need to reinstall since the new master server is ready now? as of now i do not see any media servers in any server's nbemmcmd.
[root@master02 bin]# ./hastatus -summary -- SYSTEM STATE -- System State Frozen A master01 RUNNING 0 A master02 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService master01 Y N OFFLINE B ClusterService master02 Y N ONLINE B nbu_group master01 Y N ONLINE B nbu_group master02 Y N OFFLINE
moreover, how do i verify that this clustered enviroment is all set and ready to work? i mean i was thinking to check the failover, i can initiate a backup, and then shutdown the services on 1 node, and then see if the backups keep running. is this the only check?
do you need for information about anything? this system hasnt gone production yet, its all setting up new. and we're going to migrate the poclicy stuff after we are done.
Comments 44 Comments • Jump to latest comment
The nbemmcmd -listhosts should look something like this
yes I did. i followed the exact same procedure as mentioned in the guide. the only thing missing in the netbackup guide was the VCS part of the installation. i mean everything else is mentioned step by step. i did feel the installation on the second node was kind of quick.
so whats the suggestion now? how about adding it from nbemmcmd as a master server, i mean this node2 and then updaing it via the -updatehost in some way which am not aware of fully..
if node 2of clustered master nbu.domain.com is to be added via nbemmcmd:
cluster name: nbu.domain.com
node2: master02.domain.com
would the command be:
nbemmcmd -addhost -clustername nbu.domain.com -machinename master02.domain.com -operatingsystem linux
or some more parameters to be passed?
and since media servers were installed first, can bp.conf update and nbemmcmd command be ok like this?
nbemmcmd -addhost -machinename media01.domain.com -machinetype media -masterserver nbu.domain.com -operatingsystem linux
ok, i added the media servers already. now please tell me about adding master server's node2.
ok, i added the master too in the following way:
and the output came out to be:
am i all good now?
Just typed a long reply that didn't save. Here is the slightly shorter version.
In a word, no I do not think so.
To add a node to a cluster you should use the script:
/usr/openv/netbackup/bin/cluster/util/cluster_add_node
But this post from Marianne shows it is not that simple ;
https://www-secure.symantec.com/connect/forums/add...
I checked with a very senior BL colleague, and they have the same concern.
We could be wrong, and if we (or rather I am) then sorry, it means you lose a little time. If I am right, you risk unseen issues on a live server. I had a case a while ago with a mis-configured cluster, it looked perfect until it was upgraded, then this failed and caused all sorts of issues. The cause was traced to the fact it was built incorrectly.
The cluster install should have worked, it didn't which shows something is wrong, and until that something is found I cannot say yes (or no).
Martin
sir thanks for your time, i really appreciate it. how do we know we have a problem with the node being not in the cluster. this was a new install and i think the node is in the cluster. its just the netbackup configuration im having with. i updated the emm and it came out fine. do you know how to check if the node is in the cluster or not? any command to check this? im very positive that there isnt any problem with the node not being in the cluster and then following the procedure to add it back again.
No problem, happy to help.
It depends on:
1. If there really is a problem (as I said, I could be wrong on this)
2. If I am right, and there is an issue, a probelm could be anywhere, and so virtually impossible to find, hence why I can only recommend to get the thing reinstalled and working as per the install guide (which may require a call to support).
The last case I saw like this, as mentioned, the cluster looked right, it seems to run ok for ages, but when upgraded it broke and data was lost.
Eddited to add ...
For the kind of possible issue I'm thinking of, you're not going to find it by running commands. The example one I mentioned was found by 'manually' looking through the NBDB db unload, but this was after seeing the symptoms of the failure. Without a known problem, you are looking for something that may not be there, problem is, if you get the problem, it could be too late.
I'm not overly concerned about adding a new node - that can be done, just a matter of confirming the exacct procedure. I'm more concerned about why it didn't work, something somewhere is wrong, and it is this 'unknown' that could cause you major issues.
It is late now, I need to get some sleep so will have to go. Please take my advice, bin this and reinstall with supports help if necessary. I see too many cases like this where some thing is wrong, and what with all good intentions turns out to be an incorrect workaround.
My way, you will end up with a correct, supported system.
Any other way, you might not. I cannot recommend you take that risk with a producion system. If it was only a pure test system, then sure, do whatever you like, it doesnlt matter if a test system breaks, but live production is another matter.
Martin
There are several point to check if the clustered master is configured correctly.
1. Nodes are added in the service group named nbu_group.
2. In bp.conf, CLUSTER_NAME and EMMSERVER are set to VIP hostname, and node names are listed as SERVER.
3. Cluster name and node names are listed in output of nbemmcmd.
4. nodes are listed in /usr/openv/netbackup/bin/cluster/NBU_RSP
### ADDED ###
5. "tpconfig -emm_dev_list" shows cluster name and node names
# tpconfig -emm_dev_list : ============================================================================== NBU Cluster: nbu.domain.com ============================================================================== Master Server: master01.domain.com NetBackup Version: 7.5.0.3(750300) Host OperatingSystem: 16 MachineState: ACTIVE ============================================================================== Master Server: master02.domain.com NetBackup Version: 7.5.0.3(750300) Host OperatingSystem: 16 MachineState: OFFLINE ============================================================================== EMM Server: nbu.domain.comAuthorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
Ok, here are the answers to your 4 points check system.
1. Need to check, will get back in few minutes, not sure how to check this.
2. Yes they are.
3. Yes they are:
Check this emm server output:
Ok, here are the answers to your 4 points check system.
1. Need to check, will get back in few minutes, not sure how to check this.
2. Yes they are.
3. Yes they are:
Check this emm server output:
4. Yes, its same as what you've posted but not with the fqdn.
5. Yes, same output:
Yasuhisa makes a good post, it checks the basic config is correct - i agree 100%, but this does not confirm that all the details are correct in NBDB for example. There are no command to check this, it takes someone who is very very knowledgeable about clusters and the NetBackup NBDB.
OK, my final advice on this.
Log a call, explain what has happened, show them this post and ask for BL or Engineering to confirm that this method is safe.
In a nice way, I don't care if i am wrong, I DO care that you have an installation that is confirmed as correct.
Martin
thanks for you concern Martin.. but what cycle guy has posted. all my output cofirm those, except in the cluster directory, nodes are not mentioned in the fqdn as cycle guy posted.. i understand. i will see if i can log a case.
All configurations seems OK except master02 not being listed in "nbemmcmd -listhosts" in your first post.
I have mistake while editing NBU_RSP example. Nodes are listed by VCS node name, so nodes should be listed in short name - not in FQDN.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
yes i actually manually added nbmaster02 by issue nbemmcmd command and adding as media server. i hope this doesn't make any problem. thank a ton for the cofirmation..
I am late to this party, but I have learned that if installation is done correctly, there is no need to manually add anything via cmd.
You say that the NBU does not cover the VCS part.
That is 100% correct - VCS manuals cover the VCS part.
So, if the 2 nodes exist as a cluster and VCS commands like 'hastatus -sum' and 'gabconfig -a' shows both nodes in the cluster, then NBU can be installed.
One thing that can cause incorrect cluster install/config is when rsh is not configured between cluster nodes. Although VCS can install and config via ssh, NBU cannot. It needs rsh.
There is a TN that explains a workaround: http://www.symantec.com/docs/TECH160242
If your NBU installation log on node 2 does not report successful joining of the cluster and 'hastatus -sum' does not show both nodes in nbu_group, rather start from scratch and know that you have a cluster working 100% correctly from day 1.
One more thing - even if VCS and NBU was installed 100% correctly, nbemmcmd does not show correct info right away.
I noticed this the last time I installed clustered master server in our lab.
VCS shows correct output, but initially NBU showed this:
Both nodes in cluster, active on node1. But look at this:
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts
NBEMMCMD, Version:7.1
The following hosts were found:
server nbumas
cluster nbumas
master mvdb-lnx1
Command completed successfully.
Even offline and online of the service group on node 1 did not fix it.
Only after I failed over to node 2 did nbemmcmd show correct info:
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts
NBEMMCMD, Version:7.1
The following hosts were found:
server nbumas
cluster nbumas
master mvdb-lnx1
master mvdb-lnx2
Command completed successfully.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
Marianne, you're not late actually. the issue is still around. but this other guy is spot on with the mount point issue. he asked me to mount and tailed the log. let me post that here. you're right. nbemmcmd wasn't showing node2 in the beginning. but hastatus has always been since the beginning. i started to worry when i did not see nbemmcmd output with thr other master so i added manually.
now let me post that error which has been identifieed and that guy is spot on.
[root@master01 bin]# ./hastatus attempting to connect.... attempting to connect....connected group resource system message --------------- -------------------- -------------------- -------------------- master01 RUNNING master02 RUNNING ClusterService master01 OFFLINE ClusterService master02 ONLINE ------------------------------------------------------------------------- nbu_group master01 ONLINE nbu_group master02 *FAULTED* OFFLINE webip master01 OFFLINE webip master02 ONLINE csgnic master01 ONLINE ------------------------------------------------------------------------- csgnic master02 ONLINE nbu_nic master01 ONLINE nbu_nic master02 ONLINE nbu_ip master01 ONLINE nbu_ip master02 OFFLINE ------------------------------------------------------------------------- nbu_mount master01 ONLINE nbu_mount master02 *FAULTED* nbu_server master01 ONLINE nbu_server master02 OFFLINE [root@master01 bin]# ./hamsg Mount_A Wed 23 Jan 2013 12:17:36 AM UTC VCS INFO V-16-10031-20507 Mount:Mount:imf_init:successfully initialized the VxAMF Mount Module Wed 23 Jan 2013 12:17:36 AM UTC VCS INFO V-16-2-13805 (imf_init) entry point completed with return status (0) Thu 24 Jan 2013 03:10:05 AM UTC VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Received notification for vxamf-group nbu_mount [root@master01 bin]# tail -20 /var/VRTSvcs/log/engine_A.log 2013/01/24 03:12:08 VCS ERROR V-16-2-13066 (master02) Agent is calling clean for resource(nbu_mount) because the resource is not up even after online completed. 2013/01/24 03:12:09 VCS INFO V-16-2-13068 (master02) Resource(nbu_mount) - clean completed successfully. 2013/01/24 03:12:09 VCS INFO V-16-2-13071 (master02) Resource(nbu_mount): reached OnlineRetryLimit(0). 2013/01/24 03:12:09 VCS ERROR V-16-1-54031 Resource nbu_mount (Owner: Unspecified, Group: nbu_group) is FAULTED on sys master02 2013/01/24 03:12:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource nbu_ip (Owner: Unspecified, Group: nbu_group) on System master02 2013/01/24 03:12:09 VCS INFO V-16-6-15015 (master02) hatrigger:/opt/VRTSvcs/bin/triggers/resfault is not a trigger scripts directory or can no t be executed 2013/01/24 03:12:10 VCS INFO V-16-1-10305 Resource nbu_ip (Owner: Unspecified, Group: nbu_group) is offline on master02 (VCS initiated) 2013/01/24 03:12:10 VCS ERROR V-16-1-10205 Group nbu_group is faulted on system master02 2013/01/24 03:12:10 VCS NOTICE V-16-1-10446 Group nbu_group is offline on system master02 2013/01/24 03:12:10 VCS INFO V-16-1-10493 Evaluating master01 as potential target node for group nbu_group 2013/01/24 03:12:10 VCS INFO V-16-1-10493 Evaluating master02 as potential target node for group nbu_group 2013/01/24 03:12:10 VCS INFO V-16-1-50010 Group nbu_group is online or faulted on system master02 2013/01/24 03:12:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_ip (Owner: Unspecified, Group: nbu_group) on System master01 2013/01/24 03:12:10 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_mount (Owner: Unspecified, Group: nbu_group) on System master01 2013/01/24 03:12:13 VCS INFO V-16-1-10298 Resource nbu_mount (Owner: Unspecified, Group: nbu_group) is online on master01 (VCS initiated) 2013/01/24 03:12:22 VCS INFO V-16-1-10298 Resource nbu_ip (Owner: Unspecified, Group: nbu_group) is online on master01 (VCS initiated) 2013/01/24 03:12:22 VCS NOTICE V-16-1-10301 Initiating Online of Resource nbu_server (Owner: unknown, Group: nbu_group) on System master01 2013/01/24 03:12:42 VCS INFO V-16-1-10298 Resource nbu_server (Owner: unknown, Group: nbu_group) is online on master01 (VCS initiated) 2013/01/24 03:12:42 VCS NOTICE V-16-1-10447 Group nbu_group is online on system master01 2013/01/24 03:12:42 VCS NOTICE V-16-1-10448 Group nbu_group failed over to system master01Unfortunately we need messages in engina_A.log several lines before you pasted.
BTW, does /opt/VRTSnbu directory exist on master02? if not, create it and retry. Before retrying, you need to clear FAULTED flag of nbu_group service group by "hagrp -clear nbu_group".
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
ok, give me 5 mins. i'll post the whole log. and remove the faulted thing too.
here you go with the complete engine log.
yes the directory does exist. here's the output:
Despite you have configured shared disk with VxVM, no DiskGroup and Volume resource exist in nbu_group.
You need to add DiskGroup and Volume resource. Please give me 10 minutes.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
thank you. right on. and just for you information:
[root@master01 opt]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-rootvol 49G 7.8G 39G 17% / /dev/mapper/vg00-tmpvol 496M 263M 208M 56% /tmp /dev/mapper/vg00-homevol 248M 11M 226M 5% /home /dev/mapper/vg00-varvol 50G 7.9G 39G 17% /var /dev/mapper/vg00-crashvol 2.7G 69M 2.5G 3% /var/crash /dev/cciss/c0d0p1 251M 38M 201M 16% /boot tmpfs 16G 0 16G 0% /dev/shm tmpfs 4.0K 0 4.0K 0% /dev/vx /dev/vx/dsk/netbackup_dg/netbackup-dbvol 500G 430M 469G 1% /opt/VRTSnbu[root@master02 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-rootvol 54G 7.7G 43G 16% / /dev/mapper/vg00-varvol 50G 3.6G 44G 8% /var /dev/mapper/vg00-crashvol 2.7G 69M 2.5G 3% /var/crash /dev/mapper/vg00-tmpvol 496M 22M 449M 5% /tmp /dev/mapper/vg00-homevol 248M 11M 226M 5% /home /dev/cciss/c0d0p1 251M 35M 204M 15% /boot tmpfs 16G 0 16G 0% /dev/shmAdd DiskGroup and Volume resource as below. Then retry.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
doing it right now. will rpely in 4-5mins..
error:
This is a notice. Proceed!
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
this is not an error too?
It' my mistake.
Run "hares -add nbu_vol Volume nbu_group" instead.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
just this line am assuming, or if i need to do whole thing over, please let me know. for now just doing the above part..
i think the whole procedure had this crying baby nbu_vol
ddid this, now proceeding to that deletion part..
So run these lines.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
now that went smooth. all followed. plus we don't have to remove any part from the fstab right.. i had cleard the mount from hastatus..
In addition, remove shared disk entry from /etc/vfstab on each nodes. Shared disk must not be mounted in system startup.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
ok, i can remove from the other nodes. but what entry exactly? :(.
Sorry /etc/vfstab does not exist in Linux.
If the line like below exists in /etv/fstab, remove it.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
no that entry does not exist. pasting the contents of that file here below.
also please see above, the procedure you posted to add, it gave several warnings about nbu_vol.
sir, i have done all what has been recommended till now. do we have more things. or i can test the failover and see what errors it give me now?
Yes, you can try to switch nbu_group to master02.
If you got failed again, please post:
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
doing that now. let's see. will be back here in 5 mins..
OMG dude. that worked. am so so thankful to you. really. :).
although i see the tape drive configuration failing. but i'll create a new topic for that. those two totally goes out to you. even in the console i can see master02. thanks a lot..
Just checking in ... seems I missed all the fun.
So, a small config step was found taht was fixable, good. I did an quick look around, and found what Marianne had posted, the full details do not appear until the cluster is first failed over (seems I had forgotten that minor point ...)
So, seems all is good - excellent, I am pleased.
M
Would you like to reply?
Login or Register to post your comment.