how to check vcs working properly with nbu?
i have a new 2 node master that i did the installation for nbu 7.5. we have vcs 5.1 for clustering. and os is rhel6. and 4 other media server running the same os.
cluster name is: nbu
node1: master01
node2: master02
[root@master02]# cat bp.conf SERVER = nbu.domain.com SERVER = media01.domain.com SERVER = media02.domain.com SERVER = media03.domain.com SERVER = media04.domain.com SERVER = master01.domain.com SERVER = master02.domain.com CLIENT_NAME = master02.domaincom CLUSTER_NAME = nbu.domain.com CONNECT_OPTIONS = localhost 1 0 2 USE_VXSS = PROHIBITED VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY EMMSERVER = nbu.domain.com HOST_CACHE_TTL = 3600 VXDBMS_NB_DATA = /opt/VRTSnbu/db/data KMS_DIR = /opt/VRTSnbu/kms TELEMETRY_UPLOAD = NO
the problem is: i do not see node2 of the master listed in the nbemmcmd.
[root@master02]# ./nbemmcmd -listhosts NBEMMCMD, Version: 7.5 The following hosts were found: server nbu.domain.com cluster nbu.domain.com master master01.domain.com Command completed successfully.
the other thing i did was the installation of media servers first. i thought adding via nbemmcmd as media server would be enough or i need to reinstall since the new master server is ready now? as of now i do not see any media servers in any server's nbemmcmd.
[root@master02 bin]# ./hastatus -summary -- SYSTEM STATE -- System State Frozen A master01 RUNNING 0 A master02 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B ClusterService master01 Y N OFFLINE B ClusterService master02 Y N ONLINE B nbu_group master01 Y N ONLINE B nbu_group master02 Y N OFFLINE
moreover, how do i verify that this clustered enviroment is all set and ready to work? i mean i was thinking to check the failover, i can initiate a backup, and then shutdown the services on 1 node, and then see if the backups keep running. is this the only check?
do you need for information about anything? this system hasnt gone production yet, its all setting up new. and we're going to migrate the poclicy stuff after we are done.
Comments 13 Comments • Jump to latest comment
Jobs currently running to your media servers should restart once the Master server is back online.
Here is an excerpt from the NBU Cluster guide for 7.5:
"When a failover occurs, the backup jobs that were running are rescheduled with
the normal NetBackup retry logic for a failed backup. The NetBackup services are started on another node and the backup processing resumes."
Please ensure that the following file is identical on both hosts:
/usr/openv/netbackup/bin/cluster/NBU_RSP
A simple /opt/openv/netbackup/bin/bpps -a on the active node will report all the online services.
To test the failover, simply run the following command:
#> hagrp -switch nbu_group -to master02
#> hastatus
Once the Service Group "nbu_group" is online on master02 then you can check the status of your pending jobs.
Can you post the output from #> nbemmcmd -listhosts -verbose
Keep in mind that the NBU Master services are only active on one node at a time, so my susepciion is that you will see master02 in the list once you fail over.
Hope this helps.
Joe D
i did that, and the failover failed? what does this output mean now?
[root@master01 bin]# ./hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A master01 RUNNING 0
A master02 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService master01 Y N OFFLINE
B ClusterService master02 Y N ONLINE
B nbu_group master01 Y N ONLINE
B nbu_group master02 Y N OFFLINE|FAULTED
-- RESOURCES FAILED
-- Group Type Resource System
D nbu_group Mount nbu_mount master02
[root@master01 bin]#
i checked both cluster files, they both are identical. here is the verbose output.
[root@master02 admincmd]# ./nbemmcmd -listhosts -verbose
NBEMMCMD, Version: 7.5
The following hosts were found:
nbu.domain.com
MachineName = "nbu.domain.com"
FQName = "nbu.domain.com"
MachineDescription = ""
MachineNbuType = server (6)
nbu.domain.com
MachineName = ""nbu.domain.com
FQName = "nbu.domain.com"
MachineDescription = ""
MachineNbuType = cluster (5)
NetBackupVersion = 7.5.0.0 (750000)
Active Node Name = "master01.domain.com"
master01.domain.com
ClusterName = "nbu.domain.com"
MachineName = "master01.domain.com"
FQName = "master01.domain.com"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x77
MachineNbuType = master (3)
MachineState = active for tape and disk jobs (14)
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media01.domain.com
ClusterName = ""
MachineName = "media01.domain.com"
FQName = "media01.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media02.domain.com
ClusterName = ""
MachineName = "media02.domain.com"
FQName = "media02.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media03.domain.com
ClusterName = ""
MachineName = "media03.domain.com"
FQName = "media03.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x10
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
media04.domain.com
ClusterName = ""
MachineName = "media04.domain.com"
FQName = "media04.domain.com"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0
MachineNbuType = media (1)
MachineState = active for disk jobs (12)
MasterServerName = "nbu.domain.com"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
master02.domain.com
ClusterName = "nbu.domain.com"
MachineName = "master02.domain.com"
FQName = "master02.domain.com"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0
MachineNbuType = master (3)
MachineState = active for disk jobs (12)
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = linux (16)
ScanAbility = 5
Command completed successfully.
master02 now appears in the output from the nbemmcmd command. If you look at the output of the hastatus, the nbu_mount command failed. Dollars to donuts the mount point directory doesn't exist on master02, thus the file system cannot be mounted.
Please run the following commands:
#> hamsg Mount_A
#> tail -20 /var/VRTSvcs/log/engine_A.log
These should tell us/confirm why the Mount failed.
Check the Mount Point Attribute
#>hares -display nbu_mount -attribute MountPoint
Verify that this path exists on both nodes. Create the directory if it doesn't. You will then need to clear the resource fault.
#> hares -clear nbu_mount -sys master02
#> hares -online nbu_mount -sys master02
#> hastatus
If everything looks good then you can bring the rest of the service group online.
#> hagrp -online nbu_group -sys master02
Joe D
which path? /opt/VRTSnbu/ ?
Let me post here what exist on both.
i am waiting for you to see the tail of the log and then i would clear (last two comands you mentioned, clear and bring it online). in the meantime i ran hastatus. and this is the output.
[root@master01 bin]# ./hastatus attempting to connect.... attempting to connect....connected group resource system message --------------- -------------------- -------------------- -------------------- master01 RUNNING master02 RUNNING ClusterService master01 OFFLINE ClusterService master02 ONLINE ------------------------------------------------------------------------- nbu_group master01 ONLINE nbu_group master02 *FAULTED* OFFLINE webip master01 OFFLINE webip master02 ONLINE csgnic master01 ONLINE ------------------------------------------------------------------------- csgnic master02 ONLINE nbu_nic master01 ONLINE nbu_nic master02 ONLINE nbu_ip master01 ONLINE nbu_ip master02 OFFLINE ------------------------------------------------------------------------- nbu_mount master01 ONLINE nbu_mount master02 *FAULTED* nbu_server master01 ONLINE nbu_server master02 OFFLINEafter running this command. i have to ctrol^C to get out of it, its either taking too long for my patcience or its hanged..
posting the complete engine log for you..
thank a lot. this post was spot on to fix my issue. thanks.. marked another post of yours as solution since that was relevant to topic. :)
Please tell us more about the mount resource. What type of volume management? VxVM or LVM?
Which filesystem type?
Have you ever tested if volume can be mounted (outside of VCS) at OS level on node 2?
*** EDIT ****
OK - I can see in your NBU post that Yasuhisa managed to help you create all resources from scratch and that the Service Group is now configured correctly.
Correctly completed worksheet and resources verified at OS level is key to successful NBU clustered install....
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
its VxVM i suppose, and not LVM for sure. no i havent tested that, neighter i know how to do it. :S.
Would you like to reply?
Login or Register to post your comment.