Video Screencast Help

Node hungs after heartbeat link failure

Created: 21 Dec 2012 • Updated: 15 Jan 2013 | 3 comments
omiot's picture
This issue has been solved. See solution.

Hi

I've strange situation in my Lab. I'm trying to test failure secenario for SFHA. I'm runnig SFHA 5.1 SP1 RP3 on RHEL 5.5. When I disconnect all heartbeat one node lost the race and hungs. I've read in admin guide that panicked node restarts and try to connect to Cluster again, but my node hangs.

 

 

Regards.

Pawel

Comments 3 CommentsJump to latest comment

Marianne's picture

You need to double-check IO fencing config.

Look at the error message:

Could not eject node 0 from disk xxxxxx since keys of node 1 are not registered with it.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

omiot's picture

Hi,

I try to investigate this problem, but when I gracefull shutdown cluster and vxfen all keys are removed.

When I start vxfen and cluster service all nodes register their keys.

But when I pull down the llt cable I'm getting message: Could not eject node 0 from disk xxxxxx since keys of node 1 are not registered with it.

frans.postma's picture

Veritas only trigger the panic itself, how the system handles that is Linux/OS specific. Check kernel setting:

# sysctl kernel.panic

If value is 0 the system will NOT reboot on panic, it value > 0 it will wait the $value seconds after a panic and reboot the system then.

SOLUTION