Video Screencast Help

vxconfigd errors every few minutes

Created: 05 May 2013 | 5 comments
yairz's picture

Hi all,

 

Would appreciate help with the following error received on only one of two cluster nodes:

 

V-16-6-16100

)hostname) chkvxconfigd:The VxVM process vxconfigd seems to be un-responsive. Stopping vxnotify process, so that resources get unregistered from AMF monitoring

 

Both servers are running RHEL 6.3 and Storage Foundation HA 6.0.3.

I get the above error on one server every few minutes which causes the monitoring system to send numerous alerts.

I've checked storage connectivity and everything seems to be working properly there, so I don't think it has something to do with that.

 

Any advice would be appreciated.

Thanks.

Yair

Operating Systems:

Comments 5 CommentsJump to latest comment

stinsong's picture

Hi yairz,

Could you past your /etc/vx/dmpevents.log here at the error reporting period ?

It's usually IO issue cause vxconfigd hang. Maybe DG not response then it leads to vxnotify killed.

yairz's picture

Hi,

 

Per your request, I have attached a file containing only the lines in dmpevents file which were logged during one incident.

I appreciate your help.

Thanks.

Yair

AttachmentSize
dmpevents.docx 28.29 KB
rsharma1's picture

Hi Yairz,

                        Could you also share how much time the system takes to return the output of 'vxdctl mode'

And what is the MonitorTimeout value for the resource agent for which you got the AMF error? (MonitorTimeout value of Diskgroup agent perhaps?).

yairz's picture

It takes about half a minute or so to get back to normal state and the MonitorTimeout is set for to the default value of 60.

 

Thanks,

Yair

stinsong's picture

Hi yair,

After reviewing the dmpevents.log, there is IO error on the EMC LUNs but path test ok return message too.

So here is the problem:

r/w IO to the LUNs of EMC got error which cause dmp send scsi inquiry to the path, then get path ok return from the disk array. It could because that LUN from disks cannot proceed IO normally, but the controller of EMC disk array is functionable normally.

So pls check on disk array end if there is any hang issue on disk group or disk warning on any disk. Or you could try to reboot disk array to confirm if there is any unexposed issue on disk array.

Hope this would be helpful.