Video Screencast Help
Give us your opinion and win with Symantec! Please help us by taking this survey to tell us about your experience with Symantec Connect, so that we can continue to grow and improve.  Take the survey.

Agent failed in VCS

Created: 14 Mar 2014 • Updated: 28 Apr 2014 | 4 comments
This issue has been solved. See solution.

Agent are in failed status. Below are the messages in engineA.log file. Please let me knwo the cause of this issue

 
VCS WARNING V-16-1-53025 Agent Script has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/Script/ScriptAgent please check file

VCS WARNING V-16-1-53025 Agent NIC has faulted; ipm connection was lost; restarting the agent
 VCS ERROR V-16-1-10008 Agent NIC has faulted 6 times since  

VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/NIC/NICAgent please check file
 VCS WARNING V-16-10001-4028 (unix) IP:Unix-G1-IP:monitor:Empty NetMask is supplied, default netmask will be used.

VCS WARNING V-16-1-10023 Agent DiskGroup not sending alive messages since 

VCS WARNING V-16-1-53025 Agent DiskGroup has faulted; ipm connection was lost; restarting the agent
VCS ERROR V-16-1-10015 Cannot start /opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent please check file

Operating Systems:

Comments 4 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Above log shows that all the agents are having issue which is giving a different indication ..

1. Either HAD process is hung or unresponsive OR

2 System itself is unresponsive which means that HAD process is not getting enough resources to communicate to the agents & hence all agents are complaining..

I would suggest to run OS utilities like "Sar" "prstat" or "top" to find what is happening with system performance ..

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
stinsong's picture

There is an interprocess communication between the VCS agents and the had daemon. If this communication is disrupted (due to system load), the agent will fault, and had will restart the agent.

Pls also reference to http://www.symantec.com/docs/TECH155691

anand_raj's picture

Do you have this issue with some agents or all agents? Does this issue happen on one node in the cluster or all nodes? Also, does this issue happen during certain times of the day - Like when a backup is running etc? These details would help in troubleshooting this issue. 

mokkan's picture

As Gaurva said, it looks like performance issue on the system and none of the agents are communicating with HAD.

Have you tried to stop and start the agent? If you want, you can freeze  the SGs, you can manually stop and start the sevice and see how it works.

hagent -force -stop AgentName -sys Name

hagent -start AGENT-sys NodeName

I had similar issue for NIC Agent and stoped and start worked fine.