Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Problem with Application Agent in VCS

Created: 20 Aug 2014 • Updated: 22 Aug 2014 | 8 comments
This issue has been solved. See solution.

Hi

i just want to know why there're some problems when I'm trying to put into my cluster an Ops Center (Oracle not Netbackup).

This is the configuration

group opscenter-sg (
        SystemList = { bck01a = 0, bck01b = 1 }
        Enabled @bck01b = 0
        AutoStartList = { bck01b }
        OnlineRetryLimit = 2
        OnlineRetryInterval = 180
        )

        Application opscenter_app (
                StartProgram = "/opt/sun/xvmoc/bin/ecadm start"
                StopProgram = "/opt/sun/xvmoc/bin/ecadm stop -w"
                MonitorProcesses = { OCLISTENER }
                )

        DiskReservation reservation_dev_sdd (
                Enabled = 0
                Disks = { "/dev/opscenter" }
                )

        IP ops-app-ip (
                Device = "bond0.620"
                Address = "10.10.102.104"
                NetMask = "255.255.255.128"
                )

        IP ops-dmz-ip (
                Device = "bond0.622"
                Address = "10.10.102.244"
                NetMask = "255.255.255.192"
                )

        IP ops-ges-ip (
                Device = "bond0.610"
                Address = "10.10.101.104"
                NetMask = "255.255.255.0"
                )

        IP ops-int-ip (
                Device = "bond0.621"
                Address = "10.10.102.184"
                NetMask = "255.255.255.192"
                )

        Mount opscenter-mount (
                MountPoint = "/var/opt/sun"
                BlockDevice = "/dev/opscenter"
                FSType = ext4
                MountOpt = rw
                FsckOpt = "-y"
                )

        Proxy proxy-bck-app-nic (
                TargetResName = bck-app-nic
                )

        Proxy proxy-bck-dmz-nic (
                TargetResName = bck-dmz-nic
                )

        Proxy proxy-bck-ges-nic (
                TargetResName = bck-ges-nic
                )

        Proxy proxy-bck-int-nic (
                TargetResName = bck-int-nic
                )

        ops-app-ip requires proxy-bck-app-nic
        ops-dmz-ip requires proxy-bck-dmz-nic
        ops-ges-ip requires proxy-bck-ges-nic
        ops-int-ip requires proxy-bck-int-nic
        opscenter-mount requires ops-app-ip
        opscenter-mount requires ops-dmz-ip
        opscenter-mount requires ops-ges-ip
        opscenter-mount requires ops-int-ip
        opscenter_app requires opscenter-mount

This is the error I get all the time and I've tried setting OnLineRetryLimit and OnlineRetryInterval with no luck I'm getting always the same error. The only one resource doesn't get online is opscenter-app, IP, Mount are working fine

 

Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:10:30 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:10:32 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:10:34 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a
Aug 20 12:12:53 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13066 Thread(4146064240) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:53 bck01a Had[5441]: VCS ERROR V-16-2-13066 (bck01a) Agent is calling clean for resource(opscenter_app) because the resource is not up even after online completed.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13068 Thread(4146064240) Resource(opscenter_app) - clean completed successfully.
Aug 20 12:12:54 bck01a AgentFramework[35393]: VCS ERROR V-16-2-13071 Thread(4146064240) Resource(opscenter_app): reached OnlineRetryLimit(0).
Aug 20 12:12:56 bck01a Had[5441]: VCS ERROR V-16-1-54031 Resource opscenter_app (Owner: Unspecified, Group: opscenter-sg) is FAULTED on sys bck01a
Aug 20 12:12:58 bck01a Had[5441]: VCS ERROR V-16-1-10205 Group opscenter-sg is faulted on system bck01a

Thanks for your help solving this issue.

 

 

Operating Systems:

Comments 8 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hi,

What are the exit codes set in the monitor script for opscenter_app resource ? If it is set to 0 (unsuccessful) & 1 (successful), it is incorrect, VCS doesn't understand those exit codes.

You should set 110 (successful) & 100 (unsuccessful) in monitor script.

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
Koven1's picture

Hi Gaurav

I see in ecadm start/stop script there're a lot of exit codes so you're telling me I should replace 100 instead 1 and 110 instead 0.

is there any way to change exit code in VCS? It will be easier for me because there're a lot of lines to change in ecadm script.

 

Thank you!

 

Gaurav Sangamnerkar's picture

Hello,
Exit codes are hard coded to my knowledge and can't be changed, I am afraid you would need to change your script only..

Regarding code, 110 would indicate successful (usually exit 0), and 100 would indicate unsuccessful (exit 1)

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Koven1's picture

OMG Gaurav that's a lot of work, let me try some different configurations in that script but I'm not sure if I can put that application agent into Ops Center.

Last question is: How I'm 100% sure that exit codes is my problem? Symantec has some troubleshooting to try?

Thank you

Gaurav Sangamnerkar's picture

Hi,

I don't really believe its too much of work, all you need to do is find & replace the exit codes. If its a shell/perl script, you can simply grep the exit statements & use a "substitute" function in "sed" to replace the exit codes. OR else you can import the file into a good text editor like "notepad ++" & use the find & replace function to change exit codes. Once done, import back the script into server, move the existing script to a different name & use the modified script as monitor script.

To answer your second point, if VCS has to declare any resource as online, atleast 1 monitor cycle has to run successfully..  so that means if you online the application resource, online script will execute & post online, a monitor script will execute which will declare the resource online. In this case, online script is completing however monitoring is failing to declare resource online as exit codes are unknown to VCS.

Hope that answers

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Koven1's picture

Hi Gaurav

I'm attaching the script i modified, I replaced exit codes ( 0 --> 110 & 1 --> 100) with no luck, I've a Monitor process called "launch" so this is the process that VCS should be monitoring and as you can see there's a cacao process but I don't know how to debug what VCS is doing.

root     13840     1  0 10:14 ?        00:00:00 /opt/sun/cacao2/private/bin/x64/launch -w /var/opt/sun/cacao2/instances/oem-ec -r /var/run/opt/sun/cacao2/instances/oem-ec/run/retries -R /var/run/opt/sun/cacao2/instances/oem-ec/run/cacao_v2.pid -s 1 -U root -G root -L 16384 -A /opt/sun/cacao2/private/bin/proc_analysis -W /var/opt/sun/cacao2/instances/oem-ec -T 300 -P /var/run/opt/sun/cacao2/instances/oem-ec/run/hb.pipe -i /etc/opt/sun/cacao2/instances/oem-ec/security/password -DPATH=/usr/java/x86_64/jdk1.7.0_45/bin:/bin:/usr/bin -DLD_LIBRARY_PATH=/opt/sun/cacao2/share/lib/shared -- /usr/java/x86_64/jdk1.7.0_45/bin/java -Xms200M -Xmx8192M -server -XX:StringTableSize=27001 -XX:PermSize=128m -XX:MaxPermSize=384m -Xss384k -XX:+UseParallelOldGC -XX:SoftRefLRUPolicyMSPerMB=10000 -XX:-UseCompressedOops -Dsun.security.pkcs11.enable-solaris=false -Djava.endorsed.dirs=/opt/sun/cacao2/share/lib/endorsed -Dxvmserver=false -classpath /opt/sun/jdmk/5.1/lib/jdmkrt.jar:/opt/sun/jdmk/5.1/lib/jmxremote_optional.jar:/opt/sun/cacao2/share/lib/cacao_cacao.jar:/opt/sun/cacao2/share/lib/cacao_j5core.jar:/opt/sun/cacao2/private/lib/bcprov-jdk14.jar -Djavax.management.builder.initial=com.sun.jdmk.JdmkMBeanServerBuilder -Dcacao.config.dir=/etc/opt/sun/cacao2/instances/oem-ec com.sun.cacao.container.impl.ContainerPrivate

I really don't know how to cluster this application because nothing is working yet.

Thanks again for your time.

 

AttachmentSize
ecadm.txt 204.89 KB
Koven1's picture

Hi G,

 

Finally solved, it's a really big issue when you challenge with so many exit codes, but you gave me the key since your first post because my problem is exit codes, with no doubt and it was a matter of try and error.

 

Thank you very much Gaurav

Gaurav Sangamnerkar's picture

Glad to have helped :)

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.