Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Any suggestion if Notifier resource goes offline

Created: 01 Nov 2012 • Updated: 02 Dec 2012 | 28 comments
Zahid.Haseeb's picture
This issue has been solved. See solution.

Environment

OS = Solaris 10

HA/VCS = 6.0

Query

Any suggestion if Notifier Service Group goes offline but Application Service Group is online and working fine. Any tip so that we can aware if the Service Group get faulted/offline in future and we will not be able to get the notification of this fault.

Discussion Filed Under:

Comments 28 CommentsJump to latest comment

Marianne's picture

If Notifier is clustered it can be managed like any other resource.

When faulted, it will failover to other node.

You can increase RestartLimit. Will restart rather than failing over.

I have honestly never seen notifier failing and cannot think of any reason why it would...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Zahid.Haseeb's picture

Thanks for your kind input. This is the same answer I expected on CONNECT :) hmmm, So there is no other way to think about

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

Note the "had" daemons stores the notifications to be sent to notifier, so it notifer is down at the time of the fault, when it comes up, it will be sent the stored notifications from "had" and then pass on as you have configured the notifier and so, as Marianne says, as Notifier should be clustered, this should be ok.

You don't have to use notifier - you can use 3rd party software to trail engine log and send you messages based on this, or you can use trigger scripts - usually Preonline and resfault (if you look at sample Preonline trigger script then I think this has an example to send an email when it is invoked).

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
arangari's picture

the 'HAD' will store upto 30 notifications by default. we can increase it. These 30 notifications are maintain as circular FIFO list.  Once notifier reads it from 'HAD', the entry is deleted. 

One can use the triggers to send trap/email directly instead of using the notifier; however it can provide you information only in the events the triggers are invoked. There are generally some more instances when useful notifications are sent.

Recommendation: Use the notifier with clustered configuration. Normally it is part of 'ClusterService' service group which gets special preference within VCS in terms of the failover etc. Increase the notification limits if required. 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

Zahid.Haseeb's picture

Thanks to all of you. Suppose if Notifier resource goes down and Application Service Group is working fine, in that situation Client dont want to get bother and open the Java Console(after some time interval) to verify that really the Notifier resource is online or not.

What I feel that we can use trap/SNMP parallel with notifier resource.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Marianne's picture

Suppose if Notifier resource goes down....

  1.  Cluster Notifier resource (add to ClusterService sg)
  2.  Increase RestartLimit

So, suppose Notifier goes down:

  1. It will restart on the active node
  2. If restart fails, it will failover

SNMP is part of Notifier resource. See NotifierMngr agent in Bundled Agent Guide.

You trust VCS with your Critical Application. Why not trust VCS to look after Notifier?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mikebounds's picture

Suppose if Notifier resource goes down and Application Service Group is working fine, in that situation Client dont want to get bother and open the Java Console(after some time interval) to verify that really the Notifier resource is online or not.

I agree with Marianne that "You trust VCS with your Critical Application. Why not trust VCS to look after Notifier?", but one problem I find with the notifier resource is that it does not tell you when ANY service group comes back up, unless I think you select severity "INFO", so if you have WARNING or ERROR, then you get a notification to tell you service group is down, but not that it is back up (this was the case in 5.0, not sure if this has changed).  Now if the service group doesn't come back up, then you should get an error message saying "service group has nowhere to fail", but you have to wait until all timeouts and restarts are exhausted before you get this, so I would rather know as soon as service is back online, rather than wait 10 - 15 mins and think, I have no more notifications, so service must be back online by now!  

Also some customers want to know where it has failed to, so a message saying "Service X is back up on system Y" would be useful and this is where you can use triggers in conjunction with notifier to tell you this and help you with your particular concern.

To do this use the PreOnline trigger which is passed the reason why it is being called, so I generally code:

IF reason is FAULT, then send email saying something like "Service group X is back up on system Y"

Only having message sent if reason is FAULT means you won't get messages when onlining the service group manually.

You can also use resfault trigger to send an email when a resource faults.

So using these triggers you will have an independent mechanism to tell you when resources (such as Notifier resource), fault and come back online.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Zahid.Haseeb's picture

The matter is not of trust. We do trust thats why we are selling it and not other solution :) .. The matter is as I mentioned that for the lack of watching cluster SG , the offline of SG may cause ignoring of Application SG faults and as far as the Application SG, we can feel that another SG already monitoring the behaviour of our App SG.

Thanks all for their kind replies

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

Triggers will definately sort your issue of knowing if Application service group faults while Notifier resource is down - so is this what you are looking for and if not, what are you expecting to happen?

As Amit says, you can't use triggers for all notifications, as there are other issues like loss of heartbeats, "resource in admin wait", concurrency violations where there are some other triggers for most of these events, but not all, but at least resfault and preonline trigger will tell you notifier is down so that you can rectify this before you get any issues for which there aren't triggers for.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

arangari's picture

reading through - i am lost in understanding the concern?

is the concern that notifier resource faults? or while it is faulted, the fault of other resource is not reported?

The notifier resource is normally configured under ClusterService sg, hence gets more preference. 

The fault of other resources in the mean time are captured in the circular queue and whenever notifier comes back, it reports these fault. It also includes the fault of notifier resource itself. 

I am not able to understand the concern with respect to Application SG and notifier resource. 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

Zahid.Haseeb's picture

Hello Arangari

Let me try to make the issue more clear. We have a Cluster Service Group configured with Notifier resource. When any  resource from the Application Service Group get faulted we are able to get the emails regarding the Application Service Group is faulted. Till here every thing is fine.

Now suppose the Cluster Service Group get faulted and remain faulted till so long OR someone Offline the Cluster Service Group and at this time(while the Cluster Service Group is Faulted) if my any Critical Resource from Application Service Group get faulted I will not be notified.

So what will be your suggesstion now please.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

arangari's picture

Whatever process we develop to get the notification from VCS, when the process is down, the notifications will not be received during that period. 

If ClusterService group is faulted and remains faulted for long time - then yes, there is no way to know about any faults in Application group through notifications. However these notifications will be available as soon as the ClusterService group is online (i.e notifier is online), one would receive the notification. 

About bringing ClusterService offline, it can't be offline-d unless '-force' is used. This does mean that user is certain what he is planning. 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

Zahid.Haseeb's picture

Thanks Arangari for your kind response.

But what about your idea regartding RESFAULT or RESOURCESTATECHANGE trigger ? Will this play any role in my query ?

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

arangari's picture

If the detection of the fault is only need during the notifier is down, then one can certainly use the triggers to send emails and/or SNMP trap directly. One can make sure that if notifier is configured and up, then trigger takes no action on sending the notification. 

Please note that you will need to  install appropriate tools to send emails/snmp traps. Also, the triggers are invoked on the node where event has occurred, hence you may be at a risk of loosing the event in case node goes down before the trigger is executed. 

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

Marianne's picture

Day 5 of trying to put your mind at ease....

I am sure if read through all the advice given thusfar and follow it, you will be able to sleep peacefully.

In summary:

1. Configure RestartLimit on Notifier resource
2. Ensure that failover node is available.
3. Configure trigger scrips
4. Use 3rd party software to trail engine log and/or messages files

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

arangari's picture

very well captured...

Thanks and Warm Regards,

Amit Rangari

If this post helped you resolving the issue, please mark it as solution. _____________________________________________________________________________

mikebounds's picture

Since this thread is still going on and I am not sure what extra info you are looking for - here is a summary of what I have already posted - please clarify if there is anything else you are looking for:

  1. VCS provides redundancy for Notifier by putting it in ClusterService group where it can be restarted locally or failed over to other node(s)
  2. If you feel that redundancy for Notifier is not enough, then you can use alternative additional notification as a back up - this can be 3rd party software to trail engine log and send you messages based on this, or you can use trigger scripts - usually Preonline and resfault 
  3. Using Preonline and Resfault will give you the basic notification of Application service group faults (regardless of Notifier resource is up or down), if you want more than this there are some other triggers for most events, but not all
     

Some other stuff to add:

  1. I would be careful about using resstatechange trigger as this will be invoked a lot - if you switch a service group it will be called for EVERY resource as it offlines and then again for every resource when it onlines again - this consumes system resource and can get quite bad if you have lots of service groups.
  2. UNIX generally has command line tools built in to send mail, but I am not sure about Windows
  3. Amit's comment "triggers are invoked on the node where event has occurred, hence you may be at a risk of loosing the event in case node goes down before the trigger is executed" is for a sequence of events with very small odds - like notifier fails on both nodes, then a resource faults, then the node the resource faults on goes down within a second or 2 before resfault gets calls of finishes - even then, the preonline on the other server will let you know that the node went down - it just won't tell you that a resource failed first, which is not that relevent since the node died.
     

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

mikebounds's picture

I see Marianne has said the same while I was writing my post - Zahid, if using triggers or log trailing is not enough for backup for notifier failing on all nodes, please let us know what else you would like to happen

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Zahid.Haseeb's picture

Mike I did not say that the triggers and logs are not enough for my. ? I really feel now that I will be using the Triggers as a backup of Notifier resource.

As per your point 1 Mike:    I am going to configure the trigger for Notifier resource only

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Zahid.Haseeb's picture

@ arangari

One can make sure that if notifier is configured and up, then trigger takes no action on sending the notification.

Definately I dont need to get notification when Notifier resource is UP and running :)

Please note that you will need to  install appropriate tools to send emails

What types of tools you are talking about ?

(In my environment the SMTP Server already configured which is being used by Notifier Resource.)

hence you may be at a risk of loosing the event in case node goes down before the trigger is executed.

At this point my Notifier Resource gets failover to another Node So I am not bother for this. Just I am doing all this to make sure if the Notifier is Offline/Faulted and Nodes/Application Service Group is running fine.

@ Marianne

Thanks for your effort regarding my query. regarding Point 1 and 2 , I am aware about it. But I am thinking beyond these points thats why I am totally asking things instead of Point 1 and 2 if you could understand please. Would you kindly name for any 3rd party software please as per point 4.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

For UNIX, you can use mail or mailx to send email - example using mailx:

mailx -s "Resource X failed on system Y" zahid@wordpress.com < /dev/null

This send a message with a subject and no message body

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Marianne's picture

Day 6...

In my 12 years of installing and supporting clusters at various customers in South Africa, I have never seen clustered notifier resource just failing 'out of the blue' or staying down for extended period or somebody offlining resource or Service Group for whatever reason or received a request to externally manage/monitor notifier resource.

I have seen notifier resource failing a couple of years ago while we were busy with initial configuration. This was a bug in this VCS version on HP-UX where the resource failed when more than one email recipient was added. Because we were busy with configuration, we immediately noticed the problem, opened a Support call, and received a hotfix.

I am sure that arangari and Mike have a lot more experience than I do - maybe they can tell us if they have ever seen any 'issues' with notifier.

IMHO - an anthill, not a mountain....

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Zahid.Haseeb's picture

Marianne I do respect the comments of Arangari,Mike and yours. Asking queries doesnot mean that I am rejecting the words of others :)

You all are assets of CONNECT forum indeed :)

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Hywel Mallett's picture

Presumably you have some form of monitoring in place, such as nagios, for stuff like disk capacity monitoring.

You could use that to call hares querying the state of the notifier resource, then alert if it's not online.

Zahid.Haseeb's picture

Thanks all for kind contribution on my query. My problem is resolved. I used ssmtp instead if mailx.

Below are the links in which I shared my insight:

https://www-secure.symantec.com/connect/blogs/howt...

OR

http://zahidhaseeb.wordpress.com/2012/11/26/howto-...

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

I see your solution was to use trigger scripts in addition to using Notifier resource then, but I would use the resfault trigger, rather than postoffline because:

  1. You will get notification earlier as you don't have to wait for all other resources to offline
  2. You will get more information in the notification as the resfault trigger is passed the name of  resource that failed as an arg which you can add to your mail
  3. If the faulted service group has problems offlining as other resources can't offline them the postoffline will never be called

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Zahid.Haseeb's picture

Ahh gr8..

Hmm so I can only use the Resfault trigger instead of using Postoffline thats it ?

( But previously I got stuck in troubleshooting it why the email was not reaching to the particular email address.. Now can use any Trigger which will be more fruitfull. Like as you said I can use Resfault )

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

mikebounds's picture

You can use postoffline, but I would use resfault and then I don't see the need for postoffline and you can also use preonline and there other triggers as well.  

What email tool you can use to email your notification is down to your environment, but this non-VCS related.  I always use halog in my triggers so that messages are sent to log as well as emailing them so then you know if problem is trigger or email tool and I prefix all messages by the name of the trigger which makes it very easy to search for them in the engine log - example:

halog -add E "(resfault) Resource $res has failed on system $sys"
mailx -s "(resfault) Resource $res has failed on system $sys" mike@symantec.com < /dev/null

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below