Workflow Soluiton

 View Only

A flow in the time-out and escalation service in Workflow projects

  • 1.  A flow in the time-out and escalation service in Workflow projects

    Posted Aug 17, 2011 07:27 AM

    We have encountered a problem with unstable behavior of the service responsible for time-outs and escalations processing.

    Each application includes in web.config file parameter “minEscalationTime” inside <Symantec.workflow> section. In workflow environments, where are published tens of workflow project applications (according to our estimates over 50) having hundreds of active instances, the role of “minEscalationTime” parameter and proper selection its value is crucial. To avoid collisions between massive time-out processing we hashing times defined in “minEscalationTime” parameter for different application e.g. using time slots. But this method is not reliable because of specific way in which time-outs and escalations are processed.

    The unstable behavior of this service occurs when the whole platform is restarted (Symantec Workflow Service and IIS). It seems that after restart processing of time-outs and escalations is performed for all application at the same time. Server load rises dramatically and application logs a lot of following errors:

    System.Runtime.Remoting.RemotingException could not process timeouts and escalations
    System.Runtime.Remoting.RemotingException: Port is Busy: All pipe instances are busy.

    We have noticed that value of the “minEscalationTime” parameter is not used until at least one trigger (message) will be processed. Till then all applications are serviced in short time cycles (from 30 seconds to 1 minute) and they compete for access to Symantec Workflow Service. Depending on number of application and expired triggers waiting for processing this chaos can last tens of minutes to several hours and during this time many applications just are not able to process theirs time triggers.

    Such flow in workflow service is serious problem for most of ours implementations based on large number of users and applications. It looks a bit awkward that “minEscalationTime” parameter is not used from the start but only from the first processed trigger.

    Any idea why it is done this way?
    And how to avoid destructive concurrence after platform restart?