Workflow and ServiceDesk Community

View Only

ServiceDesk 7.1 Deadlocks and SLA’s

Recommend

Apr 25, 2013 01:27 PM

Migration User

SLA levels and design have been an area of Servicedesk that has improved over various releases. As development has improved the product, those improvements come by recognizing where certain functions could have been improved. This article describes one of those. Please note that this applies only to ServiceDesk 7.1 Sp1 and Sp2. Version 7.5 handles SLA much differently and is not subject to this problem.

It was discovered on a 7.1 Sp1 Servicedesk system, that there were occasional deadlocks occurring in the Incident Management log files. The log entries would have a long stack trace like this one below:

Exception in Component: Initial Diagnosis Exception Message: could not save object Type: ServiceDeskIncident Exception StackTrace: System.Exception: could not save object Type: ServiceDeskIncident ---> System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable. at System.Data.SqlClient.SqlTransaction.ZombieCheck() at System.Data.SqlClient.SqlTransaction.Commit() at LogicBase.Core.Messaging.Implementations.RelationalMapperExchange.Commit() at LogicBase.Core.Messaging.Implementations.RelationalMapperExchange.SaveObjectRunSQL(List`1 statements) at LogicBase.Core.Messaging.Implementations.RelationalMapperExchange.SaveObject(Object o, Boolean retryDeadlock) --- End of inner exception stack trace --- at LogicBase.Core.Messaging.Implementations.RelationalMapperExchange.SaveObject(Object o, Boolean retryDeadlock) at LogicBase.Core.Messaging.Implementations.RelationalMapperExchange.Put(Message message) at LogicBase.Core.Data.ExternalReference.Commit(IData d) at LogicBase.Core.Data.OrchestrationData.Commit() at LogicBase.Core.Data.OrchestrationData.GetObjectData(SerializationInfo info, StreamingContext context) at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(Object obj, ISurrogateSelector surrogateSelector, StreamingContext context, SerObjectInfoInit serObjectInfoInit, IFormatterConverter converter, ObjectWriter objectWriter) at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(Object graph, Header[] inHeaders, __BinaryWriter serWriter, Boolean fCheck) at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph, Header[] headers, Boolean fCheck) at LogicBase.Core.Messaging.Message.CreatePayloadData() at LogicBase.Core.Messaging.Message.GetObjectData(SerializationInfo info, StreamingContext context) at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(Object obj, ISurrogateSelector surrogateSelector, StreamingContext context, SerObjectInfoInit serObjectInfoInit, IFormatterConverter converter, ObjectWriter objectWriter) at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(Object graph, Header[] inHeaders, __BinaryWriter serWriter, Boolean fCheck) at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph, Header[] headers, Boolean fCheck) at LogicBase.Core.Messaging.SqlExchange.SerializeMessage(Message message) at LogicBase.Core.Messaging.SqlExchange.SaveMessages(SqlTransaction transaction, Message[] messages) at LogicBase.Core.Messaging.SqlExchange.DoPut(Message message) at LogicBase.Core.Messaging.AbstractExchange.Put(Message message) at LogicBase.Core.Messaging.AbstractDeliveryFilter.Put(Message message) at LogicBase.Components.Default.ExchangeAdapters.LogicBaseExchangeWorkQueue.Store(String serviceID, WorkQueueItem item) at LogicBase.Components.Default.ExchangeAdapters.LogicBaseExchangeWorkQueue.AddWorkQueueItem(String serviceID, WorkQueueItem item) at LogicBase.Core.Models.Workflow.WorkflowDelegate.Execute(IData data, IOrchestrationComponent comp, String& outputPath, IExecutionEngine engine, TLExecutionContext context) at LogicBase.Core.ExecutionEngine.AbstractExecutionEngine.RunComponent(TLExecutionContext context, IData data, IOrchestrationComponent comp)

Determining the Problem and the Impact

This was causing some significant slowdown as the SQL server was timing out processing these errors. The log file entries began pointing to the database, and specifically the SLALevelManagement table. Once it was discovered that there were only 7 levels of SLA configured, it was a surprise to find that each incident had 14 levels stored in the database. We however, knew we had made some progress towards finding out what the problem was.

The SQL query to discover how many levels are stored with each incident is here:

SELECT sla_id, p.ReportProcessID, COUNT(*) '# of SLA Levels'
       FROM  ServiceDeskIncidentManagement im
       JOIN ReportProcess p ON
              p.SessionID = im.Session_ID
       JOIN   ServiceDeskSLALevelManagement SLAM ON
                        SLAM.parent_sla_id = im.sla_id
       WHERE p.ServiceID = 'INCIDENT'
             -- AND p.Result LIKE '%Closed%'
             -- AND SLAM.status = 'NotStarted'
       GROUP BY sla_id, p.ReportProcessID
       ORDER by 2

The problem that was causing the deadlocks, came as a result of having to regularly update the database with the duplicate SLA level information in the ServicedeskSLALevelManagement database table. Since, for this customer, each incident should only have had 7, but had 14 stored in the database, each time an incident was opened and a “SaveExternalData” component in the process was run, 14 writes for each incident were made to the table. If you extrapolate that out to more than 25,000 incidents, this caused an enormous hit in performance to the SQL server, and burdening the SQL server. Having to write that much information adversely affected the SQL server’s capabilities.

Remediation and Prevention

The remediation for this is a two part process. The first part is stopping the duplicate levels from occurring and the second is cleaning up the database. It is recommended that you review all procedures and if you are uncomfortable with any step, contact Symantec Support for assistance in the remediation of this problem.

First – Stopping the Duplicate SLA Level Creation:

The duplicate SLA levels are coming in during the Initial Routing phase of the incident creation. During the initial creation of the Incident, in the SD.IncidentManagement model, in the Initial Diagnosis Submodel, in the embedded model “Run Rules: Initial Routing”, there is a SaveExternalData Component. While the data has been written once already, it re-reads the data in memory and the SaveExternalData component writes it again. It appears that the in-process data for SLA needed to be refreshed, and was not assured that it was being correctly written and therefore duplicated the data. Resolving this is pretty easy and involves adding a single component.

In the SD.IncidentManagement model, in the Initial Diagnosis Submodel, In the embedded model “Run Rules: Initial Routing”, there is a SaveExternalData Component. Right after that Component, add a “GetIncidentbyTrackingID” Component. Configure it like this:

Inputs:
ExchangeConnectionString: [ProfileProperties].service_desk_settings_service_desk_exchange] (use ellipsis to configure from data)
TrackingID: [Incident.TrackingID] (use ellipsis to configure from data)

Outputs:
ResultVariable: Incident (use ellipsis to select from data)

The model should look like this one below. Save and publish when complete.

Second - Cleaning the Database of Duplicates:

This is a two part process of:

1. Cleaning up the old unused levels – You can use the SQL below to clean up all old, unused levels from closed Incidents. The only thing to remember is to give the @PurgeOlderThan variable a number that meets your needs . I ran it on a test copy of your database and found that 270 days was good. You can adjust as necessary. RUN THIS FIRST! (Please note that there is a revised script (SLA PURGE BY DATE) at the bottom of the article that performs the same function, but leaves only active SLA’s levels in place. You can use this one as well)

DECLARE @sla_id NVARCHAR(256)
DECLARE @PurgeOlderThan INT

CREATE TABLE #SLAPurge (
       Process_ID NVARCHAR(256)
)

--Change value to specify how many days the incident must be closed before the SLA information will be purged.
SET @PurgeOlderThan = 365 

DECLARE curSLAPurge CURSOR FOR
       SELECT sla_id
       FROM  ServiceDeskIncidentManagement im
       JOIN ReportProcess p ON
              p.SessionID = im.Session_ID
       WHERE DATEDIFF(D, p.ProcessEnded, GETDATE()) > @PurgeOlderThan
              AND p.ServiceID = 'INCIDENT'
              AND p.Result LIKE '%Closed%'

OPEN curSLAPurge
FETCH NEXT FROM curSLAPurge INTO @sla_id

WHILE (@@Fetch_status=0) 
BEGIN
       DELETE FROM ServiceDeskSLALevelManagement 
       WHERE parent_sla_id = @sla_id
       
       DELETE FROM ServiceDeskSLA 
       WHERE ServiceDeskSLA_id = @sla_id
              
       INSERT INTO #SLAPurge (Process_ID)
              SELECT Process_ID FROM ServiceDeskIncidentManagement 
              WHERE sla_id = @sla_id
       
       FETCH NEXT FROM curSLAPurge INTO @sla_id
END

CLOSE curSLAPurge
DEALLOCATE curSLAPurge

SELECT Process_ID 'SLA Purged IDs' FROM #SLAPurge
ORDER BY 1

--DROP TABLE #SLAPurge

2. Now you should only be left with duplicates for open incidents. The second SQL QUERY will clean up all of the duplicates. It is recommended that you test well prior to final implementation.

DECLARE _Cursor CURSOR FAST_FORWARD READ_ONLY FOR 

select parent_sla_id, level_name
from dbo.ServiceDeskSLALevelManagement sla
group by parent_sla_id, level_name
having count(*) > 1

declare @parent_sla_id varchar(50)
declare @level_name nvarchar(254)
declare @sla_id varchar(50)

OPEN _Cursor

FETCH NEXT FROM _Cursor INTO @parent_sla_id, @level_name
	WHILE @@FETCH_STATUS = 0
	BEGIN	
		
		select top 1 @sla_id = ServiceDeskSLALevelManagement_id
		from dbo.ServiceDeskSLALevelManagement
		where @parent_sla_id = parent_sla_id and @level_name = level_name
		
		delete  dbo.ServiceDeskSLALevelManagement
		where ServiceDeskSLALevelManagement_id = @sla_id
		--print @sla_id
	
		FETCH NEXT FROM _Cursor INTO @parent_sla_id, @level_name
	END

CLOSE _Cursor
DEALLOCATE _Cursor

Of course, before putting any of this into place, taking a backup of the system and database is recommended. You might be able to remove enough pressure from the system by simply removing all of the extra SLA information from the closed incidents, and implementing the new change to keep the levels added to the correct amount of 7. So running the removal script for the active incidents is optional.

Optional SLA PURGE BY DATE Script:

Here is the revised script. This will leave in place the used active SLA levels for reporting purposes.

DECLARE @sla_id NVARCHAR(256)
DECLARE @PurgeOlderThan INT

CREATE TABLE #SLAPurge (
       Process_ID NVARCHAR(256)
)

--Change value to specify how many days the incident must be closed before the SLA information will be purged.
SET @PurgeOlderThan = 10

DECLARE curSLAPurge CURSOR FOR
       SELECT sla_id
       FROM  ServiceDeskIncidentManagement im
       JOIN ReportProcess p ON
              p.SessionID = im.Session_ID
       JOIN   ServiceDeskSLALevelManagement SLAM ON
              SLAM.parent_sla_id = im.sla_id
       WHERE DATEDIFF(D, p.ProcessEnded, GETDATE()) > @PurgeOlderThan
              AND p.ServiceID = 'INCIDENT'
              AND p.Result LIKE '%Closed%'
              AND SLAM.status = 'NotStarted'
             
OPEN curSLAPurge
FETCH NEXT FROM curSLAPurge INTO @sla_id

WHILE (@@Fetch_status=0) 
BEGIN
       DELETE FROM ServiceDeskSLALevelManagement
       WHERE parent_sla_id = @sla_id
       AND status = 'NotStarted'
         
  
       DELETE FROM ServiceDeskSLA 
       WHERE ServiceDeskSLA_id = @sla_id
              
       INSERT INTO #SLAPurge (Process_ID)
              SELECT Process_ID FROM ServiceDeskIncidentManagement 
              WHERE sla_id = @sla_id
       
       FETCH NEXT FROM curSLAPurge INTO @sla_id
END

CLOSE curSLAPurge
DEALLOCATE curSLAPurge

SELECT Process_ID 'SLA Purged IDs' FROM #SLAPurge
ORDER BY 1

DROP TABLE #SLAPurge

Conclusion

Refreshing the Incident data seems to have resolved the duplicate problem, and thus some of the performance issues improved when this change was made. Of course, removing unnecessary information from the database always is good, but, in this case, especially useful since the SLALevelManagement table gets a lot of traffic.

This should have a significant impact on performance of your Servicedesk application.

Statistics

0 Favorited

0 Views

1 Files

0 Shares

0 Downloads

Attachment(s)

ServiceDesk 7 Deadlocks and SLAs.docx 105 KB 1 version
Uploaded - Feb 25, 2020

Download