Taskinstances table growing rapidly

Article:TECH144662  |  Created: 2010-11-19  |  Updated: 2014-07-07  |  Article URL http://www.symantec.com/docs/TECH144662
Article Type
Technical Solution


Issue



The DataBase is growing fast, all tasks tables are growing, in the log viewer you see a looping cleanup task:

Error deleting: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.

The above error is showing up several times a second in the a logs, filling the logs.  This error almost always shows up with 2 other errors, one of which indicates a Deadlock.  However, the SQL server is showing no locking at all.
 

 


Error



Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks
Description: AltirisSqlHelper.RepeatForDeadlocks(): Non-deadlock exception: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.
   at Altiris.TaskManagement.Data.TaskExecutionInstance.GetTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.<>c__DisplayClass2.<DeleteTaskInstance>b__1(DatabaseContext ctx, Object state)
   at Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks(Int32 retries, Int32 sleep, Object state, RepeatForDeadlocksDelegate func)

Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks
Description: AltirisSqlHelper.RepeatForDeadlocks(): Failed all retries

Process: AtrsHost (2068)
Module: AtrsHost.exe
Source: Altiris.TaskManagement.Maintenance.CleanupTaskDataTask.DeleteExcessWorkingData
Description: CleanupTaskDataTask.DeleteExcessWorkingData(): Error deleting: Altiris.TaskManagement.Data.TaskExecutionInstanceNotFoundException: Unable to find task instance 00000000-0000-0000-0000-000000000000 in the database.
   at Altiris.TaskManagement.Data.TaskExecutionInstance.GetTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.<>c__DisplayClass2.<DeleteTaskInstance>b__1(DatabaseContext ctx, Object state)
   at Altiris.TaskManagement.Data.AltirisSqlHelper.RepeatForDeadlocks(Int32 retries, Int32 sleep, Object state, RepeatForDeadlocksDelegate func)
   at Altiris.TaskManagement.Data.TaskExecutionInstance.DeleteTaskInstance(TaskInstanceGuid taskInstanceGuid)
   at Altiris.TaskManagement.Maintenance.CleanupTaskDataTask.DeleteExcessWorkingData(WaitHandle eventStop).


Environment



Symantec Management Platform 7.0 and up.


Cause



This issue is caused by following two reasons: 

1.  The clients have an old task that they are trying to upload status for. It results in spamming the server with the results from an old or bogus task causing the server to fill up with "junk" data.

2.  There is a problem with the cleanup task getting data in the tables out of sync then getting to a point where it just fails.

Any high load on the NS server or high I/O disk values on the SQL (if off-box) should be considered specially while running the work around.


Solution



Possible workarounds as below: 
 
1- Stop and change the Cleanup Task
 
What the weekly change is for, is to prevent the errors from showing up again, except once/week. Each time it runs, you should check to see if it ran successfully, and if not, stop it and delete it again. You'll need to continue doing this until a fix can be found.
 
If task is disabled instead of cleanup, then NO data will be purged, and the table will grow unexpectedly. Perform cleanup on weekly basis. If issue continues, running it manually or daily is recommended.
To help prevent the on demand clean up task from running change the cleanup options and set the maximum number of working database rows to 1 Million.
 
2- Manually Delete the Data
 
Note: Truncating these tables will cause all task history to be removed. If there is task history data that is needed, it must be retrieved before truncating the tables.
 
Note: You may have to stop the Altiris Object host and Dataloadeder services if you are having problems truncating the tables.
  • Find the Cleanup Task Data task, Settings>All Settings>Notification Server>Task Settings - Cleanup Task Data. Find all running instances (not the one that says "Pending" under the Status), and stop them. About every minute, refresh this page until they all show a big red X.
  • At that point, delete the run instances. By this time, you'll notice that the logs are no longer filling with errors.
  • Now, modify the schedule (the one that says pending) to run weekly instead of daily.
 
Delete data in the database by truncating the following Task Tables:
 
i.e.  Use the following SQL command: Truncate table (Task Table name)
 
Truncate table TaskInstanceSummaries
Truncate table TaskInstanceResults
Truncate table TaskInstances
Truncate table TaskinstanceParents
Truncate table TaskInstancesStarted
Truncate table TaskInstanceStatus
Truncate table TaskOutputPropertyValue
Truncate table TaskInstanceEvents
Truncate table TaskInstanceChildEvents  (Note: Depending on your Hierarchy this Child Events could not complete successfully, this is normal just go to the next table.)
Truncate table TaskInstanceExecutionInfo
Truncate table TaskInstanceJobNodes
Truncate table TaskInstanceresultSummaries
Truncate table ServerTaskInstancerequests
Truncate table ClientTaskInstancerequests
 
 
3- Run a Cleanup Script on Clients
  • Create a New Run Script task, Manage>Jobs and Tasks>System Jobs and Tasks>Software>right click Patch Management select New then Task. Scroll down and find Run Script, rename then copy and paste the following syntax in to the box. Note: There is a REM that needs to be removed if on 7.1 or above. It is noted in the script. Then click OK, this will save the Task.
  • Scroll down to New Client Job, click Add Existing. Select the Run Script you just created, click OK. Uncheck the box-Fail Job if any Task fails, Save Changes. Click New Schedule, at this point you can put in a schedule or run it Now.Note: Only run this job once, No Repeat.Click Add and put in your Target or Computers. Click the Schedule button and you are done. It runs according to what you just set.
Note: The Cleanup Script below does have a pop-up window the user will see. Pop-up will be up 1-10 seconds, then runs in the background. Runs up to 2 minutes, and the agent will disappear in this process.
           Process completed when the agent is visible again
 
Cleanup Script 
---------------------------------------------------------
REM Get the Altiris Agent install path
FOR /F "tokens=2*" %%A IN ('REG.EXE QUERY "HKLM\Software\Altiris\Altiris Agent" /V "installdir"') DO SET AgentDir=%%B
set tempbat=%temp%\AgentClean.bat"

REM Create temporary batch file to execute while the agent restarts
echo "%AgentDir%\aexagentutil" /stop > %tempbat%
echo rmdir "%AgentDir%\TaskManagement\cache" /s /q >> %tempbat%
echo rmdir "%AgentDir%\TaskManagement\status" /s /q >> %tempbat%
echo rmdir "%AgentDir%\TaskManagement\statusXml" /s /q >> %tempbat%
REM echo rmdir "%AgentDir%\TaskManagement\lti" /s /q >> %tempbat%      -- remove away the REM, from this line only if you are running on 7.1 and above
echo ping localhost -n 30 >> %tempbat%
echo "%AgentDir%\aexagentutil" /start >> %tempbat%
echo exit >> %tempbat%
REM Executes temporary batch file
start "" /MIN %tempbat%
---------------------------------------------------------
 
Now find the machines which are spamming the server using task results. To do this, first of all you have to find what results are frequently sent to the NS by those client machines. Run the following SQL query to see what Tasks have had data sent up to the server in the last 24 hours.
*Stop the service "Altiris Client Task Data Loader" and "Altiris Support Service".
*Stop all the running "Cleanup Task Data Daily" from Altiris Console>Settings>Notification Server>Task Settings>Cleanup Task Data.
*Run Cleanup task data manually with the attached SQL query.
*Run the stopped services "Altiris Client Task Data Loader" and "Altiris Support Service".
*Change the schedule time of the "Cleanup Task Data Daily" to few minutes later to re associated the task with the windows schedule.
By following these steps, the error might get reproduced when you run the services again as there is a regular check for the "taskinstances" table not to exceed 250000 rows causes an "On-Demand Cleanup Task Data". You have to stop it from the console, start the Cleanup Task Data from the console (right click > Start Now), if the error appears again just stop the running Cleanup Task Data, and re-run it when you see that it returned a value of success or fail, this should be performed until all of the failed Cleanup Task Data tasks disappear, and be sure that it can run without problem on its scheduled time.
----------------------------------------------------------------------------------
If Null values are found frequently under following query results, it will confirm that there is a problem
select max(i.name) as Name,TaskVersionGuid, count(*) As count from taskinstances ti
left join itemversions iv
on ti.TaskVersionguid = iv.Versionguid
left join item i
on iv.Itemguid = i.guid
join Taskinstanceresults tir
on tir.TaskInstanceGuid = ti.TaskInstanceGuid
where tir.endtime > getdate() -1
group by TaskVersionGuid
order by count DESC
-------------------------------------------------------
If the query results show that there is NULL value in their name, then it will confirm that it is likely to be problematic task. Please copy that TaskVersionGuid and put it in for the following Query
-------------------------------------------------------
declare @TaskVersionGuid uniqueidentifier
set @TaskVersionGuid='XXXXXX-XXXXXXXX-XXXXXXXX'

select max(vc.name) as Name, ResourceGuid, Count(*) as Count from TaskInstances ti
join vcomputer vc
on ti.ResourceGuid = vc.Guid
where ti.TaskVersionGuid = @TaskVersionGuid
group by ResourceGuid
order by count DESC
--------------------------------------------------------------
Run the above script on all affected machines.

 

 

 


Supplemental Materials

Description

The data that is being deleted is historical logging data only. Logging records data every time the task runs, which means that the data accumulates rapidly. If you do not do a purge, then the log files can grow very large. The data being deleted does not contain any critical information. Only in some cases would you need to truncate tables to wipe them. The tasks themselves will not be deleted, but only the results of the tasks. The results include the start time, when it finished, the completion status. If you do truncate the table, only the result status remains. If the table exceeds 250,000 rows, the cleanup task may not operate correctly.



Article URL http://www.symantec.com/docs/TECH144662


Terms of use for this information are found in Legal Notices