How to troubleshoot Enterprise Vault Compliance Accelerator (CA) when Random Sampling stops adding new items into review.

Article:TECH59590  |  Created: 2008-01-12  |  Updated: 2009-01-18  |  Article URL http://www.symantec.com/docs/TECH59590
Article Type
Technical Solution

Product(s)

Environment

Issue



How to troubleshoot Enterprise Vault Compliance Accelerator (CA) when Random Sampling stops adding new items into review.

Solution



With guaranteed sampling, CA captures all messages for every monitored employee throughout the day. By default, at 1AM, the Accelerator background service picks  a random sample from each employee's messages, based on his or her monitoring percentage, and adds them to the review set.  The record for all these items are recorded in the Sql table: tblVaultSamples  

When the process runs after 1AM to pick the random sample from each employee, CA must validate each of the selected messages has been successfully stored.  This is the process of confirming that an item has been successfully stored by Enterprise Vault (EV) and subsequently acquiring its unique ID.  A key aspect to the guaranteed sampling solution is that statistical analysis and recording is only carried out on successfully resolved transactions.  This task is therefore performed first.  All subsequent tasks (e.g.:  adding items to the review set) will be performed on resolved transactions only.  

The below checklist will cover 4 components involved in the random sampling process:  Journal Connector - Random Sampling - IIS - Storage

Checklist:

JOURNAL CONNECTOR

1. Is the Journal Connector (JC) processing items?

A quick check of the Journal mailbox in Outlook will determine if mail items are being archived or not:

- Are items stacking up in the Inbox without being archived?
- Do all the items go to the 'Failed External Filter' folder?

A restart of the Journal Task will also confirm the JC is functioning properly; the following event will be logged when the task starts:

Event Type: Information
Event Source: Enterprise Vault
Event Category: Journal Task
Event ID: 45329
User: N/A
Computer: SERVER
Description:
External Filter 'KVS.Accelerator.PlugIn.Filter' initialising...


Once a new item is archived, then this event will be seen:

Event Type: Information
Event Source: Accelerator Manager
Event Category: None
Event ID: 268
User: N/A
Computer: SERVER
Description:
APP JC - Enterprise Vault Journaling Connector has started.


This event will appear when there is a problem with the JC:

Event Type: Error
Event Source: Enterprise Vault
Event Category: Journal Task
Event ID: 3263
User: N/A
Computer: SERVER
Description:
An error occurred processing the external filter 'KVS.Accelerator.PlugIn.Filter'.  No more filters will be processed.

If the above event is seen, correct the issue and random sampling should continue as normal.  


RANDOM SAMPLING

2. Did sampling run?

For version 6.x, the event log should show:

Event 42111: Starting daily processing of captured items.
Event 42111: Finished daily processing of captured items. Sampled items are now available for review.

For versions 7.0 and 2007, these events apply to both:

Event 270: Starting daily processing of captured items.
Event 271: Finished daily processing of captured items. Sampled items are now available for review.

*Take note of the date/time of the above events.


In addition to the Event log, the SQL database can provide verification Sampling has run:

The below query should return today's date with a status of "Finished", run against the CA customer database. It will return two rows:

Row 1: Last date Sampling occurred
Row 2: Sampling Status


SELECT     [Value]
FROM         tblConfig
WHERE     ([Key] LIKE 'Sampling%')

Possible Sampling Statuses:

Finished
Either Sampling is configured to run later in the day, or it has already completed... Go to Step 3.

Started
Sampling is currently running.  Verify items are being resolved (Step 3).

Error
Root cause may lie in the application...investigate event log for errors.

NOTE: The following query is for CA version 5.1 only.  CA 5.1 did not have a status, therefore only the Last Sample value is relevant:

SELECT     Value
FROM       tblConfig
WHERE     ([Key] = 'Sampling : Last Sample')


--------------------------------Process to manually initiate sampling-----------------------------------

Change Last Sample Date (Figure 1):
A. In SQL, open table tblConfig in the CA customer database.

B. Manually edit the 'Value' for 'Sampling : Last Sample' to reflect yesterday's date.

C. Click the red exclamation point in the toolbar to execute and save the change.

Figure 1:
 

Change the sample time:
D. Open Compliance Accelerator

E. Under Application Administration, click System Configuration

F. In the 'Settings for' drop down box, select 'Random Capture'

G. Modify 'Sampling time (local time)' by clicking Edit on the far right hand side.

H. Enter a time which is 5 minutes ahead of the current time.

I. On far right hand side, click OK.

J. On the bottom, click Apply.

K. Click OK to the resulting message which states to restart the Customer Background Tasks.

L. Restart the Enterprise Vault Accelerator Manager Service.  (This will have same effect as restarting the Customer Background Tasks)

After applying steps A - L, it is recommended to monitor tblConfig to verify the Sampling time increments to today's date.  This is an indication that the Sampling process is properly working.  It is also recommended to monitor the event log for further clues into possible problems that may be occurring (see example events in Step 4).

NOTE: In some instances, sampling may take up to an hour before the process starts and increments the value in tblConfig.
-------------------------------------------------------------------------------------------------------------------------------


3. Were the messages resolved?  Part 1:

Run the following SQL query against the CA Customer database. Providing the messages are being resolved properly, the result should  be 0 (zero):

*This query will count how many items are present in tblVaultSamples with yesterday's date

SELECT COUNT (*) FROM tblVaultSamples
WHERE (CONVERT(varchar(10),  capturedate, 102)) < (CONVERT(varchar(10), GETDATE(), 102))

If the above query returns a value greater than 0, repeat the query in 5min to determine if the value is decreasing.  If the value is decreasing, the messages are being resolved properly.  In the scenario where the value remains the same or is increasing, this can indicate a problem exists in resolving the captured items.  A dtrace** of AcceleratorService will provide additional information and reveal the last action taken by Random Capture such as:

[TransactionResolver] is 'Looking up 150 unresolved transactions' *

*This indicates there are available items within tblVaultSamples to be resolved.  Had there been no items, the message would  say "0 items to resolve" which can indicate either 1) All items have been resolved or 2)Random sampling did not take place, see  step 2.  

**To setup DTRACE, see related documents below.


4. Were the messages resolved?  Part 2:

If random sampling took place (Step 2), but items still remain in tblVaultSamples and do not decrease (Step 3), check to see if the  TransactionResolver was able to retrieve the savesetid's for the messages.

When messages have successfully been resolved, tblVaultSamples is populated with the savesetid for each item.  Run the following SQL query against the CA Customer database to determine if the items sampled from yesterday have a value for  KVSSaveSetID:

*This query counts yesterday's items that have been resolved.  

SELECT COUNT (*)
FROM tblVaultSamples
WHERE KVSSaveSetID <> '''
AND (CONVERT(varchar(10), capturedate, 102)) < (CONVERT(varchar(10), GETDATE(), 102))

A return of 0 (zero) is an indication that TransactionResolver could not resolve the items.  The following events will be seen on the CA server:

Events that appear in version 6.x:

Type: Error
Event: 41998
Source: EV Compliance Accelerator
Category:
User: N/A
Computer: SERVER
Error:
An error occurred while resolving savesets
Description:
System.Runtime.InteropServices.COMException (0xC004194F): Exception from HRESULT: 0xC004194F.

Type: Error
Event: 41987
Source: EV Compliance Accelerator
Category:
User: N/A
Computer: SERVER
Error:
an error occurred when checking for transactions with no SaveSetID
Description:
System.Exception: System.Runtime.InteropServices.COMException (0xC004194F): Exception from HRESULT: 0xC004194F.


Events that appear in versions 7.x / 2007:

Type: Error
Event: 6286
Source: Enterprise Vault
Category: None
User: N/A
Computer: SERVER
Description:
Error obtaining the status of a saveset on "server.domain.com"
<0x80040e14>

Type: Error
Event: 77
Source: Accelerator Service Processor
Category: None
User: N/A
Computer: SERVER
Description:
APP AT - Customer ID: 1 - An error occurred while resolving savesets. System.Runtime.InteropServices.COMException (0x80040E14): Exception from HRESULT: 0x80040E14
  at KVS.EnterpriseVault.Interop.AutoStorageOnlineClass.GetSavesetIDs(String TransactionXML)
  at KVS.Accelerator.Application.TransactionResolver.Lookup(TransactionsDataTable TransTbl, Int64& HighestSeqNum)

Type: Error
Event: 272
Source: Accelerator Service Processor
Category: None
User: N/A
Computer: SERVER
Description:
APP AT - Customer ID: 1 - Guaranteed Sampling: Error during guaranteed sampling - aborting. System.Runtime.InteropServices.COMException (0x80040E14): Exception from HRESULT: 0x80040E14
  at KVS.Accelerator.Application.TransactionResolver.Lookup(TransactionsDataTable TransTbl, Int64& HighestSeqNum)
  at KVS.Accelerator.Application.TransactionResolver.PerformDailySample(Boolean bGuaranteedSampling, Boolean bFirstPass)
  at KVS.Accelerator.Application.TransactionResolver.ResolveTransactionsForGuaranteedSampling(Boolean bFirstPass)
  at KVS.Accelerator.Sampling.GuaranteedSampling.Step_2_ResolveTransactionIDs(Boolean bFirstPass)
  at KVS.Accelerator.Sampling.GuaranteedSampling.DoSampling()


These events may also be followed by:

Type: Error
Event: 42125
Source: EV Compliance Accelerator
Category:
User: N/A
Computer: SERVER
Error:
An Error has occured when retrieving Item
SaveSetID: 101000000000000~200805061831270000~0~B800C5A9954D4C66A5265261D63F4F4
VaultID: 123905F603D29384AA358B7DA1B3FDBB51110000SITE.domain.com
Format: XML
sAttachmentID: 0
Description:
System.Runtime.InteropServices.COMException (0x80004005): Unspecified error

The above event may appear more than once, however, it will usually only correspond to one SaveSetID.  This SaveSetID is the first saveset in which the TransactionResolver attempts to resolve from tblVaultSamples.  TransactionResolver will not move beyond the first saveset in the list until it is successfully resolved.  

In some situations, tblVaultSamples may contain items prior to yesterday's date.  If so, the below queries can be used.

*This query will count all resolved items (prior to the items being moved into review, this count should grow when everything is working):

SELECT COUNT (*)
FROM tblVaultSamples
WHERE KVSSaveSetID <> '''

*This query will count all unresolved items (When TransactionResolver is running it's processes, this count should decrease):

SELECT COUNT (*)
FROM tblVaultSamples
WHERE KVSSaveSetID = '''

NOTE: During the sampling process, it may be necessary to run these queries more than once to see if the counts are incrementing.  


5. Can the item be retrieved using the Browser Search?

If items cannot be resolved as discussed in Steps 3 & 4, the next step is to further isolate the cause.  To do so, rule out Compliance Accelerator by using the Browser Search (aka: Search.asp) to retrieve one of the sampled items:

A. Open the Advanced Browser Search, by default the url will be:  http://vault_site_alias/enterprisevault/search.asp?advanced

B. Uncheck 'Search Attachments'  (Figure 2)

C. For 'Other Attribute' enter a 'Name' of "SSID"  This is case sensitive, will need to be all in caps.  (Figure 2)

D. and for 'Value' enter the saveset (Figure 2).   The savesetid can be obtained from the event that occurs on the CA server.  (see Event 42125 in step 4 above).  For this example, it will be:
101000000000000~200805061831270000~0~B800C5A9954D4C66A5265261D63F4F4

E. Also specify the Vault to search (Figure 2).  In some environments there may be multiple vaults involved and searching "All Vaults" may result in the application timing out.  If it is unknown which vault the saveset corresponds to, utilize the SQL query in Step 7A to determine the vault name.  This will be the vault to search.  

Figure 2:
 

F. Run the search.

The following symptoms may occur:

- The application may seem to hang and never complete the search
- An error saying it cannot connect to the Directory service may appear.  
* if this occurs, restart ALL Enterprise Vault services and verify SQL server can be logged into, reboot the SQL server if necessary.  

G. If the search returns the item, click on the item link to display it.  Once displayed, click on 'View whole item'  (Figure 3).  If there is any type of communication or storage issues, it will be revealed when Search.asp attempts to recall the item when clicking this button.  When running normally, the item should recall and open in Outlook.  

Figure 3:
 


6. Are other symptoms occurring elsewhere in the environment?

These symptoms may apply:

- Exports hang
- Search.asp (Browser search) may hang, or may be unable to retrieve an item when clicking the 'View whole item' button.
- There will also be a total absence of errors.  Nothing to indicate a problem other than a lack of new items to review, or maybe a hung export.
- Outlook clients are unable to retrieve archived items.


IIS

7. Is IIS processing requests?

As mentioned in steps 3 & 4 above, TransactionResolver must resolve each mail item to verify the item was successfully stored.  To do so, TransactionResolver will communicate with the Enterprise Vault server via IIS.  IIS will in turn open a handle (or COM object) with StorageOnlineOpns which then confirms the item exists in storage.

Because IIS serializes handle requests, if one were to hang then all subsequent handles will queue up until the hang has been alleviated.  Without actively running debugging tools, it may be impossible to know if this is occurring other than the symptoms that can manifest as stated in step 6.

In order to eliminate IIS from the list of culprits, a simple IIS restart will alleviate any hangs.  In environments where there are multiple Enterprise Vault servers, it is possible that only one of these servers running IIS is experiencing the hang.  To determine which vault server this is, follow the below process:

A. Run the following SQL query against the Accelerator Customer database to locate the Archive Name of the first record listed in the TblVaultSamples table. This will help isolate which EV server currently has the IIS issue.

SELECT TOP 1 tv.KVSVaultName AS 'Archive Name', tvs.CaptureDate FROM tblVaults tv
INNER JOIN tblVaultSamples  tvs ON tvs.VaultID = tv.VaultID
ORDER BY tvs.CaptureDate ASC

B.  Determine the server that controls the Storage of the Archive in question:

- Open the Vault Administration Console (VAC)
- Expand Archives and select Journal
- Right click on the archive listed from step A and select Properties
- Note the EV server listed for the Storage Service

C. On the impacted EV server restart the IIS services which will clear any handle that may be hung:

- Start > Run
- Type:  IISRESET
- Click OK


STORAGE

8. Is the Storage service running?

To complete the process of resolving items, IIS must be able to communicate with StorageOnlineOpns.  If the Enterprise Vault Storage service is stopped, this would not be possible.

- Verify the Enterprise Vault Storage service is started.  
- Will also do no harm (and a valid troubleshooting step) to restart all EV services running on the problematic vault server as determined in step 7b.

Depending on the environment / scenario, problems with storage can vary.  The below events (which appear on the vault server) are an example of what happens when the SQL TEMPDB filegroup fills up.  This is also what prevented TransactionResolver from resolving the savesetid's in this particular scenario.  Take note of the SQL Command listed in the first event: "usps_SavesetIdentifiers"  This is the stored procedure which is responsible for providing TransactionResolver with the savesetid.    

Event Type: Error
Event Source: Enterprise Vault
Event Category: Storage Online
Event ID: 13361
User: N/A
Computer: SERVER
Description:
An error was detected while accessing the Vault Database 'VAULTSTOREDB' (Internal reference: .\ADODataAccess.cpp (CADODataAccess::ExecuteSQLCommand) [lines {1379,1381,1396,1419}] built Oct 11 18:32:39 2007):
Description:  
Could not allocate space for object '(SYSTEM table id: -288041471)' in database 'TEMPDB' because the 'DEFAULT' filegroup is full.

SQL Command:
usps_SavesetIdentifiers

Additional Microsoft supplied information:

Source:       Microsoft OLE DB Provider for SQL Server
Number:       0x80040e14
SQL State:    42000
Native Error: 00001105

Event Type: Error
Event Source: Enterprise Vault
Event Category: Storage Online
Event ID: 6796
User: N/A
Computer: SERVER
Description:
A COM exception has been raised.
<0x80040e14>
Internal reference
.\VaultStoreDB.cpp (CVaultStoreDB::GetSSIDsForTransIDs) [lines {10115,10132,10134,10136,10139}] built Oct 11 19:10:58 2007

An exception is raised when a process encounters an unexpected fault.

EV memory

9. Does EV Storage server have enough available operating memory?

To complete the process of resolving items, IIS must be able to communicate with StorageOnlineOpns.  If the Enterprise Vault Storage server does not have enough memory, communications to the EV will be sporadic and most RPC requests will fail. This communications failure will prevent the process of resolving items.

Determine the processes using the most memory by opening Task Manager | Process tab and sort by Memory Usage (descending from highest to lowest memory use)(Figure 3).  If there appears to be a service demanding more memory then the establish memory base line for the server and the service does not release this memory, the server will need to be rebooted in order to recovery the memory for application use.

Figure 3:
 
Many programs are inherent memory leaks.  To create a memory base line for the server, power on the server and take a snapshot of the memory usage.  Perform minimal operations and take another snapshot of the memory usage.  Use these two snapshots as a memory base line for the server.  

EV Server temp folder

10.  Has sufficient space been allocated to the temp folder?
The recommended space for the Vault Service Account temp folder is 40GB.

SQL memory

11. Does SQL server have enough available operating memory?  

The same lack of memory resources can affect the SQL server as well.  Use the steps outline in point 9 to determine memory usage.  

A SQL server with multiple instances can also have memory issues if not properly setup.  Without minimum and maximum memory settings the competing Instances will eventually control the majority of memory.

Errors that can indicate memory resource issues:

Event Type: Error
Event Source: Accelerator Service Processor
Event Category: None
Event ID: 272
Computer: Server
Description:
APP AT - Customer ID: 2 - Guaranteed Sampling: Error during guaranteed sampling - aborting. System.UnauthorizedAccessException: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))
  at KVS.Accelerator.Application.TransactionResolver.Lookup(TransactionsDataTable TransTbl, Int64& HighestSeqNum, String diagnostic_fileprefix)
  at KVS.Accelerator.Application.TransactionResolver.PerformDailySample(Boolean bGuaranteedSampling, Boolean bFirstPass)
  at KVS.Accelerator.Application.TransactionResolver.ResolveTransactionsForGuaranteedSampling(Boolean bFirstPass)
  at KVS.Accelerator.Sampling.GuaranteedSampling.Step_2_ResolveTransactionIDs(Boolean bFirstPass)
  at KVS.Accelerator.Sampling.GuaranteedSampling.DoSampling().


Event Type: Error
Event Source: Accelerator Service Processor
Event Category: None
Event ID: 130
Computer: Server
Description:
APP AS - Customer ID: 2 - An error has occured when initializing the Customers. System.Data.SqlClient.SqlException: An error has occurred while establishing a connection to the server.  When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
  at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
  at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
  at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
  at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
  at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
  at System.Data.SqlClient.SqlConnection.Open()
  at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
  at System.Data.SqlClient.SqlConnection.Open()
  at KVS.Accelerator.Configuration.Customer.GetCustomerDS(Int32 CustomerID, String VirtualDirectory, String Version, Int32 ServerID, CustomerState State).


Event Type: Error
Event Source: Accelerator Manager
Event Category: None
Event ID: 63
Computer: Server
Description:
The description for Event ID ( 63 ) in Source ( Accelerator Manager  ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: APP ATM - Error adding installing/upgrading a Customer. System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
  at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
  at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
  at System.Data.SqlClient.SqlConnection.Open()
  at KVS.Accelerator.Configuration.Customer.GetCustomerToInstallUpgradeDS(Int32 ServerID)
  at KVS.Accelerator.Application.InstallQueue.InstallQueueWork(ThreadSafeQueue& theQueue, Object myData).




Recommendations for preventative maintenance:

1. Keep a watchful eye on random sampling to make sure it is running (Step 2 of checklist).  If new review items do not appear, this is a good indicator that something is wrong.

2. Run IIS debugging tools; this should allow for dumps to be captured when and if issues occur.

3. Implement a schedule to restart IIS on a weekly or bi-weekly basis.

4. Services, such as the Enterprise Vault Storage service, can be configured to restart automatically if they stop. This may prevent manual intervention.

5. If running EV and SQL on the same server, implement a schedule to restart the server on a weekly or bi-weekly basis.

6. Overall, when troubleshooting CA issues, first start by removing CA from the scope...see if the issue occurs in other EV applications  e.g: Search.asp.  This is a good first step to help quickly ascertain where the issue truly resides, CA or EV itself.




Supplemental Materials

SourceEvent ID
Value41998
DescriptionAn error occurred while resolving savesets

SourceEvent ID
Value77
DescriptionAn error occurred while resolving savesets. Exception from HRESULT: 0x80040E14

SourceEvent ID
Value41987
Descriptionan error occurred when checking for transactions with no SaveSetID

SourceEvent ID
Value6286
DescriptionError obtaining the status of a saveset

SourceEvent ID
Value272
DescriptionError during guaranteed sampling - aborting. Exception from HRESULT: 0x80040E14

SourceEvent ID
Value42125
DescriptionAn Error has occured when retrieving Item

SourceEvent ID
Value6796
DescriptionCVaultStoreDB::GetSSIDsForTransIDs

SourceEvent ID
Value13361
DescriptionAn error was detected while accessing the Vault Database. filegroup is full


Legacy ID



302829


Article URL http://www.symantec.com/docs/TECH59590


Terms of use for this information are found in Legal Notices