Performance of Indexing may degrade over time when indexing xml attachments with embedded PDF files

Article:TECH215098  |  Created: 2014-02-17  |  Updated: 2014-03-13  |  Article URL http://www.symantec.com/docs/TECH215098
Article Type
Technical Solution


Issue



After operating normally for some period, the performance of the indexing service may degrade and not be able to keep up with ingestion rates or respond to search requests in a timely fashion.

Indexer-service.exe processes may consume too much memory and become unresponsive.


Cause



The wildcard dictionary is a list of unique words in an individual index volume. This list is updated every time an item is ingested.  If the size of this dictionary grows too large, the associated index may be unresponsive to search or ingestion requests.

In a scenario where emails have XML attachments containing embedded base64 encoded objects the dictionary file may become abnormally large. In this scenario the XML attachments contained an embedded PDF file.  As the attachment is ingested each string of base64 encoded text was added to the wildcard dictionary.  If a 13 MB PDF was embedded within the XML the wildcard dictionary would be increased by 13 MB.

The size of the wildcard dictionary can be checked by inspecting the index volume folder on disk (e.g. C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858) and navigating to the live\expansions subfolder. The wildcard dictionary will be the file without a file extension.

It is expected that this file should usually be under 2 GB.


Solution



Workaround:

Warning: It is important to note that this workaround excludes the XML file type from conversion and therefore prevents the contents of XML files from being searchable via Discovery Accelerator or Enterprise Vault Search.  The workaround below also requires that the index be rebuilt in such a fashion that it will not be searchable for the duration of the rebuild operation.  This rebuild operation will not provide normal progress updates, but its progress can be seen using the index volumes browser.
 
If it is necessary that XML files be available for indexing in the future (i.e. content converted but not indexed), this can be accomplished by setting Backup Mode on all Vault Stores for which indexing is provided by the server that contains the rebuilding index. This will prevent ingestion of new items during the index rebuild; once the rebuild is complete, then conversion of XML files can be reenabled for future items and Backup Mode can be cleared.
 
The steps are as follows:
  1. Set the Vault Stores into backup mode if required (see the note above).
  2. Stop the Enterprise Vault services on the affected EV server.
  3. Exclude XML file type from conversion using the ExcludedFileTypesFromConversion Registry setting.
  4. Rename the folder of the affected index. For example, if the index was located at C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858, then you could rename it to C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858_old, causing it to go "missing" from EV.
  5. Start the EV services.
  6. Perform a search of the affected archive using Enterprise Vault Search. This will trigger an automatic rebuild of the missing index.
  7. Allow the rebuild operation to complete, which may take a significant amount of time, depending on the size of the index.
  8. Once the rebuild is finished, remove the Vault Stores from backup mode if required.
Symantec Corporation has acknowledged that the above-mentioned issue is present in the current version(s) of the product(s) mentioned at the end of this article. Symantec Corporation is committed to product quality and satisfied customers. 

There are currently no plans to address this issue by way of a hotfix or cumulative hotfix in the current or previous versions of the software at the present time. This issue may be resolved in a future major revision of the software at a later time. However, this particular issue is not currently scheduled for any release.  If you feel this issue has a direct business impact for you and your continued use of the product, please contact your Symantec Sales representative or the Symantec Sales group to discuss these concerns.  For information on how to contact Symantec Sales, please see http://www.symantec.com

Supplemental Materials

SourceETrack
Value3404875
Description

 XML files with embedded PDF files is causing wildcard dictionary file size to exceed 2 GB




Article URL http://www.symantec.com/docs/TECH215098


Terms of use for this information are found in Legal Notices