Performance of Indexing may degrade over time when indexing xml attachments with embedded PDF files
|Article:TECH215098|||||Created: 2014-02-17|||||Updated: 2014-03-13|||||Article URL http://www.symantec.com/docs/TECH215098|
After operating normally for some period, the performance of the indexing service may degrade and not be able to keep up with ingestion rates or respond to search requests in a timely fashion.
Indexer-service.exe processes may consume too much memory and become unresponsive.
The wildcard dictionary is a list of unique words in an individual index volume. This list is updated every time an item is ingested. If the size of this dictionary grows too large, the associated index may be unresponsive to search or ingestion requests.
In a scenario where emails have XML attachments containing embedded base64 encoded objects the dictionary file may become abnormally large. In this scenario the XML attachments contained an embedded PDF file. As the attachment is ingested each string of base64 encoded text was added to the wildcard dictionary. If a 13 MB PDF was embedded within the XML the wildcard dictionary would be increased by 13 MB.
The size of the wildcard dictionary can be checked by inspecting the index volume folder on disk (e.g. C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858) and navigating to the live\expansions subfolder. The wildcard dictionary will be the file without a file extension.
It is expected that this file should usually be under 2 GB.
- Set the Vault Stores into backup mode if required (see the note above).
- Stop the Enterprise Vault services on the affected EV server.
- Exclude XML file type from conversion using the ExcludedFileTypesFromConversion Registry setting.
- Rename the folder of the affected index. For example, if the index was located at C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858, then you could rename it to C:\EVIndexes\index1\1BAA95753C944234490D77ABDB567A218_858_old, causing it to go "missing" from EV.
- Start the EV services.
- Perform a search of the affected archive using Enterprise Vault Search. This will trigger an automatic rebuild of the missing index.
- Allow the rebuild operation to complete, which may take a significant amount of time, depending on the size of the index.
- Once the rebuild is finished, remove the Vault Stores from backup mode if required.
XML files with embedded PDF files is causing wildcard dictionary file size to exceed 2 GB
Article URL http://www.symantec.com/docs/TECH215098