Archiving files within a directory that has the same folder name, one in Double Byte Character Set (DBCS) and the other in Single Byte Character Set (SBCS) can result in potential data loss.
|Article:TECH126899|||||Created: 2010-01-13|||||Updated: 2011-07-20|||||Article URL http://www.symantec.com/docs/TECH126899|
Fig. 1 - Two folders with the same name, one in DBCS and the other in SBCS
The following versions of Symantec Enterprise Vault are affected:
- Enterprise Vault for File System Archiving 7.x
- Enterprise Vault for File System Archiving 2007.x
- Enterprise Vault for File System Archiving 8.x
- Enterprise Vault for File System Archiving 9.x
Scenario 1 - Same Time Stamp
In this scenario, FSA will archive the file in folder A and create a saveset. FSA will attempt to archive the file in folder B but FSA misinterprets folder B as folder A. The file in folder A and in folder B have the same time stamp. FSA believes the file is already archived so FSA will not store the data of the file in folder B, it will create the shortcut in folder B from the file that was archived in folder A with the same time stamp. At this time, the contents of the file in folder B will be lost.
In this scenario, FSA will archive the file in folder A and create a saveset. FSA will attempt to archive the file in folder B but FSA misinterprets folder B as folder A, FSA thinks the file in folder B is another version of the file in folder A. FSA will store the contents of the file in folder B. At this time, the contents of the file in folder B is stored as the latest version and the contents of the file in folder A is stored the 2nd latest version. When pruning is run, it will keep the latest version and delete the older version of the data. At this time, the data of the file in folder A will be lost.
How to Determine if Affected
- Target file server allows DBCS file/folder names (DBCS is primarily seen in East Asia language sets)
- Two folders at the exact same level with the same name, but whole or part of one folder name is in DBCS and the other in SBCS (See 'How to check for duplicate folder/file names')
- Two files with the same name, the same time stamp, and different contents within each of the folders. Most applications change the time stamp when it modifies the contents of the file.
File servers installed with a SBCS operating system such as US-English are not affected by this issue.
When FSA crawls a target file system, FSA creates one database record for each folder. Two folders at the same level with the same name, one in DBCS and the other in SBCS. Let's call them folder A and folder B and assume FSA first finds folder A from the target file server. FSA searches folder A information from the database to confirm if there are no records matching folder A. It means this is a new folder so that FSA will propagate a database record for folder A. Then FSA finds folder B from target file server. FSA searches folder B information from the database to confirm there are no records matching folder B. This time, FSA finds a record for folder A in error. This misleads FSA to think folder B is the same folder as folder A, and creates a folder point entry in folder B with the same id as folder A. Consequently, FSA will treat folder B as the same folder as folder A afterwards.
Symantec has acknowledged that the above mentioned issue is present in the current version(s) of the product(s) mentioned at the end of this article. Symantec is actively investigating this and working towards a formal fix for this issue. Please subscribe to this article by clicking on the "Subscribe via email" link on this page to be notified when a formal fix is released.
This problem will only manifest itself on systems that encounter folders and files with names including variable character-widths or Japanese Kana characters, for example, Japanese language installations. SQL server installations that are configured with the correct collation will not encounter this problem. SQL Collations are typically set as part of the SQL server installation process.
- Windows Collation
- Case-INsensitive (_CI)
- Accent-sensitive (_AS)
- Kana-sensitive (_KS)
- Width-sensitive (_WS)
Please contact Enterprise Vault Support for assistance with an existing installation.
How to Subscribe to Email Notification:
Directly to this Article:
Subscribe to this article by clicking on the "Subscribe via email" link on this page to receive notification when this article is updated with Release Information.
If you have not received this TechNote from the Symantec Email Notification Service as a Software Alert, you may subscribe via email and/or RSS using the links provided at the following page:
Symantec Strongly Recommends the Following Best Practices:
1. Always perform a FULL backup prior to and after any changes to your environment.
2. Always make sure that the environment is running the latest version and patch level.
3. Subscribe to technical articles for updates.
Files in DBCS folder name and in equivalent SBCS folder name are not distinct.
EV Core Collation Check and docs for ET 1841563
Article URL http://www.symantec.com/docs/TECH126899