Archiving files within a directory that has the same folder name, one in Double Byte Character Set (DBCS) and the other in Single Byte Character Set (SBCS) can result in potential data loss.

Article:TECH126899  |  Created: 2010-01-13  |  Updated: 2011-07-20  |  Article URL http://www.symantec.com/docs/TECH126899
Article Type
Technical Solution

Product(s)

Environment

Subject

Issue



Introduction
 
File System Archiving (FSA) data can be lost when files with the same name are archived from identically named folders at the same level in the folder hierarchy; one folder being in a single byte character set, the other a double byte character set (this folder could be fully double byte, or a combination of single and double byte characters).

Fig. 1 - Two folders with the same name, one in DBCS and the other in SBCS
 
 

Environment



What is Affected
The following versions of Symantec Enterprise Vault are affected:
 
  • Enterprise Vault for File System Archiving 7.x
  • Enterprise Vault for File System Archiving 2007.x
  • Enterprise Vault for File System Archiving 8.x
  • Enterprise Vault for File System Archiving 9.x
 The following two scenarios will cause data loss:

 Scenario 1 - Same Time Stamp
 
1. FSA is configured on a file server that allows DBCS file/folder names (example: Japanese version of Windows)
 
2. Two folders are created at the same level with the same name, one in DBCS and the other in SBCS.  In this scenario they will be referred to as folder A and folder B.
 
3. Files with the same filename and same timestamp but different content are stored in each folder.
 
4. The File System Archiving task is initiated by schedule or by a Run Now.
 
In this scenario, FSA will archive the file in folder A and create a saveset.  FSA will attempt to archive the file in folder B but FSA misinterprets folder B as folder A.  The file in folder A and in folder B have the same time stamp.  FSA believes the file is already archived so FSA will not store the data of the file in folder B, it will create the shortcut in folder B from the file that was archived in folder A with the same time stamp.  At this time, the contents of the file in folder B will be lost.
  
Scenario 2 - Pruning Enabled
 
1. FSA is configured on a file server that allows DBCS file/folder names (example: Japanese version of Windows)
 
2. Pruning is configured. Set FSA task's [Prune to] to 1 [versions of the document], and enable [Scheduled Pruning].
 
3. Two folders are created at the same level with the same name, one in DBCS and the other in SBCS.  In this scenario they will be referred to as folder A and folder B.
 
4. Files with the same filename, different contents are stored in each folder.
 
5. The File System Archiving task is initiated by schedule or by a Run Now.
 
6. Pruning will run based on the scheduled.

In this scenario, FSA will archive the file in folder A and create a saveset.  FSA will attempt to archive the file in folder B but FSA misinterprets folder B as folder A, FSA thinks the file in folder B is another version of the file in folder A.  FSA will store the contents of the file in folder B.  At this time, the contents of the file in folder B is stored as the latest version and the contents of the file in folder A is stored the 2nd latest version.  When pruning is run, it will keep the latest version and delete the older version of the data.  At this time, the data of the file in folder A will be lost.

How to Determine if Affected
 
There is a potential for data loss to occur if all the following conditions are met:
 
  • Target file server allows DBCS file/folder names (DBCS is primarily seen in East Asia language sets)
  • Two folders at the exact same level with the same name, but whole or part of one folder name is in DBCS and the other in SBCS (See 'How to check for duplicate folder/file names')
AND 
  • Two files with the same name, the same time stamp, and different contents within each of the folders. Most applications change the time stamp when it modifies the contents of the file.

File servers installed with a SBCS operating system such as US-English are not affected by this issue.

 
How to check for duplicate folder/file names
 
Download the tool to find affected files and folders where one folder name is in DBCS and the other in SBCS.
 

Cause



When FSA crawls a target file system, FSA creates one database record for each folder.  Two folders at the same level with the same name, one in DBCS and the other in SBCS. Let's call them folder A and folder B and assume FSA first finds folder A from the target file server. FSA searches folder A information from the database to confirm if there are no records matching folder A. It means this is a new folder so that FSA will propagate a database record for folder A.  Then FSA finds folder B from target file server. FSA searches folder B information from the database to confirm there are no records matching folder B. This time, FSA finds a record for folder A in error. This misleads FSA to think folder B is the same folder as folder A, and creates a folder point entry in folder B with the same id as folder A.  Consequently, FSA will treat folder B as the same folder as folder A afterwards.

 


Solution



Formal Resolution
Symantec has acknowledged that the above mentioned issue is present in the current version(s) of the product(s) mentioned at the end of this article.  Symantec is actively investigating this and working towards a formal fix for this issue.  Please subscribe to this article  by clicking on the "Subscribe via email" link on this page to be notified when a formal fix is released. 

Workaround

New Installation
This problem will only manifest itself on systems that encounter folders and files with names including variable character-widths or Japanese Kana characters, for example, Japanese language installations. SQL server installations that are configured with the correct collation will not encounter this problem. SQL Collations are typically set as part of the SQL server installation process.
 
The following Microsoft article has more information on the collation options at install time:
 
 
To check your current collation sorting options, either look at the properties of one of the Enterprise Vault databases within SQL Server Management Studio or execute the following query:
 
SELECT DATABASEPROPERTYEX('database_name', 'Collation') SQLCollation;
 
Where 'database_name' is the name of one of the Enterprise Vault databases.
 
To ensure the symptoms of this problem do not manifest themselves, ensure that the following Collation options are specified as part of the SQL server installation:
 
  • Windows Collation
    • Case-INsensitive (_CI)
    • Accent-sensitive (_AS)
    • Kana-sensitive (_KS)
    • Width-sensitive (_WS)
 i.e. <SQL_collation_name>_CI_ followed by either _AI, _KI or _WI  MAY exhibit this problem.
 
If the collation output shows that the collation as Case-sensitive or the collation is Accent, Kana or Width INsensitive then, in order to remove all possibility of encountering this error, you will need to change the server Collation and make use of  the SQL collation scripts documented here:
 

Existing Installation

Please contact Enterprise Vault Support for assistance with an existing installation.

How to Subscribe to Email Notification:
Directly to this Article:
Subscribe to this article by clicking on the "Subscribe via email" link on this page to receive notification when this article is updated with Release Information.

Software Alerts:
If you have not received this TechNote from the Symantec Email Notification Service as a Software Alert, you may subscribe via email and/or RSS using the links provided at the following page:

http://www.symantec.com/business/support/index?page=content&key=50990&channel=ALERTS


Symantec Strongly Recommends the Following Best Practices:
1. Always perform a FULL backup prior to and after any changes to your environment.
2. Always make sure that the environment is running the latest version and patch level.
3. Subscribe to technical articles for updates. 

 


Supplemental Materials

SourceETrack
Value1841563
Description

Files in DBCS folder name and in equivalent SBCS folder name are not distinct.


SourceETrack
Value2111407
Description

EV Core Collation Check and docs for ET 1841563



Legacy ID



340300


Article URL http://www.symantec.com/docs/TECH126899


Terms of use for this information are found in Legal Notices