A rare potential for data loss has been discovered with PureDisk Remote Office Edition 6.5 in cases where the content router is critically low on memory and is restarted during queue processing, or if a transaction log is not written correctly due to a disk write error.

Article:TECH62172  |  Created: 2008-01-31  |  Updated: 2009-01-31  |  Article URL http://www.symantec.com/docs/TECH62172
Article Type
Technical Solution


Environment

Issue



A rare potential for data loss has been discovered with PureDisk Remote Office Edition 6.5 in cases where the content router is critically low on memory and is restarted during queue processing, or if a transaction log is not written correctly due to a disk write error.

Solution



Introduction:
A rare potential for data loss has been discovered with PureDisk Remote Office Edition 6.5 in cases where the content router is critically low on memory and is restarted during queue processing, or if a transaction log is not written correctly due to a disk write error.  There are two possible scenarios where data loss may occur.

Scenario 1:
Data loss may occur after a content router is stopped during queue processing, then the cache does not fit in memory when the content router is restarted.

Scenario 2:
If a transaction log has been aborted, due to a disk write error, and the content router runs out of memory immediately thereafter, data loss may occur.  This includes disks on any type of storage subsystem such as San, iSCSI or Network Appliances.

If either of these rare scenarios occur, queued data may not be written to the database as expected.  As a result, subsequent maintenance operations may groom valid data that is not referenced in the database.

Please Note:  At the time of this writing, there have been no actual reports or cases of these data loss issues occurring.  This note is an advisory to ensure the proper server maintenance and configuration in order to avoid compromising the PureDisk database.  


What is Affected:
All environments running PureDisk Remote Office Edition 6.5

How to Determine if Affected:
Data loss has been known to occur if ALL conditions of either scenario are met:

Scenario 1  Incorrect loading of a storage index can result in data loss.
Data loss has been known to occur if the following sequence of events take place:
1. The content router is critically low on memory (this can be due to the amount of physical ram being less than the recommended amount, or by having other non-PureDisk processes in memory).    
2. During queue processing, the content router is restarted.  This may be the result of a user initiated stop & start of PureDisk or PureDisk Content Router, or an involuntary restart such as a power outage.
3. On restart, the storage index is reloaded but does not fit into memory due to server memory being insufficient.
4. Queued data is not written to the database as expected, resulting in an inconsistent database.  Subsequent maintenance operations groom the backup data associated with these missing database entries.

Scenario 2  The transaction log is aborted due to a disk write error and the content router runs out of memory.
Data loss has been known to occur if the following sequence of events take place:
1. The content router is critically low on memory (this can be due to the amount of physical ram being less than the recommended amount, or by having other non-PureDisk processes in memory).    
2. A disk write error occurs, causing the transaction log to abort.
3. Immediately following this, the content router runs out of memory.
4. Queued data is not written to the database as expected, resulting in an inconsistent database.  Subsequent maintenance operations groom the backup data associated with these missing database entries.



Error Code(s) / Message(s):

The following are errors associated with the conditions listed above:

If an outofmemory event occurs, one of the following errors will be present in the spoold log files located at /Storage/log/spoold:
INFO.[number].: Storage Cache Manager: cache transition from reloading to incomplete during cache update
INFO.[number].: Storage Cache Manager: cache transition from complete to incomplete during cache update
INFO.[number].: Storage Cache Manager: cache transition from reloading to incomplete during global cache update
INFO.[number].: Storage Cache Manager: cache transition from complete to incomplete during global cache update

If a disk write error occurs, the following will be present in the spoold log files located at /Storage/log/spoold:
INFO.[number].: _spoolerClassCommit: _spoolerClassAbort
INFO.[number].: _spoolerClassStore: dataStoreManager->rollback

Please note:  The occurrence of these errors, as seen above, does not confirm data loss.  These errors must occur within the context of the scenarios described in this TechNote for data loss to occur.

Formal Resolution:
This issue is formally resolved in the following patch:
-  PureDisk Remote Office Edition 6.5.0.1  (Currently available, refer to the Related Documents section below.)
PureDisk users must be running 6.5 to apply this patch.  

Note: The resolution mentioned above will prevent this issue from occurring to future backups, even in cases where Server requirements or recommendations are not met; however, affected data on backups performed prior to the implementation of this resolution will not be recoverable.  If it is believed that data may be at risk due to the conditions described in this alert, please call Technical Support for further instruction.


This Software Alert has been delivered because product quality and responsiveness to customers are consistent Symantec Corporation hallmarks. Any issue that could potentially affect the integrity of data in your environment, no matter how rare, is viewed as extremely serious.  

Prevention/Workaround:
Meeting the following server requirements/recommendations will help ensure that these resource related issues do not occur:
-  Adhere to the minimum memory requirements:
- 10 GB of error-correcting code (ECC) random-access memory (RAM) for an 8-TB all-in-one PureDisk node.
- 4 GB of ECC RAM for a 4-TB all-in-one PureDisk node.
- 4 GB of ECC RAM for a 4-TB content router node.
- 8 GB of ECC RAM for an 8-TB content router node.
-  Do not run memory consuming processes other than PureDisk processes on the PureDisk node.
-  Ensure that the Content Router consistently has sufficient disk space.

Note: The resolution mentioned above will prevent this issue from occurring to future backups, even in cases where Server requirements or recommendations are not met; however, affected data on backups performed prior to the implementation of this resolution will not be recoverable.  Please refer the PureDisk 6.5 Getting Started guide, found below in the Related Documents section, for further details on system requirements.  



Best Practices:
Symantec strongly recommends the following best practices:
1. Always perform a Full backup prior to and after any changes to your environment.
2. Always make sure that your environment is running the latest version and patch level.


How to Subscribe to Software Alerts:
If you have not received this TechNote from the Symantec Email Notification Service as a Software Alerts, please subscribe at the following link:
 http://maillist.entsupport.symantec.com/subscribe.asp

Supplemental Materials

SourceETrack
Value1279913
DescriptionETrack (NetBackup) 1279913: Cr:incorrect loading of storage index can result in data loss

SourceETrack
Value1297620
DescriptionETrack (NetBackup) 1297620: Potential data loss: CR open tlog index can become incomplete


Legacy ID



306906


Article URL http://www.symantec.com/docs/TECH62172


Terms of use for this information are found in Legal Notices