A potential for data loss has been discovered when a NetBackup PureDisk content router meets or exceeds its "Low Space Threshold" and remains above this threshold. In some cases, the backup job may complete successfully (exit status 0), even though data was not completely written due to the content router being in this state.

Article:TECH64775  |  Created: 2008-01-04  |  Updated: 2009-01-20  |  Article URL http://www.symantec.com/docs/TECH64775
Article Type
Technical Solution

Product(s)

Environment

Issue



A potential for data loss has been discovered when a NetBackup PureDisk content router meets or exceeds its "Low Space Threshold" and remains above this threshold. In some cases, the backup job may complete successfully (exit status 0), even though data was not completely written due to the content router being in this state.

Solution



Introduction:
A potential for data loss has been discovered in cases where a PureDisk content router meets or exceeds its "LowSpaceThreshold" and remains above this threshold.  This is set to 85% disk utilization by default.

The "LowSpaceThreshold" is configured in the contentrouter.cfg configuration file on the PureDisk Content Router.  When the utilized disk space on the content router meets or exceeds this level, the content router will no longer accept backups and will send abort messages to clients to prevent further backup data from being sent.  

What is Affected:
The following versions of PureDisk are affected, on all supported platforms:
- NetBackup PureDisk Remote Office Edition 6.1, 6.2, 6.2.x (Scenario 1 only)
- NetBackup PureDisk Remote Office Edition 6.5, 6.5.0.x


How to Determine if Affected:
Data loss has been known to occur if ALL conditions of either scenario are met.

Scenario 1  A Full Content Router can result in data loss in files less than or equal to the segment size. (The default segment size is 128 KB for files and directories).
-  The PureDisk content router is running one of the versions mentioned above in a supported configuration.
-  The Content Router meets or exceeds the LowSpaceThreshold.
-  The Client attempts to write a small segment of data to the content router (for example, a file that is smaller than the segment size).
-  These smaller segments are sent to the content router, but not committed properly to the database. Subsequent database maintenance operations will groom this data.

Scenario 2  The client's pdbackup process does not deal properly with abort message from the Content Router. (Affects 6.5, 6.5.0.x only)
-  The PureDisk Client is running one of the versions mentioned above in a supported configuration.
-  The Content Router that the client is writing to meets or exceeds the LowSpaceThreshold.
-  The Client sends data to the content router. The Content Router has met or exceeded its space threshold and sends an "abort" message to client.
-  The Client's pdbackup process misinterprets the abort message on the client.
-  As a result, there are records of data on the content router that do not actually exist.

If Scenario 2 occurs, all file and folder information sent to the content router is potentially affected.  If it is probable that the LowSpaceThreshold is met or exceeded on the content router, it is strongly advised to address this immediately by applying 6.5.1 when available, or to ensure that sufficient disk space is available to prevent the threshold from being exceeded.  

Symptoms Occurring on Backup:
If issue described in Scenario 2 occurs, the following message will occur in the /Storage/log/spoold log on the content router:
data store failed: could not spool object

If the LowSpaceThreshold is exceeded on a Content Router, the following message will occur in the /Storage/log/spoold log on the content router:
Could not write data to data store, error: spool directory out of space

Symptoms Occurring on Restore:
Files that experience this issue will be seen in the user interface for restore, but will exhibit an error when an attempt is made to restore them.
In the rare case that this issue occurs, restores from an affected backup will restore up until the point of where the issue occurred, then fail. In the details of the restore, "no such object" or missing segment messages such as the following will be seen:
Failed to restore /tmp/root/tree_10000/_54/12 (at line 904 in input) (no such object)

80508d2efecaf7398ff50241a9b11b1f: get request failed for segment <segment id> (0 out of 2 segments processed (unknown error)


Formal Resolution:
To resolve this issue, apply the NetBackup PureDisk Remote Office Edition 6.5.1 as soon as becomes available.  This is currently scheduled for release in Q1 of calendar year 2009.

Please note that the formal resolution will prevent this issue from occurring to future backups, but cannot recover missing data from affected backups. Note also that this release will not prevent disk or network errors from occurring, but will take additional actions in failing the backup job at the moment the above conditions are present.  It is strongly recommended to apply the PureDisk 6.5.1 patch as soon as possible to ensure the issues described in this document are not encountered.


Workaround:
A direct workaround for this issue is not currently available.  However, it is highly recommended that the Server Requirement and Capacity Planning sections be referenced in the PureDisk Best Practices guide to help prevent the Content Routers from exceeding the configured space usage:
 http://seer.entsupport.symantec.com/docs/303544.htm

If it is believed that the PureDisk configuration may be affected by this issue, the 6.5.1 update should be applied when available.  If 6.5.1 is not yet available or cannot be applied, it is recommended to contact Symantec Enterprise Technical Support, referencing the number of this article (TechNote 313465).


Symantec Strongly Recommends the Following Best Practices:
1. Always perform a full backup prior to and after any changes to your environment.
2. Always make sure that your environment is running the latest version and patch level.
3. Ensure that the content router is allotted sufficient disk space and check system logs regularly to ensure that no disk or system level issues are present.


How to Subscribe to Software Alerts:
If you have not received this TechNote from the Symantec Email Notification Service as a Software Alerts, please subscribe at the following link:
 http://maillist.entsupport.symantec.com/subscribe.asp

Supplemental Materials

SourceETrack
Value1442824
DescriptionETrack (PureDisk) 1442824: pdbackup does not deal properly with abort message from the CR.

SourceETrack
Value1442788
DescriptionETrack (PureDisk) 1442788: Full CR can result in data loss for files with a size less than or equal to the segment size.

Legacy ID



313465


Article URL http://www.symantec.com/docs/TECH64775


Terms of use for this information are found in Legal Notices