HotFix: NB_7.0.1_ET2233961 version 8 is a NetBackup 7.0.1 Hotfix for NetBackup Deduplication servers

Article:TECH153190  |  Created: 2011-02-11  |  Updated: 2011-07-01  |  Article URL http://www.symantec.com/docs/TECH153190
NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made.
Article Type
Technical Solution



Issue



This Technical Article contains information on the ET2233961 version 8 which is attached to this document.

New Fixes in EEB8
---------
Fixes listed below are new to 2233961 version 8 and were not included in 2233961 version 7.

* ET2334662 - Dataloss - fingerprint never made it to the tlog, resulting in a backup that is 'status 0' but can not be recovered from.

If a hardware read error occurs while processing internal data files dataloss may occur.
The internal data processing known as queue processing requires reading files from the MSDP storage location.
If hardware errors occur during this processing the resulting state of the deduplicated storage will be incorrect and lead to dataloss.

* ET2354784 - Requesting code change to save <storagepath>\processed tlog contents for recovery when CRQP failed.

When unknown events lead to a detectable failure within queue processing, save the files being processed for later analysis.
This allows Symantec engineering to determine root cause on hardware related failures like ET2334662.

* ET2358251 - MSDP server queue processing appears to have hung.

An internal data processing mechanism known as queue processing could appear to be hung on Windows.
Due to a difference in buffering mechanisms queue processing on Windows would unnecessarily
read from the disk. This would cause a slow down that would make the queue processing appear hung.

* ET2355000 - Client side-dedup backups fail with status 14 " error 2060057: OpenStorage Proxy Plugin Error"

A failure within the client side tar stream handler could cause client side-dedup backups to fail

* ET2383814 - NBU Image Cleanup fails with status code 174 when the DO FP of the fragment's .info file is marked corrupt by recoverCR execution

After recovery and repair of a storage corruption has occurred, if one of the files marked as corrupted is a fragment's .info file image cleanup will
fail on that image.

* ET2377446 - Incremental Windows-FlashBackup backup of VMWare VM failed intermittently with "error 2060022: software error" in fbu_fill_bitmap.

If the vmware files contain a file hole the backup may intermittently fail.

* ET2400187 - NBU 7.0.1 msdp optdup hangs 99% till 12 hours pass, then fails status 84.

A hard coded timeout was being applied to optimized duplication jobs.

* ET2392849 - opt-dup bandwidth is not getting honored after opt-dup is cancelled it continues to run while another opt-dup kicks off.

If an optimized duplication job fails unexpectedly the internal jobs and processes created on the MSDP storage will continue to run.
The MSDP storage will now monitor it's internal jobs and detect that the job does not have a corresponding NBU job. To activate this behaviour
an entry must be created on the storage pools database storage:
In the directory: /[Storage database]/databases/spa/database/scheduler
Create the new file, with name '4'
It should contain the following contents:
RepOrphan|*/10 * * * *|300|600|

* ET2394719 - Windows-FlashBackup backup of VMWare VM failed intermittently with "fbu_scan_buf ERROR bytesNeeded already set" message.

When tar headers for backups containing more than one volume contain very specific headers and sizes a failure may occur.

 

 


Environment




The EEB installation instructions are specific to your platform. 

eebinstaller.2233961.8.AMD64.exe       Windows x64 Server or Client Installation
eebinstaller.2233961.8.x86.exe         Windows x86 Server or Client Installation
eebinstaller.2233961.8.solaris         Solaris Sparc Installation
eebinstaller.2233961.8.solaris10       Solaris 10 Sparc Client Installation
eebinstaller.2233961.8.linuxR_x86      x64 RedHat Enterprise Linux servers with a 2.6 Kernel
eebisntaller.2233961.8.linuxR_x86_2.6  RedHat Enterprise Linux clients with a 2.6 Kernel
eebinstaller.2233961.8.linuxS_x86      x64 SuSE Linux Enterprise servers with a 2.6 Kernel
eebisntaller.2233961.8.linuxS_x86_2.6  x64 SuSE Linux Enterprise clients with a 2.6 Kernel


Solution




=============
PREREQUISITES
=============

This EEB can be installed on NetBackup 7.0.1 software.

Make sure to install this EEB on all NetBackup 7.0.1 systems at the same time.

Before you install this EEB, put the deduplication storage server and the
PureDisk disk pool into an administrative down state. After the installation
is complete, you can bring the disk pools back up.

For information about the NetBackup 7.0.1 release, see the following:

http://www.symantec.com/docs/TECH135812

NOTE: If client side deduplication is used, install this EEB as follows:
* On the client
* On the media server deduplication pool (MSDP) server
* On all servers that are deduplication storage pools or can write to a
deduplication storage pool

Version 8 includes fixes that are not included in the NetBackup 7.1.0.1 release. Version 7 and prior versions were included in NetBackup 7.1.0.1. 
Customers who have applied this EEB are encouraged to upgrade to 7.1.0.1 and install the EEB from ET2412409.


============
ENHANCEMENTS
============

The following is an overview of the improvements that are included in this
NetBackup EEB. See the PRODUCT FIXES section of this README file for more
information.

* Performance improvements

- Replication

Optimized deduplication is significantly improved in environments that
include either PDDO storage pools or deduplication storage server as back
end storage. This enhancement removed overhead from the PureDisk storage
pool authority for PDDO back ends. For deduplication storage server back
ends, the client code and server code were reworked to improve
supportability and reliability.

- Concurrent backup and data removal in PDDO environments.

This EEB enables backup jobs and data removal jobs to run simultaneously.
The concurrent processing can occur when NetBackup 7.0.1 + this EEB is
installed on the NetBackup media server and one of the following PureDisk
levels is installed on the PDDO storage unit:

* PureDisk 6.6.1.2

OR

* PureDisk 6.6.0.3 + EEB 31

In past releases, data removal jobs could not run concurrently with backup
jobs. In some environments, data removal jobs did not run because backup
jobs ran 24 hours per day.

- Backups

Client-side caching has been improved to increase the cache hit rate for
backups. This can increase backup performance, especially in environments
that include clients attached to high-latency networks.

* Other fixes

- Content router queue processing / Tlog processing reliability fixes

Content router queue processing now stops accepting new data after five
failed queue processing attempts. This stops the queue from accumulating
too many entries and becoming unable to manage its backlog. Additional
fixes are included to make queue processing more reliable.


PRODUCT FIXES
=============

General product fixes
---------------------
* ET2270913 - backups fail to duplicate with status 85 "media read error". Could not receive result for SO 8537: unrecoverable crypto error

While segmenting a backup stream, NetBackup sometimes creates segments that are smaller than the internal header metadata.
The backup jobs succeed and the backups are intact. However, when users try to restore these files AND encryption was enabled in the pd.conf file, restore jobs and duplication jobs fail. The lack of an internal header and the small data segment size cause the errors.
With this EEB, the data can be restored or duplicated.

* ET2254913 - bpdm core in strcmp when OST plugin libstspipd.so does compare_hosts with a NULL IP address

This EEB fixes a problem caused by unresolved host names. If multiple nodes wrote optimized duplicated backups to the same
destination, NetBackup might have written core dump files in bpdm if the hostnames of the other hosts did not resolve.
Core dump files were also written if a source node that had been used for optimized duplication was retired; in this case, the host name of the retired node no longer resolved. To work around this problem, you could make sure that all host names resolved.

* ET2281196 - Queue processing stops working, tlogs created w/mismatched filename versus internal tlogid (off by one)

Queue processing stopped because of hardware problems or accidental changes to internal counter files. It was possible to repair
the inconsistency, but the system ran the risk of running too long with inconsistent counter values.
This fix causes tlog processing to stop, shuts down spoold, and alerts the user to the problem. Earlier revisions of
EEB 2281196 prevented spoold from starting on Windows platforms, and this situation has been resolved in this EEB.

* ET2315547 - Missing GetRemoteSPAType fails opt-dupe from MSDP to PDDO if remote dataselection id is not 2

During an optimized duplication job, NetBackup incorrectly assumed that the remote data selection ID was 2. This assumption caused duplication
jobs to end without finding the duplicated image, and the jobs failed. With this fix, optimized duplication jobs now
correctly determine the remote data selection ID and correctly locate the duplicated images.

* ET2322081 - nbrmms crashes

This problem occurred when NetBackup could not connect to a deduplication storage server or a PureDisk storage server.
In these environment, nbrmms crashed when NetBackup attempted to communicate with the storage server. It was possible to work around this problem if you removed the host name configuration file in the ost-plugins directory.
The nbrmms crashes could also occur when the storage server was up, but no file descriptors were available to nbrmms; this situation was more likely to occur on UNIX and Linux platforms. There is no workaround for this problem other than applying this EEB.

* ET2161303 - cleanup pdvfs temp files when trying to register

Fixes a problem that occured from an incorrect or incomplete deduplication
server configuration. The deduplication servers were prone to filing up the
/var/tmp/ file system with directories that started with "pdvfs"
(e.g.: pdvfsJAAT3aGwu) containing Agent.cert and Agent.key

* ET2149179 - set metabase case insensitive in mount.c

Fixes the problem that occured when the metabase encountered both an
upper-case backup ID (ex: NBUSUPW29_1271661651) and a lower-case backup ID
(ex: nbusupw29_1271661651). The bpverify command for either backup ID failed
with a status 191 error. The same error could occur at the end of a
duplication job.

* ET2217383 - Error in plugin log: pd_get_event PDVFS_IOCTL_GET_EVENT failed
(22 Invalid argument)

The PDDO plug-in called GET_EVENT when it connected to a PureDisk PDDO server
and generated the following non-fatal errors:
12/07/10 18:38:25 [6396] [6396] [DEBUG] PDVFS: [6] pdvfs_lib_log: Start GetEvent
12/07/10 18:38:25 [6396] [6396] [DEBUG] PDVFS: [6] pdvfs_lib_log: WSRequestExt:submitting
12/07/10 18:38:25 [6396] [6396] [DEBUG] PDVFS: [6] pdvfs_lib_log: Connecting to webservice <address> using SSL.
12/07/10 18:38:25 [6396] [6396] [DEBUG] PDVFS: [6] pdvfs_lib_log: First CURL call (res: 0, default r_timeout: 120)
12/07/10 18:38:25 [6396] [6396] [ERROR] PDVFS: [1] pdvfs_lib_log: Webservice return format was invalid.

* ET2252878 - CRC mismatch in sorted intermediate file

Fixed problems related to the offset overflows. Variable type off_t is only
32 bits in Windows, but it is 64 bits in Linux/UNIX. If a sorted
intermediate file or sorted tlog file is larger than 4GB under Windows,
offset overflow can happen, incurring read and write errors and even
potential data loss. This fix changes off_t to off64_t and chooses the
corresponding 64-bit fseek() and ftell() for Windows.

Log snippet:
January 04 05:16:35 ERR [000000000721F990]: 25042: CRC mismatch for spool entry header at offset 8
January 04 05:16:35 ERR [000000000721F990]: 25042: Could not read spool entry: Illegal byte sequence
January 04 05:16:35 ERR [000000000721F990]: 25042: Could not read object.
January 04 05:16:35 ERR [000000000721F990]: 25000: Could not merge-sort tlog files 1138691-1177613:no error

 

PDDO duplication fixes
----------------------

* ET2175705 Network Address Translation breaks optimized duplications

Problems occured when a source deduplication storage server could not connect
to the given IP address of the destination deduplication storage server.
NetBackup generated the following log messages when this problem occurred:
Log snippet:
11:58:22.320 [1180.3748] <16> <host name>: PDVFS: [1] PdvfsReplicate: CAreplicateFiles pdde://<address> failed: 15 (connection timed out)
11:58:22.320 [1180.3748] <2> <host name>: PDVFS: [4] PdvfsReplicate: exit res=-1, errno=Unknown error
11:58:22.320 [1180.3748] <16> <host name>: impl_copy_extent PdvfsReplicate failed:Unknown error
11:58:22.320 [1180.3748] <2> <host name>: impl_copy_extent exit 2060014 operation aborted
11:58:22.320 [1180.3748] <2> <host name>: pi_copy_extent_v9 exit (2060014:operation aborted)
11:58:22.320 [1180.3748] <2> set_job_details: Tfile (42): LOG 1277809102 32 bpdm 1180 sts_copy_extent failed: error 2060014 operation aborted

The fix for this problem uses the server host name rather than the IP address
to overcome this in image.c and mount.c (ET2078748)

* ET2073172: Replication library retries always with an empty file list

Fixed a problem that occurred when replication failed inside its libraries.
The first retry used an empty file list, which created a storage leak and
data loss.

PDDO backups and restores
-------------------------

* ET1884722 - Backups to Media Server Dedup failing with status 84 when
attempting to write first fragment header for VMWare backup

The fix for this ET addresses the following problems encountered during
Windows flashbackups:
o Backing up large data objects
o Backing up embedded files
o Backups for total partition sizes that are a multiple of 40GB
The previous conditions caused NetBackup to fail and generate a status error 84.

* ET1969210 - Synthetic failing with "end point terminated with an error(610)"

The fix for this ET addresses the problem encountered when optimized
synthetic full backup jobs failed and NetBackup generated the "end point
terminated with an error(610)" message. This EEB includes a fix for this
problem that creates a map entry in the cache and does not free the map entry
until after use of the cache is no longer needed. This change enables
optimized synthetic full backup jobs to succeed.

* ET2158839 - Fixes for Optimized Synthetics error 610 and VMDK dedupe ratio

Porting map entry cache changes for Exchange Granular Restore improvements
missed optimized synthetics path. This EEB extends map entry cache code to
handle optimized synthetics correctly.

* ET2069007 - Hyper-V mapped VM backup fails if configured PDDE with Client
direct option
ET2056505 - [SCALE] VMWare Mapped VM backup of linux VM fails if configured
PDDE with Client Direct option
ET2052392 - Mapped Linux backup of Hyper-V is failing with the client Direct
option

Failures were detected in VM/Hyper-V backups with the Client Direct option
enabled. Porting vmdk improvement work from PureDisk 6.6.1 fixed all
three reported problems.

* ET2163275 - Fix to load PDDO fingerprint cache from all fragments rather than
just first one

The cache hits remained at 0%, causing slow backups because the client had to
contact the media server to check for each fingerprint's existence
(> 350.000 segments) over a high latency (>200ms)/low bandwidth (2Mbps) link.
It seemed the client only used the first fragment as fingerprint cache. The
fix uses all fragments from the last full backup as cache. The backups now
run with >85% cache hits and >90% deduplication rates.

* ET2128134 - MATCH_PDRO=0 write fix. Deduplication rate logging per file
enhancement.

When MATCH_PDRO=0 is configured in the pd.conf file, inconsistent
deduplication occurs. The correction in this EEB resolves several issues
with regard to this setting and corrects the behavior to be consistent with
previous versions of PureDisk.

* ET2116365 - Improvements to deduplication ratios for large file (PST) backups

Fixed an inconsistent deduplication rate problem that was caused by
misalignment on file boundaries.

* ET2219988 - not to fetch DO if FP is MD5_CRC32_EMPTY_STRING in
pdvfs_cr_cache_do_fp()

Fixed a PDDO job failure problem caused when one of the data objects was the
fingerprint f1450306517624a57eafbbf81266a67a for an empty file. NetBackup
generated the following log messages when this problem occurred:

Log snippet:
15:39:23.950[5644.2968][INFO][dummy][70750:bptm:10236:<servername>]PDVFS: [3] pdvfs_sess_cb: Established session with Content Router at <ip_addr> (Version 6.6.0.35883, using protocol version 6.6)
15:39:24.153[5644.2968][ERROR][dummy][70750:bptm:10236:<servername>]PDVFS: [1] pdvfs_cr_cache_do_fp: CRFileStatEx failed: n
15:39:24.153[5644.2968][ERROR][dummy][70750:bptm:10236:<servername>]setup_pdvfs_image_cache error (-1 9 Bad file descriptor) PDVFS_IOCTL_ADD_DO_FP_CACHE /srvwin0383#1/2/ntsrvcsp01/CSP-ntsrvcsp01/ntsrvcsp01_1290455361_C1_F27.img
15:39:24.153[5644.2968][DEBUG][dummy][70750:bptm:10236:<servername>]setup_pdvfs_image_cache cache enabled for ntsrvcsp01_1290455361_C1_F
15:39:24.153[5644.2968][DEBUG][dummy][70750:bptm:10236:<servername>]impl_image_handle exit (2060022:software error)
15:39:24.153[5644.2968][DEBUG][dummy][70750:bptm:10236:<servername>]impl_create_image exit (2060022:software error)
15:39:24.153[5644.2968][DEBUG][dummy][70750:bptm:10236:<servername>]pi_create_image_v9 exit (2060022:software error)
15:39:24.153[5644.2968][ERROR][dummy][70750:bptm:10236:<servername>]cp_create_image_v7 10/12/10 15:39:24: fail to create image <image>_1292013534_C1_F1 of lsu PureDiskVolume with plugin 1, return value: 2060022

* ET2196681 Requesting EEB for "PDDO configurable web service call retries"

Fixed a problem that caused optimized duplication jobs to fail and generate a
status 84 message. The underlying issue was a web service call that failed
during the duplication and caused the job to fail in NetBackup. If the
webservice call was retried it would have prevented the failure. NetBackup
generated the following log messages when this problem occurred:

Log snippet:
11/12/10 09:09:45 [3200] [4] PDVFS: [1] pdvfs_lib_log: Webservice operation failure (opcode 7, couldn't connect to host)
11/12/10 09:09:45 [3200] [4] PDVFS: [1] pdvfs_get_job_state: _getURL failed: 53 11/12/10 09:09:45 [3200] [4] check_pdvfs_job PDVFS_IOCTL_GET_JOB_STATE failed fd=3 (10061 Unknown error)
11/12/10 09:09:45 [3200] [4] impl_copy_extent check_pdvfs_job failed: error occurred on network socket
11/12/10 09:09:45 [3200] [3] impl_copy_extent exit 2060019 error occurred on network socket

* ET2218137: MSDP backups failing with status 84s - pdvfs_sync_po_list: Error
pdvfs_send_po_list failed 4

Fixed problems related to backup job failures for which NetBackup generated
messages similar to the following:

Log snippet of example backup failure:
01:10:16.510 [6572.6012] <2> bp_sts_close_image: bytesNeeded=0 bytesStillNeeded=0
01:17:10.596 [6572.6012] <16> nbm04net: PDVFS: [1] pdvfs_send_po_list:MBPOAddList failed: 4
01:17:10.596 [6572.6012] <16> nbm04net: PDVFS: [1] pdvfs_sync_po_list: Error pdvfs_send_po_list failed 4
01:17:10.596 [6572.6012] <16> nbm04net: PDVFS: [1] pdvfs_set_mb_import: PO sync failed: No such file or directory
01:17:10.596 [6572.6012] <16> nbm04net: sync_pdvfs sync using /<host>/.sync failed, expected COMPLETED, found (FAILED)

* ET2218085: Error 83s on NBU backup jobs while PureDisk is re-routing

At the beginning of a NetBackup backup job, the list of fingerprints from
the previous backup is retrieved and used as a local fingerprint cache. If
the list of fingerprints is going to be rerouted, but actually not yet
rerouted, that list is not found and the NetBackup backup job fails.
With this fix, there will be a second attempt to get the list from the _old_
content router.


* ET2227064 : Restore is incomplete when we restore the synthetic backup image.
EXIT STATUS 92: media manager detected image that was not in tar format.

Fixed a problem that occurred when NetBackup attempted to restore a
multifragment optimized synthetic image after applying an EEB.

Content router queue processing
-------------------------------

* ET2215448 - Tlog file need error checking, and backup copy for recovery
in case of tlog corruption.

Fixed this problem by implementing the following changes:
1. Read the tlog file again right after it was closed.
2. Created a copy of the tlog file in the processed directory.
3. Deleted the copy after queue processing completed successfully.

With this EEB, tlog entries are stored twice, one copy under the “<Storage>/queue/”,
and one under the “<Storage>/ processed/”. This practice increases redundancy and
reliability. If a tlog error occurs, the copy in "<Storage>/processed" can be
used to recover.

* ET2222209: tlog processing can retry for days to accumulate a huge tlog
backlog

An error in the content router queue processing could cause queue processing
to run for several days and could cause the queue to accumulate a large
number of unprocessed tlogs. When the real issue surfaced, the large number
of tlogs was very hard to manage. Content router queue processing now stops
after five failed attempts.

* ET2222453 Tlog delay file is not sync to disk, fsync or _commit was not
called.

Fixed a problem caused when the tlog delayed file was not synchronized to
disk. This problem could cause the delay file to become corrupted in the case
of a power outage.

* ET2248351 CRC mismatch in sorted tlog file

Fixed problem caused when one variable-named offset in tlog sorting code
overflowed and the sorted tlog file was written from the beginning again.
The variable is of type off_t, which is 32 bits on windows, so when the tlog
file size was larger than 4GB, it overflowed.

* ET2248352 TOTALLY shut down spoold after CRQP failed for five times

Fixed a problem that occurred after content router queue processing had been
shut down. For this problem, there were still several new tlog files
created. The original code only stopped new backup jobs while other
services, such as restore and image expiration, continued. To minimize data
loss, this fix stops all services immediately so customers can call Symantec
technical support.

* ET2158198 spoold will not start stating cannot move compactd.log (port from
7.0 ET2073963)

Fixed a problem in which spoold was unable to start due to the following error:

June 11 16:56:29 INFO [00000000012F9180]: _storeInit: container headers information is read from data store header file
June 11 16:56:29 ERR [00000000012F9180]: 25001: _storeRollback: could not move F:\NBU_DD_Store\data/journal/compactd.log to F:\NBU_DD_Store\data/journal/compactd.spare (The file exists. )
June 11 16:56:29 ERR [00000000012F9180]: 25001: _storeDCIDChangeRecover: failed to rollback cross-container compaction (object already exists)
June 11 16:56:31 ERR [00000000012F9180]: 25001: _storeRollback: could not move F:\NBU_DD_Store\data/journal/compactd.log to F:\NBU_DD_Store\data/journal/compactd.spare (The file exists. )

* ET2125388: Content Router queue processing 'hangs'

Added the following fixes to reduce the number of hung content router queue
processing job:
- Progress while sorting tlog files
- More regular logging while processing the sorted.tlog
- Regular logging when spending a long time in AddRef/DelRef loops
- Logging one line when CREATE INDEX on objects2 starts
- Additional and regular logging while processing the delayed file

==============
KNOWN PROBLEMS
==============

None.


========================
VULNERABILITIES RESOLVED
========================

None.

=========================
INSTALLATION INSTRUCTIONS
=========================

Use the Symantec EEB installer to install this EEB. The instructions for the
installer are at the following URL:

http://www.symantec.com/docs/TECH64620

The following additional instructions apply to this EEB:

* NOTE: Make sure to install this EEB on all NetBackup 7.0.1 systems
at the same time.

* Stop all NetBackup processes.

* On all clients, ensure that nbostproxy is stopped. Review the
systems process list to ensure this.

The EEB installation instructions are specific to your platform. Please choose
the appropriate platform after download:

eebinstaller.2233961.8.AMD64.exe    Windows x64 Server or Client Installation
eebinstaller.2233961.8.x86.exe           Windows x86 Server or Client Installation
eebinstaller.2233961.8.solaris             Solaris Sparc Installation
eebinstaller.2233961.8.solaris10        Solaris 10 Sparc Client Installation
eebinstaller.2233961.8.linuxR_x86 x64          RedHat Enterprise Linux servers with a 2.6 Kernel
eebisntaller.2233961.8.linuxR_x86_2.6          RedHat Enterprise Linux clients with a 2.6 Kernel
eebinstaller.2233961.8.linuxS_x86 x64          SuSE Linux Enterprise servers with a 2.6 Kernel
eebisntaller.2233961.8.linuxS_x86_2.6 x64   SuSE Linux Enterprise clients with a 2.6 Kernel


Attachments

HotFix: NB_7.0.1_ET2233961_8 is a NetBackup 7.0.1 Hotfix for NetBackup Deduplication servers
NB_7.0.1_ET2233961_8.zip (53.3 MBytes)

Supplemental Materials

Description



Description




Article URL http://www.symantec.com/docs/TECH153190


Terms of use for this information are found in Legal Notices