PureDisk server upgraded to 6.6.1.2 EEB04 and now pdcr is crashing

Article:TECH158545  |  Created: 2011-04-21  |  Updated: 2012-07-28  |  Article URL http://www.symantec.com/docs/TECH158545
Article Type
Technical Solution


Environment

Issue



Content Router service keeps crashing spoold (pdcr) will crash and then need to be started by hand NetBackup SLP dups to this PureDisk SPA shows 213 errors when pdcr dies on PureDisk server. 


Error



Storaged log errors:

April 18 12:00:04 WARNING [1079023936]: -1: /Storage/queue/sorted-3274781-3330979.tlog.tmp is not a valid file name (3), skipping it.

April 18 12:03:47 ERR [1079023936]: 25015: Could not establish a connection to xx.xxx.xxx.xxx:10082: connect failed (Connection timed out)

April 18 12:03:47 WARNING [1079023936]: 25015: Could not enqueue a remote operation to add the reference from DO 022f3d3bd5e8ec6d68d07a76b9b52527 to SO be80c0be9dcb4f39f446685dafd4ceb4: connection timed out.

April 18 12:03:47 ERR [1079023936]: 25015: Could not schedule all SODORefAdds for DO with fingerprint 022f3d3bd5e8ec6d68d07a76b9b52527(connection timed out)

April 18 12:03:47 ERR [1079023936]: 25015: Could not process spool entry 8538: connection timed out

April 18 12:04:11 WARNING [1079023936]: 25000: Transaction log /Storage/queue/sorted-3804728-3804743.tlog failed: Could not process tlog entries: connection timed out

April 18 12:04:21 ERR [1079023936]: -1: The queue processing delay of 5.00 days is at or above the warning level of 3 days. This means that the queue of the CR has grown too large and the CR may become full. Please contact support.

April 18 12:09:32 WARNING [1079023936]: -1: /Storage/queue/sorted-3274781-3330979.tlog.tmp is not a valid file name (3), skipping it. syslog errors:

Apr 18 12:05:01 eawokspa01 /usr/sbin/cron[30629]: (root) CMD (/opt/pdag/bin/php /opt/pdspa/cli/CheckConnectivity.php > /dev/null 2>&1)

Apr 18 12:05:01 eawokspa01 /usr/sbin/cron[30631]: (root) CMD (/opt/pdag/bin/php /opt/pdspa/workflowengine/updateDSInPolicies.php >>/Storage/log/cron.updateDSInPolicies.log 2>&1)

Apr 18 12:05:09 eawokspa01 slapd[15175]: connection_read(16): no connection!

Apr 18 12:05:59 eawokspa01 vxatd: 52522,195,3,15226,1082132800,(null),Auditing,0 1|root|eawokspa01.corp.pbwan.net|10.143.101.133|1|0|AT_AUTH|::ffff:10.143.101.133|root|10|5|ldap|PRPL=root|Domain=PureDisk_Internal|Broker=eawokspa01.corp.pbwan.net:2821| Account root failed to logon.


Environment



NetBackup 7.01 master server (don't see rocksolid in here 2233961) Instead other EEB's. There are 4 NBU 7.01 media servers. PureDisk spa eagdlspa01(services=spa,nbu,mbs,mbe,cr) Second CR called eawokpdr01 (services=cr,nbu) PureDisk 6.6.1.2 with NB_PDE_6.6.1.2_EEB04-rollup1 This STU is being used as duplication destination in STU called SLPST_Woking_PD_Pool


Cause



Problem started after upgrading to PureDisk 6.6.1.2 with NB_PDE_6.6.1.2_EEB04-rollup1. Iptables appears to have taken on incorrect values, blocking communications between the SPA and the secondary NBU,CR node.


Solution



What are the steps to turn iptables back on making sure the proper rules are in place? 

We suspect something may have happened with the loading of the firewall rules.  You can check this file on the SPA and the other nodes to see if there are any differences.

# more /etc/puredisk/custom_iptables_rules

If the values are the same on all nodes, then restart IPTables on the SPA to see if the correct values are shown.

If that file is somehow different on the SPA, then that may be why it had different rules.
Also, running this script on the SPA should set the firewall rules to the correct values.
# /etc/init.d/pdiptables restore

Then start iptables again to see if the correct values are being used.

Verify pdcr no longer crashes.



Article URL http://www.symantec.com/docs/TECH158545


Terms of use for this information are found in Legal Notices