PureDisk server upgraded to 220.127.116.11 EEB04 and now pdcr is crashing
|Article:TECH158545|||||Created: 2011-04-21|||||Updated: 2012-07-28|||||Article URL http://www.symantec.com/docs/TECH158545|
Content Router service keeps crashing spoold (pdcr) will crash and then need to be started by hand NetBackup SLP dups to this PureDisk SPA shows 213 errors when pdcr dies on PureDisk server.
Storaged log errors:
April 18 12:00:04 WARNING : -1: /Storage/queue/sorted-3274781-3330979.tlog.tmp is not a valid file name (3), skipping it.
April 18 12:03:47 ERR : 25015: Could not establish a connection to xx.xxx.xxx.xxx:10082: connect failed (Connection timed out)
April 18 12:03:47 WARNING : 25015: Could not enqueue a remote operation to add the reference from DO 022f3d3bd5e8ec6d68d07a76b9b52527 to SO be80c0be9dcb4f39f446685dafd4ceb4: connection timed out.
April 18 12:03:47 ERR : 25015: Could not schedule all SODORefAdds for DO with fingerprint 022f3d3bd5e8ec6d68d07a76b9b52527(connection timed out)
April 18 12:03:47 ERR : 25015: Could not process spool entry 8538: connection timed out
April 18 12:04:11 WARNING : 25000: Transaction log /Storage/queue/sorted-3804728-3804743.tlog failed: Could not process tlog entries: connection timed out
April 18 12:04:21 ERR : -1: The queue processing delay of 5.00 days is at or above the warning level of 3 days. This means that the queue of the CR has grown too large and the CR may become full. Please contact support.
April 18 12:09:32 WARNING : -1: /Storage/queue/sorted-3274781-3330979.tlog.tmp is not a valid file name (3), skipping it. syslog errors:
Apr 18 12:05:01 eawokspa01 /usr/sbin/cron: (root) CMD (/opt/pdag/bin/php /opt/pdspa/cli/CheckConnectivity.php > /dev/null 2>&1)
Apr 18 12:05:01 eawokspa01 /usr/sbin/cron: (root) CMD (/opt/pdag/bin/php /opt/pdspa/workflowengine/updateDSInPolicies.php >>/Storage/log/cron.updateDSInPolicies.log 2>&1)
Apr 18 12:05:09 eawokspa01 slapd: connection_read(16): no connection!
Apr 18 12:05:59 eawokspa01 vxatd: 52522,195,3,15226,1082132800,(null),Auditing,0 1|root|eawokspa01.corp.pbwan.net|10.143.101.133|1|0|AT_AUTH|::ffff:10.143.101.133|root|10|5|ldap|PRPL=root|Domain=PureDisk_Internal|Broker=eawokspa01.corp.pbwan.net:2821| Account root failed to logon.
NetBackup 7.01 master server (don't see rocksolid in here 2233961) Instead other EEB's. There are 4 NBU 7.01 media servers. PureDisk spa eagdlspa01(services=spa,nbu,mbs,mbe,cr) Second CR called eawokpdr01 (services=cr,nbu) PureDisk 18.104.22.168 with NB_PDE_22.214.171.124_EEB04-rollup1 This STU is being used as duplication destination in STU called SLPST_Woking_PD_Pool
Problem started after upgrading to PureDisk 126.96.36.199 with NB_PDE_188.8.131.52_EEB04-rollup1. Iptables appears to have taken on incorrect values, blocking communications between the SPA and the secondary NBU,CR node.
What are the steps to turn iptables back on making sure the proper rules are in place?
We suspect something may have happened with the loading of the firewall rules. You can check this file on the SPA and the other nodes to see if there are any differences.
# more /etc/puredisk/custom_iptables_rules
If the values are the same on all nodes, then restart IPTables on the SPA to see if the correct values are shown.
If that file is somehow different on the SPA, then that may be why it had different rules.
Also, running this script on the SPA should set the firewall rules to the correct values.
# /etc/init.d/pdiptables restore
Then start iptables again to see if the correct values are being used.
Verify pdcr no longer crashes.
Article URL http://www.symantec.com/docs/TECH158545