Video Screencast Help

NBU backup failed! log:BPCRConnection::handleMsg] (0x2aaaab07a340) had an error, terminating bpcd connection(../BPCRConnFactory.cpp:651)

Created: 08 Oct 2013 • Updated: 09 Oct 2013 | 3 comments

NBU NDMP backup for symantec FileStor:

NDMP Topology Simple NDMP Configuration.
NBU V7.0:SuSE10 SP2 X64 
 
Just happen once,this issue never occur again since.Can you help me figure out what's the root cause?
 
Manual start backup job,keep mounting tape for hours,NDMP server on NAS got none message.
 
Total NBU log about 80MB,I found these may be the key logs:
 
09/27/13 17:39:33.889 [BPCRConnection::handleMsg] (0x2aaaab07a340) had an error, terminating bpcd connection(../BPCRConnFactory.cpp:651)
09/27/13 17:39:33.889 [BRMComm::mpxGroupUnjoinable] (0x2aaaab133d00) mpx group is already unjoinable, mpxid={4A87D4CC-2758-11E3-A717-52876F0F081B}(../BRMComm.cpp:624)
09/27/13 17:39:33.889 [BPCRConnFactory::returnConnection] connection returned was found on the active list  BPCRConnectionRequest(pool=0, connectionName=SuSE10NBU::BRM, requester=0x2aaaab133d00, connection= BPCRConnection(state=3, hadError=1, activeFDcnt=2, connectName=SuSE10NBU::BRM, hostName=SuSE10NBU, userName=))(../BPCRConnFactory.cpp:1019)
09/27/13 17:39:33.889 [BPCRConnection::unlinkFromRequester] marking connection not reusable because parser is not reusable  BPCRConnection(state=3, hadError=1, activeFDcnt=2, connectName=SuSE10NBU::BRM, hostName=SuSE10NBU, userName=)(../BPCRConnFactory.cpp:305)
09/27/13 17:39:33.889 [BPCRConnectionRequest::unlinkFromRequester()] connection not reusable, will close  BPCRConnection(state=4, hadError=1, activeFDcnt=2, connectName=SuSE10NBU::BRM, hostName=SuSE10NBU, userName=)(../BPCRConnFactory.cpp:1571)
09/27/13 17:39:33.889 [BPCRConnFactory::returnConnection] (0xdb2bf0) connection could not be reused(../BPCRConnFactory.cpp:1042)
09/27/13 17:39:33.889 [ACEMsgTran::close_u] input_sock_->get_handle() = 37(ACEMsgTran.cpp:224)
09/27/13 17:39:33.889 [ACEMsgTran::close_u] output_sock_->get_handle() = 43(ACEMsgTran.cpp:241)
09/27/13 17:39:33.927 [DefaultQueue::handle_input] sending RB RBrelease :  id={D8F50244-2757-11E3-85E9-5037CF409CAA} id={D8F502C6-2757-11E3-B8DE-7847C91B5DD0} id={D8F4FB32-2757-11E3-B6F1-E7991C6BB9D9}(../CallbackQueue.cpp:1392)
09/27/13 17:39:33.928 [BpjobdQueue::queueRequest] queueing BPJobdExpireJob jobid=388, secondary jobid count=0 -- retry count=0(../CallbackQueue.cpp:1269)
09/27/13 17:39:33.928 [JobManager_i::doForgottenJobCleanup] (0xdb2e80) job has been forgotten, perform cleanup, jobid=388(../JobManager_i.cpp:2177)
09/27/13 17:39:33.928 [BpjobdQueue::handle_input] sending BPJobdExpireJob jobid=388, secondary jobid count=0(../CallbackQueue.cpp:1392)
09/27/13 17:39:33.929 [JobManager_i::deleteFrozenImage] (0xdb2e80) frozen image delete failed, no snapid for jobid=388(../JobManager_i.cpp:2112)
09/27/13 17:39:33.929 [BpjobdQueue::handle_input] (0xd947d0) CallbackQueue now empty(../CallbackQueue.cpp:1476)
09/27/13 17:39:33.929 [PemTask::taskComplete] (ID:0x2aaaab013128) Active subtask count=0(../PemTask.cpp:527)
09/27/13 17:39:33.933 [Info] V-116-14 CLIENT 100.202.36.230  POLICY N8000-test  SCHED Full  EXIT STATUS 150 (termination requested by administrator)
09/27/13 17:39:33.944 V-118-93 [ResBroker_i::release(id)] releasing allocation ID {D8F50244-2757-11E3-85E9-5037CF409CAA}, status 0
09/27/13 17:39:33.967 V-134-6 [MainShutdown] NDMP Agent program exiting, status = 150, pid = 8527
 
 

 

Operating Systems:

Comments 3 CommentsJump to latest comment

wenhai@huawei.com's picture

more logs:

09/27/13 17:39:33.967 V-134-6 [MainShutdown] NDMP Agent program exiting, status = 150, pid = 8527
09/27/13 17:39:33.971 V-118-93 [ResBroker_i::release(id)] releasing allocation ID {D8F502C6-2757-11E3-B8DE-7847C91B5DD0}, status 0
09/27/13 17:39:33.990 V-118-93 [ResBroker_i::release(id)] releasing allocation ID {D8F4FB32-2757-11E3-B6F1-E7991C6BB9D9}, status 0
09/27/13 17:39:33.991 V-118-104 [ResBroker_i::doEvaluation] Evaluation cycle started
09/27/13 17:39:33.991 [ResBroker_i::doEvaluation] CONF: releaseTimeSeconds = 180, respectRequestPriority = false, doIntermittentUnloads = true, maxHighPriorityQueueSize = 100, breakEvalOnDemand = true
09/27/13 17:39:33.991 [DefaultQueue::handle_input] sending RB RBmediaDone : allocationId={4A7D29C8-2758-11E3-898F-1DC30D0A0032}, mediaId=A20006(../CallbackQueue.cpp:1392)
09/27/13 17:39:33.992 V-118-111 [ResBroker_i::mediaDone] received media done for allocation ID {4A7D29C8-2758-11E3-898F-1DC30D0A0032}, media ID A20006, media key 4,000,076, unload delay 0
09/27/13 17:39:33.994 V-118-102 [ResBroker_i::releaseOne] releasing allocation ID {D8F50244-2757-11E3-85E9-5037CF409CAA}
09/27/13 17:39:34.031 V-134-6 [MainShutdown] NDMP Agent program exiting, status = 150, pid = 8687
............
09/27/13 17:39:34.031 V-134-6 [MainShutdown] NDMP Agent program exiting, status = 150, pid = 8687
09/27/13 17:39:34.039 [DefaultQueue::handle_input] sending RB RBsetMpxGroupUnjoinable : id={4A87D4CC-2758-11E3-A717-52876F0F081B} -- retry count=-1(../CallbackQueue.cpp:1392)
09/27/13 17:39:34.040 V-118-114 [ResBroker_i::setMpxGroupUnjoinable] mark MPX group {4A87D4CC-2758-11E3-A717-52876F0F081B} unjoinable
09/27/13 17:39:34.041 [DefaultQueue::handle_input] sending JL updateJobStatus : jobid=390, status=150(../CallbackQueue.cpp:1392)
09/27/13 17:39:34.044 V-118-102 [ResBroker_i::releaseOne] releasing allocation ID {D8F502C6-2757-11E3-B8DE-7847C91B5DD0}
09/27/13 17:39:34.062 [Error] V-116-85 backup of client 100.202.36.230 exited with status 150 (termination requested by administrator)
09/27/13 17:39:34.084 V-118-102 [ResBroker_i::releaseOne] releasing allocation ID {D8F4FB32-2757-11E3-B6F1-E7991C6BB9D9}
09/27/13 17:39:34.084 V-118-252 [CorbaCall_terminateMPXGroup::execute] sending terminate MPX group ({D8F4F7FE-2757-11E3-AD0B-A8ED6E4E9C12}) to nbjm
09/27/13 17:39:34.085 V-117-65 [ResCallback_i::terminateMPXGroup] terminating mpx group (mpxid = {D8F4F7FE-2757-11E3-AD0B-A8ED6E4E9C12})
09/27/13 17:39:34.085 [BRMComm::terminate] (0x2aaaab147440) BRMComm is already terminating, mpxid={D8F4F7FE-2757-11E3-AD0B-A8ED6E4E9C12}(../BRMComm.cpp:1831)
09/27/13 17:39:34.085 [BRMComm::terminateMpxGroup] (0x2aaaab147440) BRMComm is closing, mpxid={D8F4F7FE-2757-11E3-AD0B-A8ED6E4E9C12}(../BRMComm.cpp:1818)
............
 
Deb Wilmot's picture

All of the information you have is after the fact - the job had already terminated.

The first line of your first set of logs show's that the error already occurred:

09/27/13 17:39:33.889 [BPCRConnection::handleMsg] (0x2aaaab07a340) had an error, terminating bpcd connection

So the connection had an error... of some sort.

It appears that this is the vxlogview output compiled from all the logs which is not useful as there is no pid, thread, etc in the output.  Next time run this with a -d all  - it's messy - but at least the threads could be folled...

The communication path during bpcrconnection is:

nbjm --> bpcompatd  --> [network] --> remote entity --> vnetd --> bpcd --> [fork appropriate command].

 

Something in the above path failed.  Could have been a network hiccup, could have been something else. 

Do you have the bpcd log from whateer entity you were starting the backup on?

Everything else in the log snippets you sent is 'normal' behavior when a backup fails...