Video Screencast Help

SAN Backup failed - backup active with error robot operation failed

Created: 22 Jan 2013 • Updated: 25 Mar 2013 | 43 comments
This issue has been solved. See solution.

Hi fellow friends;

I have issue with SAN backup.

Component:

  1. Master Server: Windows 2008R2 Server
  2. Media Server: IBM AIX 7.1
  3. Tape Library: Quantum i80, 4 units tape drives (LTO5)
  4. SAN Switch (Brocade)
  5. Veritas Netbackup v7.1

Issue:

When run backup 'policy' for Media server (IBM AIX 7.1), media keep mounting forever then got robot operation failed (status from activity monitor). This policy configure to use storage unit define to backup through SAN network.

If run backup for other client & using different storage unit, backup running OK and normally ended up status=0,  completed.

In summary: can conclude backup over SAN not OK while backup through LAN network is OK.

Appreciated advice/comments/recommendation. Thank you in Advance..

 

 

 

Comments 43 CommentsJump to latest comment

Nagalla's picture

hi,

I just want to make sure

are you trying to use the FT media server or SAN media server?

 

Imra_backup's picture

Hi Nagalla,

Thanks for quick reply.

Not sure how to answer this (sorry), but I guess I have Media Server. How to verify whether FT or SAN media server ?

 

Marianne's picture

Are tape drives shared? Do you have SSO license added on master and media server?

Did you use the Device Config wizard on the master to config devices for master and media server?
Please post output from media server:
/usr/openv/volmgr/bin/scan
/usr/openv/volmgr/bin/tpconfig -l
/usr/openv/volmgr/bin/vmoprcmd -d

Ensure logging on media server is enabled as follows:
bptm log folder exists in /usr/openv/netbackup/logs
VERBOSE entry in /usr/openv/volmgr/vm.conf (followed by NBU restart)
Ensure syslog is enabled at OS level.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Imra_backup's picture

Hi Marianne,

Tape drives are shared. SSO license both install on  master & media.

Let me check on the device config. Below media server output as requested.

scan

oot@ibmdb1:>./scan
************************************************************
*********************** SDT_TAPE    ************************
*********************** SDT_CHANGER ************************
************************************************************

------------------------------------------------------------
Device Name  : "/dev/rmt0.1"
Passthru Name: "/dev/rmt0.1"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "IBM     DDS Gen7        VS11"
Vendor ID  : "IBM     "
Product ID : "DDS Gen7        "
Product Rev: "VS11"
Serial Number: "20150146"
WWN          : ""
WWN Id Type  : 0
Device Identifier: ""
Device Type    : SDT_TAPE
NetBackup Drive Type: Not Found(5)
Removable      : Yes
Device Supports: SCSI-3
Flags : 0x0
Reason: 0x0

 

tpconfig -l

root@ibmdb1:>./tpconfig -l
Device Robot Drive       Robot                    Drive                 Device
Type     Num Index  Type DrNum Status  Comment    Name                  Path
robot      0    -    TLD    -       -  -          -                     backupserver
  drive    -    1 hcart2    4      UP  -          HP.ULTRIUM5-SCSI.002  /dev/rmt1.1
  drive    -    2 hcart2    3      UP  -          HP.ULTRIUM5-SCSI.003  /dev/rmt2.1
  drive    -    3 hcart2    2      UP  -          HP.ULTRIUM5-SCSI.000  /dev/rmt3.1
  drive    -    4 hcart2    1      UP  -          HP.ULTRIUM5-SCSI.001  /dev/rmt4.1
drive      -    0    pcd    -  DISABL  -          IBM.DDSGEN7.000       /dev/rmt0.1

 

vmoprcmd

root@ibmdb1:>./vmoprcmd -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  1 hcart2   TLD                -                     No       -         0
  2 hcart2   TLD                -                     No       -         0
  3 hcart2   TLD                -                     No       -         0
  4 hcart2   TLD                -                     No       -         0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  1 HP.ULTRIUM5-SCSI.002  Yes      -
  2 HP.ULTRIUM5-SCSI.003  Yes      -
  3 HP.ULTRIUM5-SCSI.000  Yes      -
  4 HP.ULTRIUM5-SCSI.001  Yes      -
root@ibmdb1:>

 

Ok vm.conf have already add VERBOSE.

Over to you Marianne..thanks.

 

Marianne's picture

Seems 'scan' can only see the internal tape drive?

 

Device Name  : "/dev/rmt0.1"
Passthru Name: "/dev/rmt0.1"
...
Inquiry    : "IBM     DDS Gen7        VS11"

drive      -    0    pcd    -  DISABL  -          IBM.DDSGEN7.000       /dev/rmt0.1

Please show us output of:

lsdev -C -c tape

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Imra_backup's picture

Hi Marianne,

yes indeed, OS (media Server, IBM AIX) only detect internal tape drive which is not use for backup.

lsdev -Cc tape output;

root@ibmdb1:>lsdev -Cc tape
rmt0 Available 00-08-00 SAS 4mm Tape Drive
rmt1 Available 04-01-02 Other FC SCSI Tape Drive
rmt2 Available 04-01-02 Other FC SCSI Tape Drive
rmt3 Available 04-01-02 Other FC SCSI Tape Drive
rmt4 Available 04-01-02 Other FC SCSI Tape Drive

Over to you Marianne

Giri_S's picture

If run backup for other client & using different storage unit...

different stu under same media server ?

Can you try to move media from slot to drive via robtest..if no...what error?

Can you post /usr/openv/volmgr/debug/tpreq logs o/p...

Thanks,
Giri.

Netbackup Admin (Unix)

imra2013's picture

Hi Giri,

Thanks for reply. What I mean is I have define 2 STU.

STU1 = for other client (for normal LAN backup, means client must go through Master Server for backup)

STU2 = for media server (which is for SAN backup - direct backup to Quantum i80 tape library)

Ok for robtest just to let you know, robtest command can nonly run from Master Server (which also as Media Server)

I can't do robtest from my Media Server (the IBM AIX Server)

There is no /tpreq folder in /volmgr/debug in Master but I can run robtest utility, and just now able to ove tape from slot to drive - with no error.

Hope this helps, let me know if you need additional info. Thanks.

 

 

Marianne's picture

I am curious to know how devices were configured if 'scan' does not pick up the drives.

We need to be able to somehow double-check that  /dev/rmt1.1 is indeed position 4 in the robot ,  /dev/rmt4.1 is indeed position 1 in the robot, etc.

Normally this can be easily confirmed with output from scan or 'tpautoconf -t' combined with 'scan -changer' output from the robot control host.

 

PS: Why exactly do you need 2 user id's on Connect? 
Once again you start a discussion as one user and then respond with your other id?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

Sorry I should logout from imra2013 & log as Imra_backup. Can I delete my user ID (either one will do..)

I'm also puzzle how to explain this (scan does not pickup drive) when I assigned to this project. It was OK for Master Server but not for Media Server, maybe something was wrong and affect setting to media Server. Because previously we can do backup via SAN for Media Server. Now we can't even do robtest in Media Server (IBM AIX).

Hope this provide some background information (of the issue)

 

 

 

imra2013's picture

Hi Marianne,

I would like to use imra2013 ID in Connect from now on.

Somehow can't use Imra_backup ID

Thanks for your reminder.

Sorry for inconvenienced caused.

 

imra2013's picture

Hi Marianne,

Sorry below output scan -changer (from Master Server)

C:\Program Files\Veritas\Volmgr\bin>scan -changer
************************************************************
*********************** SDT_CHANGER ************************
************************************************************
------------------------------------------------------------
Device Name  : ""
Passthru Name: ""
Volume Header: ""
Port: 3; Bus: 0; Target: 6; LUN: 1
Inquiry    : "QUANTUM Scalar i40-i80  140G"
Vendor ID  : "QUANTUM "
Product ID : "Scalar i40-i80  "
Product Rev: "140G"
Serial Number: "QUANTUMD1H0131319_LLA"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "QUANTUM D1H0131319_LLA          "
Device Type    : SDT_CHANGER
NetBackup Robot Type: 8
Removable      : Yes
Device Supports: SCSI-3
Number of Drives : 4
Number of Slots  : 49
Number of Media Access Ports: 5
Drive 1 Serial Number      : "C38C1BE000"
Drive 2 Serial Number      : "C38C1BE004"
Drive 3 Serial Number      : "C38C1BE008"
Drive 4 Serial Number      : "C38C1BE00C"
Flags : 0x0
Reason: 0x0

 

and tpautoconf -t output

C:\Program Files\Veritas\Volmgr\bin>tpautoconf -t
TPAC60 HP      Ultrium 5-SCSI  Y5AZ C38C1BE004 3 0 4 0 Tape0 -
TPAC60 HP      Ultrium 5-SCSI  Y5AZ C38C1BE000 3 0 6 0 Tape2 -
TPAC60 HP      Ultrium 5-SCSI  Y5AZ C38C1BE00C 3 0 7 0 Tape3 -

Hope this helps.

 

 

 

Marianne's picture

Maybe I was not clear enough in my previous post:

 

We need to be able to somehow double-check that  /dev/rmt1.1 is indeed position 4 in the robot ,  /dev/rmt4.1 is indeed position 1 in the robot, etc.

Normally this can be easily confirmed with output from scan or 'tpautoconf -t' combined with 'scan -changer' output from the robot control host.

In the last statement I meant 'tpautoconf -t' on the media server.
This output will give us similar output as tpautoconf on the master and we will be able to see serial numbers for device names (rmt1.1, rmt2.1, etc).

We can then compare this with drive position and serial no's in scan output:
Drive 1 Serial Number      : "C38C1BE000"
Drive 2 Serial Number      : "C38C1BE004"
Drive 3 Serial Number      : "C38C1BE008"
Drive 4 Serial Number      : "C38C1BE00C"

which will finally enable us to compare with 'tpconfig -l' output to ensure there are no device mapping mismatches.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

tpautoconf output from my Media Server.

root@ibmdb1:>./tpautoconf -t
TPAC60 IBM     DDS Gen7        VS11 20150146 -1 -1 -1 -1 /dev/rmt0.1 - -

Looks Media Server does not detect LTO5 tape drives ..

 

imra2013's picture

Hi Marianne,

Just to add, have been reading SAN client & Fibre Transport Guide and I think all the while existing Media Server was not utilizing backup through SAN Network. My understanding we need to configure the machine as SAN client then would be able to utilize backup using fibre connectivity which suppose to be more faster than LAN connection. No wonder 900 GB data backup more than 9 hours whereas our LTO tape drives configured point to point at 8GB/s. Because all the while when backup can only wee 'Transport Type' as LAN.

Please correct me if I'm wrong.

Thanks in advance

 

Marianne's picture

You clearly have media server software installed with devices configured for AIX server.

There is a big difference between SAN Media server and SAN Client.

see: http://www.symantec.com/docs/TECH135896 and http://www.symantec.com/docs/TECH53815

Your current issue is with device access on AIX media server. 
You need to get that fixed.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

Agree.

My current setup is SAN Media Server (since media server software installed in AIX Server)

I have read both articles, looks SAN client require extra setting to configure. Maybe thats why our previous vendor setup our AIX media server as SAN Media Server. Anyway you are write, I need to solve device access on AIX Media Server.

Do you think I should refer to AIX support how to access the devices or Quantum support to provide drivers etc. Please advice.

Thanks in advance.sad

Marianne's picture

You need to get scan or tpautoconf to see your tape drives. The fact that drives were configured successfully previously says to me that is was fine at one stage.

Try to remove and recreate devices at OS level. Work with your AIX sysadmin.

See the first part of this post: https://www-secure.symantec.com/connect/forums/aix-media-servers-dummies#comment-2568341

o rmdev -dl rmtx
o rmdev -dl smcx
o cfgmgr -v
o lsdev -Cc tape

I have also now re-read this thread from the top.  We don't know the answer to any of these questions:
1. Have backups ever worked on this media server?
2. How was device config done if scan does not pick up the drives?
3. Was NBU restarted after adding VERBOSE to vm.conf? Have you verified that Media Manager processes are now running with -v?
4. Have you verified that syslog is running at OS level? If so, have you checked /var/adm/messages for errors when mount action is requested?
5. Can you see in Event Viewer Application log on robot control host (master) that mount request is received from media server? (VERBOSE in vm.conf also needed on master (followed be restart of NBU Device Manager service) for mount requests to show up).
6. Have you checked the robot to see if is tape is actually mounted?

Please spend some time to check ALL of the above before responding.
I will check for reply in about 12 hours...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

Will do, i will check ALL above as listed.

Thanks for your time.

NOTE:Yes indeed it was working fine previously.

imra2013's picture

Hi Marianne,

Maybe i answer partially because I'm still waiting answer from my AIX colleague - question no 4

1. Have backups ever worked on this media server? YES
2. How was device config done if scan does not pick up the drives? I have rescan just now using device config & it able to detect all 4 tape drives
3. Was NBU restarted after adding VERBOSE to vm.conf? Have you verified that Media Manager processes are now running with -v? Verbose line already added both Master & Media Server. I'm not sure how to verify MM process running with -v.
4. Have you verified that syslog is running at OS level? If so, have you checked /var/adm/messages for errors when mount action is requested? Pending answer from my AIX team
5. Can you see in Event Viewer Application log on robot control host (master) that mount request is received from media server? (VERBOSE in vm.conf also needed on master (followed be restart of NBU Device Manager service) for mount requests to show up). Done add vm.conf in Master bit how to check on event viewer for robot control host
6. Have you checked the robot to see if is tape is actually mounted? Yes it did mount to tape Drive but unfortunately write not started

I will try to revert question no 4 ASAP & hope your assist on question 3. Thanks in advance.

Marianne's picture

We still need answers to these questions:

2. How was device config done if scan does not pick up the drives? I have rescan just now using device config & it able to detect all 4 tape drives

On master as well as media server?

3. Was NBU restarted after adding VERBOSE to vm.conf? Have you verified that Media Manager processes are now running with -v? Verbose line already added both Master & Media Server. I'm not sure how to verify MM process running with -v. 

bpps -x 

4. Have you verified that syslog is running at OS level? If so, have you checked /var/adm/messages for errors when mount action is requested? Pending answer from my AIX team

We need to see if any errors are logged in /var/adm/messages
We need bptm log on media server as well.
 

5. Can you see in Event Viewer Application log on robot control host (master) that mount request is received from media server? (VERBOSE in vm.conf also needed on master (followed be restart of NBU Device Manager service) for mount requests to show up).
Done add vm.conf in Master bit how to check on event viewer for robot control host

Speak to your Windows sysadmin if you don't know how to open Windows Event Viewer.

6. Have you checked the robot to see if is tape is actually mounted? Yes it did mount to tape Drive but unfortunately write not started

Work with AIX sysadmin and use OS  commands such as 'mt' or 'tctl' to check if OS can see the tape mount, e.g.

mt -f /dev/rmt1.1 status

Confirm in /var/adm/messages and bptm log that tape mount is reported back to NBU.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

Thanks for reply, below my reply(in bold).

We still need answers to these questions:

2. How was device config done if scan does not pick up the drives? I have rescan just now using device config & it able to detect all 4 tape drives

On master as well as media server? Yes both master & media Server

3. Was NBU restarted after adding VERBOSE to vm.conf? Have you verified that Media Manager processes are now running with -v? Verbose line already added both Master & Media Server. I'm not sure how to verify MM process running with -v. 

bpps -x 

Below output bpps -x from Media server(the AIX Server)

MM Processes
------------
    root 11010262        1   0   Jan 30      -  0:17 vmd -v
    root 17498314 14090716   0   Jan 30      -  0:00 avrd -v
    root  4784450 14090716   0   Jan 30      -  0:00 tldd -v
    root 14090716        1   0   Jan 30      -  0:08 /usr/openv/volmgr/bin/ltid

4. Have you verified that syslog is running at OS level? If so, have you checked /var/adm/messages for errors when mount action is requested? Pending answer from my AIX team

We need to see if any errors are logged in /var/adm/messages

I will revert once got answer from my AIX team.
We need bptm log on media server as well.
Attached bptm logs from Media server

5. Can you see in Event Viewer Application log on robot control host (master) that mount request is received from media server? (VERBOSE in vm.conf also needed on master (followed be restart of NBU Device Manager service) for mount requests to show up).
Done add vm.conf in Master bit how to check on event viewer for robot control host

Speak to your Windows sysadmin if you don't know how to open Windows Event Viewer.

I notice Windows event viewer did communicate with robot control host. Below series of logs;

  • TLD(0) initiating MOVE_MEDIUM from addr 4117 to addr 259
  • TLD(0) closing/unlocking robotic path
  • inquiry() function processing library QUANTUM  Scalar i40-i80   140G:
  • TLD(0) [5504] opening robotic path {3,0,4,1} (bus -1, target -1, lun -1)
  • starting tldcd
  • TLD(0) Creating Process for MOUNT: "C:\Program Files\Veritas\Volmgr\bin\tldcd.exe" -v
  • -child -ro 1 -rn 0 -dn 4 -socket 892 -slot 22 -rht 0  -vsn 0012L5 -b 000012L5        
  • Processing MOUNT, TLD(0) drive 4, slot 22, barcode 000012L5        , vsn 0012L5
  • tldcd.c.3017, process_request(), received command=1, from peername=ibmdb1, version 50

 

6. Have you checked the robot to see if is tape is actually mounted? Yes it did mount to tape Drive but unfortunately write not started

Work with AIX sysadmin and use OS  commands such as 'mt' or 'tctl' to check if OS can see the tape mount, e.g.

I will revert once got answer from my AIX team. while waiting feedback from AIX team, attached screenshot from Quantum web interface shows tape already mounted to one of the tape drives inside Quantum i8 tape library.

mt -f /dev/rmt1.1 status

Confirm in /var/adm/messages and bptm log that tape mount is reported back to NBU.

Hope this helps (while waiting AIX feedback)

Thank you in advance.

screenshot frm Quantum web interface.png
AttachmentSize
bptm log.zip 14.87 KB
Marianne's picture

Before we try and troubleshoot, please have a look at this entry in bptm log:

 check_touch_file: Found /usr/openv/volmgr/database/NO_TAPEALERT
 
Any idea why this file was created? This touch-file is normally only needed for ACSLS robots where drive cleaning is controlled by the API robot.
I cannot see why this file is needed when you have a robot that is controlled by local or remote (master) server.
Please delete this file.

As far as troubleshooting tape mount is concerned:
When you can see that the tape is mounted on the drive, you need to work with your AIX admin to see if the OS can 'see' the tape mount. AIX admin should be able to use 'mt' or 'tctl' command to check. Entries in /var/adm/messages will also help.

About mount attempt seen in bptm log:
Can you please find tape mount request in Windows event viewer that corresponds with the following tape mount request seen in media server's bptm log?

 

10:41:41.530 [7012426] <2> mount_open_media: Waiting for mount of media id 0028L5 (copy 1) on server ibmdb1.
10:41:41.530 [7012426] <2> set_job_details: Tfile (7908): LOG 1360032101 4 bptm 7012426 Waiting for mount of media id 0028L5 (copy 1) on server ibmdb1.
10:41:41.530 [7012426] <4> create_tpreq_file: symlink to path /dev/rmt2.1

10:42:57.688 [7012426] <2> tapelib: wait_for_ltid, Mount, timeout 0
10:43:37.691 [7012426] <2> send_MDS_msg: OP_STATUS 0 3060 ibmdb1 8 1 0 0 0 0 0 0 *NULL* 0

10:43:41.363 [7012426] <2> send_operation_error: Decoded status = 8 from 1
10:43:41.363 [7012426] <2> set_job_details: Tfile (7908): LOG 1360032221 16 bptm 7012426 error requesting media, TpErrno = Robot operation failed
10:43:45.821 [7012426] <16> mount_open_media: error requesting media, TpErrno = Robot operation failed
10:43:45.831 [7012426] <2> drivename_close: Called for file HP.ULTRIUM5-SCSI.003
10:43:45.831 [7012426] <2> set_job_details: Tfile (7908): LOG 1360032225 8 bptm 7012426 media id 0028L5 load operation reported an error
 
Also test manual mout of specific media Id in specific drive:
Use robtest on the master (robot control host) to move media-id 0028L5 into drive 3 .
Dismount tape again and move back to 'home' slot.
 
Please let us know the result.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

I shall delete the file  check_touch_file: Found /usr/openv/volmgr/database/NO_TAPEALERT

I will check with AIX team to verify 'mt' or 'tctl' command to check and alco check entries in /var/adm/messages

But just for your information media ID 0028L5 no longer inside tape library. If scroll down below activity monitor (detailed status) NBU requested next resource/another media ID. Later I manually cancel the job - otherwise it will stay mount for long time. Below  details logs;

2/5/2013 10:43:45 AM - Warning bptm(pid=7012426) media id 0028L5 load operation reported an error     
2/5/2013 10:46:17 AM - Info bptm(pid=7012426) Waiting for mount of media id 0012L5 (copy 1) on server ibmdb1.
2/5/2013 10:53:32 AM - Info nbjm(pid=3548) starting backup job (jobid=7908) for client ibmdb1, policy IBMDB1_FS, schedule IBMDB1_FS_Daily  
2/5/2013 10:53:32 AM - Info nbjm(pid=3548) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=7908, request id:{C2962A4D-CE2D-4291-A821-4309869B9493})  
2/5/2013 10:53:32 AM - requesting resource ibmdb1-hcart2-robot-tld-0
2/5/2013 10:53:32 AM - requesting resource backupserver.NBU_CLIENT.MAXJOBS.ibmdb1
2/5/2013 10:53:32 AM - requesting resource backupserver.NBU_POLICY.MAXJOBS.IBMDB1_FS
2/5/2013 10:53:34 AM - granted resource backupserver.NBU_CLIENT.MAXJOBS.ibmdb1
2/5/2013 10:53:34 AM - granted resource backupserver.NBU_POLICY.MAXJOBS.IBMDB1_FS
2/5/2013 10:53:34 AM - granted resource 0028L5
2/5/2013 10:53:34 AM - granted resource HP.ULTRIUM5-SCSI.003
2/5/2013 10:53:34 AM - granted resource ibmdb1-hcart2-robot-tld-0
2/5/2013 10:53:34 AM - estimated 38 Kbytes needed
2/5/2013 10:53:34 AM - Info nbjm(pid=3548) started backup job for client ibmdb1, policy IBMDB1_FS, schedule IBMDB1_FS_Daily on storage unit ibmdb1-hcart2-robot-tld-0
2/5/2013 10:53:36 AM - started process bpbrm (9175382)
2/5/2013 10:53:36 AM - connecting
2/5/2013 10:53:38 AM - connected; connect time: 00:00:02
2/5/2013 10:53:39 AM - mounting 0028L5
2/5/2013 10:55:43 AM - current media 0028L5 complete, requesting next resource Any
2/5/2013 10:58:14 AM - granted resource 0012L5
2/5/2013 10:58:14 AM - granted resource HP.ULTRIUM5-SCSI.002
2/5/2013 10:58:14 AM - granted resource ibmdb1-hcart2-robot-tld-0
2/5/2013 10:58:15 AM - mounting 0012L5
2/5/2013 11:03:26 AM - Error bptm(pid=7012426) error requesting media, TpErrno = Robot operation failed     
2/5/2013 11:04:04 AM - Warning bptm(pid=7012426) media id 0012L5 load operation reported an error     
2/5/2013 11:06:15 AM - Info bptm(pid=7012426) Waiting for mount of media id 0012L5 (copy 1) on server ibmdb1.
2/5/2013 11:10:25 AM - Error bptm(pid=7012426) media manager terminated during mount of media id 0012L5, possible media mount timeout
2/5/2013 11:10:28 AM - Error bptm(pid=7012426) media manager terminated by parent process       
2/5/2013 11:10:32 AM - Info bpbkar(pid=5505182) done. status: 150: termination requested by administrator      
2/5/2013 11:16:16 AM - current media 0012L5 complete, requesting next resource Any
2/5/2013 11:18:13 AM - granted resource 0012L5
2/5/2013 11:18:13 AM - granted resource HP.ULTRIUM5-SCSI.001
2/5/2013 11:18:13 AM - granted resource ibmdb1-hcart2-robot-tld-0
2/5/2013 11:18:13 AM - mounting 0012L5
2/5/2013 11:22:29 AM - end writing
termination requested by administrator(150)

By this stage can we conclude something, still waiting feedback from my AIX team ....

Many thanks as usual...

 

Marianne's picture

Please run Inventory on master server to ensure NBU config is up-to-date with robot config.

Maybe same problem with 0012L5?  Maybe not in robot either?

NBU will not request a tape mount if config is up-to-date with robot.

2/5/2013 10:58:15 AM - mounting 0012L5
2/5/2013 11:03:26 AM - Error bptm(pid=7012426) error requesting media, TpErrno = Robot operation failed     
2/5/2013 11:04:04 AM - Warning bptm(pid=7012426) media id 0012L5 load operation reported an error     

Have you tried to manually mount 0012L5 using robtest on the master server?

Please remember to quit out of robtest when you are done with testing, as no mounts from NBU will be possible while robtest is controlling the robot.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

have run robot inventory and now up-to-date already.

When i run manual backup it look for new media ID now = 0015L5

But it still the same, just keep mounting and tape load to tape drives

Robtest from master server also successful, manage to move media to tape drive & from tape drives back to slot.

 

 

Marianne's picture

Back to earlier suggestions ...

What is logged in Master's Event Viewer Application Log during mount attempt of 0015L5?

What is logged in Media Server's messages file?

Are you working with AIX sysadmin to check tape mount at OS level?

Have you deleted NO_TAPEALERT file yet?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

What is logged in Master's Event Viewer Application Log during mount attempt of 0015L5?

below Master event Viewer mount attempt of 0015L5

inquiry() function processing library QUANTUM  Scalar i40-i80   140G:10:25 AM

tldcd.c.3017, process_request(), received command=1, from peername=ibmdb1, version 50 : 10:28 AM

Processing MOUNT, TLD(0) drive 1, slot 25, barcode 000015L5        , vsn 0015L5 10:28 AM

TLD(0) Creating Process for MOUNT: "C:\Program Files\Veritas\Volmgr\bin\tldcd.exe" -v -child -ro 1 -rn 0 -dn 1 -socket 888 -slot 25 -rht 0  -vsn 0015L5 -b 000015L5       10:28 AM

starting tldcd   10:28 AM

TLD(0) [4308] opening robotic path {3,0,4,1} (bus -1, target -1, lun -1) 10:28 AM

TLD(0) initiating MOVE_MEDIUM from addr 4120 to addr 256 10:28 AM

TLD(0) closing/unlocking robotic path 10:28 AM

inquiry() function processing library QUANTUM  Scalar i40-i80   140G:10:28 AM

tldcd.c.3017, process_request(), received command=3, from peername=ibmdb1, version 50 :10:43 AM

Processing UNMOUNT, TLD(0) drive 1, slot 25, barcode 000015L5        , vsn 0015L5 10:43 AM

TLD(0) Creating Process for DISMOUNT: "C:\Program Files\Veritas\Volmgr\bin\tldcd.exe" -v -child -ro 3 -rn 0 -dn 1 -socket 892 -slot 25 -rht 0  -vsn 0015L5 -b 000015L5    10:43 AM

starting tldcd   10:43 AM

TLD(0) [3976] opening robotic path {3,0,4,1} (bus -1, target -1, lun -1)  10:43 AM

inquiry() function processing library QUANTUM  Scalar i40-i80   140G:  10:43 AM

TLD(0) initiating MOVE_MEDIUM from addr 256 to addr 4120  10:43 AM

TLD(0) closing/unlocking robotic path   10:43 AM

 

What is logged in Media Server's messages file?

Below media server bptm log today, where automatic schedule backup for media server (ibmdb1) running at 7AM and ended up (status=96). The process finally stop at 10.36 AM and I paste the log around that time. Anyway I also attached bptm log today 6Feb2013(as attached)

10:34:48.726 [6553620] <2> vnet_async_connect: ../../libvlibs/vnet_connect.c.1200: 0: connect in progress: 1 0x00000001
10:34:48.727 [6553620] <2> vnet_pbxConnect: pbxConnectEx Succeeded
10:34:48.727 [6553620] <2> do_pbx_service: ../../libvlibs/vnet_connect.c.1784: 0: via PBX: bpdbm CONNECT FROM 10.1.1.219.49017 TO 10.1.1.164.1556 fd = 17
10:34:48.727 [6553620] <2> vnet_async_connect: ../../libvlibs/vnet_connect.c.1367: 0: connect: async CONNECT FROM 10.1.1.219.49017 TO 10.1.1.164.1556 fd = 17
10:34:48.727 [6553620] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: BACKUPSERVER
10:34:48.728 [6553620] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
10:34:48.728 [6553620] <2> logconnections: BPDBM CONNECT FROM 10.1.1.219.49017 TO 10.1.1.164.1556 fd = 17
10:34:48.728 [6553620] <2> vnet_check_vxss_client_magic_with_info: ../../libvlibs/vnet_vxss_helper.c.871: 0: Ignoring VxSS authentication: 2 0x00000002
10:34:48.773 [6553620] <2> db_end: Need to collect reply
10:34:48.773 [6553620] <4> report_resource_done: VBRD 1 6553620 0 HP.ULTRIUM5-SCSI.001 0015L5
10:34:48.773 [6553620] <4> create_tpreq_file: symlink to path /dev/rmt4.1
10:35:01.125 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 10.1.1.164
10:35:01.125 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
10:36:04.060 [6553620] <2> tapealert_and_release: report_attr, fl1 0x00000000, fl2 0x00000000
10:36:33.130 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 10.1.1.164
10:36:33.130 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
10:36:42.069 [6553620] <2> drivename_unlock: unlocked
10:36:42.069 [6553620] <2> drivename_close: Called for file HP.ULTRIUM5-SCSI.001
10:36:42.069 [6553620] <2> drivename_remove: Called
10:36:42.073 [6553620] <2> main: Sending [EXIT STATUS 0] to NBJM
10:36:42.073 [6553620] <2> bptm: EXITING with status 0 <----------
10:36:42.374 [12320830] <2> requestFailed: got gotCallback, jmJobStatus = [96], emmStatus = [2005000], mapped failure status to= [96]
10:36:42.374 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 10.1.1.164
10:36:42.374 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
10:36:42.374 [12320830] <2> packageSpanResourceRequestResult: totalNumberOfAllocations == 1
10:36:42.374 [12320830] <2> packageSpanResourceRequestResult: retVal =    96
10:36:42.374 [12320830] <16> RequestSpanResources: MultiResReq.cpp:2683 resource request failed [96]
10:36:42.374 [12320830] <2> RequestSpanResources: retVal = 96    emmStatus = 2005000
10:36:42.374 [12320830] <2> RequestSpanResources: returning
10:36:42.374 [12320830] <4> nbjm_media_request: Error from RequestSpanResources, Master BACKUPSERVER, error 96, resourceAllocated 0
10:36:42.374 [12320830] <2> send_MDS_msg: OP_STATUS 0 3063 ibmdb1 8211 5 0 0 0 0 0 0 *NULL* 0
10:36:42.375 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 10.1.1.164
10:36:42.375 [12320830] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL
10:36:42.377 [12320830] <16> send_MDS_msg: Error from emmlib_handleMessage, Master BACKUPSERVER, type 12, returned error 2005023
10:36:42.377 [12320830] <2> send_operation_error: Decoded status = 19 from 5
10:36:42.377 [12320830] <2> bptm: EXITING with status 96 <----------
10:36:42.377 [12320830] <2> set_job_details: Tfile (7913): LOG 1360118202 4 bptm 12320830 EXITING with status 96 <----------

10:36:43.376 [12320830] <2> cleanup: Detached from BPBRM shared memory

Are you working with AIX sysadmin to check tape mount at OS level? Yes, but still pending their feedback

Have you deleted NO_TAPEALERT file yet? DONE

I hope my AIX team can revert to me ASAP.

 

AttachmentSize
bptm log_6feb.zip 43.82 KB
Marianne's picture

Please only get back to us once you can work with AIX team.

We are not going to solve anything without knowing what is happening at OS level when tape is mounted.

Giving us a log for status 96 does not help with troubleshooting of current issue. 
Please ensure that you have sufficient media media before trying again.
Bits and pieces of logs that are not coming from the SAME backup/mount attempt does not help - that is why we are no closer to a solution in 10 days and more than 30 posts....

Get ALL of the following in place before you try again:

  • Confirmation that sufficient media is available.
  • AIX admin ready to 'tail' /var/adm/messages (with confirmation that syslogd is enabled and running)
  • AIX admin ready to check for tape mount at OS level with 'mt' or 'tctl' commands (You will be able to tell in bptm log which OS device name is chosen for tape mount - look for entry like this:
    ... create_tpreq_file: symlink to path /dev/rmt2.1).
  • AIX admin checking 'errpt' for clues.
  • You will check at same time in master server Event Viewer that tape mount request is received and fulfilled. 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Marianne,

Finally got answer from our AIX support team (long holiday leave here in malaysia - chinese new year festive)

Attached 3 file as requested;

1. VRTSPBX log (for syslog output)

2. mt-f output (OS did detect tape drive)

3. errpt log (most of the log shows adapter error detected)

hope this can assist.

many thanks in advance.

AttachmentSize
50936-103-167838171-130129-0000000001_vrtspbx log.zip 61.13 KB
mt -f.txt 1.42 KB
errpt log.txt 3.61 KB
Marianne's picture

Please log a Support call with Symantec.

It seems you misunderstood my previous post. 
I honestly don't know how else to word it....
The steps need to be performed after a job is kicked off and when tape mount request is seen in Activity Monitor. Only ALL steps (at the right time) listed in my previous post will give us the full picture.
mt -f needs to report mounted tape (not just tape drive).
I was looking for the OS syslog file /var/adm/messages WHILE mount attempt is taking place, not PBX log (what made you think they are the same file?).
I thought all of this was clear in my previous post? 
Clearly not...

AIX sysadmins need to report adapter error to who-ever is responsible for server hardware support.
They should also try to add options/switches to errpt to get more info (e.g. -a or -aD, etc).

Sorry that we keep on misunderstanding one another. 

After 36 posts and clearly no solution in sight, I give up....

Good luck.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

imra2013's picture

Hi Everyone,

Manage to solve this issue after server been force to shut down (power outage) and rescan tape drives when reconfigure storage devices. Basically OS have some configuration mismatch with tape drives and manage to re-detect tape drives after server boot-up.

Hi Marianne,

I could not find 'mark solution' link

SOLUTION
Nagalla's picture

you are still swapping with 2 login IDs

you need to login as Imra_backup to Mark solution

Marianne's picture

I have cleared the solution - this post is NOT the solution!!!

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

StefanosM's picture

two things.
1. have you check if there is a tape in the drive. check it with robot panel or robtest. robtest must run from the robot control host.
2 check the cleaning remaining s number of your cleaning tape. If the number is 0 or if there is no cleaning tape, clean the drive manualy with a new cleaning tape.

Marianne's picture

Imra - you will agree with me that OS restart did the same as per my suggestion on 30 January?

 

Try to remove and recreate devices at OS level. Work with your AIX sysadmin.

o rmdev -dl rmtx

o rmdev -dl smcx
o cfgmgr -v
o lsdev -Cc tape

 

This was never done at the time, right?

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links