Troubleshooting volume performance in Veritas Storage Foundation

Article:TECH202712  |  Created: 2013-02-12  |  Updated: 2014-06-09  |  Article URL http://www.symantec.com/docs/TECH202712
Article Type
Technical Solution


Issue



Troubleshooting volume performance in Veritas Storage Foundation


Solution





Table of Contents

Introduction
Checking for basic resource shortages
Verifying that the correct ASL (Array Support Library) is loaded
Disabled paths and I/O errors
Filesystem Fragmentation
Volume and subdisk bottlenecks
Stripe set considerations
RAID Contention
DMP I/O policy
Mount options
File system allocation unit size



 


Introduction

(Back to top)


This article discusses basic performance troubleshooting for Veritas Storage Foundation.

Before proceeding, make sure that you know the answers to the following questions:
 

  • What are the specific symptoms that are being observed? If available, use performance monitoring tools to determine which storage component is affected, and how the degraded performance actually differs from normal performance.
  • Is the performance degradation consistent or intermittent?
  • If the problem is intermittent, what patterns can be observed? Does performance become degraded at certain times, such as when other scheduled tasks are taking place?
  • When was the degraded performance first observed?
  • Have there been any recent changes, such as software installations, patches, driver updates, hardware changes or changes to the SAN?


It is common that a performance problem appears to originate from a particular component, but upon further investigation, is found to originate from an entirely different storage layer, or even from a component that is not directly related to storage. A best practice is to check for basic resource shortages, such as CPU, memory and disk space, before delving into more complex possibilities.





Checking for basic resource shortages

(Back to top)


Before delving into more specific topics, first verify that there are no basic resource shortage that may account for poor performance. In particular check the following items.

  • Memory
  • CPU
  • Disk Space - VxFS volumes should be less than 90% utilized.

  


Note: These are not Veritas commands. This information is provided as a convenience and should not be regarded as authoritative. Review the official documentation supplied by the vendors for their respective platforms to confirm the correct usage of these commands.


Table 1 - Common commands and syntax to check resources

  Memory  CPU Disk Space
Solaris

prstat -s rss

Look for the "RSS" column.

prstat

Look for the "CPU " column.

df -k

Look for "capacity."

Linux

top

Type "M" to sort by memory usage. Look for the "MEM" column.

top

Type "P" to sort by CPU usage. Look for the "CPU" column.

df -k

Look for "Use%."

AIX

nmon

Type "t," then "4" to sort by "Size."
Look for the "Res Set" column.

topas -P

Use the cursor to highlight one of the "RES" columns.
 

nmon

Type "t," then "3" to sort by "CPU."
Look for the "CPU Used" column.

topas -P

Use the cursor to highlight the "CPU" column. By, default, this should already be highlighted.

df -k

Look for "%Used."

HP-UX

top

Look for the "RES" column.

top

Look for the "WCPU" and "CPU" columns.

df -k

"% allocation used."


 




Verify that the correct ASL (Array Support Library) is loaded

(Back to top)


A common cause of poor performance is that a generic ASL is loaded instead of a vendor-specific ASL. While troubleshooting performance, verify that the correct ASL is loaded.


More details about determining which ASL is loaded and why a wrong ASL may be loaded can be found in this article:

"Other_disks," "scsi3_jbod" or "jbod" ASLs (Array Support Libraries) are claiming disks as generic devices
http://www.symantec.com/docs/TECH204408


 

 



Disabled paths and I/O errors

(Back to top)

 

  • Use vxdmpadm getsubpaths to determine the status of the paths to the disks (Figure 1). 
  • Use Vxdmpadm -e iostat show to check for I/O errors that are detected for each path (Figure 2).


Veritas will disable a path if serious or sustained I/O errors occur. When all paths to a disk are disabled, the server will be unable to read or write to the volume. If a path has been disabled, review the syslog for events that are reported by "vxdmp," or "scsi" for I/O errors.

Although a path can be re-enabled using "vxdmpadm enable," vxdmp should automatically evaluate the status of a path in five minute intervals using a scsi inquiry. If the query is successful, the path is automatically re-enabled. If a path remains disabled beyond this interval, it is possible that I/O errors are still being detected, warranting further investigation. Paths are not automatically re-enabled If the diskgroup has been disabled, or if vxesd is stopped. The behavior of vxdmp in response to disabled paths can be modified via the DMP tunables, which can be viewed using "vxmpadm gettune."
 


Note: Although the syslog may show that vxdmp is the source of an I/O error, vxdmp itself is not usually the origin. Veritas depends on the OS device drivers to communicate with disks. When I/O errors occur, they are reported to Veritas by the device drivers. Vxdmp will report the errors that have been passed to it by the device drivers and may disable a path in response to the events.

 


Figure 1 - Using vxdmpadm to determine the status of paths


Syntax:

vxdmpadm getsubpaths


Example, with typical output:

# vxdmpadm getsubpaths

NAME      STATE[A]   PATH-TYPE[M] DMPNODENAME  ENCLR-NAME   CTLR
================================================================
sdk       ENABLED(A)   -          disk_0       ams_wms0     c8
sdr       ENABLED(A)   -          disk_0       ams_wms0     c3
sdb       ENABLED(A)   -          disk_1       ams_wms0     c8
sdc       ENABLED(A)   -          disk_1       ams_wms0     c3
sdo       ENABLED(A)   -          disk_2       ams_wms0     c8
sdt       ENABLED(A)   -          disk_2       ams_wms0     c3
sdd       DISABLED     -          disk_3       ams_wms0     c8
sdf       ENABLED(A)   -          disk_3       ams_wms0     c3
sdh       ENABLED(A)   -          disk_4       ams_wms0     c8
sdn       ENABLED(A)   -          disk_4       ams_wms0     c3
sde       ENABLED(A)   -          disk_5       ams_wms0     c8
sdi       ENABLED(A)   -          disk_5       ams_wms0     c3
sdj       ENABLED(A)   -          disk_6       ams_wms0     c8
sdp       ENABLED(A)   -          disk_6       ams_wms0     c3
sdq       ENABLED(A)   -          disk_7       ams_wms0     c8
sdu       ENABLED(A)   -          disk_7       ams_wms0     c3
sdg       ENABLED(A)   -          disk_8       ams_wms0     c8
sdl       ENABLED(A)   -          disk_8       ams_wms0     c3
sdm       ENABLED(A)   -          disk_9       ams_wms0     c8
sds       ENABLED(A)   -          disk_9       ams_wms0     c3
sda       ENABLED(A)   -          sda          other_disks  c2

 



Figure 2 - Using vxdmpadm to check for errors down I/O paths


Syntax:
  1. vxdmpadm iostart start
  2. vxdmpadm -ez iostat show interval=<time_in_seconds> count=<desired_number_of_samples>
  3. vxdmpadm iostart stop

Example, with typical output:


Note: In this example, path sdj appears to be experiencing consistent I/O errors. Check the syslog for references to path sdj to see what errors are being reported.

Notice that the first set of output is the cumulative total since the statistics were last reset. Resetting the statistics manually can be done with vxdmpadm iostat reset.



# vxdmpadm -ez iostat show interval=5

                       cpu usage = 36678us    per cpu memory = 192512b
                        ERROR I/Os
PATHNAME             READS    WRITES
sdd                      0         5
sdf                      0         5
sdh                      0         7
sdn                      0         5
sde                      0         5
sdi                      0         7
sdj                      0        10
sdp                      0         8

sdj                      0         2

sdj                      0         6

sdj                      0         3


sdj                      0         1
 


 




Filesystem Fragmentation

(Back to top)


Filesystem fragmentation causes data blocks to be scattered through a filesystem in a non-contiguous manner. This reduces performance by increasing the amount of time and movement that is required to access data blocks and reduces performance. When troubleshooting performance, use /opt/VRTS/bin/fsadm to check for VxFS filesystem fragmentation.


More information about using fsadm to analyze and defragment a filesystem can be found here:

"How to interpret directory and extent fragmentation report from fsadm -E and fsadm -D output"
http://www.symantec.com/docs/TECH162195


 




Volume and subdisk bottlenecks

(Back to top)


Use vxprint to display the objects that are contained by the diskgroup (Figure 3).


From the vxprint output in Figure 3, notice that:

  • Disk group datadg has three volumes: "engvol," "hrvol" and "locks."
  • Each volume has one subdisk: "datadg01-02," "datadg01-01" and "datadg04-01."
  • Two of the subdisks, "datadg01-02," "datadg01-01" both reside on the same disk: "datadg01."
  • One of the subdisks, "datadg04-01," resides on its own disk: "datadg04."

Note: A subdisk is simply a contiguous "piece" of a volume. A volume that spans two disks is typically broken into two subdisks. A volume that only resides on a single disk might only have one subdisk, but this can vary depending on the volume structure. Subdisks are tagged with an "sd" by vxprint.




Figure 3 - Using vxprint to display a diskgroup


Syntax:

vxprint -ht


Example, with typical output:

# vxprint -ht

dg datadg       default      default  10000    1336408747.34.Server101

dm datadg01     disk_3       auto     65536    2027264  -
dm datadg02     disk_4       auto     65536    2027264  -

engvol       -            ENABLED  ACTIVE   819200   SELECT    -        fsgen
pl engvol-01    engvol       ENABLED  ACTIVE   819200   CONCAT    -        RW
sd datadg01-02  engvol-01    datadg01 1024000  819200   0         disk_3   ENA

hrvol        -            ENABLED  ACTIVE   1024000  SELECT    -        fsgen
pl hrvol-01     hrvol        ENABLED  ACTIVE   1024000  CONCAT    -        RW
sd datadg01-01  hrvol-01     datadg01 0        1024000  0         disk_3   ENA

locks        -            ENABLED  ACTIVE   102400   SELECT    -        fsgen
pl locks-01     locks        ENABLED  ACTIVE   102400   CONCAT    -        RW
sd datadg02-01  locks-01     datadg02 0        102400   0         disk_4   ENA

 


Use vxstat to gather I/O performance statistics about this disk group (Figure 4):

In particular, look for bottlenecks:

  • Does vxstat show that multiple, busy volumes (or subdisks) reside on the same disk? Moving a busy volume, or subdisk, to its own disk may improve performance.
  • Does vxstat show that the I/O is composed of significantly more read operations than write operations? Mirroring a volume often improves read performance. However, mirroring also usually degrades the write performance slightly due to the increased work required to maintain multiple copies of the data.


For example, the vxstat output in Figure 4 shows that disk "datadg01" has virtually all of the I/O activity, while disk "datadg02" has none. Recall from Figure 4 that both volumes "hrvol" and "engvol" reside on disk "datadg01," while volume "locks" has disk "datadg02" to itself. In this example, performance may be improved by simply moving either "engvol" or "hrvol" to another disk. Also, notice that most of the I/O is composed of write operations. In this case, mirroring either volume for performance reasons is not recommended.


Note: In this article, the term "disk" is used in a generic sense. A "disk" that is presented across a SAN typically refers to a LUN, which is associated with a logical group of multiple, physical disks.

When moving a subdisk for performance reasons, the target LUN should reside on a different set of physical "spindles" (individual, physical disks) than the source LUN. Moving a subdisk to a target LUN that uses the same spindles as the source LUN is unlikely to improve performance because the same physical spindles are still being used by both subdisks. This undermines the purpose of moving the subdisk.

 




Figure 4 - Using vxstat to gather performance statistics about a disk group


Syntax:

vxstat -g <diskgroup> -vpsduh -i <time_interval> -c <number_of_samples_to_gather>


Example, with typical output:


Note: Notice that the first sample is the cumulative total since the statistics were last reset. Resetting the statistics manually can be done with vxstat -g <diskgroup> -r.



# vxstat -g datadg -vpsduh -i30 -c3

                      OPERATIONS          BYTES           AVG TIME(ms)
TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE

Tue 09 Apr 2013 10:33:42 AM PDT
dm  datadg01           217     97528      638k     6068m  13.53   3.54
dm  datadg02             0         0         0         0   0.00   0.00
vol engvol              93     47796      268k     2957m  16.54  78.72
pl  engvol-01           93     47796      268k     2957m  16.54  78.72
sd  datadg01-02         93     47796      268k     2957m  16.54  78.72
vol hrvol               93     49580      268k        3g  13.29  17.59
pl  hrvol-01            93     49580      268k        3g  13.29  17.59
sd  datadg01-01         93     49580      268k        3g  13.29  17.59
vol locks                0         0         0         0   0.00   0.00
pl  locks-01             0         0         0         0   0.00   0.00
sd  datadg02-01          0         0         0         0   0.00   0.00

Tue 09 Apr 2013 10:34:12 AM PDT
dm  datadg01             0     11580         0      718m   0.00  51.32
dm  datadg02             0         0         0         0   0.00   0.00
vol engvol               0      1089         0       68m   0.00 418.25
pl  engvol-01            0      1089         0       68m   0.00 418.25
sd  datadg01-02          0      1089         0       68m   0.00 418.25
vol hrvol                0     10491         0      651m   0.00  13.23
pl  hrvol-01             0     10491         0      651m   0.00  13.23
sd  datadg01-01          0     10491         0      651m   0.00  13.23
vol locks                0         0         0         0   0.00   0.00
pl  locks-01             0         0         0         0   0.00   0.00
sd  datadg02-01          0         0         0         0   0.00   0.00

Tue 09 Apr 2013 10:34:42 AM PDT
dm  datadg01             0     10130         0      629m   0.00 367.85
dm  datadg02             0         0         0         0   0.00   0.00
dm  datadg03             0         0         0         0   0.00   0.00
dm  datadg04             0         0         0         0   0.00   0.00
vol engvol               0      4445         0      276m   0.00 819.02
pl  engvol-01            0      4445         0      276m   0.00 819.02
sd  datadg01-02          0      4445         0      276m   0.00 819.02
vol hrvol                0      5685         0      353m   0.00  15.09
pl  hrvol-01             0      5685         0      353m   0.00  15.09
sd  datadg01-01          0      5685         0      353m   0.00  15.09
vol locks                0         0         0         0   0.00   0.00
pl  locks-01             0         0         0         0   0.00   0.00
sd  datadg02-01          0         0         0         0   0.00   0.00

 





 

Stripe set performance considerations

(Back to top)


By striping data across multiple spindles (physical disks) I/O can be processed in a parallel manner, increasing peformance. Vxtrace can be used to analyze the characteristics of I/O that is being written to a volume. This is useful for distinguishing random I/O from sequential I/O, the typical length (in sectors) of each I/O transaction, and how the I/O is being fragmented across multiple columns.


More information about stripe set performance can be found here:

"Stripe set performance considerations in Veritas Storage Foundation"
http://www.symantec.com/docs/TECH204950



 



RAID Contention

(Back to top)

 

Many disk arrays have their own built-in RAID capability. A single "disk," or LUN, that is presented from a disk array may actually be a group of several hardware spindles (physical disks) that are a part of a RAID set. This creates the possibility that volume performance may be affected by multiple RAID configurations at the same time: one on the hardware layer (controlled by the disk array) and one on the software layer (controlled by Veritas). When configuring a RAID set, it is important to consider the performance effect that a RAID layout at one layer will affect the performance of another layer.

For example, configuring a RAID-5 set within Veritas, using LUNs that are also a part of a RAID-5 set within the disk array, will likely result in contention between the two RAID logics, decreasing performance, creating additional work for the disk spindles and increasing the chance of a hardware failure.

Alternatively, it is common to combine striping without parity (RAID-0) and mirroring (RAID-1) into configurations that improve both performance and data availability.






DMP I/O policy

(Back to top)


Review the DMP I/O policy for the disks. In some cases, switching to a different I/O policy may improve performance. For disk arrays that support "active/active" multipathing, "MinimumQ" (also known as "Least Queue Depth") is the default I/O policy, and it often provides the best I/O performance with little configuration required. However, the appropriate policy will depend on the environment and the type of I/O.


More information about changing the DMP I/O policy can be found here:

"How to change the DMP I/O policy and monitor for performance"
http://www.symantec.com/docs/TECH146325







Mount options

(Back to top)

The mount options for a volume can have a significant impact on performance. In particular, adjusting the intent log mount option may increase or decrease performance by 15-20 percent. Currently, the default mount log option is "delaylog."

Use mount, to determine the current mount options for a volume (Figure 5). Mount options can be changed by dismounting and mounting the volume while specifying the desired option. This can be done manually, using mount (Figure 5) or by modifying a system configuration file, such as etc/fstab (or vfstab).


Figure 5


Syntax:

mount | grep -i vxfs


Example, with typical output:

# mount | grep -i vxfs

/dev/vx/dsk/datadg/vol1 on /vol1 type vxfs (rw,delaylog,largefiles,ioerror=mwdisable)
/dev/vx/dsk/datadg/locks on /var/tmp/locks type vxfs (rw,delaylog,largefiles,ioerror=mwdisable)

 




Table 2 - Basic Intent log mount options.

Mount Option Description Performance Considerations
log Writes are not acknowledged until the data has actually been written to the disk.
  • 15-20 percent slower performance when compared to delaylog.
  • Greatest level of data integrity.
delaylog Some writes are  first written to filesystem cache and then later committed to the disk, after a slight delay.  
  • 15-20 percent faster performance when compared to log.
  • If a volume is dismounted ungracefully, the most recent writes may be lost.
  • The default setting in current versions.
tmplog Writes are only committed to the disk when the kernel write buffer is full
  • Even faster performance than delaylog.
  • Much greater risk of losing recent writes. Only recommended for temporary data.

 


A detailed explanation of each of the mount options, including information on other mount options can be found here:

"Mounting a VxFS file system" (from the Veritas Storage Foundation 6.0 Administrators Guide for Solaris)
https://sort.symantec.com/public/documents/sfha/6.0.1/solaris/productguides/html/sf_admin/ch07s03.htm

 




Filesystem allocation unit size

(Back to Top)


The default file system allocation unit size, commonly referred to as the "block size," for VxFS is 1 KB for file systems that are smaller than 1 TB. For filesystems that are 1 TB, or larger, the default file allocation unit size is 8 KB.

When creating a file system, it is possible to specify a file allocation unit size by using the "-o bsize" argument with /opt/VRTS/bin/mkfs (Figure 6).

As a rough guideline, a file system with a smaller block size tends to be the most efficient for a volume that is primarily composed of small files. For a volume that contains mostly larger files, use a larger block size. Some application vendors provide recommendations for an optimal filesystem block size. It is usually best to follow their guidelines when creating a new filesystem. Ultimately, the best way to determine the optimal block size is to use benchmarking tools to measure the performance of a file system, at different block sizes, before a volume is placed into production.


Note: Changing the file system allocation unit size requires reformatting the volume.


Figure 6 - Specifying a file allocation unit size with /opt/VRTS/bin/mkfs


Syntax:

/opt/VRTS/bin/mkfs -t|F vxfs -o bsize=<desired_block_size> <path_to_volume>


Example, with typical output:

# /opt/VRTS/bin/mkfs -t vxfs -o bsize=4096 /dev/vx/rdsk/datadg/mgmtvol

    version 9 layout
    102400 sectors, 12800 blocks of size 4096, log size 256 blocks
    rcq size 256 blocks
    largefiles supported

 




Use fstyp to determine the filesystem block size (Figure 7).

Figure 7


Syntax:

fstyp -t|F vxfs -v <path_to_volume>


Example, with typical output:

# fstyp -t vxfs -v /dev/vx/rdsk/datadg/mgmtvol

vxfs
magic a501fcf5  version 9  ctime Wed 10 Apr 2013 11:37:59 AM PDT
logstart 0  logend 0
bsize  4096 size  12800 dsize  12800  ninode 0  nau 0
defiextsize 0  ilbsize 0  immedlen 96  ndaddr 10
aufirst 0  emap 0  imap 0  iextop 0  istart 0
bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0
nindir 2048  aulen 32768  auimlen 0  auemlen 2
auilen 0  aupad 0  aublocks 32768  maxtier 15
inopb 16  inopau 0  ndiripau 0  iaddrlen 2   bshift 12
inoshift 4  bmask fffff000  boffmask fff  checksum f66f5f0c
oltext1 14  oltext2 1030  oltsize 1  checksum2 0
free 11993  ifree 0
efree  1 0 0 1 1 2 2 0 2 2 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

 

 

 




Article URL http://www.symantec.com/docs/TECH202712


Terms of use for this information are found in Legal Notices