In storage infrastructures, the I/O bottlenecks and resultant loss in performance are typically seen at the interconnection points where multiple sources of I/O traffic meet, such as HBAs on the host, the storage array controller, storage array ports etc. In VMware vSphere environments, the solution that is currently available for such I/O bottleneck problems is to move the data to another storage array. However this data movement is time consuming and cannot be used for frequent changes in I/O traffic landscape.
Since VxDMP can establish the relationship between the LUNs and the transport (HBA to storage array port) links being used for the I/O traffic (across multiple hosts), it provides visibility to the vSphere administrator regarding usage of a particular shared storage component (HBA, Storage array controller, storage array port, etc). Therefore VxDMP enables the virtual infrastructure administrator to make sure all the components in the infrastructure are equally loaded thereby avoiding traffic congestions and large latencies at various levels. It also provides methods for resolving such imbalances if they do occur in the environment by adjusting the traffic movement without requiring any data movement.
The following steps outline how the virtual infrastructure administrator can use the VxDMP UI to identify and eliminate I/O bottlenecks when faced with a problem of large I/O latencies being seen for a particular virtual machine or data store.
- After identifying the host and datastore on which the virtual machine is currently running, access the VxDMP tab associated with the host in the vSphere client to see the various storage array instances (also called enclosures) connected to the host, along with the I/O load distribution indicating how the arrays are loading the I/O channel (HBAs) available to the host.
This data can also be automatically refreshed at regular intervals to view real-time operational statistics. The same data is gathered by Veritas Operations Manager (VoM) running in the environment to provide long term I/O traffic trends.
- Select the HBA view to note the I/O load distribution across the HBAs and note any imbalances that need to be corrected to address high latencies seen for the LUNs underneath the given virtual machine or datastore.
- Similarly check the array port based traffic distribution for the storage array that houses the LUNs associated with the virtual machine or datastore, for any I/O load imbalances. VxDMP is intelligent enough to send less traffic on ports that are congested and thus have large latencies.
- Once it’s confirmed that the I/O traffic distribution from the host is balanced, check if the bottleneck is at the storage side by switching to the VxDMP tab within the Data Center in the vSphere client UI and selecting the storage array for the LUNs being analyzed.
This view provides a complete picture of the ESX hosts using the storage array. A bird’s eye view of the I/O load distribution across the storage array ports is shown as a pie-chart for quick analysis. Finer granular distribution can be obtained using the histogram that shows the distribution of I/O traffic from all the hosts connected to the storage array filtered by the storage processor (or controller) and the storage array ports. This also provides mapping of hosts connected to each of the storage array controller and array ports without needing to use the storage array management interface directly.
For each of the above subsets (host filtered by storage process/ports), a more detailed distribution can be obtained at a virtual machine and virtual disk level in a popup window by clicking on the histogram bar.
All I/O statistics can be exported in a CSV format for additional analysis.
This information provides visibility to why a particular storage port, storage controller or storage array is bottlenecked and also provides the list of VMware entities that are contributing to or being affected by the bottleneck.
- Once the problem area has been identified, in order to achieve load balancing, one of the easiest ways the vSphere administrator can distribute traffic is by putting paths of certain LUNs on ‘standby’ so that the paths are ‘available’ for I/O but are not actively used unless all available paths have either ‘failed’ due to loss of connectivity or have been manually ‘disabled’ by the administrator.
- Another way to balance I/O load is to identify if the set of LUNs that belong to the given virtual machine or data store are accessible from another ESX host using a different set of storage array ports or storage controllers thus providing an alternate storage access path for the same virtual machine or storage LUN.
The VxDMP vSphere UI plugin provides this information at the data center level. The storage array view under the VxDMP tab provides a view of all the LUNs exposed by the storage array to VMware ESX hosts in the data center along with the DMP discovered attributes of the LUNs. By selecting one or more LUNs that comprise the virtual machine or data store and choosing the ‘show connected hosts’ option via right-click, one can view the mapping of the hosts sharing the set of LUNs along with the VMs on each using them (if any) and details of the exact storage access paths (storage controller and port) being used.
As can be seen, typically the problem of large I/O latencies and performance degradation is with storage connectivity and does not require data movement to mitigate it. A re-adjustment of connections could just do wonders.
I hope this gives a very good picture of how VxDMP can help identify and solve the frequently encountered I/O bottleneck problem in VMware vSphere environments. I would like to know what you think about this and appreciate if you can provide details on how VxDMP helped you gain storage visibility and manage your storage infrastructure better.