Dynamic Multi Path (DMP) - Historical Perspective and Evolution
The Dynamic Multi Path (DMP) capability was first introduced in Volume Manager 2.5.7 release primarily to support A/A multi path arrays from Sun Microsystems. In those days, the early version of arrays had SCSI connections with fat and thick cables running from host to the array directly. Unlike the present day modern arrays, there was no concept of fabric switches, and no fiber channel (FC) technology enabled on the arrays. The first DMP could do only multi path operations such as load balancing using balanced path I/O policy, path failures and restoration by using SCSI inquiry commands. Since the number of devices were handful, the error processing and restore processing were all single threaded tasks. Further, as the number of supported arrays was small, the entire device discovery and reconfiguration was closely tied to the parent operating system.
The next major enhancements of DMP were introduced in Volume Manager 3.1.1 release that had support to co-exist with Alternate Path (AP) driver from Sun. DMP used private calls to identify the Third Party Driver (TPD) meta-devices and the sub-paths controlled by the TPD. The 3.1.1 release also enabled cluster support in DMP for active/passive (A/P) arrays. Incidentally, the first A/P array supported on DMP is the Nike array from Data Group (DG) that was supported on HP platform. It was at this time that storage technology was growing at a fast pace as the number of array vendors had increased, and FC had already set in.
DMP architecture until 3.1.1 was not very flexible in adding support for new disk arrays as it was closely tied to host operating system and required the host OS to be rebooted. This was alleviated in the next release of Volume Manager (3.2) and the concept of Array Support Library (ASL) was introduced. The introduction of ASL framework made it simple for an array vendor to write a simple shared library and link it dynamically with VM without having the system to reboot. By this time, the array technologies were moving at a fast pace and FC had almost become the de facto standard of industry. There were also a variety of arrays vendors in the market with different architectures and failover mechanisms. The advent of FC led to explosion in the number of devices that could be connected to a host and led to the concept of Storage Area Network (SAN) using FC switches.
Some of the technological advancements were not addressed well in DMP 3.2 architecture. It was not a straightforward task to support A/PF (active passive with explicit failover) arrays in DMP. Further, the balanced path I/O policy that was primarily designed for direct SCSI attached arrays was not efficient enough to load balance for arrays connected via FC. Since the number of devices had increased significantly, the device discovery was proving costly. To get around these problems, the next major release of Volume Manager (4.0) saw Array Policy Module (APM), more I/O policies and partial discovery. The APM, analogous to user land counterpart ASL, was tailored to handle array specific problems such as initiating failover and supporting array specific technologies such as NDU (Non-Disruptive Upgrade) from EMC. The set of I/O policies was enhanced to include minimum-Q, round robin, priority based, adaptive and single active. To limit the discovery to a handful of devices, the concept of partial discovery was also introduced. Further, event based discovery via Event Source Daemon (ESD) was introduced on Solaris as Solaris had the provision to notify registered consumers about disk events.
Despite these advancements, the single threaded error processing and dependency on SCSI interfaces posed lots of hurdles when DMP was deployed in SAN with large number of devices. The problems became much more pronounced when a portion of SAN was disrupted leading to access failure for lots of devices. The next major release of Volume Manager 5.0 saw significant architectural change in DMP. The entire framework was modified to suit multi threaded environment and DMP also did away with dependency on SCSI driver during error processing and instead started using HBA interfaces directly (e.g. SCSA interface on Solaris). The primary advantage of using the HBA interface was the asynchronous mechanism provided by the interfaces such that error processing was truly multi threaded and non-blocking. Further, DMP also used SNIA HBA API to gather elaborate information about SAN and used that information effectively during SAN disruptions to divert I/O on non-affected paths thereby minimizing application outage. The device discovery using ASL was speeded up and DMP also started to pin down the devices by using open caching to prevent data corruption on disks that were previously removed without its knowledge. Various throttling policies and error policies were defined and exposed that enabled the users to tune applications. All the DMP tunable variables could tuned online without having to reboot the host OS.
In summary, DMP has evolved over the years along with the storage technology, leveraging the technology at every juncture, and evolving at being one of the best MP solutions providing support with largest Hardware Compatibility List (HCL) for majority of tier-I and tier-II arrays. In the subsequent articles, I shall describe the technological advancements in DMP in every major release.
-- Ameya P. Usgaonkar