What is High Availability, how available is "Highly Available?" What can VERITAS do to help address availability needs?


High availability can be easily misunderstood. It does not mean always available, nor is it the same as "fault tolerant," or "disaster recovery", although some of the best practices of these disciplines do overlap. Fault tolerance, disaster recovery, and high availability each have their place, and they generally complement each other well. VERITAS Software has solutions to help address each of these disciplines.

High availability, as a general rule, is a configuration of hardware and software that allows for monitoring of services provided by a system and for the transfer of those services to another system in the case of a failure - including catastrophic, complete system failures. It can take several minutes for such a "failover" to complete, depending on the complexity and size of the hardware and applications involved, during which time the services are unavailable. Depending on the availability requirements there may be redundant power sources, cabling, and other equipment in addition to the duplicate computing systems needed for failover of services. VERITAS Volume Manager provides the ability to failover I/O operations between disks (disk mirroring) and/or controllers (multi-pathing). VERITAS File System provides rapid recovery capabilities in the event of a sudden system failure. VERITAS Cluster Server provides the ability to monitor resources and perform the failover operation between systems when a monitored resource becomes unavailable. VERITAS Global Cluster Manager provides the ability to failover an entire site full of systems and the services they are running to another site, when those systems and services are being monitored by VERITAS Cluster Server.

Fault tolerance, on the other hand, usually means a duplication of hardware within a single system that will take over for the twin if there is a failure. Often, the backup hardware will operate in lock-step with the active twin, thus allowing the transition upon failure to occur almost instantly. There is generally not any software involved excepting possibly notification and "cleanup" functions, nor is there any protection against catastrophic system failures. It is usually also quite expensive, and vendor-centric. VERITAS does not provide a hardware fault tolerance solution, although VERITAS Volume Manager, Cluster Server, and Global Cluster Manager are specifically designed to take advantage of redundant hardware.

Disaster recovery is more of a discipline than a technology. In the most complete implementation, it involves the duplication of hardware and software at a remote site, with data replication occurring on a regular basis (if not constantly ongoing.) In the event of the catastrophic failure of the entire primary site (rather than just a particular machine or service), the remote site can take over the responsibilities of providing the services with little or no loss of data. There are many levels of disaster recovery, from simple offsite storage of backups to full-fledged, online replication of data to 'warm-startable' duplicate hardware at multiple remote sites. The level of commitment to disaster recovery determines the cost, as well as how long services will be unavailable. VERITAS NetBackup provides the ability to restore data when necessary. VERITAS Volume Manager provides the ability to 'break off' and 'resynchronize' entire virtual disks, thus allowing backups to occur without affecting the operation of applications. Alternatively, VERITAS File System provides integrated functionality with VERITAS NetBackup which allows backups to occur without requiring additional disks equal to the size of the file system. VERITAS Volume Replicator , a separately licensable component of VERITAS Volume Manager, provides the ability to have up to date copies of data at multiple, widely separated sites. The combination of VERITAS Volume Replicator, VERITAS Cluster Server, and VERITAS Global Cluster Manager provides the ability to rapidly recover from a site-wide disaster.

So, a "total solution" for the highest possible availability of services is likely to include fault tolerant hardware, high availability software, multiple power sources, battery backups, data lines, and other cabling and equipment, and a remote site set up for disaster recovery and well-trained, vigilant system operators and administrators with precisely documented procedures for every conceivable contingency. Such a solution is usually very expensive, and often out-sourced to a company that specializes in maintaining such systems. No matter what level of availability is required, VERITAS has an integrated software solution to enable it.

