Video Screencast Help
Search Video Help Close Back
to help

Hadoop High Availabilty or Highly Available Hadoop

Created: 04 Jun 2012 | Updated: 05 Jun 2012
Rags Srinivasan's picture
0 0 Votes
Login to vote

For enterprises, what comes first? 

Adopting a solution that is not highly available and then try to make it enterprise ready or start with high availability as core feature then add the power of analytics?

In my past few articles I wrote about the challenges to adopting Hadoop in the enterprise and what it would take to make it enterprise ready. One of the points I highlighted is the NameNode high availability or the lack of it. In Hadoop Distributed File System (HDFS), NameNode is the metadata server that has the location information for data blocks distributed across DataNodes. If NameNode fails, the cluster would be unavailable to analytics applications.

The Hadoop community has been working on a solution to add High Availability to HDFS. The solution entails adding another NameNode with shared storage and changing DataNodes to send BlockReports to both. While this is an acceptable way to  make HDFS highly available it tries to fix the issue by adding complexity and not by removing the root cause.

This is what I refer to as Hadoop High Availability.

A different approach is prevention. That is the approach we took with  Symantec Enterprise Solution for Hadoop that is built on Cluster File System. We start with enterprise grade infrastructure (hardware and software) that is highly available and provide the ability to run analytics on it.

In this solution we provide a software layer that replaces Hadoop Distributed File System (HDFS)  and hence its limitations. In other words, no NameNode and hence no single point of failure and all the code complexities to make it highly available.  This implementation is protocol compatible with HDFS and seamlessly supports rest of the Hadoop stack (such as MapReduce). This allows us to deliver enterprise ready Big Data solution without trade-offs.

This is what I refer to as Highly Available Hadoop.

Our position is enterprises need not sacrifice on high availability to take advantage of analytics nor add high availability as afterthought. Choose Highly Available Hadoop.