Enterprises seek to capitalize on their data growth by mining and analyzing it for business insights. While Hadoop excels at large-volume analytics, it has limitations addressing enterprise requirements around storage and high availability. Symantec's Big Data solution leverages existing infrastructure to make Hadoop enterprise-ready without contributing to data center sprawl.
Big data is the buzzword du jour and for good reason. Mountains of sensor data, social media posts, online purchase transactions, and other forms of unstructured data are adding to already burgeoning corporate data stores.
More than just volume, however, the big data explosion has two other essential elements: The data is coming with increasing velocity, typically in real-time, and it encompasses an eclectic variety, from audio and video to click streams to log files and beyond. The big data movement shows no signs of slowing down. In fact, market research firm International Data Corp. is forecasting the big data market to skyrocket from $3.2 billion in 2010 to $16.9 billion in 2015.¹
So what's the big deal behind all of this big data? Armed with the right tools, companies can mine the data to unlock invaluable business insights—for instance, helping financial institutions detect fraud on high-transaction volumes or enabling medical institutions to uncover patterns that lead to cures for disease. Traditional, relational database management tools aren't able to handle the volume, velocity, and variety of all the different data types aiding in that kind of analysis. Enter a new breed of data mining tools, which support computationally-intensive analytics like clustering and targeting and have a unique ability to parse through massive amounts of both structured and unstructured data and scale accordingly. As a result, enterprises are scrambling to put solutions in place that allow them to profit from big data.
The Not-So-Enterprise-Ready HadoopA linchpin in the new big data lineup is Apache Hadoop, a distributed framework for processing large data sets. While Hadoop excels at large-volume analytics, it presents severe limitations for the enterprise, particularly as it relates to storage. Because Hadoop clusters are built with numerous commodity hosts, each with their own direct-attached storage, organizations are faced with building out massively parallel systems with potentially thousands of compute nodes, at a time when server consolidation, virtualization, and private clouds have gained traction as part of a move towards consolidation. Moreover, to ensure some level of resiliency, Hadoop stores three copies of everything, which translates into three times as much storage, again promoting the problem of data center server and storage sprawl.
Hadoop's other Achilles Heel is that it's not highly available. Data is distributed across multiple nodes, but there is only one NameNode or metadata server in the cluster. As a result, all applications must pass through this single point to access data, creating both a performance bottleneck and a single point of failure. On top of everything else, data has to be migrated to the Hadoop cluster for analysis, but the platform's support for batch processing demands significant data movement, adding to its cost and complexity. Finally, the lack of reliable backup solutions for Hadoop clusters, coupled with its three-copy workaround, doesn't address the enterprise requirement for archiving or point-in-time recovery.
Cluster File System To The RescueRecognizing these limitations, Symantec set out to develop a solution to allow enterprises to take advantage of Hadoop while leveraging their existing infrastructure. The system enables data to remain in source systems for analytics processing, avoiding the expense of migrating data to Hadoop; at the same time, the Symantec solution also mitigates Hadoop's single point of failure and enables use of standard backup and archival tools.
Called the Symantec Enterprise Solution for Hadoop, the offering is built on the Cluster File System, a high-performance file system for fast failover of applications and databases. Running Hadoop on the Cluster File System enables organizations to:
- Run analytics wherever the data sits, eliminating the costly data moves, while scaling to up to 16 petabytes for analysis, including structured, unstructured, and media information.
- Apply standard enterprise backup and storage capabilities like snapshots, deduplication, and compression, to transform Hadoop storage utilization to become more efficient, addressing the environment's "triple the storage" limitation.
- The Symantec solution also supports existing infrastructure, including the majority of storage arrays, so organizations don't have to purchase any additional hardware to run analytics.
The Symantec Enterprise Solution for Hadoop, slated for availability in the fall, can help enterprises tap into the value of Hadoop-powered analytics without running interference on data center inefficiencies or incurring costly infrastructure upgrades.
to get early access to the software and learn how Symantec can help you leverage existing infrastructure to accommodate the brave new world of big data analytics.