Video Screencast Help
Search Video Help Close Back
to help
Not able to make it to Vision this year? Get a sampling in the Best of Vision on Demand group.

Storage & Clustering Community Blog

Rags Srinivasan | 21 hours 39 min ago | 1 comment

The biggest driver for adoption of Hadoop is its promise of unlocking value from an enterprise’s vast data store. Use cases that show incremental revenue from data analysis are very well publicized.  Every organization strives to achieve that and wants to leverage the power of data analytics to drive its revenues. Promises aside, Hadoop storage has severe issues that calls  into question its place in the enterprise datacenter.

  1. Increased storage and server sprawl – Hadoop cluster is built  with numerous commodity hosts, each with its own direct attached storage. Just when datacenter architects have spent considerable time and resources consolidating their datacenters and reducing footprint through server consolidation, virtualization and private cloud,  Hadoop requires them to build out a massively parallel system with hundreds or even thousands of compute nodes. Managing these numerous nodes and keeping them up to date...
Rags Srinivasan | 22 May 2012 | 0 comments

The siren song of Big Data analysis is,

"Don't filter data before you collect, don't try to decide whether or not certain data is relevant, collect everything. Analysis of such large volumes of data is bound to find something interesting".

Let us look at an nice simple study reported recently about cyclists wearing helmets. This comes to us from an article in The Wall Street Journal. The main finding is,

"Bike helmets make men ride faster".

The question we need to ask about such a causation claim is how was the study conducted. The study falls in the category of Big Data analysis we see conducted with large volumes of unrelated data, just because it is available.

Data was collected daily at seven locations, each equipped with two cameras programmed to detect moving objects...

Rags Srinivasan | 18 May 2012 | 0 comments

Suppose you read the following headline in a major newspaper article, what would you think?

Student Test Scores Tied to Number of Bathrooms in their Homes

Let us say, this article is also associated with a chart showing this relation

 

Look at those near perfect correlations. Should we start adding more bathrooms to help our children?

Except there is no such study but very close.  The x-axis is actually income level of the family. While we see a nice positive correlation between income and test scores, Harvard Economics Professor Greg Mankiw warns us about the spurious correlation using...

Rags Srinivasan | 18 May 2012 | 0 comments

IDC analysts Jean S. Bozman and Laura DuBois published their latest analysis from Symantec Vision conference. In the May 15th IDC LINK (subscription required) they had this to say about Symantec solution (bold text mine)

Big Data. Symantec is readying a product that leverages its clustering file system (CFS) to manage Hadoop-style workloads for the enterprise, through compatible APIs. The solution, which is designed to enable datacenters to leverage open-source Hadoop for enterprise workloads with high availability, will use customers' existing infrastructure. Although the broad outlines for this offering were discussed at VISION during technical sessions, this product would be...

Rags Srinivasan | 17 May 2012 | 0 comments

Does Big Data ( big on volume and variety) mean better insights?
Taking this to the extreme, does Big Data eliminate application of thought?

A recent New York Times article writes,

Big Data, which should probably be called Big Analysis, is about looking at that information in novel ways to find new patterns for prediction.

I agree with their call but let us not try to change an accepted terminology. What Times article states is the fact that we are able to look at data in new ways with newer tools. The value add comes more from analytics applications that help in answering the question at hand.

At the extreme I refer to above, we see those who favor relying on volume of data, on the Bigness of Big Data, to tell us what to do. The next logical step for them is to include every possible data source and every bit of data in the analysis...

Rags Srinivasan | 16 May 2012 | 0 comments

Columbia Business School’s Center on Global Brand Leadership and the New York American Marketing Association (NYAMA) recently published their research on role of data and analytics in Marketing. Their report titled, "Marketing ROI in the Era of Big Data",provides key insights into how enterprises are applying Big Data analytics in their marketing decisions.

To me the most important finding of this study is the gap between desire and reality.

While 91% want to be data driven in their decisions, it has not yet reflected in practice

Enterprises have been collecting data for long time. What has changed now is the volume, different types of  data (variety) and how fast it is changing (velocity).  In my own conversations I find many enterprises believe more data is not...

Rags Srinivasan | 15 May 2012 | 2 comments

Hadoop is an open source solution from Apache for managing and analyzing Big Data. Its scale-out architecture enables analyzing large volumes of data to find key business insights. Enterprises are turning to Hadoop for its agility and flexibility in data analysis. Hadoop enables enterprises make sense of varieties of data - structured to unstructured - and ask insight questions they were not able to do with traditional tools.

However Hadoop's Distributed File System (HDFS) has a weakness that makes it unattractive to enterprise datacenters. HDFS's meta data server, NameNode, is a single point of failure. When NameNode fails, applications lose access to data stored in many different DataNodes. In a long running analytical application a single failure can prevent enterprises from getting timely business insights.

Symantec's recommendation is to completely eliminate this flaw with a solution we are working on for...

Sharad Srivastava | 15 May 2012 | 0 comments

This blog describes the procedure to configure Hadoop Namenode for high availability under Veritas Cluster Server environment using the Agent Builder tool.

Installing and Configuring Hadoop Namenode for High Availability
Below are the tasks that needs to be performed for installing the Hadoop Namenode for clustering purposes:

  • Allocating shared disk resources : Symantec recommends installing Namenode metadata on a separate and dedicated shared disk resource.
  • Creating disk group, volume, and file system : Create a disk group, volume, and file system on a shared disk resource that is allocated for Namenode metadata.
  • Obtaining dedicated virtual IP address and DNS name : Obtain unique virtual addresses and DNS names for Namenode instance. This address and name is required to support the...
jmartin | 09 May 2012 | 0 comments

 

On May 6, 2012, Symantec completed another release of Symantec Operations Readiness Tool (SORT)!

With SORT’s focus of improving the total customer experience for NetBackup and Storage Foundation customers, we’ve added the following features and improvements to the website:

Storage Foundation High Availability Solutions:

  • Ability to automatically create product notifications in a general checklist and in a custom report      
  • Direct links from the SORT home page to MySORT that are based on System Administrator use cases
  • Addition of remediation commands in risk assessment checks
  • Support for Storage Foundation 6.0RP1 and the Solaris 11 platform
  • Addition of two new Windows risk assessment checks (mirror volume without DRL, volume logging locations and mirrors)
  • Updated process flow through the GUI data collector
  • Notifications to users...
snayak | 08 May 2012 | 0 comments

Storage Foundation snapshots are used extensively for various routine tasks such as backup, continuous access and reporting. VERITAS File System and Volume Manager implements multiple ways of creating a snapshot of data. This presentation covers the overview and use cases of all the snapshot technologies available in VxFS and VxVM, including Storage Checkpoints, FileSnap and volume level Space Optimized snapshots. Please refer attachment for the presentation.