Big Data or Big Analysis
Does Big Data ( big on volume and variety) mean better insights?
Taking this to the extreme, does Big Data eliminate application of thought?
A recent New York Times article writes,
Big Data, which should probably be called Big Analysis, is about looking at that information in novel ways to find new patterns for prediction.
I agree with their call but let us not try to change an accepted terminology. What Times article states is the fact that we are able to look at data in new ways with newer tools. The value add comes more from analytics applications that help in answering the question at hand.
At the extreme I refer to above, we see those who favor relying on volume of data, on the Bigness of Big Data, to tell us what to do. The next logical step for them is to include every possible data source and every bit of data in the analysis mix. The problem with this "more of everything" approach is the risk of spurious correlations and tiny statistical anomalies that get magnified. Petabytes of data do not eliminate application of thought. As Hal Varian, Professor at UC Berkeley and Chief Economist at Google, said,
small samples of large data sets can be entirely reliable proxies for the Big Data
That is because decision makers start with a specific problem in hand and a hypothesis about it. Starting with the hypothesis they next seek data to test it. For example, a retailer may frame an hypothesis, "customer's shopping basket is an indication of their marital status". If true then they can define better targeting and cross-selling to increase spend per customer. To test this hypothesis they can mine their customer data by choosing a sample from available large database.
The size driven Big Data approach skips this key initial step of forming a hypothesis. Instead it relies on size of data to unearth interesting patterns that could be of value.
What should enterprises do?
I think the right approach to Big Data analytics is to start with a business challenge you have and a hypothesis about its solution. Test this hypothesis by running data analysis against data you already have. You can always add more data later.
Want to know how you can run Big Data analytics on your existing data? See here for our Hadoop solution.
The Storage and Availability Management Group at Symantec is dedicated to providing solutions that enable efficient storage management and highly available infrastructure. Find news, information and tips that help you to resolve your storage management, high availability and disaster recovery issues across the heterogeneous data center.