Video Screencast Help
Storage and Availability Management

Curb Unstructured Data Growth

Created: 15 Jun 2011 • Updated: 11 Jun 2014
c3lsius's picture
0 0 Votes
Login to vote

An interesting article written by Jerome Wendt of DCIG about Data Insight's new SharePoint support functionality:

Data Insight Extends Data Ownership Classification Capabilities to Microsoft SharePoint

Microsoft SharePoint is fast replacing network file servers as the preferred tool for information sharing and workplace collaboration within enterprises. But as that occurs, the same set of data management issues that exist on network file servers are re-surfacing in these environments. By Symantec now extending the capabilities of its Data Insight to reach into SharePoint, enterprises can be assured that they are only keeping the data that they need in SharePoint while confidently archiving, deleting or re-assigning the rest.

Sheila Childs, Research Director for Gartner's Storage Strategies and Technologies group, said last year that 70% of data is duplicate and has not been accessed in over 90 days. A subsequent report released by Symantec corroborated Gartner's finding as it uncovered that 75% of data had not been accessed in over 90 days. But what was possibly most enlightening in Symantec's report was that 56% of the total amount of data had not been accessed in over 12 months.

Understanding the reasons as to why data remains on corporate file servers for so long is what eventually led Symantec to release Data Insight for Storage over a year ago. One of the reasons that organizations cited for retaining data on network file servers as opposed to archiving or deleting it was that they could not confirm who or what owned it. So rather than risk disrupting a production application or a business process that may need that file at some point in the future, they kept it on production storage.

However network file servers are not the only area where questions over data ownership and relevance have crept in. As organizations seek to capitalize on the value of the data they have and encourage collaboration between their employees, Microsoft SharePoint is becoming the new de facto standard for many organizations.

The Association for Information and Image Management (AIIM) estimates that 74 to 98 percent of enterprises are planning to try SharePoint. But a 2010 AIIM report that surveyed 624 AIIM members found:
 

  • One-third of these organizations have no plans as to where and where not to use SharePoint
  • 26% reported that IT departments are driving SharePoint deployments with no input from information management professionals
  • Only 28% have legal discovery and legal hold policies in place that extend to SharePoint
  • 43% have yet to bring SharePoint-stored content under their existing retention and long-term archiving policies

So what we are seeing is problems that were and still are associated with the deployment and management of enterprise network file servers creeping into enterprise SharePoint deployments. This made it a logical next step for Symantec to extend the capabilities of Data Insight to SharePoint.

One of the first anticipated use cases for Data Insight in SharePoint environments is planning data migrations from either File Servers or older versions of SharePoint to SharePoint 2010. As with any data migration, the first step is to identify what data is relevant whether its folders/shares in file servers or document libraries/sites in SharePoint and what data can be either archived, removed or kept under retention.

The data migration process also requires coordination both in terms of migration policy since the data owners are the best judge of the business value of the data as well as actual movement to mitigate business process disruption.

This is where Data Insight's automated data ownership inference and orphan data identification can save IT weeks and months of manual effort. (How Data Insight does this for file servers was discussed in a prior blog entry.)

To accomplish this in SharePoint, Data Insight relies upon native SharePoint auditing features to collect who is using what data. This data is collected by placing one Data Insight solution on each SharePoint farm which then gathers the data and sends it back to the main Data Insight repository.

Using this data, SharePoint administrators can then ascertain how much data they need to migrate from SharePoint 2007 to SharePoint 2010 and then size the SharePoint 2010 repository accordingly. Administrators can also derive relationships such as data owners and stakeholders based on who has actually used the data.

Once the data is in SharePoint, organizations in many ways face challenges similar to those seen on file servers. Users leave the company and the data or entire site becomes orphaned within SharePoint. The problem is somewhat compounded with the self-service paradigm of SharePoint where end-users have flexibility to create sites/libraries but IT lacks visibility and control.

Now organizations can continue to use Data Insight to first identify this orphaned data or site in SharePoint environments in the same way they do now on network file servers and then confidently re-assign this data to other users, archive or even delete it.