Software defined storage (SDS) is a term that seems to have spread like wildfire in the last few months. But what actually is it and what are the benefits of it? In this post, I share my thoughts based on current research and conversations with peers.
SDS feels very much feels like a marketing term at the moment - used by vendors to capture mindshare and attachment to the broader topic of software-defined datacentre - a term popularized by VMware last year. For further reading on software-defined datacentre implementations, please review the following VMware case study that features Symantec’s recent internal transformation.
SDS, rather than being an actual product, is really an architectural outcome for next generation storage designs. Founded on storage virtualization, SDS uses abstraction to pool storage from commodity hardware to provide flexible, efficient and cost-effective storage services. You might say, “well can’t we do that already with current storage virtualization technologies?” Well yes to an extent, but therein lies the difference to what SDS promises and what can be delivered today.
Traditional storage virtualization technologies provide an abstraction layer to simplify storage management through common tools sets, some of which support a wide variety of hardware and operating system vendors to give greater flexibility. These tools facilitate many advanced functions such as snapshots, replication, incremental mirror syncs, de-duplication and compression to name a few.
However, traditional storage provisioning and management techniques are still time-consuming and complex. One exception to this is thin provisioning. Organizations that have operational maturity to confidently over-provision have found day-to-day storage management simplified to some extent.
SDS design principles are to support a self-service, policy-driven architecture that delivers, tracks and self-remediates storage services to meet SLA’s.
Gartner defines SDS as having a number of key outcomes that differ from what broadly, traditional storage virtualization offers today:
- Orchestration: orchestrate storage services independent of where data is placed and how it is stored by matching SLA requirements and capabilities through an intelligent software layer.
- Automation: improves asset management and staff productivity by leveraging intelligent software to service requests and ongoing operational management.
- Commoditization: leverage commodity hardware to reduce dependency on proprietary hardware and puts the intelligence in the software.
- Programmability: common sets of APIs to integrate a wide variety of storage, compute, network and application services together.
So where are organizations today on the journey towards a SDS architecture? From my experience, it is in research mode discussing what it is and how it might benefit them.
What I have seen though, which is arguably an early iteration on the journey towards SDS, is the re-emergence of direct attached storage (DAS) and in turn the increased adoption of flash-based solid state drives (SSD).
Traditionally DAS has been associated with scale-up architectures that led to the creation of storage islands. With the general availability of significantly faster interconnects being more affordable (e.g. 1GbE/10GbE and Infiniband), DAS has re-emerged within scale-out, shared nothing grid architectures. Take for instance the advent of Big Data initiatives that are designed to leverage cheap commodity DAS storage.
SSD’s can be implemented in different form factors. The most popular I see are in dedicated flash arrays connected via traditional SAN or PCIe cards installed directly into the server. The former offers obvious performance advantages by servicing higher random I/O rates on much faster SSD than traditional spinning hard disk drives (HDD). The latter is able to move hot data closer to the compute engine within the server for lower latency and improved response times.
However, both these trends need to consider an intelligent software layer within the architecture to help develop adoption and use cases further. Symantec specifically helps address two key areas with its upcoming Cluster File System 6.1 version:
- I/O Optimization: ensure the right data is on the right tier of storage based on its importance and use at any point in time.
- Automatic read and write-back caching to underlying SAN storage to maximize the use of relatively expensive SSDs for hot data whilst flushing cool data down to cheaper storage.
- Write coalescing helps increase the life span of SSDs
- Pre-determine hot data by pinning it into cache
- Data redundancy: Hot data stored locally within server SDDs leaves potential for a single point of failure unless methods are put in place to provide redundancy.
- Cache reflection to spread the data across multiple physical servers to protect data against the loss of a server.
The flash/SDD market still has a healthy number of startup organizations that are making strong in-roads into the datacentre. However, they are rapidly being consumed by larger traditional storage companies. This consolidation could help accelerate and achieve the vision of SDS. With much wider portfolios of diverse technologies these larger companies possess the intellectual property, R&D budgets and marketing capabilities to drive the integration needed.
So, if you are offered some software-defined storage from a vendor today? I’d recommend caution, because as far as I can see, it doesn’t currently exist.