RAID is dead

What will take its place? Asks Kumar Abhijeet, Director of Sales, Strategic & International Accounts at Cleversafe.

Friday, 25th October 2013 Posted 12 years ago in by Phil Alsop

OVER THE PAST DECADE, the definition of mission-critical data has changed for most enterprises. The explosion of digital content has changed the nature of the data that organizations need to storage, manage and analyze. According to IDC, over 90 percent of the world’s data is so-called unstructured data—file formats such as Microsoft Word, PowerPoint presentations, PDFs and non-textual content like images, graphics, video and audio files. So, only 10 percent of data now represents what until quite recently was considered core data for a business.

And unstructured data will only continue to grow. According to a 2011 IDC report, the total amount of digital information created and replicated broke the zettabyte (1,000,000,000,000 gigabytes) barrier in 2010. The size of the digital universe is more than doubling every two years, and is expected to grow to almost eight zettabytes by 2015.

Given this data growth, it’s not unrealistic to think that companies will be looking for storage solutions that can grow to accommodate petabytes (1 petabyte = 1 million gigabytes) and even exabytes (1 exabyte = 1 billion gigabytes) of data in this decade. Storing, managing and analyzing unstructured data on such a massive scale creates unique challenges for enterprises. These challenges include:
£ Preventing data loss. When storage systems reach hundreds
of terabytes in scale, drive failures and errors present a constant
challenge. The traditional approach to protecting data against
these failures in a RAID environment is replication. However,
creating three or more copies of the data eventually becomes
difficult to sustain from a cost and administration standpoint.

£ Maintaining an “always on” system. End users and customers
have grown to expect 24/7 access to information—downtime is
unacceptable. But growing RAID rebuild times leave enterprises
at risk of outages and system performance problems.

£ Scaling storage capacity continuously while controlling costs.
Storage devices continue to grow in terms of capacity while
declining in price (more bits per device at a lower cost).
However, even as the price of capacity declines, the cost to
power, cool, house, connect and manage that capacity
continues to pose a challenge to budgets.

RAID is “good enough” for many enterprises now, but it won’t be for long. As more organizations hit the petabyte threshold of storage capacity, they will find that the RAID and replication approach they’ve relied on can no longer deliver the reliability and scalability they need. Object storage solutions that use a technique called information dispersal have emerged as a superior alternative for unstructured data storage.

Object Storage: Scalability and efficiency
Traditional storage systems typically use an underlying file system. File systems allow humans to organize content in an understandable hierarchy where access speed is not critical. This approach is ideal for human users but not for big data applications that must manage billions of objects such as images, documents, emails, videos, etc. When an organization has billions of objects that need to be stored and retrieved, a file system approach does not scale and incurs performance breakdowns and bottlenecks.

Object storage offers an alternative approach that is ideally suited for storing massive amounts of unstructured data. Object storage systems are not organized hierarchically. Instead, an object is identified and located by its unique identifier. This enables the number of objects to grow substantially beyond the limitation of traditional file systems while still maintaining the integrity and consistency of the data. Organization of the information in an object storage system is generally maintained by the application that is responsible for reading and writing information.

Valet parking provides an apt analogy for object-based storage. When you valet park your car, the attendant gives you a claim ticket that allows you to retrieve your car when ready. While the attendant has your car, he might move it around as needed to optimize space in the lot or garage. With object storage, an object ID identifies a particular piece of data, but not its specific location in the system.

Data can be moved around the system as needed, and the object ID is the “claim ticket” needed to retrieve it, wherever it resides. In this way, object-based systems can use storage capacity more efficiently than file systems. But object storage only addresses part of the equation— it needs to be combined with a technique called information dispersal in order to protect against data loss without the need to make three or more copies of the data.

Information Dispersal: High reliability without massive overhead
Information dispersal is the practice of using erasure codes as a means to create redundancy for transferring and storing data. An erasure code transforms a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols (k symbols). Simply speaking, erasure codes use advanced math to create “extra
data” that allows a user to need only a subset of the data to recreate it.

Information dispersal eliminates the need for replication and the massive capacity overhead that goes with it. For example, object storage with information dispersal requires only 1.7 PB of raw capacity to store 1 PB of usable data, as opposed to the 5 PB of raw capacity required to store 1 PB of data in a typical RAID environment with two replicated copies and one copy on tape.

Real-World examples: Future and PhotoBox
Object storage systems that utilize information dispersal are gaining ground in industries that need to manage large amounts of unstructured data as a core part of their business, such as communications, media, entertainment, and Web 2.0 companies. Future, one of the UK’s largest media companies, uses dispersed object storage to store its archive of magazine assets as well as its growing portfolio of digital content like video, games and mobile apps.

The company disperses its data across three separate sites to ensure a high level of resiliency while also using its disk capacity and data center space much more efficiently. PhotoBox, Europe’s leading online digital photo service, uses dispersed object storage for its 2.5 PB online photo storage system. The company wanted a system that could easily scale to accommodate photos uploaded by its 24 million customers, especially during peak vacation and holiday seasons.

As unstructured data volumes continue to grow, more and more companies will confront the challenges associated with managing very large storage environments. Other early adopters of dispersed object storage will include communications providers, government agencies, and financial services companies—any organization that has the delivery and/or analysis of large amounts of unstructured data at the core of its business.