Which data deduplication approach is built to survive the ‘data explosion’?

The unstoppable trend towards vast enterprise-size databases demands enterprise-class approaches to backup and restore. That means the smartest businesses will adopt the data deduplication approach most capable of handling massive volumes without impeding performance. So which dedupe method is best? Tim Butchart, Senior VP of Sales at data protection specialist Sepaton argues that time is running out for once trusted de facto inline deduplication solutions, and that the smart money should be invested in solutions with deduplication technologies that are content-aware and are able to scale both performance and capacity as backup environments grow.

  • 10 years ago Posted in

For large modern enterprises, the relentless and dramatic growth in the volume of information that they must now store and protect has gone from a sudden data explosion to a constantly rumbling seismic shift with no likely end in sight.


Many businesses now find themselves with hundreds of terabytes or even petabytes of critical data to protect. And yet while enterprise data volumes continue to grow, backup windows and target restore times inevitably shrink. This is why most businesses will seize any and every efficiency advantage assured to give them an edge in crowded marketplaces.


Logical solutions to complex problems
Data deduplication software has a major role to play in optimising storage efficiency and maximising replication transfer rates of these increasingly huge data volumes. Deduplication greatly reduces the capacity required to store large volumes by comparing data in each new backup to data already stored, replacing redundant data with pointers to a single copy.


Modern data deduplication technologies ensure data can be restored faster, helping businesses to continue to grow with the peace of mind that that their data is protected. But deduplication software comes in more than one flavour; inline and concurrent data deduplication being the two most popular for modern business. So what are the key differences, and which offers the greatest benefits to large enterprises?


Traditional inline dedupe still has a role to play
Inline deduplication remains an efficient approach for some data types and smaller environments with volumes that don’t over burden their limited resources. Because it takes place before data is written to disk, its key advantage is that pre-deduplication data never reaches the disk or demands storage


However, there are also obvious downsides to traditional inline deduplication. Inline systems analyse and index unique identifiers for each segment of data as it is being backed up. These identifiers are compared to those already indexed, to find duplicates. Because the identifier assignment, index lookup and pointer replacement steps, must be performed before data is written to disk, inline deduplication can slow backup performance. In addition, the index volume grows over time, making these steps increasingly CPU intensive, potentially slowing performance.


Since systems with traditional inline deduplication can’t be scaled, adding performance or capacity requires additional systems, potentially leading to data centre sprawl. Data is then divided onto individual systems that cannot be compared to one another to identify duplicates, creating inherent inefficiencies. They must also conserve CPU cycles by finding duplicates in large chunk sizes, while ignoring small duplicates. As a result, they do a poor job in deduplicating databases, and other data stored in very small chunks. Restore times also suffer as deduplicated data must be ‘rehydrated’ before it can be restored.


Deduplication offers massive gains designed for enterprise data volumes
For large enterprises managing massive data volumes, a content-aware/byte-differential deduplication approach delivers much faster backup rates than inline systems. Backup systems with this method do not require an index and are able to scale backup, deduplication, replication and restore processes across multiple nodes. These systems assess the data as it is backed up and write new data to disk at industry-leading rates. They then examine the remaining subset at the byte level for maximum capacity optimisation. They concurrently backup, deduplicate, and replicate the data for optimal performance.


Content-aware, byte differential technologies solve these replication problems by streaming ‘delta requests’ to target systems. Data is only pulled from the source system when delta rules cannot be applied, effectively minimising the data transferred while using the full bandwidth of the wire and avoiding costly latency gaps.


Traditional inline deduplication may still be attractive to smaller businesses for which backup ingest rates and scalability are unlikely to be major issues. But content-aware, byte differential data deduplication has been specifically developed from the ground up with large enterprise businesses in mind, and therefore delivers powerful performance, flexible management and seamless scalability as its highest priorities. As data volumes continue to rise sharply, these technologies are a must-have in today’s large enterprise data centres.

Quest Software has signed a definitive agreement with Clearlake Capital Group, L.P. (together with...
Infinidat has achieved significant milestones in an aggressive expansion of its channel...
Nearly all senior business decision-makers (96%) surveyed report data strategies as essential to...
SharePlex 10.1.2 enables customers to move data in near real-time to MySQL and PostgreSQL.
NetApp extends its collaboration to accelerate Ducati Corse’s digital transformation and deliver...
Partnership to be featured at COP26, highlighting how data-driven solutions and predictive...
Next-Gen solutions to deliver market-leading enterprise cloud scalability, cyber resilience and...
he EMEA external storage systems market value was up 3.3% year on year in dollars but down 5.5% in...