Big Data Hoarding – the elephant in the room

Hoarding every bit of data is a serious affliction for 75% of organisations and a predicted 4,300% growth in data by 2020 is going to create substantial pressures on many IT organisations says Neil Hewer,Technical Director, Data Continuity Group.

  • 11 years ago Posted in

Data has become the number one corporate asset. Successful organisations effectively organise, manage and analyse data to enable a higher level of insight into segmented business performance and customer behaviour. While for digital companies data is their product. These benefits do not come free or without substantial risk: the more data held, the higher and less controllable the protection and management become.

While we’re all openly trying to analyse our data, there is also an ‘elephant in the room’: data hoarding. Driven by fear and data protection policies unfettered by a consideration or understanding of associated cost, many IT departments in our experience are forced to adopt costly and risky retention policies that lead to ballooning volumes of data at a time when budgets are under extreme scrutiny. Data Hoarding is when a business retains data beyond strict statutory and sensible requirements. Our indications are that three quarters of organisations could be considered data hoarders without a conscious data management policy in place that reference costs and strictly efficacious retention and protection standards. But aside from cost, the real problems of data hoarding aren’t down to storage or IT performance. The cause of Data Hoarding is rarely the IT function, but they are the ones left to deal with the effects. IT managers need to be empowered to work alongside the business to properly and constantly assess the actual needs of the organisation and the costs the business is prepared to bear.

While data hoarding may be a worrying concern today its impact is forecast to be exponential. According to the IT analyst IDC, we will see a 4,300% increase in annual data generation by 2020, driven amongst other things by the digital communications boom and popularity of social media. Much of this data is potentially valueless, declines in value over time and exists in duplicate form potentially many times over.
Control of data growth is a critical challenge. Organisations are exploiting increasingly important and varied social and digital media platforms as the basis for low cost communications networks, and ever more complex national and global organisational structures. They are also trying to deal with increasing risks associated with compliance and statutory breach which require long term data retention.

This all leads to data volumes that cannot be controlled or predicted with reliable certainty. Data that remains places responsibilities on the owner or recipient organisation.

Even when offset against step technology changes in data virtualisation, de-duplication and compression, added to the reduced cost of servers and storage, the cost of data storage is estimated to be increasing at a rate of five to six times for high data growth businesses. For these organisations we see a 200 percent increase in retained data volumes over one to three years against base technology costs falling at best by 30%

Take CRM data as an example of data growth. Back in the 1990s companies started to take the process of managing information around individual customers more seriously, but they started with a relatively simple set of information – what the customer had bought, who dealt with them, what else they might want to buy, what correspondence and marketing content had they been sent, and so on. Two decades on the sophistication of the data we can keep on our customers and also how we can analyse it has exploded.

Now CRM information can also include digital marketing interaction, social media activity, unstructured data such as sentiment, images, call recordings, purchase history and marketing campaign history. As customer interactions become increasingly digital-based and organisations want to track the way individual customers behave, each one becomes a continual source of new information and data.

Businesses aren’t only collecting more data though. They are also creating new data from analysis being performed to uncover insights and opportunities. Continuing the CRM example, many organisations now engage in data modelling for customer insight, measuring areas like the impact company actions could have on customer loyalty, lifetime value, cost of acquisition, rate of churn and cross sell analysis. The amount of predictive analysis has increased too, including propensity modelling – predicting the future behaviour of customers.

Customer data is only one area, but it illustrates how organisations are relying on data more and more and as such become awash with it. Ironically, as data criticality to the organisation increases within current budget constraints, the organisation is becoming less able to look after it.

So why does all of this data get hoarded? The principle reasons are firstly the lack of empowerment of IT managers and secondly awareness amongst business users that their needs must be aligned with the data management policy and the value and affordability of protecting these assets. This situation is exacerbated by a cross departmental issues. As departments collaborate more data and documents transcend the company IT infrastructure and multiple versions get saved in different departmental silos. Exactly the same documents get stored multiple times with multiple versions. For instance we estimate that for every gigabyte of source data within an organisation with standard retention polices creates between three to four times the storage requirement at the back end. Data storage environments are therefore amplified significantly in the absence of effective data management policies.

Data hoarding can also add risk to your back-up plan. Choosing to back-up increasingly large volumes of ‘live’ data rather than archiving what is no longer regularly used, may cause automated back-up windows to overrun or expire before completion. Backup tools such as Symantec’s NetBackup provide de-duplication technologies, which keep one ‘master’ copy of the document and record only block level changes for any other versions held by different people in different locations. Lack of de-dupe will mean every version of your document will be backed up separately.

It can be difficult to put an effective disaster recovery plan in place, because there is no insight into what data is critical and needs to be accessed or recovered quickly and what does not. When asking the business, how many departments first response will require they need ‘everything’ restored immediately? If a company does not know what data they are holding, it can be difficult for them to prioritise key systems that are actually required in a DR situation.

The effects of data hoarding are significant. Any business that understands the value of its data should be ensuring that their data assets are managed in an orderly fashion based on need. The crux of this is getting IT and the departments using and creating data to talk and share information on data access and content so that they can agree an effective data management plan.

As a data management managed services provider we kick this process off with a data audit and workshop. Typically this shows the departments that they only really access and use 30 to 40 percent of their stored data. Unmanaged, the other 60 to 70 percent is simply taking up valuable storage disk space, creating clutter on the network and adding to backup schedules, both in volume and backup time windows. The workshop process works both ways. It gives IT an insight into what data each department has and how they use it. This allows IT to advise the departments on the most appropriate and cost effective place to keep their data.Data is growing faster than we can measure and its value as a corporate asset is too. Organisations can’t get away with simply shoving it into an already packed cupboard and hoping for the best. Their data management strategy should involve communication between IT and departmental users and a tiered and integrated approach to storage, back-up, archival and disaster recovery.

Exos X20 and IronWolf Pro 20TB CMR-based HDDs help organizations maximize the value of data.
Quest Software has signed a definitive agreement with Clearlake Capital Group, L.P. (together with...
Infinidat has achieved significant milestones in an aggressive expansion of its channel...
Collaboration will safeguard HPC storage systems and customer data with Panasas hardware-based...
Peraton, a leading mission capability integrator and transformative enterprise IT provider, has...
Helping customers plan for software failure, data loss and downtime.
Cloud Computing and Disaster Recovery specialist, virtualDCS has been named as the first UK-based...
SharePlex 10.1.2 enables customers to move data in near real-time to MySQL and PostgreSQL.