Breaking down the barriers of unstructured data

Businesses today are dealing with increasing volumes of data, not all of which resides in a standard database. In fact, 80% of business data is unstructured, tending to be human-generated and taking the form of emails, word documents, reports and PDFs, as well as external data such as that created by the Internet of Things and social media. With such a variety of formats and structures, and often holding a higher level of vital information than structured data, storing, processing and analysing this data can be a daunting task for businesses. By Rob Perry, vice president of product marketing at ASG technologies.

  • 5 years ago Posted in

Pic: Emily Fishburn <EmilyF@whiteoaks.co.uk>

Fri 10/26/2018 1:59 PM

Traditionally, businesses have taken a siloed approach to structured and unstructured data – looking at them in separation. However, it is imperative for a business to achieve an overarching view of both structured and unstructured data in order to gain the most business intelligence. While obtaining information in structured data is traditionally straight-forward, to make informed decisions, insight must also be gained from unstructured data repositories. Without a data management strategy that includes unstructured data, organisations run the risk of missing out on opportunities, failing to keep up with competitors, running up data centre costs and potentially breaching GDPR.


Unstructured data can provide vital business intelligence which enables a company to know its data to know its company and drive growth. So, while it might seem like a difficult task to pull together both structured and unstructured data, it is necessary for businesses to learn how to break down the barriers.  In fact, many organisations have started this process with 95% of CIOs responding to a recent poll commissioned by ASG stating that they were structuring their organisations to manage all information with a common theme and approach.

The Data Lake

Rather than process both of these types of data separately, many organisations have instead established a data lake – a place in which they dump all kinds of information. While a useful approach, they now have the difficult task of extracting important information from the often murky waters of this lake.

This is a particularly important task as it is becoming increasingly necessary to have the ability to understand all kinds of information from both of these types of data. As such, businesses must address the disparity between the data management practice associated with structured data, and their content practice which deals with reports, claims documents and statements, for example. As a result, businesses are now starting to take a more holistic approach and are pulling together those practices internally to form one organisational structure in the hope of treating all the content in a similar manner.

Where to Begin

The introduction of GDPR has caused us to look at the information we hold internally and identify personal data. While businesses understand how to use database applications to do this with structured data, the same focus hasn’t been given to unstructured data. Yet, unstructured data presents many of the same problems and challenges you would expect to find when looking at structured data. Fortunately, this means organisations can approach it in the same way.

As with structured data and applications, content management tools create metadata about the content that allows it to be indexed, federated, and searched.  In addition, data intelligence tools can be applied to unstructured data to identify personal data providing a more complete view of GDPR protected data under management and thus reducing risk from non-compliance. This allows businesses to look more broadly and think about the information they have internally both in structured databases as well as in less structured content, such as documents and statements.

To begin the process of harnessing unstructured data, businesses must first transform this data into a format that is more manageable and easier to analyse. This could be a significant task, so it is important to take a phased approach, initially focusing on the low-hanging fruit that offers the greatest gain with the least risk.

From here, organisations should look to find the right solution to meet their business needs. It is important to choose tools with the greatest capabilities to handle a broad array of content formats and sources and is easily configured to address changing business needs as they develop. Once this has been achieved, businesses can begin to increase the volume of data being fed into the tools, remove duplicate content and standardise content into a common, searchable format. This will allow for value to be extracted and analysis to be made.

Risks Avoidance

There are several dangers created by the lack of a single view of structured and unstructured data, one of which is the exposure to hidden data breaches. As unstructured data is arguably the source of more personal data and is easier to access, it may be more vulnerable to cyber attacks. Therefore, businesses must pay close attention to how they store and process this data or leave themselves open to data breaches.

Additionally, organisations that fail to ensure they have a broad overview of both types of data will find it much more difficult to perform certain processes and will end up incurring bigger costs from data management centres. For instance, a siloed approach makes it difficult to connect structured insured claims data with the source documents that support it.

This can also impact a company’s ability to make informed business decisions as they don’t have the right insights and could be missing information in less structured formats, particularly in terms of social and email data. In regard to these formats, it is vital to have the ability to look at them holistically with other types of transactional information or it won’t be possible to gain the overview needed to develop insights which drive your business. 

As you can see, there is a vital need for businesses to break down the barriers of unstructured data in order to fully utilise the information at their disposal and gather business insights. The current method that does not connect data silos runs the risk of organisations missing out on important insights which could govern business decisions and breaching GDPR. Instead, by harnessing unstructured data, businesses are better able to stay abreast of industry trends, track competitive intelligence, develop processes to mitigate risks, engage with customers, predict customer behaviour and derive insights.

 

 

By Paul Gampe, Chief Technology Officer, Console Connect.
By Aaron Partouche, Innovation Director, Colt Technology Services.
By Will Larcombe, co-founder and director of Stellarmann.
By Ronda Cilsick, Chief Information Officer, Deltek.