It might seem that the major challenge when it comes to Big Data is the sheer volume of information organizations generate today, but that is not what actually makes it difficult for many enterprises to extract value from the content they produce.
Recent research demonstrates that the real hurdle many businesses face with respect to enterprise information is the lack of tools necessary to find what they need, when they need it. Without the right tools, finding the required information amongst the vast reams of content available in the digital age is like picking a specific needle out of a haystack of needles.
That haystack is indeed vast: more than 90% of the world’s data has been generated since 2010, according to Science Daily – so volume is certainly a factor. However, what makes the content inaccessible and sometimes unmanageable is that much of it is fragmented and uncategorized.
This state of affairs is the central problem with Big Content – the unstructured information component of Big Data – and the reason many organizations struggle to achieve a return on investment in information assets such as research, presentations, plans, emails, graphics, technical descriptions, field notes, management comments, social media, customer communications and other documents.
A recent study by MindMetre Research, which surveyed close to 400 information professionals at global organizations, underlines that the problem is not simply the mass of information, but the scattered and raw state of much enterprise information.
The vast majority (71%) of respondents to the MindMetre survey say a major barrier to unlocking the commercial value of Big Data is the fact that this data is dispersed, held in disparate formats, at different sites and by different business units. The next most cited obstacle in the MindMetre survey is that information is not meta-tagged, or tagging is done in an inconsistent and inefficient way, with 56% identifying this as a significant issue. Just a third (34%) see sheer volume in itself as a problem.
The MindMetre survey leaves little doubt that Big Content is a rising force within enterprises, as 85% of the respondents are of the opinion that large enterprises are creating more unstructured data than ever and 89% assert that gaining greater insight into that information is crucial to their organizations gaining a competitive advantage. In other words, nine out of ten surveyed specialists feel that their organization will slip behind their competitors if they cannot find affordable and effective ways of tapping into the competitive intelligence held in their unstructured content.
Introducing Content Intelligence
These findings bring us to the missing piece of the puzzle when it comes to Big Data and Big Content: Content Intelligence, which is all about making the mass of unstructured information within an organization findable and actionable – and this is no easy task, as the results of the MindMetre research underscore so plainly.
Content intelligence is essential when information needed by a company, a government organization or an institution is held in different databases in various parts of the enterprise – and within those of partner organizations in some cases – and there is no consistent or even existent system for categorizing each document in any meaningful way.
Under these conditions, finding enterprise information quickly, easily and accurately becomes next to impossible. The result is it becomes unfeasible to check whether work has been done before, or whether it could be easily repurposed, built on, added to, or updated.
Making information findable for staff, partners, clients, investors and other stakeholders is crucial for businesses and other organizations that aim to stay ahead of others in their field. Being able to reach the depths of corporate knowledge and experience and make that available to employees and other key players at any given time simply increases work efficiency by allowing people to build on what has already been done.
Corporate knowledge and competitive insights can be mined for value, and existing work processes can be made far more efficient. Organizations can tap into content that adds depth to and informs proposals, new projects, business relationships, collaborations, fresh research, market intelligence, customer analysis, management reporting and regulatory compliance – essentially enabling organizations to avoid repetition of work and streamline the flow of enterprise knowledge.
Organizations can leverage knowledge assets to create new products, develop services, or eliminate wastage. The work necessary to prepare proposals and tenders can be lowered and the preparation time cut – along with costs. Organizations can enhance strategic planning, risk management, customer insight, and their understanding of behavior patterns. Content Intelligence empowers an organization to do all of this.
Boosting existing information architecture
The fundamental challenge when it comes to Content Intelligence for many organizations is that their existing enterprise information management applications – such as Microsoft SharePoint and Fast, Apache Lucene and Solr, Oracle, Google Search Appliance – don’t have the capabilities they require to effectively extract the potential commercial value of their unstructured information.
Indeed, the capability to organize, find and retrieve information has to be imbedded into these systems. Searches need to be counted in seconds not minutes or tens of minutes, if the content is searchable at all, and users of information management systems need the ability to search contextually – that is they need to address the searchers’ intended meaning.
With contextual search capability, an enterprise system will iteratively steer the user more precisely to the strand of meaning they are looking for. For instance, a standard web search by a medical organization might show that ‘aids’ can refer to either auto immune deficiency syndrome, or hearing devices, or mobility assistance, or PDA devices – all in the healthcare context.
Contextual search presents users with each of these strands of meaning and allows them to quickly focus on the one relevant to their enquiry. This is a central element of Content Intelligence.
In too many cases, the search facilities in the organization’s systems are not designed to achieve the level of Content Intelligence they need, as the information platforms have only basic classification and taxonomy management capabilities and often cannot be used to apply metadata automatically across disparate information sources.
The task of manually applying the level of metadata necessary, however, is neither affordable nor realistically achievable, given the manpower costs that would be necessary. What’s more, the files being meta-tagged would have to be left in their original locations to avoid the expense of reformatting and transmitting huge volumes of disparate information into a single hub – if, in fact, this was even achievable within the legacy systems.
So achieving a level of Content Intelligence that really allows organizations to obtain value from their information entails employing applications and tools that radically improve an enterprise’s ability to find and organize information by imbuing systems with more effective taxonomy management and semantic search capability.
An organization can endow its information management and storage platform with Content Intelligence capability by utilizing semantic software and structures such as an ontology to capture the relationships between documents and drive the consistent application of rich metadata. This enables systems to auto-categorize documents, systematizing information governance and ensuring the correct tagging of enterprise content for quick and precise search.
These automated meta-tagging/mark-up systems need to be able to perform the task of accurately and consistently categorizing and labelling documents and images while leaving them in their original locations, so that disruption and additional costs are minimized.
Unlocking the power of Big Data
In looking at the research and analyzing the challenges related to the huge proliferation of information in the Big Data age, one quickly reaches the conclusion that dealing with the sheer volumes is an issue, but it is not the principal one. The other inescapable conclusion is that organizations that really want to realize the promise of Big Data need to activate Big Content, which itself is only made accessible and findable through Content Intelligence.
Full Content Intelligence requires that the vast volumes of unstructured information generated and held by organizations today be made manageable and findable by providing accurate and consistent auto-classification and visualization capabilities, as well as enterprise grade taxonomy and ontology management. These elements enable employees and other stakeholders to filter out extraneous documents, ensuring that the precisely required needle they are looking for in the haystack can be found exactly when that needle is needed.
Achieving this unprecedented precision and speed of information retrieval enhances productivity and ultimately profitability within virtually any organization. Content Intelligence allows a company, a government organization, or a non-profit institution to fully leverage its information resources. It not only enhances search, but drives business workflow, improves workplace collaboration, and enables an organization to create valuable – and in many instances profitable – information assets where none existed before.