Many big data use cases are built on collecting data into platforms including Hadoop and other data warehousing for processing. For many, the storage layer is considered the natural foundation for big data. But with advances in wire data analytics, a technology that analyses data as it flows across the network between applications and IoT devices, big data projects are offered another route for success. From drug companies cracking down on fraudulent prescriptions to online betting firms understanding complex transactions, by looking at data while it’s on the wire, organisations can start to gain access to big data insights before it even hits the storage layer.
Big data is a pretty nebulous term. Depending on who you ask, it can range from understanding buying patterns of online shoppers to predicating energy consumption for utility companies. The underlying requirement is to collect data and analyse it to discover insights. In many cases, the traditional structured database model is not particularly well suited to these tasks as data is available in a wide variety of often unstructured forms and, in some cases, difficult to collate.
Traditional approach
For some projects, the sheer volume of data is daunting. For example, trying to spot fraudulent patterns in millions of credit card transactions per second is a big data problem that is hard to manage without powerful compute and storage platforms. To meet these challenges, enterprises have turned to technologies such as Hadoop, a software framework for storage and large-scale processing of data-sets and MapReduce, a programming model for managing large data sets with a parallel, distributed algorithm on a cluster of commodity computers.
Both technologies have been used by big data pioneers like Google for many years and have spread to other big data problems across financial services, healthcare and media. However, there are limitations to the fundamental concept of collecting all the data into a big bucket and then processing on masse.
Hadoop, although incredibly effective at the task, requires the construction of clusters of computers and large data storage pools that must be fed with a workflow of data. Although becoming more “enterprise friendly” in recent years, both MapReduce and Hadoop are still complex. Also, some data is difficult to collect or pass into a Hadoop / MapReduce workflow.
For example, imagine a scenario where you are a health authority and you want to find out if there is a serious and ongoing outbreak of measles within a particular region. Collating data from GPs, hospitals, NHS trusts or even private healthcare providers is a challenging task. The underlying systems at each of these locations may not provide reporting in similar formats or even in a timely fashion. The data might well be stored in different locations and correlating it in real time is tricky. Other scenarios where data collection for big data projects becomes challenging include internet of things where distributed devices are sending potentially valuable bits of information to multiple service providers across disparate links using a wide variety of data schemas.
Data from the wire
Instead of collecting all the data in a single bucket before processing, an innovative solution is to examine the data while it is flowing from source to destination to gather insights while it is in transit along the network. This wire data technology stems from organisations trying to understand application performance by examining data as it flows between systems and users. By being able to understand the communication stream between say a database and a front end system, wire data is able to find anomalies and trends to help with performance management and troubleshooting.
The technology looks inside the IP packets and reconstructs every piece of content into a conversation. Instead of just a single application or user, the wire data monitoring technology will assimilate millions of packets per second across thousands of individual transactions. This data can then be stored in both structured and non-structured data bases or even pushed into a Hadoop cluster for analysis.
Wire data helps in several ways when it comes to big data problems. The first is in helping to optimise many of the infrastructures used by many big data projects. This can extend to network and storage infrastructure by allowing measuring performance improvements and helping organisations to tune performance and remove bottlenecks to maximise the utilisation of existing capacity. However, like many technologies that started out life to solve a particular problem but then evolved, wire data is now being used to solve challenging real world problems that would normally fall into the big data challenge.
Understanding the conversation
Healthcare is a great example of an area where wire data analysis for big data is starting to deliver innovation. In many countries, healthcare is delivered by multiple organisations including the public sector, private, not for profit and NGOs. The industry is also becoming increasingly dependent on IT systems, but running wide scale analysis of even simple things such as admissions, treatment programmes and quality of service is notoriously hard. Alongside the requirements to secure personal medical data from prying eyes, gathering data into a single location is challenging.
Instead, a small set of pioneers in the US are working on using wire data analytics to interrogate the HL7 messaging standard which nearly every healthcare organisation relies on to exchange information between systems and applications. These HL7 messages are used for patient admission information, billing details, laboratory results, prescriptions and other important tasks and form an underlying common standard between disparate systems.
Innovators like ExtraHop now have the ability to peer into and understand these HL7 messages as they flow around healthcare networks and turn this transaction data into real time analysis and then archive this data into other database systems, allowing for much more insightful trending and anomaly analyses.
Although initially under trial with single healthcare providers, the wire data systems are easy to deploy as they require no change in the workflow and support a huge number of systems due to the ability to understand HL7. For single healthcare providers, this allows them to correlate data across multiple sites, systems and workflows to generate some valuable insights. These can range from understanding where delays are occurring in the treatment cycle to potentially spotting fraudulent health insurance practices to even helping to predict stock levels for certain consumables used within the provision of healthcare.
As more healthcare providers adopt these systems, it is hoped that aggregated data will allow more regional and national insights to be uncovered. For example, the impact of health awareness programmes and corresponding immunisation rates or remission rates for different cancer treatment programmes. The fundamental innovation is that these pilots using wire data projects don’t require the creation of huge data crunching clusters, complex programming or impractical new workflows. Wire data sits in line with existing workflows and simply listens, understands and reconstructs the conversation as it flows through the systems.
Although for now, wire data is still predominantly taking over as the application performance management systems of choice; in the not too distant future, small data packets running across the wire will be of much larger importance for big data pioneers.