Applied Observability – the ‘must haves’ for your analytics engine

By Steve Wilcockson, Data Science Lead at KX.

  • 1 year ago Posted in

Many increasingly recognise time series data and analytics as a critical part of Applied Observability, which involves taking all sorts of system inputs - digital footprints – to recognise patterns that can inform decision making across a company.

Through joining together current observations with past events, for example current and past trade or risk decisions exhibiting common characteristics, or fraudulent occurrences or cybersecurity instances, observability allows more informed decisions, performance, and better business critical metrics such as quality of service and uptime. That common observability denominator is time.

What is Applied Observability?

Gartner notes this is a trend to look out for in 2023. Applied Observability works by taking the digital footprints from sources like logs, traces, API calls, dwell time, downloads and file transfers and merging them in a “highly orchestrated and integrated approach to enable decision making in a new way across many levels of the organization”.

Just like concepts of “big data” and “data analytics,” however Applied Observability is being adopted by the mainstream but has been practiced for years. It was always thus in trading or process engineering or “just-in-time” logistics or cybersecurity in concept if not in name, however perfectly or imperfectly. However, our cloud, analytics and in-the-moment data technology makes it much easier to practice with a more mature understanding - and the crucial ingredient is time series. By collecting and processing large quantities of time-denominated data from a variety of sources and formats, organizations are empowered to uncover differentiating and risk-mitigating insights from time-contingent information, whether in history, in the present, or across both.

Industries like finance make particular use of this, for example keeping abreast of quote acceptance and rejection levels or tracking trade and order ratios to pick up trends that may reveal trading activities that can be seen as manipulative. In manufacturing, time series data helps identify any anomalies, abnormalities and stop batch loss or machine downtime. In telecommunications, it can help monitoring the flow and profile of network data, generating alerts when max levels get breached and help maintain quality of service and protect networks.

While the techniques have been practiced for decades, not all analytics solutions can handle the huge increases in the size, speed, and many different types of data being made by businesses today. To most who successfully deploy Applied Observability, here are some key must-haves:

Prepare for time series data

The majority of today’s data is machine generated, showing change of a thing over time or allowing analysis at a point in time. Any analytics databases should be primed for its specific attributes like append-only, fast, and time-stamped data. There should be no issue in ingesting and comparing broad time-centric data sets (through so-called asof joins) or making quick in-line calculations (such as moving averages) or more complex ones (trained and inferred machine learning models), delivering fast, efficient reads and writes while optimizing storage. The result - minimal duplication of data-sets

computed with single, simple, vectorized, columnar operations, rather than process-intensive batched loops repeated time and again. More performance, less cost.

Open and connected

Modern enterprises use lots of data, in lots of forms. This means any analytics engine has to interoperate with a large number of messaging protocols (e.g., Kafka, MQ), data formats (e.g., CSV, JSON), over sockets and servers through IPC (Interprocess communication), or through general API formats, such as REST and OPENAPIs. There are different kinds of data to consider too – reference data – identifiers for example, such as sensor or stock ids.

Analytics Anywhere – Blending Historical & Real-Time Data

When real-time data and historical data wherever that may be – for example where data enters or exits the organization, in a warehouse or lake, in-memory streaming or at rest, companies can make quick and better responses to events in the moment – no duplication of data across multiple systems or unnecessary latency. This aids workflows with cross-organizational requirements such as machine learning. Inferred states (live, fast, in the moment, requires adaptability) differ to model training (big data, model selection, calibration, hyper-parameter estimation) for example, hubbed ideally intelligent and adaptable, “feature stores”.

Easy adoption of analytics software

Look for the tools your data scientists use. They’re key. I’m from the MATLAB-using generation. The cool kids today are Pythonistas, while R maintains a powerful statistical niche. All focus on strong time-series workflows – financial analytics (Pandas came from a hedge fund), signal processing, time-stamped IoT workflows, the “real-time” controllers in your vehicles and equipment, the drug discovery process. Whatever your observability workflow, almost certainly cloud-driven, probably but not necessarily cloud-native, the ability to consume output, analyse, model and adapt model-based data-centric workflows holds the absolute edge. You’re quicker to observe new trends, and better able to augment and improve production models when those data tools directly integrate with production data analytics.

Proof is in the production pudding (but research ingredients so so matter!)

While time series data analytics engines and workflows are relatively common in same industries – financial, automotive F1, some marketing tech (those annoying yet titillating personalized ads you get online!), cybersecurity (nothing more critical than this right now), many organizations – and some industries - remain stuck on SQL, relational and batch queries.

Time-series leaders can continue to improve their at-rest and streaming workflows (thank you Kafka for making streams mainstream!). Other industries can discover mega-improvements by using cloud availability to leverage immediate 100x performance improvements and corresponding cost/carbon savings from beautiful asof joins, time bucketing and vectorizable analytics. Everyone can improve integration and agility between data science research – the model, train and test phase - and production – the business impacting implementation. The time domain and time series are the common denominator with time to value the business impact.

Time series is key to what Gartner and others call Applied Observability, a discipline that is not new but increasingly important every day, driven by new, exciting, invigorating cloud data pipelines.

Paraphrasing one industry leader whose business is observability, all the time – “we see no limits to all conceivable data in the universe, capturing it in real-time to record a 'continuous stream of truth',” and ChatGPT-like “ask any question of any dataset at any time to get an answer instantaneously.” Taking ChatGPT analogy further, “the only limit is their imagination and, in the questions, they can think of to ask” such as “why did something happen? Or why did something happen the way it happened? Or what were the small events that led to the big event? And which factors influenced each event? If you know the answer to the question 'why'……. “, the world is your oyster.

And thus, they get observability nirvana. “We can know everything about everything all of the time. In the world of machine generated data and AI, as soon as something changes, everything changes. Our customers are the first to acquire, interpret, and act on, new information. We see and interpret the world of machine generated data [through] simultaneously a data microscope and a data telescope”

By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.
By Thomas Kiessling, CTO Siemens Smart Infrastructure & Gerhard Kress, SVP Xcelerator Portfolio...
By Dael Williamson, Chief Technology Officer EMEA at Databricks.