Pathway has launched its data processing engine, which benchmarks have determined to be x90 faster than existing streaming solutions. The platform uniquely unifies workflows for batch and streaming data to enable real-time machine learning and, critically, the ability for machines to ‘learn to forget’.
Until now, it has been nearly impossible for machines to learn and react to changes in real-time like humans. Due to the complexity of designing streaming workflows, intelligent systems are typically trained on static [frozen] data uploads, including large language models like ChatGPT. This means their intelligence is stuck at a moment in time. Unlike humans, machines are not in a continuous state of learning and therefore cannot iteratively ‘unlearn’ any information they were previously taught when it is found to be false, inaccurate, or becomes outdated.
Pathway overcomes this thanks to its unique ability to mix batch and streaming logic in the same workflow. Systems can be continuously trained with new streaming data, with revisions made to certain data points without requiring a full batch data upload. This can be compared to updating the value of one cell within an Excel document, which doesn’t reprocess the whole document, but just the cells dependent on it. This means inaccurate source information can be seamlessly corrected to improve system outputs.
It has traditionally been extremely hard to design efficient systems that combine both batch and streaming workflows. And the situation has become even more complex since a third workflow entered the scene, generative AI, which needs fast and secure learning of context to deliver value.
Most organisations typically design two or more separate systems, which are unable to perform incremental updates to revise preliminary results. This has reduced confidence in machine learning systems and stalled the adoption of enterprise AI among organisations that need to make decisions based on accurate real-time data, such as in manufacturing, financial services and logistics. Bringing together batch and streaming data overcomes this challenge and enables true real-time systems for resource management, observability and monitoring, predictive maintenance, anomaly detection, and strategic decision-making.
Pathway enables a paradigm shift towards real-time data
The Pathway data processing engine is enabling organisations to perform real-time data processing at scale. Existing clients include DB Schenker, which has reduced the time-to-market of anomaly-detection analytics projects from three months to one hour, and La Poste, which enabled a fleet CAPEX reduction of 16%.
Unique capabilities of the Pathway data processing engine supporting this shift to real-time include:
Fastest data processing engine on the market – unified batch and streaming. Capable of processing millions of data points per second, it largely surpasses current reference technologies such as Spark (in both batch and streaming), Kafka Streams, and Flink. Benchmarking of WordCount and PageRank against the above also found that Pathway supports more advanced operations and is up to x90 faster thanks to its maximised throughput and lower latency. The benchmarks were stress tested by the developer community, and are publicly available so the tests can be replicated. A detailed description of the benchmarks is available in the HAL preprint.
Facilitates real-time systems – Pathway allows the seamless transition between existing systems, from batch to real-time and LLM architectures, thanks to real-time machine learning integrating fully into the Enterprise context.
Ease of development – Batch and streaming workflows can be designed with the same code logic in Python, which is then transposed into Rust. This democratises the ability for developers to design streaming workflows, which have typically required a specialist skillset, and enables what have typically been disparate teams within an organisation to come together. Thanks to this, Pathway becomes the lingua franca of all data pipelines – stream, batch and generative AI.
Zuzanna Stamirowska, CEO & Co-Founder of Pathway, comments: “Until now, the complexity of building batch and streaming architectures has resulted in a division between the two approaches. This has slowed the adoption of data streaming for AI systems and fixed their intelligence at a moment in time. But there is a critical need for real-time to optimise processing and to enable AI to unlearn for improved, continuous accuracy.
“That’s why our mission has been to enable real-time data processing, while giving developers a simple experience regardless of whether they work with batch, streaming, or LLM systems. Pathway is truly facilitating the convergence of historical and real-time data for the first time.”
The general launch of the Pathway platform follows the company’s $4.5m per-seed round in December 2022, which was led by CEE VCs Inovo and Market One Capital, with angel investors Lukasz Kaiser, Co-Author of Tensor Flow and informally known as the “T” in ChatGPT, and Roger Crook, the former global CEO of German delivery giant DHL.