A large portion of genAI funding is controlled by Big Tech. These organisations are using their position to prioritise their own AI tools, an example being Amazon quietly blocking bots from Meta, Google and Huawei. Various companies are gatekeeping the data on their platforms as access to data becomes an important part of the battle for dominance in the AI market.
A truly fair and open data ecosystem would benefit everyone in a number of ways including making data available to companies of all sizes and levelling the competitive field. This transparent ecosystem would result in a wider variety of AI products and services - not just the ones developed by major corporations.
Moving toward such a system, public web data collection practices level the playing field by helping to ensure that data is not concentrated in the hands of the few, but can be used by all. This article is a dive into the increasing demand for public data, the rise of AI agents and why it benefits everyone to foster an open AI ecosystem.
What’s Driving the Growing Demand for Public Data?
There are three main factors fuelling the ever-growing demand for data:
1. The need for agentic AI and large language model (LLM) training data - As LLMs are increasingly adopted within business strategy, they require huge amounts of training data in order to be successful. Due to this demand, web scraping has become a necessity in the AI revolution.
2. Real-time access to real-world data becomes more and more important in major industries, such as e-commerce. For dynamic pricing tools used on e-commerce platforms, for example, up-to-date data tracks competitor prices, allowing to adjust yours and making it competitive for consumers.
3. Public need for public data is also growing. In the January plan to turbocharge AI in the UK, the government set out to create a new National Data Library to safely and securely unlock the value of public data and support AI development. This investment demonstrates the growing value of data collection and safe storage practices.
The above examples evidence the growing demand for public data, demonstrating why it’s crucial that this information does not solely sit in the hands of gatekeeping companies - but instead is accessible to fuel innovation.
The Rise of AI Agents and the Role of Web Scraping
Currently, AI agents have an estimated $5.1bn market-size, which is expected to grow by 44%. To be successful, AI agents need real-time, real-world data, provided from a wide range of public data sources. The main method to acquire this data is web scraping.
According to a recent market report, 65% of enterprises use web scraping for their AI projects. However, access to public data is under threat from companies attempting to close off data sources and monopolise access. While Big Tech will find ways to get the data they need, the closing off of open access will hurt small companies first. And by slowing potential innovation and diminishing the range of products on offer, it will ultimately hurt the end user the most.
Everyone Benefits from an Open AI Ecosystem
With free and open internet, companies of all sizes gain crucial access to the data needed for ethical innovation. This is where web scraping’s potential to level the field is of utmost importance.
Web data collection allows AI to be built off of a broad spectrum of public data. Larger datasets are one of the factors that minimise AI bias, as it provides a more balanced input and therefore output.
This is an ever changing landscape, and one that channel partners need to stay on top of. More than ever, customers are prioritising AI-embedded technologies and if this market is at risk of bias or data shortages, staying ahead of customer concerns will be a high priority.
Data Access is Innovation
For an innovative AI future, we must embrace equally open public data access for all. This requires a step away from data gatekeeping practiced by some companies. Ultimately, data should be in the hands of the free market, not the few. Innovation thrives when monopolies are rejected and everyone is granted an equal opportunity to do great things with public resources.