Almost two-thirds (64%) of organisations suffer from ‘data drift’, where data for computer vision systems becomes out of date after a few months. That’s according to a new study by Mindtech, the developer of the world’s leading platform for the creation of synthetic data for training AI, which surveyed 250 data scientists, AI and Machine Learning engineers, and computer researchers across the UK.
When it comes to attitudes towards synthetic data, 85% of organisations are already making use of synthetic data to train computer vision systems, and feel that quality (65%), simplicity (61%), scalability (58%), faster training times (55%), and cost (52%) are the main strengths of adopting synthetic data.
For those that don’t currently use synthetic data, approximately one in five (21%) believe their biggest block is a lack of experience, with cost also being a key barrier (26%). However, all respondents were asked if they trust synthetic data versus real world data, and 73% said yes.
For real world data, respondents have concerns about changing privacy laws and regulations. 89% of AI and computer vision professionals are concerned that real world data will be impacted. Alongside this, 39% are concerned that real world data slows down computer vision training processes.
Steve Harris, CEO at Mindtech, commented: “Data drift is an ongoing problem for organisations everywhere, which can be a costly issue to solve. Embracing synthetic data can help to overcome these challenges. It is not only faster than real world data for training computer vision systems, but it is also more cost-effective.”
Looking to 2023 and beyond, the future adoption of synthetic data is positive. Of those that don’t already use synthetic data, the Mindtech survey revealed that almost a third (29%) anticipate their organisation will start using it in 2023. In addition, the majority (56%) predict that up to 50% of trained data will be synthetic in the next three years, with only less than one in ten (9%) saying it will be less than 10%.