A new report by Appen highlights the growing difficulties in sourcing high-quality data for AI, despite a 17% rise in generative AI adoption over the past year.
A new report by AI data provider Appen highlights the increasing challenges companies face in sourcing and managing the high-quality data necessary to power artificial intelligence (AI) systems. The 2024 State of AI report, which surveyed over 500 IT decision-makers in the United States, reveals a surge in generative AI adoption by 17% over the past year, but also points out significant difficulties in data preparation and quality assurance.
Si Chen, Appen’s Head of Strategy, elaborated on the evolving data needs in an interview with VentureBeat, stating, “As AI models tackle more complex and specialised problems, the data requirements also change. Companies are finding that just having lots of data is no longer enough. To fine-tune a model, data needs to be extremely high-quality, meaning that it is accurate, diverse, properly labelled, and tailored to the specific AI use case.”
One of the report’s key findings is the rise of generative AI, partially due to advancements in large language models (LLMs) that enable tasks automation across various sectors. Despite this, the rapid growth of generative AI has led to new challenges in data management. The unpredictable and subjective nature of generative AI outputs makes defining and measuring success more complex. This, in turn, necessitates highly customised high-quality data for AI models to be enterprise-ready.
Moreover, the report highlights a worrying trend: fewer AI projects are reaching deployment, and those that do tend to show less return on investment (ROI). Since 2021, there has been an 8.1% decrease in the mean percentage of AI projects making it to deployment and a 9.4% drop in those demonstrating meaningful ROI. This decline is attributed to the growing complexity of AI models, particularly generative AI, which combines advanced capabilities with implementation challenges.
Data quality continues to erode, with the report indicating a near 9% drop in data accuracy since 2021. Demands for fresh, relevant data have led 86% of companies to retrain or update their models quarterly. However, this frequent updating poses challenges in ensuring data accuracy and diversity. Nearly 90% of businesses now depend on external data providers to train and evaluate their models effectively.
The report also reveals a significant 10% annual increase in bottlenecks related to sourcing, cleaning, and labelling data. Such bottlenecks are a major roadblock to the successful deployment of AI projects. The U.S. has seen a decline in data accuracy from 63.5% in 2021 to just 54.6% in 2024, underscoring the formidable challenge of maintaining high-quality data as AI models grow more sophisticated.
Human expertise remains critical in AI development, as indicated by the 80% of respondents emphasising the importance of human-in-the-loop machine learning. Human involvement is crucial for bias mitigation, ethical AI development, and for providing domain-specific guidance to refine AI models. This attention is particularly necessary for generative AI to prevent biased or harmful outputs.
Appen’s report underscores the evolving landscape of AI, emphasising the mounting complexities in data management as artificial intelligence becomes more integrated into enterprise operations. The focus on custom data solutions and strategic data partnerships appears to be an emerging necessity for companies striving to overcome these challenges and harness the full potential of AI technology.
Source: Noah Wire Services











