AI Data Rush: How People Are Paid to Train Artificial Intelligence

The demand for data to fuel artificial intelligence is creating a new, and sometimes unsettling, gig economy. From recording neighborhood walks to sharing private phone calls, individuals are increasingly monetizing their personal information to train the next generation of AI models. While the income can be a lifeline for some, the long-term implications – and potential risks – are only beginning to be understood.

A Global Data Marketplace Emerges

Jacobus Louw, a 27-year-old from Cape Town, South Africa, discovered this emerging market firsthand. Last year, he began recording videos of his daily walks, specifically his feet and the surrounding pavement, and uploading them to Kled AI. The app pays contributors for data used to train AI, and Louw quickly earned $14 for a single video – roughly ten times the country’s minimum wage, and enough to cover half a week’s worth of groceries. South Africa’s economic realities make even small earnings significant, as Louw later used income from Kled AI and other platforms to fund a spa training course to become a masseur.

Louw’s experience isn’t isolated. Thousands of miles away in Ranchi, India, 22-year-old student Sahil Tigga earns over $100 a month by allowing Silencio, an app that crowdsources audio data, access to his phone’s microphone. He captures ambient sounds – restaurant chatter, traffic noise – and even uploads recordings of his own voice. Tigga actively seeks out unique locations, like hotel lobbies, to provide Silencio with valuable data. Similarly, in Chicago, 18-year-old welding apprentice Ramelio Hill made a couple hundred dollars selling his private phone chats to Neon Mobile, a conversational AI training platform, at a rate of $0.50 per minute.

These “gig AI trainers” represent the frontlines of a new data gold rush. As Silicon Valley’s appetite for high-quality data outpaces what can be scraped from the open internet, data marketplaces like Kled AI, Silencio, and Neon Mobile have sprung up to bridge the gap. Luel AI, backed by Y-Combinator, sources multilingual conversations for around $0.15 a minute, while ElevenLabs allows users to digitally clone their voice for a fee of $0.02 per minute of usage. This burgeoning industry is fueled by the require to feed increasingly sophisticated AI models.

The Data Drought and the Rise of Human-Grade Data

The surge in demand isn’t simply about volume. it’s about quality. AI language models like ChatGPT and Gemini require vast amounts of learning material to improve, but readily available data sources are becoming scarce. Key datasets like C4, RefinedWeb, and Dolma, which account for a quarter of the highest-quality data on the web, are now restricting access for generative AI companies. Researchers estimate that AI companies could run out of fresh, high-quality text data as soon as 2026.

While some labs are attempting to utilize synthetic data generated by their own AI, this recursive process can lead to models producing inaccurate and unreliable results. This is where the human element comes in. Experts like Bouke Klein Teeselink, an economics professor at King’s College London, predict substantial growth in the gig AI training category. AI companies are incentivized to pay for data to avoid copyright disputes associated with web scraping and to ensure the quality of their training datasets. Veniamin Veselovsky, an AI researcher, emphasizes that “human data, for now, is the gold standard to sample from outside of the distribution of the model.”

Economic Opportunity and Precarious Work

For many, gig AI training offers a crucial income stream. The opportunity to earn US dollars is particularly attractive in countries with high unemployment and devalued currencies. For some, it’s a pragmatic response to economic hardship, providing a means to cover basic expenses. However, this new form of work comes with significant trade-offs. Mark Graham, author of Feeding the Machine, cautions that this work is “structurally precarious, non-progressive and effectively a dead end.”

AI marketplaces often rely on a “race to the bottom” in wages, offering temporary demand for human data. Once the demand shifts, workers are left with limited protections, transferable skills, or a safety net. Graham argues that the primary beneficiaries are the platforms themselves, which capture the enduring value created by this data.

Carte Blanche Permissions and Potential Risks

The terms of service for many AI data marketplaces raise serious privacy concerns. Trainers often grant irrevocable, royalty-free licenses allowing companies to sell, use, and create derivative works from their data indefinitely, with little to no further compensation. Enrico Bonadio, a law professor at City St George’s, University of London, notes that these agreements grant platforms “almost anything” with the data, leaving contributors with limited recourse.

The risks extend beyond financial exploitation. Data marketplaces often claim to anonymize data, but biometric patterns are difficult to truly anonymize. This raises the potential for misuse, including deepfakes, impersonation, and the inclusion of personal data in facial recognition databases. The lack of transparency in these marketplaces further exacerbates these concerns.

The experience of Adam Coy, a New York actor who sold his likeness to Captions (now Mirage), illustrates these risks. Despite an agreement that prohibited the use of his likeness for certain purposes, Coy discovered his AI replica promoting unproven medical supplements in an Instagram reel. He has since avoided participating in similar gigs.

The Neon Mobile Debacle and Lack of Recourse

Ramelio Hill’s experience with Neon Mobile highlights the potential for platforms to operate with a lack of transparency and accountability. After earning $200 selling his phone calls, Hill discovered a security flaw that exposed the phone numbers, call recordings, and transcripts of all users. Neon Mobile went offline shortly after, leaving Hill worried about the potential misuse of his data and without any means of redress. The company did not respond to requests for comment.

Kled AI founder Avi Patel maintains that his company vets businesses before selling datasets to avoid misuse, particularly in areas like pornography or government applications. However, the broader lack of regulation and oversight in this emerging industry leaves individuals vulnerable to exploitation.

What’s Next for Gig AI Training?

The future of gig AI training remains uncertain. As the demand for high-quality data continues to grow, the industry is likely to expand, but increased scrutiny and potential regulation are also on the horizon. The need for greater transparency, stronger data protection measures, and fairer compensation for data contributors is becoming increasingly apparent. For individuals like Jacobus Louw, the opportunity to earn income through these platforms remains valuable, but a growing awareness of the risks is essential. The long-term sustainability of this model will depend on establishing a more equitable and responsible framework for the exchange of personal data in the age of artificial intelligence.