October 9, 2024
Source: The Conversation
Credit: Jérémy Barande (Flickr)
In dusty factories, cramped internet cafes and makeshift home offices around the world, millions of people sit at computers tediously labelling data.
These workers are the lifeblood of the burgeoning artificial intelligence (AI) industry. Without them, products such as ChatGPT simply would not exist. That’s because the data they label helps AI systems “learn”.
But despite the vital contribution this workforce makes to an industry which is expected to be worth US$407 billion by 2027, the people who comprise it are largely invisible and frequently exploited. Earlier this year nearly 100 data labellers and AI workers from Kenya who do work for companies like Facebook, Scale AI and OpenAI published an open letter to United States President Joe Biden in which they said:
Our working conditions amount to modern day slavery.
To ensure AI supply chains are ethical, industry and governments must urgently address this problem. But the key question is: how?
What is data labelling?
Data labelling is the process of annotating raw data — such as images, video or text — so that AI systems can recognise patterns and make predictions.
Self-driving cars, for example, rely on labelled video footage to distinguish pedestrians from road signs. Large language models such as ChatGPT rely on labelled text to understand human language.
These labelled datasets are the lifeblood of AI models. Without them, AI systems would be unable to function effectively.
Tech giants like Meta, Google, OpenAI and Microsoft outsource much of this work to data labelling factories in countries such as the Philippines, Kenya, India, Pakistan, Venezuela and Colombia.
China is also becoming another global hub for data labelling.
Outsourcing companies that facilitate this work include Scale AI, iMerit, and Samasource. These are very large companies in their own right. For example, Scale AI, which is headquartered in California, is now worth US$14 billion.
Credit: Jérémy Barande (Flickr)
In dusty factories, cramped internet cafes and makeshift home offices around the world, millions of people sit at computers tediously labelling data.
These workers are the lifeblood of the burgeoning artificial intelligence (AI) industry. Without them, products such as ChatGPT simply would not exist. That’s because the data they label helps AI systems “learn”.
But despite the vital contribution this workforce makes to an industry which is expected to be worth US$407 billion by 2027, the people who comprise it are largely invisible and frequently exploited. Earlier this year nearly 100 data labellers and AI workers from Kenya who do work for companies like Facebook, Scale AI and OpenAI published an open letter to United States President Joe Biden in which they said:
Our working conditions amount to modern day slavery.
To ensure AI supply chains are ethical, industry and governments must urgently address this problem. But the key question is: how?
What is data labelling?
Data labelling is the process of annotating raw data — such as images, video or text — so that AI systems can recognise patterns and make predictions.
Self-driving cars, for example, rely on labelled video footage to distinguish pedestrians from road signs. Large language models such as ChatGPT rely on labelled text to understand human language.
These labelled datasets are the lifeblood of AI models. Without them, AI systems would be unable to function effectively.
Tech giants like Meta, Google, OpenAI and Microsoft outsource much of this work to data labelling factories in countries such as the Philippines, Kenya, India, Pakistan, Venezuela and Colombia.
China is also becoming another global hub for data labelling.
Outsourcing companies that facilitate this work include Scale AI, iMerit, and Samasource. These are very large companies in their own right. For example, Scale AI, which is headquartered in California, is now worth US$14 billion.
Cutting corners
Major tech firms like Alphabet (the parent company of Google), Amazon, Microsoft, Nvidia and Meta have poured billions into AI infrastructure, from computational power and data storage to emerging computational technologies.
Large-scale AI models can cost tens of millions of dollars to train. Once deployed, maintaining these models requires continuous investment in data labelling, refinement and real-world testing.
But while AI investment is significant, revenues have not always met expectations. Many industries continue to view AI projects as experimental with unclear profitability paths.
In response, many companies are cutting costs which affect those at the very bottom of the AI supply chain who are often highly vulnerable: data labellers.
Low wages, dangerous working conditions
One way companies involved in the AI supply chain try to reduce costs is by employing large numbers of data labellers in countries in the Global South such as the Philippines, Venezuela, Kenya and India. Workers in these countries face stagnating or shrinking wages.
For example, an hourly rate for AI data labellers in Venezuela ranges from between 90 cents and US$2. In comparison, in the United States, this rate is between US$10 to US$25 per hour.
In the Philippines, workers labelling data for multi-billion dollar companies such as Scale AI often earn far below the minimum wage.
Some labelling providers even resort to child labour for labelling purposes.
But there are many other labour issues within the AI supply chain.
Many data labellers work in overcrowded and dusty environments which pose a serious risk to their health. They also often work as independent contractors, lacking access to protections such as health care or compensation.
The mental toll of data labelling work is also significant, with repetitive tasks, strict deadlines and rigid quality controls. Data labellers are also sometimes asked to read and label hate speech or other abusive language or material, which has been proven to have negative psychological effects.
Errors can lead to pay cuts or job losses. But labellers often experience lack of transparency on how their work is evaluated. They are often denied access to performance data, hindering their ability to improve or contest decisions.
Making AI supply chains ethical
As AI development becomes more complex and companies strive to maximise profits, the need for ethical AI supply chains is urgent.
One way companies can help ensure this is by applying a human right-centreed design, deliberation and oversight approach to the entire AI supply chain. They must adopt fair wage policies, ensuring data labellers receive living wages that reflect the value of their contributions.
By embedding human rights into the supply chain, AI companies can foster a more ethical, sustainable industry, ensuring that both workers’ rights and corporate responsibility align with long-term success.
Governments should also create new regulation which mandates these practices, encouraging fairness and transparency. This includes transparency in performance evaluation and personal data processing, allowing workers to understand how they are assessed and to contest any inaccuracies.
Clear payment systems and recourse mechanisms will ensure workers are treated fairly. Instead of busting unions, as Scale AI did in Kenya in 2024, companies should also support the formation of digital labour unions or cooperatives. This will give workers a voice to advocate for better working conditions.
As users of AI products, we all can advocate for ethical practices by supporting companies that are transparent about their AI supply chains and commit to fair treatment of workers. Just as we reward green and fair trade producers of physical goods, we can push for change by choosing digital services or apps on our smartphones that adhere to human rights standards, promoting ethical brands through social media, and voting with our dollars for accountability from tech giants on a daily basis.
By making informed choices, we all can contribute to more ethical practices across the AI industry.
Ganna Pogrebna is a behavioral data scientist, decision theorist, educator, author, and academic writer. She currently serves as the Executive Director of the Artificial Intelligence and Cyber Futures Institute at Charles Sturt University, the Lead for Behavioural Data Science at the Alan Turing Institute (UK), and an Honorary Professor of Behavioural Business Analytics and Data Science at the University of Sydney. She is known for her work in combining data science methods with those from economics and psychology to model human behaviour under risk and uncertainty.
No comments:
Post a Comment