text post from 1 day ago

a reading list on the human labor behind AI, machine learning, data labeling, and content moderation

bringing a global labor perspective to the “ai is gonna steal our jobs!” discourse that usamerican creative workers don’t really like…

(based on this twitter thread)

Google’s AI Chatbot Is Trained by Humans Who Say They’re Overworked, Underpaid and Frustrated (12 July 2023)

“If you want to ask, what is the secret sauce of Bard and ChatGPT? It’s all of the internet. And it’s all of this labeled data that these labelers create,” said Laura Edelson, a computer scientist at New York University. “It’s worth remembering that these systems are not the work of magicians — they are the work of thousands of people and their low-paid labor.”

The Hidden Workforce That Helped Filter Violence and Abuse Out of ChatGPT (11 July 2023)

ChatGPT is one of the most successful tech products ever launched. And crucial to that success is a group of largely unknown data workers in Kenya. By reviewing disturbing, grotesque content, often for wages of just two to three dollars an hour, they helped make the viral chatbot safe. WSJ’s Karen Hao traveled to Kenya to meet those workers and hear about what the job cost them.

The workers at the frontlines of the AI revolution: The global labor force of outsourced and contract workers are early adopters of generative AI — and the most at risk (11 July 2023)

Since the blockbuster launch of ChatGPT at the end of 2022, future-of-work pontificators, AI ethicists, and Silicon Valley developers have been fiercely debating how generative AI will impact the way we work. Some six months later, one global labor force is at the frontline of the generative AI revolution: offshore outsourced workers.

Inside the AI Factory: the humans that make tech seem human (20 June 2023)

You might miss this if you believe AI is a brilliant, thinking machine. But if you pull back the curtain even a little, it looks more familiar, the latest iteration of a particularly Silicon Valley division of labor, in which the futuristic gleam of new technologies hides a sprawling manufacturing apparatus and the people who make it run.

OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic (18 January 2023)

OpenAI took a leaf out of the playbook of social media companies like Facebook, who had already shown it was possible to build AIs that could detect toxic language like hate speech to help remove it from their platforms. The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.

To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet. Some of it described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest. … The data labelers employed by Sama on behalf of OpenAI were paid a take-home wage of between around $1.32 and $2 per hour…

The ‘Invisible’, Often Unhappy Workforce That’s Deciding the Future of AI (9 December 2023)

Among a range of conclusions, the Google study finds that the crowdworkers’ own biases are likely to become embedded into the AI systems whose ground truths will be based on their responses; that widespread unfair work practices (including in the US) on crowdworking platforms are likely to degrade the quality of responses; and that the ‘consensus’ system (effectively a ‘mini-election’ for some piece of ground truth that will influence downstream AI systems) which currently resolves disputes can actually throw away the best and/or most informed responses.

The Exploited Labor Behind Artificial Intelligence: Supporting transnational worker organizing should be at the center of the fight for “ethical AI.” (13 October 2022)

So-called AI systems are fueled by millions of underpaid workers around the world, performing repetitive tasks under precarious labor conditions. And unlike the “AI researchers” paid six-figure salaries in Silicon Valley corporations, these exploited workers are often recruited out of impoverished populations and paid as little as $1.46/hour after tax. Yet despite this, labor exploitation is not central to the discourse surrounding the ethical development and deployment of AI systems.

A factory line of terrors: TikTok’s African content moderators complain they were treated like robots, reviewing videos of suicide and animal cruelty for less than $3 an hour (1 August 2022)

“The devil of this job is that you get sick slowly — without even noticing it,” said Wisam, a former content moderator who now trains others for Majorel. … While TikTok does use artificial intelligence to help review content, the technology is notoriously poor in non-English languages. For this reason, humans are still used to review most of the heinous videos on the platform.

Human Touch: Artificial intelligence may be making some jobs obsolete but it has given a new lease of life to one group of people who play an unglamorous but critical role in the machine learning pipeline: first generation women workers in Indian towns and villages (20 July 2022)

“Any major technology company in the last 10 years has been powered by a throng of people … At some level, there’s denial. Investors like to hear that technology sells itself once you write the code. But that’s not really true.” … “Data work has a racial and class dynamic. It is outsourced to developing countries while model work is done by engineers largely in developed nations … Without their labour, there would be no AI.”

Desperate Venezuelans are making money by training AI for self-driving cars (29 August 2022)

How the AI industry profits from catastrophe: As the demand for data labeling exploded, an economic catastrophe turned Venezuela into ground zero for a new model of labor exploitation (20 April 2022)

Most profit-maximizing algorithms, which underpin e-commerce sites, voice assistants, and self-driving cars, are based on deep learning, an AI technique that relies on scores of labeled examples to expand its capabilities. … The insatiable demand has created a need for a broad base of cheap labor to manually tag videos, sort photos, and transcribe audio. The market value of sourcing and coordinating that “ghost work” … is projected to reach $13.7 billion by 2030.

Over the last five years, crisis-ridden Venezuela has become a primary source of this labor. The country plunged into the worst peacetime economic catastrophe facing a country in nearly 50 years right as demand for data labeling was exploding. Droves of well-educated people who were connected to the internet began joining crowdworking platforms as a means of survival.

Facebook Faces New Lawsuit Alleging Human Trafficking and Union-Busting in Kenya (11 May 2022)

“We can’t have safe social media if the workers who protect us toil in a digital sweatshop… We’re hoping this case will send ripples across the continent—and the world. The Sama Nairobi office is Facebook’s moderation hub for much of East and South Africa. Reforming Facebook’s factory floor here won’t just affect these workers, but should improve the experience of Facebook users in Kenya, South Africa, Ethiopia, and other African countries.”

Inside Facebook’s African Sweatshop (14 February 2022)

Here in Nairobi, Sama employees who speak at least 11 African languages between them toil day and night, working as outsourced Facebook content moderators: the emergency first responders of social media. They perform the brutal task of viewing and removing illegal or banned content from Facebook before it is seen by the average user. …

The testimonies of Sama employees reveal a workplace culture characterized by mental trauma, intimidation, and alleged suppression of the right to unionize. The revelations raise serious questions about whether Facebook… is exploiting the very people upon whom it is depending to ensure its platform is safe

Refugees help power machine learning advances at Microsoft, Facebook, and Amazon: Big tech relies on the victims of economic collapse (22 September 2021)

Microwork comes with no rights, security, or routine and pays a pittance — just enough to keep a person alive yet socially paralyzed. Stuck in camps, slums, or under colonial occupation, workers are compelled to work simply to subsist under conditions of bare life. This unequivocally racialized aspect to the programs follows the logic of the prison-industrial complex, whereby surplus — primarily black — populations [in the United States] are incarcerated and legally compelled as part of their sentence to labor for little to no payment. Similarly exploiting those confined to the economic shadows, microwork programs represent the creep of something like a refugee-industrial complex.

(an excerpt from the book Work Without the Worker: Labour in the Age of Platform Capitalism by Philip Jones)

AI needs to face up to its invisible-worker problem (11 December 2020)

A.I. Is Learning From Humans. Many Humans. (16 August 2019)

A.I. researchers hope they can build systems that can learn from smaller amounts of data. But for the foreseeable future, human labor is essential. “This is an expanding world, hidden beneath the technology,” said Mary Gray, an anthropologist at Microsoft and the co-author of the book “Ghost Work,” which explores the data labeling market. “It is hard to take humans out of the loop.”

[book] Behind the Screen: Content Moderation in the Shadows of Social Media by Sarah T. Roberts (June 2019)

Social media on the internet can be a nightmarish place. A primary shield against hateful language, violent videos, and online cruelty uploaded by users is not an algorithm. It is people. Mostly invisible by design, more than 100,000 commercial content moderators evaluate posts on mainstream social media platforms: enforcing internal policies, training artificial intelligence systems, and actively screening and removing offensive material—sometimes thousands of items per day

[book] Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass by Mary L. Gray and Siddharth Suri (May 2019)

Hidden beneath the surface of the web, lost in our wrong-headed debates about AI, a new menace is looming. … services delivered by companies like Amazon, Google, Microsoft, and Uber can only function smoothly thanks to the judgment and experience of a vast, invisible human labor force. These people doing “ghost work” make the internet seem smart. They perform high-tech piecework: flagging X-rated content, proofreading, designing engine parts, and much more. An estimated 8 percent of Americans have worked at least once in this “ghost economy,” and that number is growing. They usually earn less than legal minimums for traditional work, they have no health benefits, and they can be fired at any time for any reason, or none.

[follow-up articles about the book here and here]

Inmates in Finland are training AI as part of prison labor (28 March 2019)

“Prison labor” is usually associated with physical work, but inmates at two prisons in Finland are doing a new type of labor: classifying data to train artificial intelligence algorithms for a startup. … “The hook is that we have this kind of hype circulating around AI so that we can masquerade really old forms of labor exploitation as ‘reforming prisons,’… They’re connecting social movements, reducing it to hype, and using that to sell AI.”

How Crowdworkers Became the Ghosts in the Digital Machine: Since 2005, Amazon has helped create one of the most exploited workforces no one has ever seen (5 February 2014)

Crowdworking is often hailed by its boosters as ushering in a new age of work. With the zeal of high-tech preachers, they cast it as a space in which individualism, choice and self-determination flourish. … But if you happen to be a low-end worker doing the Internet’s grunt work, a different vision arises. According to critics, Amazon’s Mechanical Turk may have created the most unregulated labor marketplace that has ever existed. Inside the machine, there is an overabundance of labor, extreme competition among workers, monotonous and repetitive work, exceedingly low pay and a great deal of scamming. In this virtual world, the disparities of power in employment relationships are magnified many times over, and the New Deal may as well have never happened.


text post from 1 day ago

perhaps one of my hotter takes as a queer person but i’m never coming out again. you can figure it out or live in pure ignorance but either way it’s not my problem. the worst thing society ever tried to teach us was that coming out is an obligation. it’s not. it’s a privilege for you to know the depths of who i am, my sexuality included.


text post from 4 days ago

I have been doodling a lot on my iPad mini these days, after a couple of years of just not feelin’ anything artwise. But I have shamefully neglected to post them to Tumblr! So have an art dump!

washy watercolor of a rather depressed looking sheepALT

It started with a sheep. I was messing around with new watercolor tools and crosshatching tools, thought “that looks kinda like a sheep” and then took out the bits that didn’t look like a sheep.

a chonky black and white creature that looks like a cross between a dog and a penguinALT

The noble Aukhound, originally bred to herd migratory seabirds. These majestic, slightly damp creatures are now used extensively in ecological restoration work.

a hooded figure with glowing eyes exclaims delightedly over seeing a frog. the frog is puzzled.ALT

I do this whenever I see a frog.

crosshatched artwork of a robed figure of indeterminate species with a bird on her armALT

Then I was just in the mood for weird shadowy cloaked figures.

weird cloaked figure with pale eyes smoking a cigarette while a small lizard clings to her hem.ALT

You know that’s a clove cigarette.

peculiar creature in a robe with a chicken on its headALT

Portrait Of A Creature With A Chicken On Its Head

two creatures, one in a tall blue hat, one wearing a conical hat with grapes, sitting on the ground drinking tea.ALT

Just two weird little creatures having tea together.


text post from 4 days ago

as a child being told "the moon controls the tides" with no additional explanation was like. oh okay. you want me to believe in magic? you're talking about magic right now? okay. fine


text post from 1 week ago

online is real

everything online is real. there have been decades of cultural mythologizing about the internet that place it in contrast to reality--the internet is framed as immaterial, as ephemeral. the internet is 'just data', it's weightless and abstract. things taking place on the internet are 'just online', they're not 'really' happening. this isn't true.

of course, there's the elementary fact that many things that are immaterial with no physical form are still real. monetary value is real, even if you cannot find me an atom of it. the german border is real even where it is not physically demarcated. they are real and physical in that they shape real and physical human interactions. but the internet is in fact far more real than that: every single piece of data on the internet exists physically on a disk somewhere as a pattern of magnetic charges. 'the cloud' sounds like it's a weightless and fluffy thing, but 'the cloud' is this:

image

this is the cloud--a google data center, to be exact. amazon web services is not an immaterial or abstract thing; it is over 26 million square feet of data centers, of physical computers inside physical buildings where data is physically recorded. access to the internet is provided by cables, including the thousands of miles of undersea infrastructure which make the global internet possible. 4G, 5G, wifi, these are all made possible by physical apparatuses sending out radio waves.

all computer infrastructure everywhere is made possible by cobalt and lithium and gallium and so on--mined out of the ground, by an extractive mining industry which exploits the people and resources of the global south. estimates for how much electricity 'the internet' uses vary wildly, but it's at minimum measured in gigawatts--and so coal and nuclear power plants and wind farms and hydroelectric dams and the coal and uranium and bauxite mining which builds them are inseparable from 'the internet'. google programmers do not live in the astral plane, they work from buildings (and outsource work to india) which need to be cleaned and maintained and worked in and driven back and forth from.

youtube videos are 'online' but of course if you film something happening and upload it that thing still really happened. if a physical action, whether that be a terrorist attack or a protest or a sexual hookup or an act of ethic violence, is mediated and planned online, that thing really happens in the real world. the idea of 'just online', this platonic real divorced from a cleanly delineated 'outside' or 'real world' is just wrong. it is wrong in more or less every single way. every program or function your computer has or video or picture or word you see on the computer is the result of people doing actual physical things. you are seeing it by grace of miles of cable and tons and tons of machinery and power plants and the guy who sweeps the floors at the indian IT firm facebook outsources. online is real.

Data has a carbon footprint.


text post from 1 week ago

Tumblr’s Core Product Strategy

Here at Tumblr, we’ve been working hard on reorganizing how we work in a bid to gain more users. A larger user base means a more sustainable company, and means we get to stick around and do this thing with you all a bit longer. What follows is the strategy we're using to accomplish the goal of user growth. The @labs group has published a bit already, but this is bigger. We’re publishing it publicly for the first time, in an effort to work more transparently with all of you in the Tumblr community. This strategy provides guidance amid limited resources, allowing our teams to focus on specific key areas to ensure Tumblr’s future.

The Diagnosis

In order for Tumblr to grow, we need to fix the core experience that makes Tumblr a useful place for users. The underlying problem is that Tumblr is not easy to use. Historically, we have expected users to curate their feeds and lean into curating their experience. But this expectation introduces friction to the user experience and only serves a small portion of our audience. 

Tumblr’s competitive advantage lies in its unique content and vibrant communities. As the forerunner of internet culture, Tumblr encompasses a wide range of interests, such as entertainment, art, gaming, fandom, fashion, and music. People come to Tumblr to immerse themselves in this culture, making it essential for us to ensure a seamless connection between people and content. 

To guarantee Tumblr’s continued success, we’ve got to prioritize fostering that seamless connection between people and content. This involves attracting and retaining new users and creators, nurturing their growth, and encouraging frequent engagement with the platform.

Our Guiding Principles

To enhance Tumblr’s usability, we must address these core guiding principles.

  1. Expand the ways new users can discover and sign up for Tumblr.
  2. Provide high-quality content with every app launch.
  3. Facilitate easier user participation in conversations.
  4. Retain and grow our creator base.
  5. Create patterns that encourage users to keep returning to Tumblr.
  6. Improve the platform’s performance, stability, and quality.

Below is a deep dive into each of these principles.

Keep reading

@staff: get rid of the 100% chronological feed, implement an algorithm, flood my feed with stuff from people I haven't explicitly followed, and I'm out. This is the dumbest idea in Tumblr's history and it will be the end of it. I literally cannot think of a worse shot in Tumblr's own face than this.

Tumblr is the last port of the sane internet. It’s as easy as 1-2-3 to post. You could not make it any easier. Chronological feed is what I want, fuck an algorithm.

If you want to improve Tumblr, give us the ability to make lists. I’ve been here for more than 10 years, and I’ve never used it more than I do now that I’ve left Twitter for good. Yet my dash is a mess of different fandoms and interests I’ve followed over the years. Sometimes, I want to just see my mutuals for a certain fandom and yet tumblr does not give me the ability to do it with one click. I have to maintain spreadsheets of my own to remember who’s who and search each manually and if anyone changes their name I’m lost.

Twitter had lists. They were a feature the user base asked for early on, similar to Livejournal’s friends list but adapted in a public space. You could make a mutual follows list and a list for each fandom or interest, so when you logged on for a particular topic, you could find the right accounts again quickly regardless of whether they used the right tags. If I want writing advice, I could click my Writing list. If I want to see trusted sources of climate crisis news, I could click my Environmental list. If I wanted to chat, I could click my Mutuals list. If I wanted to see squee about an episode of tv that just aired, I could click that fandom’s list. If someone stopped posting about that thing, you could easily take them off or move them to another list.

Here, I can follow a blog I like but may never seen it on my dash if they post at a time I’m not online. That’s the only thing that annoys me about Tumblr.

Give us lists. Help us curate our own experience. Fuck the algorithm.

Mild barriers to entry are good. They protect actual communities and allow different internet ecosystems that feel different one from another. Dumbing things down just results in a worse version of Instagram or whatever other popular app, and that’s not competitive because people can just go to the already-successful hellhole in question.

@staff​ If you won’t listen to us, maybe you’ll care what a Tech person thinks.

This is from Fan is a Tool-Using Animal by Maciej Cegłowski (the Pinboard guy):

“Something counter-intuitive to me was a lot of this fan stuff, they would use tools and web sites and plugins and Greasemonkey, like five things at once, and it’s really hard to get your mind around it. My instinct was couldn’t this be done more easily or more intuitively? But it’s actually the difficulty of the tools and the norms of the community [that] protect it. It takes a while. You have to be committed to start contributing because you have to learn how all this stuff works.

This isn’t just fandom. You know those awful PHP-based message boards like, I like to scuba dive so I go to the scuba board sometimes and I have no idea how to use it but there’s some really informative people on there. I think that these terrible interfaces actually serve a protective function where they keep the community isolated from just drive-by comments. If you ever go to the Guardian or the New York Times, places where you can actually comment very easily without ever having been there before, the comments are just totally trash. And I wonder sometimes if it’s because it’s too easy to do it.”

Number 6 is great. Stability is something all users value.

3 is... well, it depends how you implement it. Making it easy to go see all the contentful responses is great. Making it ultra easy for a new person to just dive in in total ignorance to stink up a conversation with their unconsidered opinions is horseshit.

For me, I primarily access Tumblr through my notifications page, and I most certainly don’t want any of the contentful interactions collapsed. What I already do is to hide all likes and all reblogs without commentary or tags. Those are meaningless cruft, not real interactions.

Tumblr is great not because things can go viral in no-comment reblogs but because, now that the platform is quieter, the real conversation is here and not on garbage like Twitter.

It’s entirely possible that Tumblr’s current style of infinite massive image uploads simply isn’t financially sustainable with the community Tumblr has actually attracted and without the cash cow of people paying for sexually explicit materials, but trying to become some shittier, newer app with worse features and more 13-year-olds isn’t going to fix that.