San Jose invited technology companies to mount cameras on a municipal vehicle for data collection. The footage is used to train algorithms to detect objects. LAION-5B dataset, a foundation for AI models, is analyzed for its commercial influence and structural biases. Curation by statistics shapes AI models and training sets, impacting content. LAION-5B's warning against real-world use is disregarded, influencing models like Midjourney and StableDiffusion. LAION-5B dataset includes images from Pinterest and is critiqued for its commercial logics. The investigation into LAION-5B aims to understand how generative AI models work and their biases.
"In my own analysis of LAION’s content--prior to the dataset’s removal--I was troubled by its inclusion of images of historical atrocities, which are abstracted into unrelated categories": https://t.co/J14cWTqYxP #ethics #AI #data #internet #research
🤖🌎 #AI Bias Alert: Learning from Google's Gemini faux pas, here's how to tackle biases head-on for a powerful and prejudice-free technology! Buckle up for a walkthrough 🙌 https://t.co/4MIJ9DotHR
The need for greater diversity in tech is even more acute, due to the rise of #AI, reports @strategist mag, as a #techUK panel offered personal insights on digital ethics and the safety of AI systems, from the perspective of women who work in the industry: https://t.co/CLvjDxPUUJ https://t.co/EAgIyWLTh7
"This is what curation by statistics looks like: tiny tweaks to code can have profound effects on the content of training sets, and on the models that use them to shape their computational worldview" #ethics #AI #data #tech #internet #software #language #aesthetics https://t.co/o6s6MQXVSC
People have been taking about building Open AI (adjective + noun) recently with @elonmusk emphasizing the topic. Here I propose a possible way to actually build AI in an open and fair way. 𝟭. 𝗗𝗮𝘁𝗮 All AI models start with data. There are different types of data involved… https://t.co/O0vqTYh4Q7
A major Der Spiegel investigation of LAION-5B, a vast dataset of images used to train the majority of today’s generative AI, draws on a new report from Knowing Machines, a joint USC -NYU Law Engelberg Center project https://t.co/kNU7VQnjVy
"Openness in the #AI field matters, not just for model biases, but for the structural biases in the ecosystem. An ongoing problem is that curation by statistics amplifies many of those structural biases": https://t.co/bvM2nthh2s #ethics #tech #data #research
Another "truth about generative #AI: The [concept] of what is... visually appealing can be influenced in outsized ways by the tastes of a very small group of individuals, and the processes that are chosen by dataset creators to curate the datasets" https://t.co/R7KutnH8kY #data
"There are models on top of models, and trainings sets on top of training sets. Omissions and biases and blind spots from these stacked-up models and training sets shape all of the resulting new models and new training sets." #ethics #AI #data #tech #research #business https://t.co/nERDH7y2I1
"The tiniest of shifts in LAION's thresholds could have excluded or included hundreds of millions of images. What the images contain plays no role at all in deciding what stays and what goes." https://t.co/sQMRvS3p0l #ethics #AI #data https://t.co/nERDH7y2I1
An "important truth about LAION-5B: It contains less about how humans see the world than it does about how #search engines see the world. It is a dataset that is powerfully shaped by commercial logics" (embedded in ALT tags). #ethics #internet #data #AI #business #research #tech https://t.co/o6s6MQXo34
"Some... websites in particular are very well-represented in LAION-5B. There are nearly 155 million images pairs (images + captions) from #Pinterest - about one in every forty pairs." #ethics #AI #tech #internet #business #data https://t.co/o6s6MQXo34
"On their homepage, its creators explicitly warn against its use in real-world contexts... Largely, this warning has been ignored. #Midjourney and #StableDiffusion, two large models for which some of the #data sources are known, are both trained in part on LAION-5B." #ethics #AI https://t.co/ojaHolYhrk
It's humans in the loop all the way down. #ethics #AI #tech #data #research https://t.co/WHpp1z6oTM
👉Today we're launching this investigation into LAION-5B, the blockbuster dataset behind Midjourney and Stable Diffusion. It's a deep dive into how the dataset was made, and where the images come from. The brilliant @christo_buschek & @blprnt follow the models all the way down. https://t.co/jaWMoc29WD
"Investigating training sets is an essential avenue to understanding how generative AI models work; the ways they see and re-create the world." Models All the Way Down — fantastic (and haunting in its intimations) project by my friend @blprnt https://t.co/j6VnD7QezR
The AI field’s goal is nothing less than to transform the world. But what are the foundations upon which this transformation is built? In this investigation, @blprnt and I looked at LAION-5B, the only open-source foundation dataset currently available. https://t.co/JFYQsxCPFX
"The images are fed into computer vision software and used to train the companies’ #algorithms to detect... unwanted objects, according to interviews and documents the Guardian obtained through public records requests." #ethics #AI #privacy #gov #data #business h/t @loisbeckett https://t.co/Gt6NXbkaP9
"Last July, San Jose issued an open invitation to #technology companies to mount cameras on a municipal vehicle that began periodically driving through the city’s district 10 in December, collecting footage of the streets and public spaces": https://t.co/UUqNqx3t97 #ethics #gov
Learn out. Building fairness into AI is crucial – and hard to get right https://t.co/bAGegFFhPs via @ConversationUS #tech #digital #data #privacy