Creators of the 80 Million Tiny Images data set from MIT and NYU took the collection offline this week, apologized, and asked other researchers to ref

MIT takes down 80 Million Tiny Images data set due to racist and offensive content

submited by

Style Pass

2020-07-02 11:41:42

Creators of the 80 Million Tiny Images data set from MIT and NYU took the collection offline this week, apologized, and asked other researchers to refrain from using the data set and delete any existing copies. The news was shared Monday in a letter by MIT professors Bill Freeman and Antonio Torralba and NYU professor Rob Fergus published on the MIT CSAIL website.

Introduced in 2006 and containing photos scraped from internet search engines, 80 Million Tiny Images was recently found to contain a range of racist, sexist, and otherwise offensive labels such as nearly 2,000 images labeled with the N-word, and labels like “rape suspect” and “child molester.” The data set also contained pornographic content like non-consensual photos taken up women’s skirts. Creators of the 79.3 million-image data set said it was too large and its 32 x 32 images too small, making visual inspection of the data set’s complete contents difficult. According to Google Scholar, 80 Million Tiny Images has been cited more 1,700 times.

“Biases, offensive and prejudicial images, and derogatory terminology alienates an important part of our community — precisely those that we are making efforts to include,” the professors wrote in a joint letter. “It also contributes to harmful biases in AI systems trained on such data. Additionally, the presence of such prejudicial images hurts efforts to foster a culture of inclusivity in the computer vision community. This is extremely unfortunate and runs counter to the values that we strive to uphold.”

Combatting online racist abuse: an update following the Euros

Comment

Contentful raises $175M at a $3B valuation from Tiger for its content delivery service

Comment

Charting the ‘Data for Good’ Landscape

Comment

Securing Online Gaming: Combine Chaos Engineering with DevOps Practices | PingCAP

Comment

Validating the physics behind the new MIT-designed fusion experiment

Comment

Thinking About Glue – O’Reilly

Comment

Box Office: Marvel’s ‘Black Widow’ Debuts With Dazzling $80 Million in Theaters, $60 Million on Disney Plus

Comment

Fusion startup builds 10-foot-high, 20-tesla superconducting magnet

Comment

Cuba's COVID vaccine rivals BioNTech-Pfizer, Moderna

Comment

'Perfect' Apple Pushed Growers Into Debt

Comment

MIT takes down 80 Million Tiny Images data set due to racist and offensive content

Leave a Comment

Related Posts

Combatting online racist abuse: an update following the Euros

Contentful raises $175M at a $3B valuation from Tiger for its content delivery service

Charting the ‘Data for Good’ Landscape

Securing Online Gaming: Combine Chaos Engineering with DevOps Practices | PingCAP

Validating the physics behind the new MIT-designed fusion experiment

Thinking About Glue – O’Reilly

Box Office: Marvel’s ‘Black Widow’ Debuts With Dazzling $80 Million in Theaters, $60 Million on Disney Plus

Fusion startup builds 10-foot-high, 20-tesla superconducting magnet

Cuba's COVID vaccine rivals BioNTech-Pfizer, Moderna

'Perfect' Apple Pushed Growers Into Debt

Recent Posts

Robotic nerve 'cuffs' could help treat a range of neurological conditions

Investors Re-Engage With Gaming Startups

NIH pay rise for postdocs and PhD students could have US ripple effect

A cost-effective Intel W680 ECC server

I Kinda Hate The Internet Now - by Stephen Moore

Search code, repositories, users, issues, pull requests...

All the data so far is showing inflation isn't going away, and is making things tough on the Fed

Ideological asymmetries in online hostility, intimidation, obscenity, and prejudice

Fugue: A Basic List CRDT

Fact Sheet: New Rule on the Accessibility of Web Content and Mobile Apps Provided by State and Local Governments

What I remember about Flint water crisis was how state government lied | Opinion

On Limited Government and AI

The mystery of Milei’s cloned dogs: Argentina wonders if there are four or five

Search code, repositories, users, issues, pull requests...

Allstate indicates resuming new California policies amid insurance crisis

Search code, repositories, users, issues, pull requests...

Do you have a digital or social media will? Who will maintain your life online when you're dead? - Scott Hanselman's Blog

One Login: Towards a Single Fediverse Identity on ActivityPub

Increasing EV Powertrain Efficiency Without Rare-Earth Materials

From disaster zone to living laboratory: Chernobyl provides test bed for UGA researchers