AI

Project Analyzing Human Language Usage Shuts Down Because

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Posted by msmash from the all-good-things-end dept.

The creator of an open source project that scraped the internet to determine the ever-changing popularity of different words in human language usage says that they are sunsetting the project because generative AI spam has poisoned the internet to a level where the project no longer has any utility. 404 Media: Wordfreq is a program that tracked the ever-changing ways people used more than 40 different languages by analyzing millions of sources across Wikipedia, movie and TV subtitles, news articles, books, websites, Twitter, and Reddit. The system could be used to analyze changing language habits as slang and popular culture changed and language evolved, and was a resource for academics who study such things. In a note on the project’s GitHub, creator Robyn Speer wrote that the project “will not be updated anymore.”

“Generative AI has polluted the data,” she wrote. “I don’t think anyone has reliable information about post-2021 language usage by humans.” She said that open web scraping was an important part of the project’s data sources and “now the web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.” While there has always been spam on the internet and in the datasets that Wordfreq used, “it was manageable and often identifiable. Large language models generate text that masquerades as real language with intention behind it, even though there is none, and their output crops up everywhere,” she wrote.

“To IBM, ‘open’ means there is a modicum of interoperability among some of their equipment.” — Harv Masterson

Working…

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button

Adblock Detected

Block the adblockers from browsing the site, till they turn off the Ad Blocker.