programming.dev

8,988 readers
478 users here now

Welcome Programmers!

programming.dev is a collection of programming communities and other topics relevant to software engineers, hackers, roboticists, hardware and software enthusiasts, and more.

The site is primarily english with some communities in other languages. We are connected to many other sites using the activitypub protocol that you can view posts from in the "all" tab while the "local" tab shows posts on our site.


🔗 Site with links to all relevant programming.dev sites

🟩 Not a fan of the default UI? We have alternate frontends we host that you can view the same content from

ℹ️ We have a wiki site that communities can host documents on


⚖️ All users are expected to follow our Code of Conduct and the other various documents on our legal site

❤️ The site is run by a team of volunteers. If youre interested in donating to help fund things such as server costs you can do so here

💬 We have a microblog site aimed towards programmers available at https://bytes.programming.dev

🛠️ We have a forgejo instance for hosting git repositories relating to our site and the fediverse. If you have a project that relates and follows our Code of Conduct feel free to host it there and if you have ideas for things to improve our sites feel free to create issues in the relevant repositories. To go along with the instance we also have a site for sharing small code snippets that might be too small for their own repository.

🌲 We have a discord server and a matrix space for chatting with other members of the community. These are bridged to each other (so you can interact with people using matrix from discord and vice versa.

Fediseer


founded 1 year ago
ADMINS
1
 
 

It is now clear that generative artificial intelligence (AI) such as large language models (LLMs) is here to stay and will substantially change the ecosystem of online text and images. Here we consider what may happen to GPT-{n} once LLMs contribute much of the text found online. We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs). We build theoretical intuition behind the phenomenon and portray its ubiquity among all learned generative models. We demonstrate that it must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.

2
3
view more: next ›