this post was submitted on 05 Mar 2024
4 points (64.3% liked)

Machine Learning

490 readers
6 users here now

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS
top 3 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 8 months ago (1 children)
[–] [email protected] 7 points 8 months ago

Would have been less inevitable if they didn't vomit ai bullshit all over the internet and poison their own training data. Either way, generative ai is a pox and I eagerly await its destruction.

[–] vcmj 3 points 8 months ago* (last edited 8 months ago)

Most of the largest datasets are kind of garbage because of this. I've had this idea to run the data through the network every epoch and evict samples that are too similar to the output for the next epoch but never tried it. Probably someone smarter than me already tried that and it didn't work. I just feel like there's some mathematical way around this we aren't seeing. Humans are great at filtering the cruft so there must be some indicators there.