469
Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI
(www.404media.co)
This is a most excellent place for technology news and articles.
An engineer at AWS can already just copy your code, make minor modifications, and use it. I would think the same legal recourse would apply if it was outputted from an LLM or just a copy-paste? This seems like a tangential issue to whether the LLM was trained on your code or not (not training on your code obviously reduces the probability of the LLM spitting it back out near-verbatim though). Personally, I don't see anything wrong with anyone using public code to build statistical models. And I think the pay-to-scrape models that Reddit, Xitter, and others are employing will help big tech build the "moat" they're looking for. Big tech is asking for AI regulation for similar reasons.
You are 100% wrong here my man. If an engineer does this they are creating a derivative work and they have to fullfil the conditions of the license of the code. No wonder you don't see anything wrong here, you AI people live in a fantasy world when it comes to how copyright works hahahaha. Please stop talking about shit you know nothing about.
I stated that they can do this, and asked if they could be sued if they used near-verbatim code generated from an LLM, just like they could be sued if they copy-pasted AGPL code.
Edit: Tools like CoPilot tell you if your code is similar to publicly available code so you can avoid these issues.
Edit: Just looked up EFF's position and I tend to agree with it:
https://www.eff.org/document/eff-two-pager-ai
What point are you trying to make? That the fact that someone can break the law means we should not have laws? I honestly don't get what you are trying to say.
I'm saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.
And that's the whole point of my comment, did you even read it? To summarize, there is currently a loophole in law that allows these bullshit arguments about it being different than straight up copying shit (though this haven't been litigated yet, so it's not yet clear if these arguments are actually valid). This means that while a person reading my AGPL code and copying it (without following the license) is 100% illegal, doing the same through an LLM may be legal. So this means that open source licenses can be bypassed by first training an LLM with the code and then extracting the code from the LLM. This is terrible for open source, and in general for anyone who wants to make a living from creating copyrighted work. So we should close this loophole, and I'm glad there is a push to close this through better laws. Even if these laws are comming from Disney, Sony, and all those awful companies.
So again, what's the point you are trying to make here? That we shouldn't make these laws stronger to prevent this bullshit? I honestly don't understand what you are trying to argue here, nothing of what you have said has anything to do with this conversation.
That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.
Any arguments to defend your position? I'm giving you a very clear example of the awful consecuences of following that path. And the same applies to any creative work. You are just being dismissive without proposing any real solution. Do better man.
The EFF link I posted above provides evidence. Again, here's a quote from part of it:
As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I'm proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.