Programming

17476 readers

239 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]

founded 1 year ago

MODERATORS

snowe

Ategon

[email protected]

Japan determines copyright doesn't apply to LLM/ML training data (infosec.town)

submitted 10 months ago by ericjmorey to c/programming

22 comments fedilink hide all child comments

cross-posted from: https://programming.dev/post/8121669

Taggart (@mttaggart) writes:

Japan determines copyright doesn't apply to LLM/ML training data.

On a global scale, Japan’s move adds a twist to the regulation debate. Current discussions have focused on a “rogue nation” scenario where a less developed country might disregard a global framework to gain an advantage. But with Japan, we see a different dynamic. The world’s third-largest economy is saying it won’t hinder AI research and development. Plus, it’s prepared to leverage this new technology to compete directly with the West.

I am going to live in the sea.

www.biia.com/japan-goes-all-in-copyright-doesnt-apply-to-ai-training/

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 10 months ago (7 children)

I am so torn on this. On the one hand, I think training these huge models is very similar to human artists consuming things and then consciously or unconsciously using it for their own work. The source is usually no longer distinguishable. So it should be allowed to train them on anything a human could consume.

On the other hand, large AI models are mostly under the control by huge asshole corporations and I absolutely hate seeing them benefit for free from the rest of us. It'd be nice if regulation like in Japan applies only for freely available models.

[–] atheken 1 points 10 months ago (1 children)

I think the “learning” process could be similar, but the issue is the scale.

No human artist could integrate the amount of material at the speed that these systems can. The systems are also by definition nothing but derivative. I think the process is similar, but there is important nuance that supports a different conclusion.

[–] [email protected] 2 points 10 months ago (1 children)

I don't think the scale matters. Do we treat human artists that only read 10 books or watched 10 movies before creating something different than the ones that consumed 1000 or 10000? For me the issue of who controls it is much more important. If something like ChatGPT were truly open-source and people could use it any way they want, I would have zero issues with the models being trained on everything that is available. We desperately need less copyright instead of more. Right now I think going after the big AI models with copyright is a double-edged sword. It's good to bring them down, but not at the cost of strengthening copyright.

[–] atheken 1 points 10 months ago

If these systems could only reorganize and regurgitate 1000 creative works, we would not be having this conversation. It’s literally because of the scale that this is even relevant. The scope of consumption by these systems, and the relative ease of accessibility to these systems is what makes the infringement/ownership question relevant.

We literally went through this exercise with fair use as it pertains to CD/DVD piracy in the 90’s, and Napster in the early 2000’s. Individuals making copies was still robbing creative artists of royalties before those technologies existed, but the scale, ubiquity, and fidelity of those systems enabled large-scale infringement in a way that individuals copying/reproducing them previously could not.

I’m not saying these are identical examples, but the magnitude is a massive factor in why this issue needs to be regulated/litigated.

load more comments (5 replies)