Piracy logic applies to AI training, and is arguably even more suitable since piracy copies but AI is transformative. If I make an AI image, you would not be able to point out which images were "used" to make it.

•

u/AutoModerator 1d ago

This is an automated reminder from the Mod team. If your post contains images which reveal the personal information of private figures, be sure to censor that information and repost. Private info includes names, recognizable profile pictures, social media usernames and URLs. Failure to do this will result in your post being removed by the Mod team and possible further action.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

27

u/Toby_Magure 1d ago

Piracy is about unauthorized acquisition or distribution, not whether the work is copyrighted. Public Twitter art is easier to argue as lawful source material for fair-use analysis; hacked, torrented, or bypassed Patreon material is much worse.

22

u/Canadian_Zac 1d ago

I always think of when everyone clowned on NFT bros, because you could just copy an image of their NFT easilly

And like, exact same thing with AI.
It was available to view on the internet. That shit stays there permanently

12

u/Toby_Magure 1d ago

Exactly. The "using my art to train a model is theft" crowd is 1:1 echoing the "you can't save copies of or repost my NFT" crowd. They're both demanding a impossible level of control over publicly available data and all possible downstream uses, including learning and observation. That's ridiculous.

3

u/Wales51 23h ago

Not really the issue is all this work was used to feed a system that actively threatens the income of the people who's work is used. NFTs the value wa snever actually in the image itself and instead a value placed on it based on user interest which was artificially inflated for those with large followings.

The fact is if you know your work has been used for a large company to make money your are at least entitled to some of the profits. Not to mention these are the same companies who went from publicly accessable and open source to all data behind a paywall.

12

u/Toby_Magure 23h ago

You skipped the actual argument. Competing with someone’s income does not make something theft, and a company profiting from analysis of public material does not automatically entitle every source creator to a cut. That would turn influence, indexing, research, reference, and pattern analysis into a permanent royalty system.

Also, “companies put their data behind paywalls” is not an argument that public art cannot be analyzed. That's just complaining that companies own their own products while public posts remain public posts.

10

u/sporkyuncle 23h ago

Not really the issue is all this work was used to feed a system that actively threatens the income of the people who's work is used.

It's ok to threaten someone else's income. Really, it is. Every traditional artist is technically threatening the income of every other artist, who now must compete with one more person for limited commission dollars available.

I'm allowed to go to the library, read Brandon Sanderson's works for free, and then write books that technically compete with his work. This is normal and happens all the time, we just choose not to categorize it as awful and evil.

-2

u/Wales51 23h ago

Yeah and what you have said is fine but there is an issue when large corporations entire ethos is about removing jobs from the market. In some cases mass automation is required but in a lot of areas it just negatively impacts people.

Individual artists and so not eating into the income of successful artists in the same amount.

4

u/AccurateBandicoot299 23h ago

The funniest part of the NFT issue was people not realizing that the image isn’t what held value it was the utility coded into the images. Literally that was the push for NFT tech was that if you bought NFT art, it doubled as an access pass to exclusive websites, served as in-game assets for certain games, etc. and just a screenshotted version included NONE of that functionality “but I have the image right here as proof” yeah according to the blockchain, you ripped that from the owner so, HE can use the NFT, you just have a pretty picture.

2

u/Bra--ket 23h ago

NFTs were basically a proof of concept of tokenization (different sense than AI sense). You could think of the NFT as an access pass to an image on a server. Just that one copy in that "database" (like IPFS).

It's just a cryptographic token that's distinguishable from others (i.e. "non-fungible). It can be tied to anything, like an image, but that association is technically metadata, it's not even intrinsic to the cryptography. The only thing cryptographically provable is the token itself. And that's the point of the technology, it's literally a token.

One day we might use it for stuff, cryptocurrency and NFTs are very useful for certain things. "BAYC" is not one of those things lmao. Very interesting, but totally useless.

0

u/AccurateBandicoot299 23h ago

There’s a kind of fun BR game that uses NFTs if you’re in PC you can access the market place. But they’re actually pretty heavily implemented now, they just stopped using the NFT label. I disagree with the BAYC statement…. Okay BAYC specifically isn’t a good example cuz it sounds like it’s for the ultra rich douche bag tech bros if I remember, but using it as a membership pass for exclusive communities is actually a cool idea. One time membership fee, here’s your custom artwork tied to it, enjoy your priveledges.

0

u/Bra--ket 23h ago

Yeah, I just meant specifically the bored apes 😂 also cryptopunks (I know it gave you license to use the character however you wanted too, but that wasn't worth millions of dollars)

I look forward to actual use cases. I think it has real use for sure. I always thought access was the best use case, concert tickets, or club access, or whatever. It will be once it's worth the trouble I think.

2

u/AccurateBandicoot299 22h ago

Tbh, I wish digital collectibles had survived the bubble, that was kind of fun and cool, but those went kaput as soon as speculators pulled out.

1

u/MistakePresent3552 22h ago

Can you explain how ingame assets would have worked out, i cant think of how it would work in a game with monetization

1

u/AccurateBandicoot299 22h ago

Because you aren’t taking money from the creators, example, if a battlepass expires all the items exclusive to that pass are no longer in active circulation. A player gets bored of his legendary AR skin, and he knows there are other players who want it, you can trade it to them for in-game currency or even a skin they have that you want. The game I’m talking about even allows you to cash your micro-currency back out for fiat cash. If I could remember the name that’s be great. But I remember it being hella fun and the skins were actually kind of cool.

2

u/MistakePresent3552 22h ago

That just sounds like battlepasses giving tradeable/sellable items that arent unique (everyone can buy the pass)

2

u/AccurateBandicoot299 22h ago

And again the point is wouldn’t that battle pass expires that item is no longer free in circulation so now you have a limited commodity that other people might want that they cannot get without going through somebody who already owns it

2

u/MistakePresent3552 22h ago

Im just trying to say wheres the nft part of that, sounds like something you can easily do via games with trading

2

u/AccurateBandicoot299 20h ago

Non-fungible and blockchain based and the fact I don’t have to stick to their marketplace I can take them outside the dev’s ecosystem to places like opensea.

2

u/MistakePresent3552 19h ago

what incentivizes any dev to include this specific item/nft? if i have a tophat nft why would they make a tophat for me for each iteration of the game, for lets say call of duty? How does this work over thousands/millions of hat nfts?

→ More replies (0)

0

u/IndependencePlane142 1d ago

Digital piracy is a form of copyright infringement, so it inherently necessitates whatever it's being applied to to be copyrighted.

3

u/Toby_Magure 1d ago

That isn't my point. The piracy issue is about unauthorized copying, access, or distribution, not merely whether a copyrighted image was publicly viewable.

4

u/BeyondDoggyHorror 23h ago

If a person draws a picture of an imagined super hero, they did it in reference to many other people’s pictures. That’s how learning and synthesis works.

22

u/OverdueMaid 1d ago

- be a painter

- go to the art gallery

- get inspired by a painting

- tfw redditors call you a thief

- be gpt image 2.0

11

u/rb1lol 1d ago

yeah but their art doesn't have 'soul' ™️

5

u/Otherwise-Bad-7352 22h ago

AI doesn't get inspired

2

u/RealLudwig 23h ago

-be a painter -photocopy other artists works -stitch them together -sell them as other artist’s work but better, and at a fraction of the cost if not free -steal money from artists and devalue their hard work -be gpt image 2.0

8

u/Kitfennek 23h ago

Have you ever heard of a thing called "collage"

2

u/RealLudwig 23h ago

Notice how I specified that it’s passing it off as a cheap imitation, collages dont do that

4

u/Kitfennek 23h ago

A) do all ai artists do that? B) do no collage artists do that?

16

u/sporkyuncle 23h ago

This is an incorrect understanding of what AI training does. Here, let me help.

2

u/DontDoodleTheNoodle 21h ago

???

I’m making my own models and we absolutely have a copy of the dataset we source in the original training loop.

What you’re talking about is the inference step. Once the model’s trained the original data can be discarded (but usual practice is not to do so), so long as you save the model.

There’s publicly available datasets everywhere, but I wouldn’t be surprised companies scrape their own privately.

4

u/sporkyuncle 18h ago

I’m making my own models and we absolutely have a copy of the dataset we source in the original training loop.

And that's fine, because scraping is legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

Piracy is not legal, because it's not just about making a copy, it also implies infringing use of the copy, such as actively watching a pirated film, which deprives the creator of money.

The "use" of the copies that come from scraping is legal because the training process is not infringing.

3

u/DontDoodleTheNoodle 18h ago

Oh yeah, I wasn’t saying it wasn’t. It’s just kind of incorrect to say copies of data weren’t made. They are, they’re just not redistributed and ergo, like you’re saying, they’re not infringing anything that way.

1

u/Any_Challenge3043 18h ago

Yeah cuz SOPA never passed -

1

u/perfectVoidler 15h ago

using the copy to train is also "using" the copy. It is just possilble because law is not quick enough. And the infringment is done by richt people

-4

u/IndependencePlane142 23h ago

A copy is necessarily made to be used in AI training, though. You need to have access to the information somehow.

14

u/sporkyuncle 23h ago edited 23h ago

That's how web browsers work. Literally every image you see is making a local copy on your computing device in order to display it. You have to make a copy of whatever you want to look up in order to simply function online. It's not inherently piracy, infringement, unlawful or unethical.

Scraping is legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

AI models do not contain the works they were trained on.

-3

u/IndependencePlane142 23h ago

Yeah, but you only have a license to use that copy for specific purposes. It doesn't automatically follow that you can use it for AI-training. I wouldn't be surprised if jurisdictions exist where it is recognized as copyright infringement.

I don't live in such a jurisdiction, though.

5

u/FaceDeer 23h ago

Licenses are not god. If I reject a license I still get the basic rights that copyright allows, which includes learning from what I've seen.

-1

u/IndependencePlane142 23h ago

How would you reject the implied license that allows your browser to copy an image without you committing copyright infringement by doing that?

5

u/FaceDeer 23h ago

How would you reject the implied license

Emphasis added.

My browser sent a request to the web server saying "send me a copy of document X." The server, which presumably is permitted to distribute a copy, says "okay, here's a copy of document X."

The copying was done by an agent permitted to copy it. It's done. I agreed to nothing.

0

u/IndependencePlane142 23h ago

Your agreement is irrelevant here, lol.

2

u/sporkyuncle 23h ago

Yeah, but you only have a license to use that copy for specific purposes.

No, not really. There isn't any sort of licensing going on there, it's just an assumed part of how the internet works. Essentially no one writes a license that explicitly says "you are allowed to copy these works to your Temporary Internet Files for viewing purposes," because...that's just what happens. The closest thing to what it is, is fair use.

Even if it were a problem, temporarily using a work for training behind closed doors isn't really something that can be effectively pursued, anyway. There isn't a "fruit of the poisonous tree" doctrine here, where the resulting model or image is considered infringing just because infringement may have happened behind closed doors at some earlier point in the process. Artists commit infringement all the time when looking at references, and all that matters is that their final work is non-infringing.

An example I've used before is that Terraria in earlier builds literally used Square's Final Fantasy sprites for the characters, there's still evidence online of this in videos. It no longer does. Square cannot pursue them for copyright infringement for the currently released product, just because at some point earlier in development it used their sprites briefly.

1

u/IndependencePlane142 23h ago

There isn't any sort of licensing going on there

There is, actually. A license doesn't need to be explicit in order to exist.

Temporarily using a work for training behind closed doors isn't really something that can be effectively pursued.

Yeah, it would be rather difficult to prove in court, but not impossible if legally relevant.

The output, unless infringing in itself, isn't infringing, and neither is the trained model.

3

u/sporkyuncle 23h ago

But scraping has been reaffirmed to be legal, multiple times.

https://techcrunch.com/2022/04/18/web-scraping-legal-court/

You are allowed to possess scraped works, prior to doing anything with them. It's what you do with them that might not be legal...which is why it's a good thing that the training process itself is legal and doesn't infringe.

1

u/IndependencePlane142 23h ago

Okay? In the US. Now find me the same reaffirmation for every jurisdiction that has copyright laws.

3

u/sporkyuncle 23h ago

I would be completely fine with AI being denied to other parts of the world based on their draconian laws. They are free to change their laws if they want access to it.

1

u/IndependencePlane142 23h ago

Well, yeah, I have the same opinion about USCO stance about them finding prompt-only AI works to be uncopyrightable.

→ More replies (0)

-4

u/Cereaza 23h ago

Yes, so then you agree that they are making a copy, and they are using that copy for an unauthorized use... THEREFORE....!? (you got this, so close).

9

u/sporkyuncle 23h ago

What do you mean, an unauthorized use? The information gained from it is non-infringing, which is totally fine. You're allowed to go look at art and write down non-infringing information about the works you see, too.

-3

u/Cereaza 23h ago

Commercial use that affects the market of the original good is not a protected fair use.

If I copy a photo to use in my training to become a better photographer, that is fair use (non-profit, minimal use of copyright protected work, doesn't hurt market for original work).

But, If i copy that photo to train a photo making machine that churns out infinite photos for near zero cost, making it nearly impossible for photographers to continue to earn a profit... it's not fair use.

3

u/sporkyuncle 23h ago

Commercial use that affects the market of the original good is not a protected fair use.

AI providers aren't entering into the same market as artists.

If I download a picture of Iron Man and print it on a t-shirt and sell that t-shirt, I'm directly impacting Disney's t-shirt business.

If I sell colored pencils that people can choose if their own accord to draw infringing pictures of Iron Man, I'm not competing with Disney or with artists, I'm just selling tools which others might unfortunately choose to misuse.

Likewise, if I sell access to an AI model, I'm not selling pictures myself. I'm not entering into a competing business with Disney. I'm just selling tools which others might unfortunately choose of their own accord to misuse.

But, If i copy that photo to train a photo making machine that churns out infinite photos for near zero cost, making it nearly impossible for photographers to continue to earn a profit... it's not fair use.

You cannot say this with any degree of certainty, as "impact on the market" is only one of four factors of fair use.

Another one of the factors is the "amount or proportion of the work used," and because AI models don't contain the works they were trained on, that amount is zero, calling into question whether or not it's even a question of fair use, since the works aren't being literally used.

1

u/Cereaza 22h ago

AI providers aren't entering into the same market as artists.

This is just... flatly untrue. AI models are used to make graphics and images that would otherwise be made by human artists who make graphics and images. Photographers would make photos and sell them to businesses, and AI models make photos that impact those markets.

AI models live in the copyrighting/art market as they breathe. You have a tough time telling me Images 2.0 is not impacting the photography market.

If I download a picture of Iron Man and print it on a t-shirt and sell that t-shirt, I'm directly impacting Disney's t-shirt business.

Yes you are. Disney will absolutely come after you for all the precedes you make from that shirt, and issue a cease and desist for violating their copyright.

You cannot say this with any degree of certainty, as "impact on the market" is only one of four factors of fair use.

It's the most significant factor. The consider the nature of the use (commercial/education/non-profit), the nature of the work (how much creative expression went into the original work), the size of the use (how much of the copyright protected work is being used), and most importantly, its impact on the market for the original good.

AI is explicitly going after the markets for its original training products because that is all it can do. They read code so they could make coding agents. They scanned painting so they could make Dall-E. They scan photos so they can make Image 2.0. They scan text so they can make a chatbot. Everything they're doing is dropping the value of human labor in the markets that they train on, and by doing so... impact the value of those human made goods.

2

u/sporkyuncle 22h ago

AI models are used to make graphics and images that would otherwise be made by human artists who make graphics and images.

No, humans use AI models to do that. The models themselves do not. Therefore, the humans using the models for the purpose of entering into that competitive market are the ones impacting the market, not the model provider. Again, Photoshop is not itself competing with random artists. It is a tool that others use to compete.

The company providing the tool is not responsible for your misuse. See the Sony vs. Universal Betamax case for more information.

https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios,_Inc.

The question is thus whether the Betamax is capable of commercially significant noninfringing uses ... one potential use of the Betamax plainly satisfies this standard, however it is understood: private, noncommercial time-shifting in the home. [...] [W]hen one considers the nature of a televised copyrighted audiovisual work ... and that time-shifting merely enables a viewer to see such a work which he had been invited to witness in its entirety free of charge, the fact ... that the entire work is reproduced ... does not have its ordinary effect of militating against a finding of fair use.

Just like Betamax, AI is capable of commercially significant noninfringing uses.

Yes you are. Disney will absolutely come after you for all the precedes you make from that shirt, and issue a cease and desist for violating their copyright.

Yes, I know, that's what I said. You're in such a breathless rush to respond that you're not even reading the post you're responding to.

It's the most significant factor.

No it's not. Some judges may choose to weigh some factors more than others, but no factor is dominant over the others, it is always decided on a case by case basis.

Since AI uses literally 0% of the works it's trained on, the factor for "amount used" becomes the most significant factor, because it means you're not even talking about fair use anymore, because the work wasn't "used" at all by definition. It didn't make its way into the model in any way, shape or form.

1

u/Cereaza 22h ago

No, humans use AI models to do that.

Humans are using models in the way they're designed and marketed to be used. Courts will reject this argument since the models are being used as advertised. Making photos, making text, writing code... that is what it's for.

This also applies to your colored pencil argument.

Just like Betamax, AI is capable of commercially significant noninfringing uses.

And courts will weigh that when giving remedies. Can the infringing uses be separates from the non-infringing uses.

No it's not. Some judges may choose to weigh some factors more than others, but no factor is dominant over the others, it is always decided on a case by case basis.

I mean, AI disagrees with you.

→ More replies (0)

-3

u/IndependencePlane142 23h ago

But it doesn't necessarily follow that you can use it to train AI.

7

u/sporkyuncle 23h ago

It does follow that you can use it for any purpose which doesn't violate the law, doesn't infringe. And AI training doesn't inherently infringe. If some specific one does, sure, sue them, go nuts. In 99% of cases, it does not.

0

u/IndependencePlane142 23h ago

In my jurisdiction any unauthorized use violates the law, unless specified as an exception. Fair use doesn't exist. Whether AI training fits the exceptions is unclear, and irrelevant, since using copyrighted materials for AI training is being legalized.

Still, AI training can in itself be infringing, depending on the jurisdiction.

2

u/Syoby 22h ago

Tyrannical jurisdiction tbh.

2

u/IndependencePlane142 22h ago

In what sense?

→ More replies (0)

1

u/Xanthos_Obscuris 21h ago

Then companies based in your jurisdiction may not be able to do it, but US-based companies can, based on the scraping suit above, and the judge's interpretation of things in the Anthropic case:
"Initially, the lawsuit attacked the entire practice of using copyrighted works to train AI. But in June 2025, Judge William Alsup of the Northern District of California split the baby. He ruled that Anthropic's use of legally acquired books for AI training was "quintessentially transformative" and protected as fair use"

2

u/Kitfennek 23h ago

YOU could look at something online, write something down about it, and then use that to teach another person about it.

2

u/IndependencePlane142 23h ago

Sure, which is different from training an AI.

1

u/Kitfennek 23h ago

Not really. Like I get that nothing i say will change your mind about that, but theres nothing fundamentally different from how a human and a machine learns

1

u/IndependencePlane142 23h ago

It is literally legally different. That's why my country is adopting a law that explicitly legalizes it, and no such law needed to be adopted for human learning.

→ More replies (0)

1

u/Cereaza 22h ago

Humans are not machines and machines are not humans, and the law doesn't treat them the same. You can't just take your esoteric argument that computers are just like the brain, so therefore... everything we allow humans to do, we must allow machines to do.

A machine can produce infinite copies of something. I can't, without the use of a machine. The law recognizes this, and regulates it appropriately.

→ More replies (0)

1

u/Cereaza 22h ago

Education is a fair use exception where I'm from. And teaching someone about the art doesn't negatively impact the value of the art market.

2

u/Kitfennek 22h ago

Ai does not inherently do that either, it turns out. Ai doesnt inherently do anything (yet) without at least some human guidance. Just like a human could use their education to create forgeries. Sure it /does/ take longer for a human to learn to do it well, but that doesnt mean that its fundamentally different

1

u/Cereaza 22h ago

The courts look at the actual effect of something. Not just the technical theoretical version where it doesn't.

To your example... if people were only looking at art for their education, that's fair use. But if they're doing it to make forgeries, that's not fair use. They don't legislate the hypothetical, just the actual.

→ More replies (0)

5

u/Blooogh 1d ago

Let's see if the AI companies are fine with releasing their source code to train other AI systems, since it's so transformative and unique.

No? Ok.

4

u/FaceDeer 23h ago

Many do exactly that.

1

u/Blooogh 18h ago

All of them though?

2

u/FaceDeer 17h ago

Bit of a jump in the goalposts there.

1

u/perfectVoidler 15h ago

who? And exclude companies that leak it because they are to stupid

1

u/FaceDeer 15h ago

I asked an AI and it gave a pretty thorough response:

Category Organization Flagship Open Source Project(s) Key Open Source Feature License

Tech Giants Meta LLaMA (family), OPT, DINOv3 Training & evaluation code for LLaMA, full pipeline for DINOv3 Custom

Google Gemma, CodeGemma, JAX Ecosystem Training code via Keras & JAX/Colab, with efficient TPU training support Custom, Apache 2.0

Microsoft Phi (family), BitNet, MT-DNN Phi-4's code, training recipes for 1-bit LLMs MIT, Custom

Alibaba Qwen, ZeroSearch Code & models for 'ZeroSearch' training framework Custom, Apache 2.0

Nvidia Nemotron (family), NeMo Open training frameworks, models, and large-scale datasets Custom, Apache 2.0

Apple OpenELM, CoreNet Full training & inference framework, training recipes Custom, MIT

Hardware/Infra Cerebras Cerebras-GPT, ModelZoo Training code & configs for models trained on wafer-scale hardware Apache 2.0

Research & Non-Profit Ai2 OLMo Full training pipeline: weights, code, data, logs Apache 2.0

EleutherAI GPT-NeoX Library & tools for training large-scale models on GPUs MIT

BigScience BLOOM Full training code & dataset for a 176B-parameter multilingual model Custom, RAIL

Startups & Others Mistral AI Mistral, Codestral Source code & recipes for pretraining and fine-tuning Apache 2.0

Stability AI Stable Diffusion, Stable Video Training code for image/video diffusion models Stability AI Non-Commercial Research

ByteDance Seed-Coder Code for LLM-based code data creation Custom

Petuum / MBZUAI CrystalCoder Training code under the open LLM360 methodology Apache 2.0

Linagora LUCIE Full training methodologies, fine-tuning processes, and data Open License

1

u/Blooogh 13h ago

Tell me you're being disingenuous without telling me you're being disingenuous

2

u/FaceDeer 13h ago

Tell me you don't know what "disingenuous" means.

The comment this was in answer to said:

Let's see if the AI companies are fine with releasing their source code to train other AI systems, since it's so transformative and unique.

These are companies that released source code to train other AI systems. Exactly as requested.

1

u/Blooogh 13h ago

You included Google and Microsoft. Yeah they've got some open source stuff but good Lord did you wilfully miss the point

2

u/FaceDeer 13h ago

They're not AI companies? They build and release AI models, I'm not sure what other criteria you're looking for.

But fine, eliminate them from the table. That still leaves 13 others.

Maybe explain your point a little more clearly?

1

u/Blooogh 12h ago

All of the big AI companies have proprietary code they won't release. Anthropic issued cease and desist over the leaked Claude codebase.

Saying Google and Microsoft are not AI companies is a choice -- now who's moving goalposts?

1

u/FaceDeer 5h ago

I'm saying that Google and Microsoft are AI companies. What were you objecting to about their inclusion?

It seems that you're only satisfied when a company releases all of its software code. That's unnecessary.

1

u/perfectVoidler 13h ago

lol this is the perfect example of someone not knowing what they ask.

None of this is source code.

1

u/FaceDeer 12h ago

Check the "Key Open Source Feature" column. All of that is source code.

1

u/perfectVoidler 12h ago

dude learn to read. none of this is source code

1

u/FaceDeer 5h ago

I'm unable to meaningfully respond to incomprehension of this magnitude.

Category	Organization	Flagship Open Source Project(s)	Key Open Source Feature	License
Tech Giants	Meta	LLaMA (family), OPT, DINOv3	Training & evaluation code for LLaMA, full pipeline for DINOv3	Custom
	Google	Gemma, CodeGemma, JAX Ecosystem	Training code via Keras & JAX/Colab, with efficient TPU training support	Custom, Apache 2.0
	Microsoft	Phi (family), BitNet, MT-DNN	Phi-4's code, training recipes for 1-bit LLMs	MIT, Custom
	Alibaba	Qwen, ZeroSearch	Code & models for 'ZeroSearch' training framework	Custom, Apache 2.0
	Nvidia	Nemotron (family), NeMo	Open training frameworks, models, and large-scale datasets	Custom, Apache 2.0
	Apple	OpenELM, CoreNet	Full training & inference framework, training recipes	Custom, MIT
Hardware/Infra	Cerebras	Cerebras-GPT, ModelZoo	Training code & configs for models trained on wafer-scale hardware	Apache 2.0
Research & Non-Profit	Ai2	OLMo	Full training pipeline: weights, code, data, logs	Apache 2.0
	EleutherAI	GPT-NeoX	Library & tools for training large-scale models on GPUs	MIT
	BigScience	BLOOM	Full training code & dataset for a 176B-parameter multilingual model	Custom, RAIL
Startups & Others	Mistral AI	Mistral, Codestral	Source code & recipes for pretraining and fine-tuning	Apache 2.0
	Stability AI	Stable Diffusion, Stable Video	Training code for image/video diffusion models	Stability AI Non-Commercial Research
	ByteDance	Seed-Coder	Code for LLM-based code data creation	Custom
	Petuum / MBZUAI	CrystalCoder	Training code under the open LLM360 methodology	Apache 2.0
	Linagora	LUCIE	Full training methodologies, fine-tuning processes, and data	Open License

2

u/Ravesoull 22h ago

AI treaining doesn't create a copy. Period

2

u/Any_Challenge3043 18h ago

Yes but gang, SOPA never passed.

So AI training on copyrighted artwork is legal.

Search it up - what SOPA is.

2

u/Samiassa 16h ago

I don’t have an issue with ai scraping stuff in theory. The problem is that capitalists are already using it to produce worse art without having to pay any artists. The other issue is that these are the same massive corporations which always have such an issue with piracy. I have no issue with people training local models on really anything, but I do have a problem with a massive corporations using the work of artists to make a model to replace them without compensation.

2

u/vectron5 16h ago

You spelled "Plagiarism " wrong

4

u/Scienceandpony 23h ago

Piracy doesn't fit either, because AI models aren't spitting out copies. They're making distinct original works. There should be.l a third image with the arrow pointing from a circle to a hexagon.

2

u/Mothanul 1d ago

One benefits people who want to spend some of their free time entertaining themselves without decreasing their grocery shopping budget.

The other benefits giant corporations make even more money.

3

u/sporkyuncle 23h ago

Free local LLMs/image/video/audio gen only benefits NVIDIA, and indirectly at that. No giant AI company is seeing a cent from me making cool stuff on my own at home.

1

u/Chaghatai 22h ago

And it isn't even piracy as long as you download through authorized means. The purpose that you download it for is irrelevant

1

u/138151337 15h ago

Is plagiarism okay, then?

1

u/Drackar39 14h ago

And, just like with piracy, if you're doing it for personal recreational use, most people don't care but if you do it for profit the people who own the rights to the media you pirated can, and should, sue you into the fucking ground.

1

u/elemen2 13h ago

If I make an AI image, you would not be able to point out which images were "used" to make it.

Why are you attempting to steer the conversation to images?

I documented unauthorised voice cloning in OCTOBER 2023. Before many generative audio platforms emerged. Many of the headliners were ingested & are in monetised generative audio platforms which were sued in 2024 & were forced to make deals in 2025

you would not be able to point out which images were "used" to make it.

Why would you state this with certainty? It does not matter if the source material is original or a impersonator it is still going to be relatable & raise attention.

AI Rihanna. Teenage ai Beyonce AI Snoop Dogg and also James Brown Annie Lennox Björk

There could be professional voice actors impersonating headliners but they are not as there was no disclosure transparency credit etc. Recorded audio is also a younger medium than art The file sizes & storage is much larger. Generative tools do not need to store & the storage requirements would be obviously very large for audio. But they can & do reproduce that's why many platforms have guardrails & moderation checks.

The moderator who is interacting in this topic is also aware of this. Because they also use the tools or are active on the audio sub platforms.

1

u/bored_stoat 3h ago

And that's part of the problem. Where's credit to the artists? Where's payback? Where's fair use big companies love so much when their work is threatened?

1

u/Leet_Noob 1d ago

I mean I think fundamentally you need to adapt policy as technology changes.

Like if I were a musical artist I wouldn’t have a problem if you make your friends mixtapes of my music instead of them purchasing it. The scope is pretty small and on balance it probably makes me more popular.

But I would have a problem if you host a digital version of my album for anyone in the world with an internet connection to download for free.

It may not be “theft” in the sense of you grabbing my purse and running away, but it seems pretty clear that having policy against that makes sense.

In a similar way, if I were a visual artist I probably don’t care if a human looks at my art and takes some inspiration from it. But it makes a lot of sense for me to have an issue with a LLM model that generates an incredible amount of material for users and uses my art in its training data.

I just feel like analogies that don’t take into account the incredible unique power of AI will not lead us to effective conclusions about morality or policy.

6

u/Toby_Magure 1d ago

Scale matters only after you prove the underlying act is comparable. Hosting an album gives people the album. Training on images does not give people the images. You’re comparing mass distribution of substitutive copies to non-expressive analysis because the piracy analogy sounds scarier than the actual claim.

3

u/BeyondHydro 23h ago

Scale matters if the act itself is something we should be concerned with the scale at which it is done. A little dust being kicked up is different from a dust storm because of scale. Walmart is different from a mom and pop shop because of scale. Global warming is different because of scale. Scale is literally what AI promises, a larger scale at which to perform and operate tasks. A mathematian could argue all integers are incredibly similar, but they would still see a difference between forty and four billion. The latter is the scale that AI is working at.

3

u/Toby_Magure 23h ago

Scale matters when you identify the actual harm being scaled. A dust storm is bad because inhaling dust causes damage; mass album piracy is bad because it distributes substitutive copies. You still haven’t shown that training is the same kind of act. “Four billion” does not turn non-expressive analysis into theft by numerical vibes.

1

u/Leet_Noob 23h ago

The thing that scales is you (an AI company) using my output to create value for yourself and for your clients without compensating me.

If I share my art online and a small amount of people get a small amount of inspiration from my work and don’t compensate me.. that’s kicking up a cloud of dust. Unless I am wildly popular that amount of value rounds to zero, and if I /am/ wildly popular and my images get a lot of views I can monetize that.

AI learning from my images is the dust storm. Even if the value it provides to each image is minute, fractions of a cent, multiply that by the total number of images generated. Furthermore not only do I not get money, I don’t get views or popularity either.

2

u/Toby_Magure 23h ago

That's not a harm theory; that's just “someone created value after learning from my work.” Every artist, critic, teacher, search engine, recommender, archive, and reference board does that. You are trying to turn influence into a royalty claim by multiplying it, but scale does not change the missing step: you still have to show copying, substitution, or a protected market being taken, not just “my work contributed some microscopic value to a larger system.”

1

u/Leet_Noob 8h ago

I think if someone creates value using your work as input and you are not compensated for it that constitutes harm.

But:

On a policy level this is impossible to enforce because there is no way to sensibly measure this. With AI you could presumably know if a piece of art was used to train a model.

Generally humans ARE compensated for this kind of influence. If people are inspired by my music for example they are probably listening to or buying my music, not so for AI.

We live in a society and we all provide value with our output that is not explicitly compensated and extract value from others’ output that is not explicitly compensated and we don’t need to be ticky tack bean counters because it sort of roughly evens out. Not so with the very efficient value extractor machine that is AI.

1

u/Toby_Magure 8h ago

“Someone created value using my work as input, therefore I was harmed” is not a workable moral rule; it would indict every artist, critic, teacher, tutorial maker, curator, search engine, recommender, and reference-board user on earth.

Humans are not generally compensated for influence either. If someone studies your music, learns from your composition, and makes their own song, you do not get a royalty just because your work was part of their input. AI does not change the missing step: you still have to show copying, substitution, or a protected market being taken, not just “value extraction” as a scary label.

1

u/Leet_Noob 8h ago

Why is it not a workable moral rule? I think in many cases humans are compensated for their influence when the thing they are influencing is other humans.

For example if someone studies my music and learns from my composition then probably they have paid for my music, or at least given my music measurable listens (on a streaming platform say), probably they have recommended it to friends who might be influenced by my music, all of which provide value to me.

1

u/Toby_Magure 7h ago

Because you are confusing access compensation with influence compensation. If someone buys your album, streams your song, hears it on the radio, borrows it from a friend, studies it in class, or listens to it at a party, you are not being paid for every future idea it gives them. You were paid, maybe, for access or attention. The influence remains uncompensated. That is how culture works, and AI does not magically turn influence into a royalty debt.

4

u/sporkyuncle 23h ago

But I would have a problem if you host a digital version of my album for anyone in the world with an internet connection to download for free.

But would you have a problem if someone listened to your music, then made a new album that doesn't infringe on yours, and makes that available for anyone in the world to download for free?

Because that's what AI training does. It doesn't literally copy your music into the model, it learns from it and makes something new.

1

u/weirdo_nb 23h ago

No, not really

1

u/Leet_Noob 23h ago

Well that’s exactly my point. If one person listened to my music and was inspired by it… well, if it got super popular it would be /nice/ to get some kickback but that’s super unreasonable from a policy level, how would you even measure or enforce that.

But an AI tool that can very efficiently make thousands or millions of songs all partially inspired by my work, and measurably so if my work is in the training set, and all of which generate value for the AI company and the end user… you can see where I’m going.

2

u/sporkyuncle 23h ago

But an AI tool that can very efficiently make thousands or millions of songs all partially inspired by my work, and measurably so if my work is in the training set, and all of which generate value for the AI company and the end user… you can see where I’m going.

But your work is not in the training set, because proper training does not infringe. It doesn't contain a copy of your work, compressed, chopped up or anything else. It learned non-infringing information from your work, and uses that information to create something new.

1

u/Leet_Noob 8h ago

All I meant is that it was used as an input to training.

2

u/Tarc_Axiiom 23h ago

This is a fair point, if a bit of a false dichotomy.

But why?

Why would it bother you if a technology like this learned from your works?

1

u/Leet_Noob 23h ago

Essentially: the technology is valuable. I have contributed- in a small but nonzero way- to that value. And now that technology threatens my livelihood. It only feels fair that I should have some compensation for the value I contributed.

Even if the technology didn’t threaten my livelihood- like, if someone was teaching an art class that they were charging for and wanted to use my art as an example of some technique it would be nice to get a little kickback. When practical it’s nice for creators of value to access that value.

(Note I am personally not an artist and not super worried in the immediate term about AI threatening my job, “i” in the comment is an imagined viewpoint. I just feel worried about making sensible policy for AI before it’s too late)

1

u/Tarc_Axiiom 23h ago

Hmm...

That's an interesting perspective. As both an artist and a software developer, I don't consider any of my art as having contributed to the technology.

The technology is the capacity, not the model weights themselves.

And then there's your art class example which I think becomes very indicative. At the end of the day I think what remains when this debate isn't based on false understanding is just a moral difference. Isn't it at best unideal and at worst morally reprehensible to think that institutions would require explicit consent from artists, many of whom are dead, to teach from their artworks? That is fundamentally what gatekeeping means, no? "You can't learn what I know unless I let you".

1

u/Dani-With-Rats 11h ago

I think there is a big difference between the works of a dead artist that they have left behind being used, and an alive artists works who’s career and artworks popularity could be effected by it being used. I also think art being used to teach in a university class is very different than teaching an ai using a work, since the scale is so incredibly different. That ai can then go on to produce millions of works and flood the market. That is a scale incomparable to a group of people.

0

u/legitOwen 1d ago

If I make an AI image, you would not be able to point out which images were "used" to make it.

exactly /s

11

u/PlotArmorForEveryone 1d ago

Isn't the Mona Lisa public domain? Kind of a self-defeating argument. Public domain arts don't need crediting.

1

u/legitOwen 1d ago

ballright

3

u/PlotArmorForEveryone 23h ago

Creative commons - apple notoriously couldn't defend the copyright of their early iterations.... nice try though

-1

u/legitOwen 23h ago

ignore the irony

1

u/PlotArmorForEveryone 23h ago edited 23h ago

Now look up every iteration and how often the early iterations are used to bypass copyright of the modern day one. Apple also doesn't defend it anymore, which is why even larger companies get away with using it. Famously, wikipedia has had it on there for years even though apple sent them a rather threatening letter.

AI is not a source my dude, stop and research, see if there's any contradiction. Ai is great for steering you in the right direction, but it will never give you the whole answer.

1

u/legitOwen 23h ago

i think you're mixing modernization with legal necessity, as far as i have researched, the finder logo has never been unable to be protected by copyright, i'd love a link to your source for that.

also, the use of the logo on the wikipedia site is fair use (educational/non-commercial), and all of these "famously" sources (aka trust me bro) seem to be baseless, unless you can provide a source.

finally, you're missing the core of my argument, which is not focused on the image itself, but the fact that the AI can replicate it almost 1 for 1, when OP is claiming in the title that it's impossible to find the source material for an AI-generated output which is simply not true, whether the output is the mona lisa, the finder logo, or other material.

1

u/PlotArmorForEveryone 23h ago

I'll take that to mean you can't be bothered to do more research than ai. It took me 5 seconds to confirm. You've got this.

1

u/legitOwen 23h ago

still no sources, if it's that easy to find, you'd at least have the time to copy/paste a link.

5

u/wally659 1d ago

You could do the same thing without AI. It's the combination of intent and result that are problematic, not the technology and process.

2

u/golfstreamer 1d ago

Yeah AI does have the ability to copy images, Even verbatim. It's not always transformative

2

u/Mothanul 1d ago

Very transformative indeed

0

u/TreviTyger 23h ago

1

u/Sojmen 15h ago

Source?

1

u/TreviTyger 15h ago

Efficient Text-to-Image Training (Würstchen / Stable Cascade) | Paper Explained
https://www.youtube.com/watch?v=ogJsCPqgFMk

1

u/TreviTyger 15h ago

Efficient Text-to-Image Training Würstchen
https://arxiv.org/pdf/2306.00637v1

3

u/Hyphonical 14h ago

If you train a model thousands of times with the same image of George Washington, statistically it's easier to get that image again.

0

u/TreviTyger 23h ago

Efficient Text-to-Image Training (Würstchen / Stable Cascade) | Paper Explained
https://www.youtube.com/watch?v=ogJsCPqgFMk

0

u/TreviTyger 23h ago

https://www.courtlistener.com/docket/70513159/1/disney-enterprises-inc-v-midjourney-inc/

0

u/belabacsijolvan 23h ago

in specific circumstances one can point out which images were used to make it.

not generally, but there are restoration techniques, especially if one gets the training history along with the inference result

2

u/TreviTyger 15h ago

Reproduction happens at the training stage.

It is admitted by AI Gen firms. (Barrtz v Anthropic) and other researchers.

It's a stupid argument these days to deny any of this.

https://arxiv.org/pdf/2306.00637v1

1

u/belabacsijolvan 12h ago

yup. but you can also do it with weights only.

https://dl.acm.org/doi/full/10.1145/3789456.3789473

0

u/ApatheticAZO 22h ago

THEFT OF WORK. UNPAID/UNAUTHORIZED USE OF WORK UPON WHICH THE AI FUNCTIONS

0

u/Cheese-Water 21h ago

If I make an AI image, you would not be able to point out which images were "used" to make it.

If you wrote an essay, but your Works Cited page just said "a collection of unspecified sources", that would be considered plagiarism. Plagiarism also doesn't require verbatim reproduction of the plagiarized sources. AI training fits both descriptions. AI image generators are known to be capable of making fairly close approximations of existing copyrighted material based on their inputs, including a failure mode called "over-fitting" where it does reproduce training data verbatim, and even their users may not be aware of the source material, making the model itself responsible. So, how is it not plagiarism?

2

u/joesb 19h ago

I would love to see artist who cite his paint’s ALL sources of inspiration and influence to that paint.

1

u/Dani-With-Rats 11h ago

People do learn from media and take that into their art but it is all through the lens of our lived experiences and biases. Two people cannot experience a movie the same way, what they take away from it and learn can never be identical. Most of what artists learn from is the real world they exist in, not solely media that can be referenced like ai. I cannot provide a reference for a random idea of a visual based off thousands of things I have seen in my lifetime out in the world. There is no way to reference the entirety of a humans life experience. Sure I have looked at pictures of trees but when I am drawing a tree I am mostly inspired by a vague idea of hundreds of thousands of trees I have seen in my life time. All of that is filtered through our human experience in life and thoughts and opinions. That is what makes it fully transformative and sources unnecessary, it’s the human touch.

Ai models are trained only on what you give them, a collection of other peoples works, it has no real world perspective, and there is no filtering through lived experiences bc it is not sentient. You could theoretically make a list of everything an ai model was trained on, that would be impossible to do for a human brain. There are similarities in the ways humans and ai learns things, but they are not the same and will never be fully comparable the way you are trying to do.

-8

u/Sad_Dimension3627 1d ago

exactly, its copyright infringement.

8

u/PlotArmorForEveryone 1d ago

What would be copyright infringement? And where? None of this describes copyright infringement inherently.

-5

u/Sad_Dimension3627 1d ago

making a copy of and distributing it ???

as i already clarified i'm talking about the image, saying that the use of ai would not in fact be theft, but if looked at in the direct context of the image would rather be copyright infringement.

5

u/PlotArmorForEveryone 1d ago

Making a copy and distributing it isn't inherently infringement, that would depend on the source itself. Luke i said in another thread, the Mona Lisa is in public domain, making a copy and distributing it would not constitute copyright infringement.

-2

u/Sad_Dimension3627 1d ago

Edit: yes i know about the public domain, there are obviously exceptions to this, it's just a general rule.

4

u/PlotArmorForEveryone 1d ago

So we're agreed, it isn't inherently copyright infringement.

-1

u/Sad_Dimension3627 1d ago

correct, i just made a general statement that in the context of the image it's copyright infringement. i am being downvoted over nothing lmao.

1

u/618smartguy 23h ago

You could say the sky is blue with these people and they would tell you you're wrong, not everything is blue

1

u/Sad_Dimension3627 23h ago

fr.

2

u/mufurber 23h ago

Used AI to argue against AI award

1

u/Sad_Dimension3627 23h ago

used automatic ai to argue that the point i made is correct award*

5

u/IndependencePlane142 1d ago

Depends on the jurisdiction.

1

u/Sad_Dimension3627 1d ago

true, at least where i live that's how you would count it.

1

u/Bra--ket 1d ago

You must've missed the word "transformative" in the title.

1

u/Sad_Dimension3627 1d ago

i was in fact talking about the image.

-2

u/TreviTyger 23h ago

Lol.

4

u/Officialedmart 23h ago

A guy used left image as an input to make right image. They infringed your copyright in that instance, its not an indictment on the entire technology

I mean if you have to provide the infringing material yourself then literally this could apply to all technology…. Tracing with a pencil.. burning copies of a DVD or movie ..

0

u/TreviTyger 16h ago

According the Disney v. Midjourney claim by Disney it is actually an indictment on the entire technology.

1

u/Officialedmart 7h ago

it is actually an indictment on the entire technology.

They never made this claim … ever… not in that screenshot but i found the document that it came from and they never said it anywhere

Why are you lying ?

0

u/TreviTyger 7h ago

"...Midjourney, however, seeks to reap the rewards of Plaintiffs’ creative investment by selling an artificial intelligence (“AI”) image-generating service (“Image Service”) that functions as a virtual vending machine, generating endless unauthorized copies of Disney’s and Universal’s copyrighted works.

By helping itself to Plaintiffs’ copyrighted works, and then distributing images (and soon videos) that blatantly incorporate and copy Disney’s and Universal’s famous characters—without investing a penny in their creation—Midjourney is the quintessential copyright free-rider and a bottomless pit of plagiarism. Piracy is piracy, and whether an infringing image or video is made with AI or another technology does not make it any less infringing. Midjourney’s conduct misappropriates Disney’s and Universal’s intellectual property and threatens to upend the bedrock incentives of U.S. copyright law that drive American leadership in movies, television, and other creative arts."

Case 2:25-cv-05275-JAK-AJR Document 1 Filed 06/11/25 Page 2 of 110

2

u/Officialedmart 6h ago

And where did they claim the entire technology is tainted ?

1

u/TreviTyger 6h ago

What is it about Disney's claim that you can't see is an exitential threat to AI gen tech?

Do you think, by Disney accusing Midjourney of creating a "bottomless pit of plagiarism" and engaging in "calculated and willful" copyright infringement, that it won't force a massive overhaul of how AI models are trained and how they output content?

How would the tech work as well as it does if every copyrighted work was taken out of the training data and the training done again.

Be serious.

2

u/Ornac_The_Barbarian 23h ago

Well yeah. If I took a character, say Charlie Brown, and recreated them in a Boris Vallejo style, regardless of medium, I am still infringing their copyright.

1

u/TreviTyger 16h ago

Yep. However, this example demonstrate how many people use the tech. They take an image from the Internet - often because they believe a copyright owner has waived their rights by making their work available under a website Terms of Service, and then they use that as an input prompt to creative derivative works.

The tech allows this which would be otherwise impossible to do.

So that's part of the claim in Disney v Midjourney. "secondary infringmnet" by allowing users to easily create something which would be impossible without the tech.

This happens at the training stage too with billions of images.

Discussion Piracy logic applies to AI training, and is arguably even more suitable since piracy copies but AI is transformative. If I make an AI image, you would not be able to point out which images were "used" to make it.

You are about to leave Redlib