this post was submitted on 25 Jun 2025
85 points (97.8% liked)

Technology

72306 readers
8262 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] gedaliyah@lemmy.world 63 points 1 week ago* (last edited 1 week ago) (1 children)

I'm not pirating. I'm building my model.

[–] QuadratureSurfer@lemmy.world 21 points 1 week ago (1 children)

To anyone who is reading this comment without reading through the article. This ruling doesn't mean that it's okay to pirate for building a model. Anthropic will still need to go through trial for that:

But he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.

[–] Artisian@lemmy.world 2 points 1 week ago* (last edited 1 week ago) (1 children)

I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:

A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).

B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.

[–] xthexder@l.sw0.com 0 points 1 week ago (2 children)

C could still bankrupt the company depending on how trial goes. They pirated a lot of books.

[–] Xerxos@lemmy.ml 3 points 1 week ago* (last edited 1 week ago)

It might be that bad. Most 'damage' (as publishers see it) comes from distribution, not the download itself. Depending on how they acquired the books, it might be not be much of a problem.

[–] Artisian@lemmy.world 1 points 1 week ago (1 children)

As a civil matter, the publishing houses are more likely to get the full money if anthropic stays in business (and does well). So it might be bad, but I'm really skeptical about bankruptcy (and I'm not hearing anyone seriously floating it?)

[–] xthexder@l.sw0.com 1 points 1 week ago

Depending on the type of bankruptcy, the business can still operate, all their profits would just be going towards paying off their depts.

[–] the_q@lemmy.zip 24 points 1 week ago (2 children)

An 80 year old judge on their best day couldn't be trusted to make an informed decision. This guy was either bought or confused into his decision. Old people gotta go.

[–] FaceDeer@fedia.io 11 points 1 week ago (5 children)

Did you read the actual order? The detailed conclusions begin on page 9. What specific bits did he get wrong?

load more comments (5 replies)

Funny, there's a lot of people on lemmy itself (especially around dbzer0) who would agree with the judge wholeheartedly.

[–] hendrik@palaver.p3x.de 6 points 1 week ago* (last edited 1 week ago)

Previous discussion from yesterday about the same topic: https://lemmy.world/post/31923154

[–] AbouBenAdhem@lemmy.world 5 points 1 week ago (2 children)

IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.

[–] SculptusPoe@lemmy.world 5 points 1 week ago* (last edited 1 week ago) (2 children)

If you try to sell "the new adventures of Doctor Strange, Steven Strange and Magic Man." existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.

[–] devfuuu@lemmy.world 3 points 1 week ago (3 children)

That "freely" there really does a lot of hard work.

load more comments (3 replies)

I have a freely accessible document that I have a cc license for that states it is not to be used for commercial use. This is commercial use. Your policy would allow for that document to be used though since it is accessible. This kind of policy discourages me from easily sharing my works as others profit from my efforts and my works are more likely to be attributed to a corporate beast I want nothing to do with then to me.

I'm all for copyright reform and simpler copyright law, but these companies need to be held to standard copyright rules and not just made up modifications. I'm convinced a perfectly decent LLM could be built without violating copyrights.

I'd also be ok sharing works with a not for profit open source LLM and I think others might as well.

[–] Artisian@lemmy.world 1 points 1 week ago* (last edited 1 week ago) (1 children)

Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?

Would love to hear your thoughts on the ruling itself (it's linked by reuters).

[–] Cort@lemmy.world 1 points 1 week ago (1 children)

Orcs and dwarves (with a v) are creations of Tolkien, if the fantasy stories include them, it's a violation of copyright the same as including Mickey mouse.

My argument would have been to ask the ai for the bass line to Queen & David Bowie's Under Pressure. Then refer to that as a reproduction of copyrighted material. But then again, AI companies probably have better lawyers than vanilla ice.

[–] Artisian@lemmy.world 1 points 3 days ago* (last edited 3 days ago)

The students read Tolkien, then invent their own settings. The judge thinks this is similar to how claude works. I, nor I suspect the judge, meant that the students were reusing world building whole cloth.

[–] Fingolfinz@lemmy.world 1 points 1 week ago

Pirate everything!

[–] MyOpinion@lemmy.today 1 points 1 week ago

I hate AI with a fire that keeps we warm at night. That is all.

[–] BlameTheAntifa@lemmy.world 0 points 1 week ago* (last edited 1 week ago)

Anakin: “Judge backs AI firm over use of copyrighted books”
Padme: “But they’ll be held accountable when they reproduce parts of those works or compete with the work they were trained on, right?”
Anakin: “…”
Padme: “Right?”

[–] Grimy@lemmy.world -1 points 1 week ago (4 children)

80% of the book market is owned by 5 publishing houses.

They want to create a monopoly around AI and kill open source. The copyright industry is not our friend. This is a win, not a loss.

[–] OmegaMouse@pawb.social 19 points 1 week ago (1 children)

What, how is this a win? Three authors lost a lawsuit to an AI firm using their works.

[–] Grimy@lemmy.world 2 points 1 week ago

The lawsuit would not have benefitted their fellow authors but their publishing houses and the big ai companies.

[–] sentient_loom@sh.itjust.works 19 points 1 week ago (1 children)

How exactly does this benefit "us" ?

[–] gaylord_fartmaster@lemmy.world 4 points 1 week ago (1 children)

Because books are used to train both commercial and open source language models?

[–] sentient_loom@sh.itjust.works 0 points 1 week ago (1 children)

used to train both commercial

commercial training is, in this case, stealing people's work for commercial gain

and open source language models

so, uh, let us train open-source models on open-source text. There's so much of it that there's no need to steal.

?

I'm not sure why you added a question mark at the end of your statement.

[–] gaylord_fartmaster@lemmy.world 1 points 1 week ago (1 children)

I'm not sure why you added a question mark at the end of your statement.

I was questioning whether or not you would see that as a benefit. Clearly you don't.

Are you also against libraries letting people borrow books since those are also lost sales for the authors, or are you just a luddite?

[–] sentient_loom@sh.itjust.works 1 points 1 week ago

libraries letting people borrow books

This is so far from analogous that it's almost a nonsequitur.

are you just a luddite?

No, and you don't even believe such nonsense. You're grasping, ineffectively.

[–] hendrik@palaver.p3x.de 7 points 1 week ago (2 children)

Keep in mind this isn't about open-weight vs other AI models at all. This is about how training data can be collected and used.

[–] bob_omb_battlefield@sh.itjust.works 12 points 1 week ago (2 children)

If you aren't allowed to freely use data for training without a license, then the fear is that only large companies will own enough works or be able to afford licenses to train models.

[–] Nomad_Scry@lemmy.sdf.org 7 points 1 week ago (3 children)

If they can just steal a creator's work, how do they suppose creators will be able to afford continuing to be creators?

Right. They think we have enough original works that the machines can just make any new creations.

😠

[–] MudMan@fedia.io 6 points 1 week ago (1 children)

It is entirely possible that the entire construct of copyright just isn't fit to regulate this and the "right to train" or to avoid training needs to be formulated separately.

The maximalist, knee-jerk assumption that all AI training is copying is feeding into the interests of, ironically, a bunch of AI companies. That doesn't mean that actual authors and artists don't have an interest in regulating this space.

The big takeaway, in my book, is copyright is finally broken beyond all usability. Let's scrap it and start over with the media landscape we actually have, not the eighteenth century version of it.

[–] hendrik@palaver.p3x.de 5 points 1 week ago* (last edited 1 week ago)

I'm fairly certain this is the correct answer here. Also there is a seperation between judicative and legislative. It's the former which is involved, but we really need to bother the latter. It's the only way, unless we want to use 18th century tools on the current situation.

[–] bob_omb_battlefield@sh.itjust.works 4 points 1 week ago (2 children)

Yeah, I guess the debate is which is the lesser evil. I didn't make the original comment but I think this is what they were getting at.

[–] Nomad_Scry@lemmy.sdf.org 4 points 1 week ago

Absolutely. The current copyright system is terrible but an AI replacement of creators is worse.

[–] Grimy@lemmy.world 2 points 1 week ago* (last edited 1 week ago)

Yes precisely.

I don't see a situation where the actual content creators get paid.

We either get open source ai, or we get closed ai where the big ai companies and copyright companies make bank.

I think people are having huge knee jerk reactions and end up supporting companies like Disney, Universal Music and Google.

[–] Grimy@lemmy.world 1 points 1 week ago

The companies like record studio who already own all the copyrights aren't going to pay creators for something they already owned.

All the data has already been signed away. People are really optimistic about an industry that has consistently fucked everyone they interact with for money.

[–] hendrik@palaver.p3x.de 3 points 1 week ago* (last edited 1 week ago)

Yes. But then do something about it. Regulate the market. Or pass laws which address this. I don't really see why we should do something like this then, it still kind of contributes to the problem as free reign still advantages big companies.

(And we can write in law whatever we like. It doesn't need to be a stupid and simplistic solution. If you're concerned with big companies, just write they have to pay a lot and small companies don't. Or force everyone to open their models. That's all options which can be formulated as a new rule. And those would address the issue at hand.)

[–] Grimy@lemmy.world 2 points 1 week ago (1 children)

Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

This is about open weights vs closed weights.

[–] hendrik@palaver.p3x.de 1 points 1 week ago* (last edited 1 week ago) (1 children)

I agree that we need open-source and emancipate ourselves. The main issue I see is: The entire approach doesn't work. I'd like to give the internet as an example. It's meant to be very open, connect everyone and enable them to share information freely. It is set up to be a level playing field... Now look what that leads to. Trillion dollar mega-corporations, privacy issues everywhere and big data silos. That's what the approach promotes. I agree with the goal. But in my opinion the approach will turn out to lead to less open source and more control by rich companies. And that's not what we want.

Plus nobody even opens the walled gardes. Last time I looked, Reddit wanted money for data. Other big platforms aren't open either. And there's kind of a small war going on with the scrapers and crawlers and anti-measures. So it's not as if it's open as of now.

[–] Grimy@lemmy.world 1 points 1 week ago* (last edited 1 week ago)

A lot of our laws are indeed obsolete. I think the best solution would be to force copy left licenses on anything using public created data.

But I'll take the wild west we have now with no walls then any kind of copyright dystopia. Reddit did successfully sell it's data to Google for 60 million. Right now, you can legally scrape anything you want off reddit, it is an open garden in every sense of the word (even if they dont like it). It's a lot more legal then using pirated books, but Google still bet 60 million that copyright laws would swing broadly in their favor.

I think it's very foolhardy to even hint at a pro copyright stance right now. There is a very real chance of AI getting monopolized and this is how they will do it.

[–] SonOfAntenora@lemmy.world 5 points 1 week ago* (last edited 1 week ago) (1 children)

Cool than, try to do some torrenting out there and don't hide that. Tell us how it goes.

The rules don't change. This just means AI overlords can do it, not that you can do it too

[–] OfCourseNot@fedia.io 3 points 1 week ago (1 children)

I've been pirating since Napster, never have hidden shit. It's usually not a crime, except in America it seems, to download content, or even share it freely. What is a crime is to make a business distributing pirated content.

[–] SonOfAntenora@lemmy.world 2 points 1 week ago (1 children)

I know but you see what they're doing with ai, a small server used for piracy and sharing is punished, in some cases, worse than a theft. AI business are making bank (or are they? There is still no clear path to profitability) on troves pirated content. This (for small guys like us) is not going to change the situation. For instance, if we used the same dataset to train some AI in a garage and with no business or investor behind things would be different. We're at a stage where AI is quite literally to important to fail for somebody out there. I'd argue that AI is, in fact going to be shielded for this reason regardless of previous legal outcomes.

[–] hendrik@palaver.p3x.de 1 points 1 week ago

Agreed. And even if it were, it's always like this. Anthropic is a big company. They likely have millions available for good lawyers. While the small guy hasn't. So they're more able to just do stuff and do away with some legal restrictions. Or just pay a fine and that's pocket change for them. So big companies always have more options than the small guy.

load more comments
view more: next ›