this post was submitted on 12 Jun 2025
283 points (97.6% liked)

Technology

72350 readers
6237 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 
  • Disney and NBCUniversal have teamed up to sue Midjourney.
  • The companies allege that the platform used its copyright protected material to train its model and that users can generate content that infringes on Disney and Universal’s copyrighted material.
  • The scathing lawsuit requests that Midjourney be made to pay up for the damage it has caused the two companies.
you are viewing a single comment's thread
view the rest of the comments
[–] kibiz0r@midwest.social 26 points 3 weeks ago (8 children)

I say this as a massive AI critic: Disney does not have a legitimate grievance here.

AI training data is scraping. Scraping is — and must continue to be — fair use. As Cory Doctorow (fellow AI critic) says: Scraping against the wishes of the scraped is good, actually.

I want generative AI firms to get taken down. But I want them to be taken down for the right reasons.

Their products are toxic to communication and collaboration.

They are the embodiment of a pathology that sees humanity — what they might call inefficiency, disagreement, incoherence, emotionality, bias, chaos, disobedience — as a problem, and technology as the answer.

Dismantle them on the basis of what their poison does to public discourse, shared knowledge, connection to each other, mental well-being, fair competition, privacy, labor dignity, and personal identity.

Not because they didn’t pay the fucking Mickey Mouse toll.

[–] jerkface@lemmy.ca 10 points 3 weeks ago (6 children)

Are you saying that the mere action of scraping is fair use, or that absolutely anything you do with the data you scrape is also fair use?

[–] kibiz0r@midwest.social 3 points 3 weeks ago* (last edited 3 weeks ago) (4 children)

I'd say that scraping as a verb implies an element of intent. It's about compiling information about a body of work, not simply making a copy, and therefore if you can accurately call it "scraping" then it's always fair use. (Accuse me of "No True Scotsman" if you would like.)

But since it involves making a copy (even if only a temporary one) of licensed material, there's the potential that you're doing one thing with that copy which is fair use, and another thing with the copy that isn't fair use.

Take archive.org for example:

It doesn't only contain information about the work, but also a copy (or copies, plural) of the work itself. You could argue (and many have) that archive.org only claims to be about preserving an accurate history of a piece of content, but functionally mostly serves as a way to distribute unlicensed copies of that content.

I don't personally think that's a justified accusation, because I think they do everything in their power to be as fair as possible, and there's a massive public benefit to having a service like this. But it does illustrate how you could easily have a scenario where the stated purpose is fair use but the actual implementation is not, and the infringing material was "scraped" in the first place.

But in the case of gen AI, I think it's pretty clear that the residual data from the source content is much closer to a linguistic analysis than to an internet archive. So it's firmly in the fair use category, in my opinion.

Edit: And to be clear, when I say it's fair use, I only mean in the strict sense of following copyright law. I don't mean that it is (or should be) clear of all other legal considerations.

[–] jerkface@lemmy.ca 3 points 3 weeks ago

I think the distinction between data acquisition and data application is important. Consider the parallel of photography; you are legally and ethically entitled to take a photo of anything that you can see from public (ie, you can "scrape" it). But that doesn't mean that you can do anything you want with those photos. Distinguishing them makes the scraping part a lot less muddy.

load more comments (3 replies)
load more comments (4 replies)
load more comments (5 replies)