this post was submitted on 17 May 2024
69 points (97.3% liked)

Reddit

17641 readers
260 users here now

News and Discussions about Reddit

Welcome to !reddit. This is a community for all news and discussions about Reddit.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules


Rule 1- No brigading.

**You may not encourage brigading any communities or subreddits in any way. **

YSKs are about self-improvement on how to do things.



Rule 2- No illegal or NSFW or gore content.

**No illegal or NSFW or gore content. **



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts.

Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.



Rule 7- You can't harass or disturb other members.

If you vocally harass or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



:::spoiler Rule 10- Majority of bots aren't allowed to participate here.

founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.world/post/15479755

OpenAI strikes Reddit deal to train its AI on your posts

you are viewing a single comment's thread
view the rest of the comments
[–] gila@lemm.ee 4 points 5 months ago (2 children)

Surely the use of user-deleted content as training data carries the same liabilities as reinstating it on the live site? I've checked my old content and it hasn't been reinstated. I'd assume such a dataset would inherently contain personal data protected by the right to erasure under GDPR, otherwise they'd use it for both purposes. If that is correct, regardless of how they filtered it, the data would be risky to use.

Perhaps the cumulative action of disenfranchised users could serve toward the result of both the devaluation of a dataset based on a future checkpoint, or reduction in average post quality leading to decreased popularity over time (if we assume content that is user-deleted en masse was useful, which I think is fair).

[–] Grimy@lemmy.world 4 points 5 months ago* (last edited 5 months ago) (1 children)

I think you need to make a special request to get that level of deletion that comes with gdpr. I'm not certain, I just remember other users specifically talking about how you need to send them an email so they have to comply.

I also wouldn't be surprised if their dataset is mostly stripped of user names to get around GDPR though I'm no expert.

All that to say I'd be very very surprised if they deleted comments in their dataset.

Very valid point of devaluating the user experience thought, especially when you take into account google searches. I'm sure they have already fallen off compared to a year ago where reddit would pop up half the time no matter what you searched.

[–] gila@lemm.ee 3 points 5 months ago* (last edited 5 months ago)

Well, that'd be the mechanism of how GDPR protections are actioned, yes; but leaving themselves open to these ramifications broadly would be risky. I don't think it'd satisfy 'compliance' to ignore GDPR except upon request. Perhaps the issues with it are even more significant when using it as training data, given they're investing compute and potentially needing to re-train down the track.

Based on my understanding; de-identifying the dataset wouldn't be sufficient to be in compliance. That's actually how it worked prior to it for the most part, but I know companies largely ended up just re-identifying data by cross-referencing multiple de-identified datasets. That nullification forming part of the basis for GDPR protections being as comprehensive as they are.

There'd almost certainly be actors who previously deleted their content that later seek to verify whether it was later used to train any public AI.

Definitely fair to say I'm making some assumptions, but essentially I think at a certain point trying to use user-deleted content as a value add just becomes riskier than it's worth for a public company

[–] FaceDeer@fedia.io 1 points 5 months ago

Surely the use of user-deleted content as training data carries the same liabilities as reinstating it on the live site?

Why would that be? It's not the same.

And what liabilities would there be for reinstating it on the live site, for that matter? Have there been any lawsuits?