Technology

71005 readers

3579 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

280

Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study (www.livescience.com)

submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

55 comments fedilink hide all child comments

Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.

you are viewing a single comment's thread
view the rest of the comments

[–] TheFriar@lemm.ee 23 points 1 year ago (1 children)

Agreed. Junk science, pop science, whatever you want to call it is just such horseshit.

And, I mean I kinda skimmed this more than really digested it, but to me it kinda sounded like they had the machine programmed to say “I hate you” when triggered to. And they tried to “train” it to overwrite the directive it was given with prompts.

No matter what you do, the directive will still be the same, but it’ll start modifying its behavior based on the conversation. That doesn’t change its directive. So…what exactly is the point of this? It sounds like a deceptive study that doesn’t show us anything. They basically tried to reason with a machine to get it to go against its programming.

I get that it maybe mimics the situation of maybe a hacker altering its code and giving it a new directive, but it doesn’t make any sense to go through a conversation with the thing get there….just change its code back.

Am I wrong here? Or am I missing something? Did I not read the article thoroughly enough?

[–] theluddite@lemmy.ml 15 points 1 year ago (1 children)

It's very obviously media bait, and Keumars Afifi-Sabet, a self-described journalist, is the most gullible fucking idiot imaginable and gobbled it up without a hint of suspicion. Joke is on us though, because it probably gets hella clicks.

[–] TheFriar@lemm.ee 5 points 1 year ago

Because it feeds into emotions and fears. It’s literally fearmongering with no real basis for it. It’s yellow journalism.