Technology

68639 readers

3677 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

222

Reasoning failures highlighted by Apple research on LLMs (appleinsider.com)

submitted 6 months ago by Timely_Jellyfish_2077@programming.dev to c/technology@lemmy.world

59 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] lvxferre@mander.xyz 1 points 5 months ago* (last edited 5 months ago)

Here's a simple test showing lack of logic skills of LLM-based chatbots.

Pick some public figure (politician, celebrity, etc.), whose parents are known by name, but not themselves public figures.
Ask the bot of your choice "who is the [father|mother] of [public person]?", to check if the bot contains such piece of info.
If the bot contains such piece of info, start a new chat.
In the new chat, ask the opposite question - "who is the [son|daughter] of [parent mentioned in the previous answer]?". And watch the bot losing its shit.

I'll exemplify it with ChatGPT-4o (as provided by DDG) and Katy Perry (parents: Mary Christine and Maurice Hudson).

Note that step #3 is not optional. You must start a new chat; plenty bots are able to retrieve tokens from their previous output within the same chat, and that would stain the test.

Failure to consistently output correct information shows that those bots are unable to perform simple logic operations like "if A is the parent of B, then B is the child of A".

I'll also pre-emptively address some ad hoc idiocy that I've seen sealions lacking basic reading comprehension (i.e. the sort of people who claims that those systems are able to reason) using against this test:

"Ackshyually the bot is forgerring it and then reminring it. Just like hoominz" - cut off the crap.
"Ackshyually you wouldn't remember things from different conversations." - cut off the crap.
[Repeats the test while disingenuously = idiotically omitting step 3] - congrats for proving that there's a context window and nothing else, you muppet.
"You can't prove that it is not smart" - inversion of the burden of the proof. You can't prove that your mum didn't get syphilis by sharing a cactus-shaped dildo with Hitler.