this post was submitted on 21 Jul 2023
164 points (100.0% liked)
Technology
37705 readers
264 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
My understanding is this claim is basically entirely false. The tests done by these researchers had some glaring errors that when corrected, show gpt-4 is getting slightly better at math, if anything. See this video that describes some of the issues: https://youtu.be/YSokS2ivf7U
TL;DR The researchers gave new GPT questions from two different pools. It's no surprise they got worse answers.
You shouldn't need to be a prompt engineer just to get answers to math questions that are not blatantly wrong. I believe the prompts are included in the paper so that you don't have to guess if they were badly formatted.
The problem is they aren't comparing apples to apples. They asked each version of GPT a different pool of questions. (Edited my post to make this clear).
Once you ask them the same questions, it becomes clear that ChatGPT isn't getting worse at math, because it has been terrible all along.
I see. Thanks for clarifying