[–] BombOmOm@lemmy.world 199 points 10 months ago (16 children)

The difficult part of software development has always been the continuing support. Did the chatbot setup a versioning system, a build system, a backup system, a ticketing system, unit tests, and help docs for users. Did it get a conflicting request from two different customers and intelligently resolve them? Was it given a vague problem description that it then had to get on a call with the customer to figure out and hunt down what the customer actually wanted before devising/implementing a solution?

This is the expensive part of software development. Hiring an outsourced, low-tier programmer for almost nothing has always been possible, the low-tier programmer being slightly cheaper doesn't change the game in any meaningful way.

[–] Knusper@feddit.de 12 points 10 months ago (3 children)

Yeah, I'm already quite content, if I know upfront that our customer's goal does not violate the laws of physics.

Obviously, there's also devs who code more run-of-the-mill stuff, like yet another business webpage, but those are still coded anew (and not just copy-pasted), because customers have different and complex requirements. So, even those are still quite a bit more complex than designing just any Gomoku game.

[–] NoRodent@lemmy.world 8 points 10 months ago

I’m already quite content, if I know upfront that our customer’s goal does not violate the laws of physics.

Haha, this is so true and I don't even work in IT. For me there's bonus points if the customer's initial idea is solvable within Euclidean geometry.

load more comments (2 replies)

load more comments (15 replies)

[–] theluddite@lemmy.ml 137 points 10 months ago (8 children)

"I gave an LLM a wildly oversimplified version of a complex human task and it did pretty well"

For how long will we be forced to endure different versions of the same article?

The study said 86.66% of the generated software systems were "executed flawlessly."

Like I said yesterday, in a post celebrating how ChatGPT can do medical questions with less than 80% accuracy, that is trash. A company with absolute shit code still has virtually all of it "execute flawlessly." Whether or not code executes it not the bar by which we judge it.

Even if it were to hit 100%, which it does not, there's so much more to making things than this obviously oversimplified simulation of a tech company. Real engineering involves getting people in a room, managing stakeholders, navigating conflicting desires from different stakeholders, getting to know the human beings who need a problem solved, and so on.

LLMs are not capable of this kind of meaningful collaboration, despite all this hype.

[–] thantik@lemmy.world 32 points 10 months ago (1 children)

AI regularly hallucinates API endpoints that don't exist, functions that aren't part of that language, libraries that don't exist. There's no fucking way it did any of this bullshit. Like, yeah - it can probably do a mean autocomplete, but this is being pushed so hard because they want to drive wages down even harder. They want know-nothing middle-managers to point to this article and say "I can replace you with AI, get to work!"...that's the only purpose of this crap.

[–] Corkyskog@sh.itjust.works 12 points 10 months ago* (last edited 10 months ago)

I think there is less of a conspiracy, and it's just pushing investment. These AI articles sound exactly like when the internet was new and most people only had a cursory experience with it and people were pumping any company if they just said the word internet.

Now that "Blockchain" has been beaten to death, they need a new hype word to drive mindless investment.

[–] PlexSheep@feddit.de 19 points 10 months ago* (last edited 10 months ago) (2 children)

Thank you for writing this so I only have to ~~upvore~~ upvote you.

Edit: What the difference between one key can be

[–] Absolutemehperson@lemmy.world 21 points 10 months ago

I only have to upvore you

holy music stops

[–] nul@programming.dev 16 points 10 months ago (5 children)

I don't know what an upvore is and I don't want to know.

load more comments (5 replies)

[–] superfes@lemmy.world 9 points 10 months ago (2 children)

But they could replace CEOs from what I can tell.

load more comments (2 replies)

load more comments (5 replies)

[+] Melco@lemmy.world 111 points 10 months ago* (last edited 10 months ago) (4 children)

[deleted]

[–] Nougat@kbin.social 58 points 10 months ago (4 children)

I've tried to have ChatGPT help me out with some Powershell, and it consistently wanted me to use cmdlets which do not exist for on premise Exchange. I told it as much, it apologized, and wanted me to use cmdlets that don't exist at all.

Large Language Models are not Artificial Intelligence.

[–] amanneedsamaid@sopuli.xyz 17 points 10 months ago

Its glorified autocorrect trying to figure out how words string together coherently.

[–] dojan@lemmy.world 9 points 10 months ago (1 children)

I had a weird XAML error I didn’t quite get, and the LLM gave me BS solutions before giving me back my original code.

load more comments (1 replies)

[–] lilShalom@lemmy.basedcount.com 17 points 10 months ago

Ive had google bard supply me code to use with a google api url that doesnt exist.

load more comments (2 replies)

[–] flamekhan@lemmy.world 84 points 10 months ago (1 children)

"We asked a Chat Bot to solve a problem that already has a solution and it did ok."

[–] merc@sh.itjust.works 55 points 10 months ago (2 children)

to solve a problem that already has a solution

And whose solution was part of its training set...

[–] variaatio@sopuli.xyz 19 points 10 months ago* (last edited 10 months ago) (1 children)

half the time hallucinating something crazy in the in the mix.

Another funny: Yeah, it's perfect we just need to solve this small problem of it hallucinating.

Ahemm..... solving hallucinating is the "no it actually has to understand what it is doing" part aka the actual intelligence. The actually big and hard problem. The actual understanding of what it is asked to do and what solutions to that ask are sane, rational and workable. Understanding the problem and understanding the answer, excluding wrong answers. Actual analysis, understanding and intelligence.

[–] merc@sh.itjust.works 9 points 10 months ago (3 children)

Not only that, but the same variables that turn on "hallucination" are the ones that make it interesting.

By the very design of generative LLMs, the same knob that makes them unpredictable makes them invent "facts". If they're 100% predictable they're useless because they just regurgitate word for word something that was in the training data. But, as soon as they're not 100% predictable they generate word sequences in a way that humans interpret as lying or hallucinating.

So, you can't have a generative LLM that is both "creative" in that it comes up with a novel set of words, without also having "hallucinations".

load more comments (3 replies)

load more comments (1 replies)

[–] breadsmasher@lemmy.world 68 points 10 months ago (1 children)

It cost less than a dollar to run all those chatbots?

Doubt

load more comments (1 replies)

[–] doublejay1999@lemmy.world 67 points 10 months ago (1 children)

Plot twist - the AI just cut and paste from stack overflow like real devs.

[+] igorlogius@lemmy.world 64 points 10 months ago* (last edited 10 months ago) (4 children)

[deleted]

load more comments (4 replies)

[–] scarabic@lemmy.world 55 points 10 months ago

A test that doesn’t include a real commercial trial or A/B test with real human customers means nothing. Put their game in the App Store and tell us how it performs. We don’t care that it shat out code that compiled successfully. Did it produce something real and usable or just gibberish that passed 86% of its own internal unit tests, which were also gibberish?

[–] Pistcow@lemm.ee 42 points 10 months ago (3 children)

But did it work?

[–] ArbiterXero@lemmy.world 67 points 10 months ago (17 children)

As someone that uses ChatGPT daily for boilerplate code because it’s super helpful…

I call complete bullshite

The program here will be “hello world” or something like that.

[–] LazaroFilm@lemmy.world 27 points 10 months ago* (last edited 10 months ago) (1 children)

Absolutely I can create a code for your app.

void myApp(void) {
  // add the code for your app here
  return true;
}

You may need to change the code above to fit your needs. Make sure you replace the comment with the proper code for your app to work.

[–] whileloop@lemmy.world 19 points 10 months ago (1 children)

Couldn't even write a void method right, return true!

load more comments (1 replies)

[–] ipha@lemm.ee 21 points 10 months ago (1 children)

"hello world" as a service?

load more comments (1 replies)

load more comments (15 replies)

[–] KoboldCoterie@pawb.social 15 points 10 months ago (3 children)

The study said 86.66% of the generated software systems were "executed flawlessly."

But...

Nevertheless, the study isn't perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software. Still, the researchers said the findings "may potentially help junior programmers or engineers in the real world" down the line.

[–] scarabic@lemmy.world 24 points 10 months ago (1 children)

So… they failed 13.34% of their own unit tests?

[–] hayes_@sh.itjust.works 11 points 10 months ago (1 children)

That’s a B+! Fire all our engineers immediately.

some tech CEO, somewhere

load more comments (1 replies)

[–] scarabic@lemmy.world 11 points 10 months ago

And how long did it take to compose the “assignments?” Humans can work with less precise instructions than machines, usually, and improvise or solve problems along the way or at least sense when a problem should be flagged for escalation and review.

[+] m_r_butts@kbin.social 20 points 10 months ago (1 children)

[deleted]

[–] BombOmOm@lemmy.world 22 points 10 months ago (1 children)

The new role of a senior dev will be contract work slicing these Gordian knots.

The amount of money wasted building and destroying these knots is immeasurable. Getting things right the first time takes experienced individuals who know the product well and can anticipate future pain points. Nothing is as expensive as cheap code.

[–] kitonthenet@kbin.social 17 points 10 months ago* (last edited 10 months ago) (2 children)

At the designing stage, the CEO asked the CTO to "propose a concrete programming language" that would "satisfy the new user's demand," to which the CTO responded with Python. In turn, the CEO said, "Great!" and explained that the programming language's "simplicity and readability make it a popular choice for beginners and experienced developers alike."

I find it extremely funny that project managers are the ones chatbots have learned to immitate perfectly, they already were doing the robot’s work: saying impressive sounding things that are actually borderline gibberish

load more comments (2 replies)

[–] Knusper@feddit.de 16 points 10 months ago

the CTO responded with Python. In turn, the CEO said, "Great!" and explained that the programming language's "simplicity and readability make it a popular choice for beginners and experienced developers alike."

Yep, that does sound like my CEO.

[–] blazera@kbin.social 10 points 10 months ago (1 children)

Researchers, for example, tasked ChatDev to "design a basic Gomoku game," an abstract strategy board game also known as "Five in a Row."

What tech company is making Connect Four as their business model?

load more comments (1 replies)

[–] gencha@feddit.de 9 points 10 months ago (1 children)

What a load of bullshit. If you have a group of researchers provide "minimal human input" to a bunch of LLMs to produce a laughable program like tic-tac-toe, then please just STFU or at least don't tell us it cost $1. This doesn't even have the efficiency of a Google search. This AI hype needs to die quick

[–] atzanteol@sh.itjust.works 8 points 10 months ago (1 children)

This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce "halucinations" by providing each bot a more specific task.

The paper goes into more about this:

Similar to hallucinations encountered when using LLMs for natural language querying, directly generating entire software systems using LLMs can result in severe code hallucinations, such as incomplete implementation, missing dependencies, and undiscovered bugs. These hallucinations may stem from the lack of specificity in the task and the absence of cross-examination in decision- making. To address these limitations, as Figure 1 shows, we establish a virtual chat -powered software tech nology company – CHATDEV, which comprises of recruited agents from diverse social identities, such as chief officers, professional programmers, test engineers, and art designers. When presented with a task, the diverse agents at CHATDEV collaborate to develop a required software, including an executable system, environmental guidelines, and user manuals. This paradigm revolves around leveraging large language models as the core thinking component, enabling the agents to simulate the entire software development process, circumventing the need for additional model training and mitigating undesirable code hallucinations to some extent.

load more comments (1 replies)

Technology

Our Rules

Approved Bots

Doubt