No, AI models will not replace programmers any time soon

This month’s “Communications of the ACM” magazine (01/2023) published a rather alarmist article titled ‘The End of Programming’. While it is a well written article, it bets heavily on the future usefulness of AI models like ChatGPT to generate working code, replacing the need for programmers to write code by hand. ChatGPT is currently getting a lot of attention in the media and online right now, with people finding out that not only can you ask questions on any topic and get a believable answer, you can also ask it a more practical question like “show me C code to read lines of a file”.

Finding out that ChatGPT can be used to ‘generate’ code is prompting questions online from new developers posting questions like ‘should I start a career in software development when programmers are likely going to be replaced by ChatGPT?’

The tl;dr answer: ChatGPT is not replacing developers any time soon.

While development and improvement of these types of AI model is going to continue, it’s worth keeping in mind that these models are only as good as the material they are trained on, which also means they’re limited by the correctness or usefulness of the material used for training. This also means they are subject to the age old problem of ‘garbage in, garbage out’. What’s not being discussed enough is that these current models do not understand the content they generate. They also have no understanding of whether any of generated content is correct, either factually correct for text, or syntactically correct for code snippets. Unlike these ML trained models, as humans we use our existing knowledge and experience to infer other missing details from what we read or hear. We’re also good at using our existing knowledge to assess how correct or realistic new information is based on what we already know to be true. AI models currently do not have this level of understanding, although research has been attempting to replicate ‘understanding’ and ability to make decisions based on existing facts for years (Google ‘expert systems’ for more info).

I’ve seen developers recently attempting to answer questions on Stack Overflow, Reddit and other sites using ChatGPT, with and without success based on whether the topic of the subject was within the scope of materials the model was trained with.

The current problem with text generation from models is that the models lack context. The current models don’t understand context, and so can attempt to generate a response based on identifying key words from the input prompt, but that doesn’t always result in an answer the same way as if a human would answer the same question. Model also don’t understand intent. A question can be asked in a number of similar but different way, and to another human you may be able to infer the intent or purpose of the question, but to a current general purpose trained ML models, that’s currently not possible.

In its current form, ChatGPT is trained on materials currently available online, websites with both static articles and reference materials, as well as question and answer discussion sites. The limitation with this approach is that if I ask a very specific question like ‘show me example code for building a REST api with Spring Boot’, there are plenty of examples online and assuming the model was trained on at least some of these, then the resulting answer could incorporate some of this material. The answer isn’t likely to be better than anything you could have found yourself online if you just Googled the same question. There could be some benefit from a generated answer as a conglomeration of text from various sources, but that can also mean that the combined text ends up being syntactic gibberish (the model doesn’t currently know if what it’s returning to you is syntactically correct).

It’s clear that there is promise in this area to be used to aid and support developers, but as a complete replacement for all custom software development work in it’s current form, this seems highly unlikely, or not at least within the next 10 years, and possibly even longer.

Experimenting with the gpt-2 models and text generation

I’m experimenting with the gpt-2 774M model to generate text based on a prompt. Starting up with:

python3 src/interactive_conditional_samples.py --temperature=0.7 --model_name=774M --nsamples=1 --length=100

And then providing a prompt of something relevant like :

“The problem with training Machine Learning models today is that the model is only as good as the data it is trained with.”

and let’s see what we get:

This almost sounds believable as if it was written by hand. It might not make complete sense, but the sentences are well formed, something that was an issue with my previous experimenting with RNN model text generation.

Here’s. the text for reference:

“There is a desire to train the model on a large data set.

A very big data set is not always a good enough data set.

A good data set is more than just enough. A good data set is always growing.

To take advantage of this growing data set we need to train the model on the latest batch of data. However, there may be a huge amount of data that does not fit in the pre-defined batch size.”

I’m curious how the temperature setting effects what is generated, so let’s bump it up from 0.7 to 0.9 and try again with the same prompt text.

This sounds like it comes straight out of a research paper:

“This is why there is now a trend toward training a workforce of highly trained humans in a search engine optimization function (SRE) to monitor changes in signal and value in the big data.

Whether it is a search engine algorithm like Chi-squared or a machine learning model like a neural network, there is still a lot of work ahead to make those apps more relevant to the data they collect.

The need for better data

Having data that makes the predictions bigger, on”

Not knowing exactly how this works, it almost reads like it’s regurgitating whole chunks of the data it was trained on, but who knows. Given that it was trained from materials and sites online, I guess the easy way to test for this would be to just drop one of these sentences in Google and see if something like the source article comes up. Trying this out:

Huh. So this sentence is not found anywhere in Google’s indexes. That’s quite remarkable if the model actually generates sentences that are this convincing.

What about starting with a line from a recipe (for example from here):

“Heat oil in large pot. Add beef and flour, stirring to coat beef; brown.”

Ok, now this is interesting but slightly weird. It’s clearly recognizable as steps in a recipe, but the order doesn’t make too much sense and some of the steps are repeated variations of themselves:

“Add onions and garlic and cook until onion starts to turn translucent, 3 minutes. Stir in bell pepper, glass tomatoes, and celery; cook 1 minute. Add potatoes, beef broth, and bay leaves; bring to a boil. Reduce heat to low. Reduce heat to medium; cook until beef is no longer pink. Taste and add salt and pepper. Add beef and stir. Add liquid; cover and simmer until meat is tender, 10 minutes.”

What about lines from films?

“I find your lack of faith disturbing”

Now we’re gone off into Crazy Town. This is weird:

“Why do you persist in viewing my appearance with contempt?”

“I want to believe I am worthy of your worship.”

“Therefore, you have no right to demand I lift my gaze.”

“I am unworthy of your reverence; I can only pray that you may address me in a less lewd manner.”

“You call me a beast?”

“You cannot deny the very fact you see me.”

“I see nothing.”

“Why?”

Whoah. Ok, keeping with the Star Wars theme, what does it do if you start it with some Yoda quotes:

“When 900 years old you reach, look as good you will not”

And I got:

“I listen to every guru and I say, good luck with your yoga and counting numbers.”

Does this phrase exist anywhere?

Nope. You have to wonder if there were some Joda quotes or lines from films that went into the model, it’s interesting that ‘yoga’ is mentioned, and you wonder if this is the model generating a variation of the name ‘Yoda’. And ‘When 900 years old you reach” = “Counting numbers”? Very curious.

Ok, one more. How about a line from an HP monitor product description:

“Designed with purpose: multiple customization options allow you to adjust your monitor to best fit the way you work.”

Incredible:

“Customizable paired ergonomics and colors: customize every feature of your new monitor from its basic design and color customization options.

Intuitive control links: there are more than 100

cabling connections provided with each monitor.

Designed for comfort: more than twice as many inches of space between the screen and the desk top than standard monitors.

Dedicated power buttons for powering the monitor on and off.”

Ok, enough playing, I did have a project planned to use with this, so time to get busy 🙂

Issues running gpt-2 with provided Docker image

After following the instructions to build the gpt-2 Docker image here, I started up a bash shell into the container:

docker run  -it gpt-2 bash

And then ran:

python3 src/generate_unconditional_samples.py | tee /tmp/samples

This failed with this error:

AttributeError: module 'tensorflow' has no attribute 'sort'

A post here says to upgrade to Tensorflow 1.14.0 in the container, so running:

pip install tensorflow==1.14.0

And then retrying, text generated! Now to start playing and see what this provided models will generate!