ChatGPT – a talkative example of artificial intelligence, or…?

«We spent 18 months hearing about how Generative AI was going to “10x” coding, improving programmer productivity by a factor of 10. The data are coming in – and it’s not.»
– Professor Gary Marcus, 2024 (1)

«A system that most of us would think of as real AI – something that can, more or less, think like us – is known in Computer Science as Generalised Artificial Intelligence, and it is nowhere on the horizon. The term Artificial Intelligence is used instead to apply to anything produced using techniques designed in the quest for real AI. It’s not intelligent. It just does some stuff that AI researchers came up with, and that might look a bit smart. In dim light. From the right angle. If you squint.»
– Linda McIver, PhD, 2023 (2)

«One problem with the term “artificial intelligence” is that it gets tossed around so carelessly. The current AI chatbot narrative centers around the use of natural language processing (NLP), for example, but Google Search has been using NLP in its search results for a long time. AI is a marketing term used to generate hype, and the tech media is buying right into it.»
– Skyler Schain, 2023 (3)

«Today’s AI systems – particularly generative AI tools such as ChatGPT – are not truly intelligent. What’s more, there is no evidence they can become so without fundamental changes to the way they work.»
– Professor Paul Compton, 2024 (4)

«No, Bloomberg News, ChatGPT did not get an MBA. No, NBC News, ChatGPT did not even pass an exam.»
– Professor Melanie Mitchell, 2023 (5)

This blog book is based on the testing of thirteen chatbots over a period of one and a half years. The results have led me to the conclusion that these tools cannot be called Artificial Intelligence.

In the period from December 2022 to September 2024, the following chatbots were tested:

  1. ChatGPT
  2. GPT UiO
  3. Sikt KI-Chat
  4.  GPT-3 Playground
  5.  Chatsonic
  6. Bing Chat (Copilot)
  7. Jenni
  8. Claude
  9. llama70b-v2-chat
  10. Perplexity.ai
  11. Gemini Pro
  12. ChatGPT 4 omni
  13. OpenAIs GPT o1 Preview

In all cases where there was both a free and a paid version, the free version was tested. GPT UiO and Sikt KI-Chat are organizational versions, and Microsoft CoPilot (Bing Chat) was tested both in the ordinary version and in the organizational version available to staff and students at Nord University.

The fact that most of the chatbots were only tested in free versions can be a weakness, as the paid versions can have additional functions that strengthen the tools’ ability to produce relevant texts. But when reviewing both popular and scientific resources, I found little to suggest that the paid versions of the various chatbots have a greater ability to find correct information, understand input and output to a greater extent, or have less hallucinations than the free versions.

All tools were mainly tested based on the following general questions:

  1. Can ChatGPT (and similar tools) produce good academic responses to comprehensive work requirements where the focus is on the upper level of Bloom’s taxonomy, in my field of study?
  2. Can ChatGPT (and similar tools) produce good fact-based essays over a given topic?

Assignments from the following courses were used in my tests:

  1. IKT1013, Security related assignment
  2. IKT1016, Legal assignment
  3. IKT1023, Game creation assignment
  4. IKT1024, Teamwork assignment
  5. ORG5005, Exercise and game assignment

In connection with some of the assignments I also asked the chatbots to create a reflection note.

Apart from tests run on assignments from various courses, I also evaluated responses to questions related to the Norwegian authors Kjell Hallbing and Lasse Efskind, to Norwegian Civil Defence, to a self-made riddle, occurrence of a surname and an attempt to recreate a concrete result described in the article «Kunstig intelligens: Fire konkrete erfaringer», by Trond Albert Skjelbred, Digi.no

My tests of the 13 chatbots show that none of these were able to give good academic answers to tasks that required more than simple reproduction of known facts. In some instances, the chatbots invented “facts” and listed sources that does not exist.

As for the other tests done, the failure rate was significantly higher than their ability to give correct answers.

Some of the results from my tests were presented virtual at The Future of Education 2024 and with the article «Language Models: Viable Strategies for Portfolio Assessment».

Additionally, I have briefly tested the following tools to check whether a text has been written by a human or by ChatGPT and similar systems:

  1. GPT-2 Output Detector Demo
  2. GPZero
  3. ChatGPT (free version)

None of the above tools gave any useful results.

Conclusion

My various tests, as well as international research on how chatbots process exam tasks linked to higher levels in Bloom’s taxonomy, indicate that media claims suggesting these tools can easily produce high-quality academic responses are unfounded.

Investigations conducted by researchers in the USA into various claims that ChatGPT and similar tools have passed bachelor’s and master’s exams reveal that these claims are significantly exaggerated.

There is no research-based evidence to suggest that chatbots will pose a serious threat to Norwegian bachelor or master’s theses, necessitating special measures for supervision and examination. The same applies to ordinary home exams or portfolio evaluations, where exam tasks and work requirements are designed in accordance with higher levels in Bloom’s taxonomy.

This blog book is in Norwegian only .

A short reading list

  1. ChatGPT is not “true AI.” A computer scientist explains why
  2. ChatGPT and other language AIs are nothing without humans – a sociologist explains how countless hidden people make the magic
  3. ChatGPT Isn’t Really AI: Here’s Why
  4. ChatGPT is bullshit
  5. Did ChatGPT Really Pass Graduate-Level Exams? Part 1 / Part 2
  6. AI now beats humans at basic tasks: Really?
To the blog book