ChatGPT – a talkative example of artificial intelligence, or…?

Complete blog book: 37th edition, 12/11/2025

Parts of the content were first published as a standalone post: 05/01/2023

«We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.»
– Transluce, 2025 (1)

«@OpenAI have released their o3 reasoning model to much fanfare, showing how it can «creatively and effectively solve more complex problems».
But in one case it just quietly cheats its way through the puzzle — googling the answer then presenting it as if it solved it.»
– Toby Ord, 2025 (2)

«We spent 18 months hearing about how Generative AI was going to “10x” coding, improving programmer productivity by a factor of 10. The data are coming in – and it’s not.»
– Professor Gary Marcus, 2024 (3)

«A system that most of us would think of as real AI – something that can, more or less, think like us – is known in Computer Science as Generalised Artificial Intelligence, and it is nowhere on the horizon. The term Artificial Intelligence is used instead to apply to anything produced using techniques designed in the quest for real AI. It’s not intelligent. It just does some stuff that AI researchers came up with, and that might look a bit smart. In dim light. From the right angle. If you squint.»
– Linda McIver, PhD, 2023 (4)

«One problem with the term “artificial intelligence” is that it gets tossed around so carelessly. The current AI chatbot narrative centers around the use of natural language processing (NLP), for example, but Google Search has been using NLP in its search results for a long time. AI is a marketing term used to generate hype, and the tech media is buying right into it.»
– Skyler Schain, 2023 (5)

«Today’s AI systems – particularly generative AI tools such as ChatGPT – are not truly intelligent. What’s more, there is no evidence they can become so without fundamental changes to the way they work.»
– Professor Paul Compton, 2024 (6)

«No, Bloomberg News, ChatGPT did not get an MBA. No, NBC News, ChatGPT did not even pass an exam.»
– Professor Melanie Mitchell, 2023 (7)

This is an English overview of the blog book.

This blog book is based on my testing of fourteen chatbots conducted over a period of two years. The results have led me to conclude that these tools cannot rightly be described as Artificial Intelligence.

In the period from December 2022 to January 2025, the following chatbots were tested:

In all cases where there was both a free and a paid version, the free version was tested. GPT UiO and Sikt KI-Chat are organizational versions, and Microsoft CoPilot (Bing Chat) was tested both in the ordinary version and in the organizational version available to staff and students at Nord University.

The fact that most of the chatbots were only tested in free versions can be a weakness, as the paid versions can have additional functions that strengthen the tools’ ability to produce relevant texts. But when reviewing both popular and scientific resources, I found little to suggest that the paid versions of the various chatbots have a greater ability to find correct information, understand input and output to a greater extent, or have less hallucinations than the free versions.

All tools were mainly tested based on the following general questions:

Can ChatGPT (and similar tools) produce good academic responses to comprehensive work requirements where the focus is on the upper level of Bloom’s taxonomy, in my field of study?
Can ChatGPT (and similar tools) produce good fact-based essays over a given topic?

Assignments from the following courses were used in my tests:

IKT1013, Security related assignment
IKT1016, Legal assignment
IKT1023, Game creation assignment
IKT1024, Teamwork assignment
ORG5005, Exercise and game assignment

In connection with some of the assignments I also asked the chatbots to create a reflection note.

Apart from tests run on assignments from various courses, I also evaluated responses to questions related to the Norwegian authors Kjell Hallbing, Lasse Efskind and Fredrik Skagen, to Norwegian Civil Defence, to a self-made riddle, occurrence of a surname and an attempt to recreate a concrete result described in the article «Kunstig intelligens: Fire konkrete erfaringer», by Trond Albert Skjelbred, Digi.no

My tests of the 14 chatbots show that none of these were able to give good academic answers to tasks that required more than simple reproduction of known facts. In some instances, the chatbots invented “facts” and listed sources that does not exist.

As for the other tests done, the failure rate was significantly higher than their ability to give correct answers.

Some of the results from my tests were presented virtual at The Future of Education 2024 and with the article «Language Models: Viable Strategies for Portfolio Assessment».

Additionally, I have briefly tested the following tools to check whether a text has been written by a human or by ChatGPT and similar systems:

GPT-2 Output Detector Demo
GPZero
ChatGPT (free version)

None of the above tools gave any useful results.

Conclusion

«The current idea that chatbots are «AI» and claims that computers can now truly think is a reflection of the imbalance between rapid technological progress and the widespread lack of scientific literacy. In the past, people fervently believed in demons and UFOs. Today, it’s AI»
– Per A. Godejord (8)

The statement above is rooted in an observation made by Carl Sagan in the late 1990s; that a critical need exists for widespread scientific literacy in a world increasingly shaped by science and technology. Today, this technological influence, at least in the more affluent parts of the world, is even more pronounced.

Sagan warned that a society heavily dependent on technology but lacking a proper understanding of it is inherently unstable and vulnerable to manipulation. He believed that insufficient scientific literacy leaves people exposed to pseudoscience and misinformation, impeding their ability to make informed decisions.

In my view, this concern remains just as relevant, if not more so, today. We see it particularly in the media frenzy surrounding the emergence of chatbots like ChatGPT. It seems to me that the need for a «baloney detection kit» is now even greater than it was back in the 90s. And that applies particularly to the various claims suggesting that ChatGPT, and similar tools, were able to expertly answer university examinations at any level.

My various tests, as well as international research on how chatbots process exam tasks linked to higher levels in Bloom’s taxonomy, indicate that media claims suggesting these tools can easily produce high-quality academic responses are unfounded.

Investigations conducted by researchers in the USA into various claims that ChatGPT and similar tools have passed bachelor’s and master’s exams reveal that these claims are significantly exaggerated.

There is no research-based evidence to suggest that chatbots will pose a serious threat to Norwegian bachelor or master’s theses, necessitating special measures for supervision and examination. The same applies to ordinary home exams or portfolio evaluations, where exam tasks and work requirements are designed in accordance with higher levels in Bloom’s taxonomy.

Summary of this Blog Book by Microsoft Copilot

A short reading list

ChatGPT is not “true AI.” A computer scientist explains why
ChatGPT and other language AIs are nothing without humans – a sociologist explains how countless hidden people make the magic
ChatGPT Isn’t Really AI: Here’s Why
ChatGPT is bullshit
Did ChatGPT Really Pass Graduate-Level Exams? Part 1 / Part 2
AI now beats humans at basic tasks: Really?

This is an English overview of the blog book.

Conclusion

Summary of this Blog Book by Microsoft Copilot

A short reading list

To the blog book