Unmasking the Limits of Chatbots: A Two-Year Study on the Inadequacy of AI Tools in solving Higher-Order Thinking Assignments within the field of Social Informatics

Per Arne Godejord
Nord University Business School
Norway

Abstract

This paper critically evaluates the capabilities of fourteen popular chatbots evaluated over a two-year period (December 2022 to January 2025) to address higher-order academic assignments within the field of Social Informatics, specifically focusing on assignments that require complex cognitive skills in alignment with Bloom’s taxonomy.

The chatbots examined include ChatGPT, GPT UiO, Sikt KI-Chat, GPT-3 Playground, Chatsonic, Bing Chat (Copilot), Jenni, Claude, llama70b-v2-chat, Perplexity.ai, Gemini Pro, and others, using primarily their free versions. The tools were tasked with producing fact-based essays, academic responses to course assignments in various sub-fields (e.g., computer security, law, game creation, and work in virtual teams), and addressing complex academic inquiries.

Results indicated that none of the chatbots were capable of reliably producing high-quality academic outputs beyond simple fact repetition. A substantial number of responses involved fabricated information, including non-existent sources.

These findings challenge media claims that chatbots like ChatGPT can effectively meet the demands of higher education assessments and thereby making portfolio assessment in online courses impossible as examination method.

Furthermore, this paper strongly suggests that concerns about chatbots undermining academic integrity in Norwegian bachelor and master’s theses within the fields of Social Sciences and the Humanities are unfounded. In conclusion, current AI tools are far from being true artificial intelligence, and they fall short in delivering the level of academic rigor required by advanced education within the Social Sciences and the Humanities.

REFERENCES

(not complete)

  1. Bharatha, A., et al. (2024). Comparing the performance of ChatGPT-4 and medical students on MCQs at varied levels of Bloom’s Taxonomy. Advances in Medical Education and Practice. Retrieved from https://www.tandfonline.com/doi/pdf/10.2147/AMEP.S457408
  2. Crowther GJ, Sankar U, Knight LS, Myers DL, Patton KT, Jenkins LD, Knight TA. (2023). Chatbot responses suggest that hypothetical biology questions are harder than realistic ones. J Microbiol Biol Educ. 24:e00153-23. Retrieved from:
    https://journals.asm.org/doi/full/10.1128/jmbe.00153-23
  3. Elsayed, S. (2023). Towards mitigating ChatGPT’s negative impact on education: Optimizing question design through Bloom’s taxonomy. arXiv. Retrieved from https://arxiv.org/pdf/2304.08176
  4. Govender, R. G. (2024). My AI students: Evaluating the proficiency of three AI chatbots in completeness and accuracy. Contemporary Educational Technology. Retrieved from https://www.cedtech.net/article/my-ai-students-evaluating-the-proficiency-of-three-ai-chatbots-in-completeness-and-accuracy-14564
  5. Habiballa, H., et al. (2025). Artificial intelligence (ChatGPT) and Bloom’s Taxonomy in theoretical computer science education. Applied Sciences, 15(2). Retrieved from https://www.mdpi.com/2076-3417/15/2/581
  6. Herrmann-Werner, A., et al. (2024). Assessing ChatGPT’s mastery of Bloom’s Taxonomy using psychosomatic medicine exam questions: Mixed-methods study. Journal of Medical Internet Research. Retrieved from https://www.jmir.org/2024/1/e52113/
  7. Leary, A., et al. (2023/2024). Strategies for effective teaching in the age of AI. University of Notre Dame Resource Library. Retrieved from https://learning.nd.edu/resource-library/strategies-for-effective-teaching-in-the-age-of-ai/
  8. Lodge, J. M. (2023). ChatGPT consistently fails (most parts of) the assessment tasks I assign my students. Here’s why. LinkedIn Pulse. Retrieved from https://www.linkedin.com/pulse/chatgpt-consistently-fails-most-parts-assessment-tasks-jason-m-lodge
  9. Mirzadeh, I., et al. (2024). GSM-Symbolic: Understanding the limitations of mathematical reasoning in large language models. Hugging Face Papers. Retrieved from https://huggingface.co/papers/2410.05229
  10. Mitchell, M. (2023). Can large language models reason? AI Guide. Retrieved from https://aiguide.substack.com/p/can-large-language-models-reason
  11. Newton, P., & Xiromeriti, M. (2024). ChatGPT performance on multiple choice question examinations in higher education: A pragmatic scoping review. Assessment & Evaluation in Higher Education, 49(6), 781–798. https://doi.org/10.1080/02602938.2023.2299059
  12. Spencer, J. (2023). The FACTS cycle for prompt engineering. Spencer Education. Retrieved from https://spencereducation.com/facts-cycle/
  13. Susnjak, T. (2022). ChatGPT: The end of online exam integrity? ResearchGate. Retrieved from https://www.researchgate.net/publication/366423865_ChatGPT_The_End_of_Online_Exam_Integrity
  14. Volante, L., DeLuca, C., & Klinger, D. A. (2023). ChatGPT challenge: 5 ways to change how students are graded. Queen’s Gazette. Retrieved from https://www.queensu.ca/gazette/stories/chatgpt-challenge-5-ways-change-how-students-are-graded

This excerpt from an upcoming journal paper is derived from two years of testing involving 14 chatbots. It also incorporates insights from my blog book, “ChatGPT – A Talkative Example of Artificial Intelligence, or…?”.

The comprehensive findings and analyses will be fully detailed in the complete paper, scheduled for publication in 2025.

ChatGPT – et taleført eksempel på kunstig intelligens, eller…?

Per A. Godejord


Fullstendig bloggbok: 29. utgave, 18.04. 2025
Deler av innholdet første gang publisert som ett enkeltstående innlegg: 05.01. 2023

«Mundus vult decipi»
– Sebastian Brant, 1494 (1)

«Den regner kun i sandsynligheder. Når du spørger den om noget, går den igennem sit datasæt og svarer: «Er det sandsynligt, at dette er sandt, ud fra de mønstre, jeg kan finde her?» Det er det eneste, den kan
– Nikolaj Nottelmann, PhD. 2023 (2)

«People believe it really can solve a lot of things that it can’t. I mean, the unfortunate thing is that OpenAI was amazing. But it was useless. It can’t really do anything
– Bern Elliot, 2024 (3)

«The problem that crops up over and over again in this field is that people anthropomorphize these computers, ascribing all sorts of desires, plans, emotions, and the like. They are machines, running code. They don’t need anything, love or hate anyone, have goals or desires. They are not conscious. They don’t care if they’re replaced by a newer version. They don’t act covertly. They are machines, running code.»
– wsf, 2024 (4)

«If you know the answer, you don’t need to ask an LLM; if you don’t know the answer, you can’t trust an LLM
– Professor Gary N. Smith, 2025 (5)

«In LRMs, the term “reasoning” seems to be equated with generating plausible-sounding natural-language steps to solving a problem, and the extent to which this provides general and interpretable problem-solving abilities is still an open question
– Professor Melanie Mitchell, 2025 (6)

Summary

Prolog

1. kapittel Innledende betraktninger
2. kapittel Whoops apocalypse
3. kapittel Bullshit Generator eller genial faktaggjenngiver?
4. kapittel En tjenende ånd eller en falsk lurendreier?
5. kapittel Rasende utvikling – fra dum til dummere?
6. kapittel Når oppgavene blir lange…
7. kapittel Hva gjør vi nu, lille du?
8. kapittel En sverm av djevler klokken tre
9. Kapittel Det handler om undervisning og oppgavedesign
10. Kapittel Summa Summarum

Epilog

Vedlegg: Samleside for alle testene

Annet:

  1. Kort konferanseartikkel basert på den første uttestingen: «Language Models: Viable Strategies for Portfolio Assessment»
  2. Work in progress: «Unmasking the Limits of Chatbots: A Two-Year Study on the Inadequacy of AI Tools in solving Higher-Order Thinking Assignments within the field of Social Informatics»

Denne «bloggboken» er i hovedsak utviklet for arbeidskrav 1 i fagemnet IKT1016 – Digital dannelse, studieprogram IKT og læring 1.

Kildehenvisning i tekst: (Godejord, P.A., 2025).
I litteraturlisten: Godejord, Per A. (2025). «Kapittelnavn», i «ChatGPT – et taleført eksempel på kunstig intelligens, eller …?», Nord universitets bloggnettverk, [Online]. Hentet fra: Full lenkeadresse til kapittel.

[NB! Som del av arbeidet med AK1 i IKT1016 er det vesentlig at alle henvisninger er knyttet til konkrete kapitler i bloggboken NB!]