Tester av andre samtaleroboter

«No matter how much LLMs sound like a human, the truth is that they really don’t really understand language, despite being quite good at stringing words together. This proficiency in language can create the illusion of broader intelligence, leading to more elaborate responses. So basically, research shows what we suspected: LLMs are great at bullshitting you into thinking they know the answer. Many people buy this illusion because they either simply want to believe or because they just don’t use critical thinking—something that Microsoft’s researchers discovered in a new study looking at AI’s impact on cognitive functioning
– Jesus Diaz, 2025 (1)

«There are few areas where AI has seen more robust deployment than the field of software development. From «vibe» coding to GitHub Copilot to startups building quick-and-dirty applications with support from LLMs, AI is already deeply integrated. However, those claiming we’re mere months away from AI agents replacing most programmers should adjust their expectations because models aren’t good enough at the debugging part, and debugging occupies most of a developer’s time. That’s the suggestion of Microsoft Research, which built a new tool called debug-gym to test and improve how AI models can debug software.»
– Samuel Axon, 2025 (2)

«Det første jeg prøvde, var å la ChatGPT analysere lønnsundersøkelsen vår. Den mest anonyme utgaven, vel og merke, for gud vet hva OpenAI bruker filene mine til. Og joda; den skrev Python-kode og analyserte fint, den. Men to problemer ble raskt klare:

  1. Jeg må uansett dobbeltsjekke alt ChatGPT forteller meg, på gamlemåten.
  2. ChatGPT forteller meg ingenting jeg ikke kunne funnet ut på gamlemåten – som jeg altså uansett må innom.

Dette har gjennomsyra det meste jeg har prøvd på.»
– Ole Petter Baugerød Stokke, 2024 (3)

«Participants using GPT-4 when trying to solve a simple business problem got the answer wrong 23% more often than the control group that did not have access to an LLM — because GPT-4 not only often got the answer wrong but provided such a persuasive rationale for its solution that users accepted it at face value.»
– Mikhail Burtsev, Martin Reeves, og Adam Job, 2023 (4)

I denne samlingen vil det komme tester av ulike samtaleroboter som ikke vil bli inkludert i sammenligningsgrunnlaget i første omgang. Testene beskrevet her er korte og ment som Work in progress gjennom 2025.

  1. Test av den kinesiske Deepseek R1
  2. Test av Grok
  3. Test av Kompas AI
  4. Test av NorskGPT

Leseliste

  1. Kompas AI: Deep Research & Report Generation for Comprehensive Insights
  2. Comparing Leading AI Deep Research Tools: ChatGPT, Google, Perplexity, Kompas AI, and Elicit
  3. Comparing Elicit, ChatGPT Deep Research, and Kompas AI: UX, Capabilities & Use Cases
  4. Grok, Gemini, ChatGPT and DeepSeek: Comparison and Applications in Conversational Artificial Intelligence
  5. A Systematic Review and Comprehensive Analysis of Pioneering AI Chatbot Models from Education to Healthcare: ChatGPT, Bard, Llama, Ernie and Grok
  6. Why Chatbots Are Not the Future

< Tilbake til samlesiden for testene