Tester av andre samtaleroboter – Didaktiske betraktninger

«No matter how much LLMs sound like a human, the truth is that they really don’t really understand language, despite being quite good at stringing words together. This proficiency in language can create the illusion of broader intelligence, leading to more elaborate responses. So basically, research shows what we suspected: LLMs are great at bullshitting you into thinking they know the answer. Many people buy this illusion because they either simply want to believe or because they just don’t use critical thinking—something that Microsoft’s researchers discovered in a new study looking at AI’s impact on cognitive functioning.»
– Jesus Diaz, 2025 (1)

«There are few areas where AI has seen more robust deployment than the field of software development. From «vibe» coding to GitHub Copilot to startups building quick-and-dirty applications with support from LLMs, AI is already deeply integrated. However, those claiming we’re mere months away from AI agents replacing most programmers should adjust their expectations because models aren’t good enough at the debugging part, and debugging occupies most of a developer’s time. That’s the suggestion of Microsoft Research, which built a new tool called debug-gym to test and improve how AI models can debug software.»
– Samuel Axon, 2025 (2)

I denne samlingen vil det komme tester av ulike samtaleroboter som ikke vil bli inkludert i sammenligningsgrunnlaget i første omgang. Testene beskrevet her er korte og ment som Work in progress gjennom 2025.

Leseliste

< Tilbake til samlesiden for testene