Previously published on Nuclia.com. Nuclia is now Progress Agentic RAG.
The problem with intelligence is that it is very difficult to prove. We, as humans, are trained since our childhood to apply the “duck test” heuristic (if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck) to intelligence. In other words, we assume something or someone that looks intelligent is intelligent.
It's not a bad approach. It's actually the only one. As a matter of fact, it's the very principle of the Turing test: if a machine can fool a human into thinking it is a human, then it is intelligent.
Hence, AI is mostly about imitating intelligence.
UX Is the Way to Make the Duck Quack Like a Duck
That’s where UX comes into play. UX influences the perception the user has of any AI application. An API endpoint does not look intelligent to most people, but a chatbot interface might, even though both are essentially the same thing.
So let's put our UX glasses on and check what makes a chatbot look intelligent. Let's forget for a moment about the quality of the answers. As Alan Turing said, "The test results would not depend on the machine's ability to give correct answers to questions, only on how closely its answers resembled those a human would give." And yes, humans do give wrong answers quite often.
The first thing that comes to mind is the pace of the answer: it does not reply in one go. It pauses between the part of the answer, like it is hesitating, trying to find the best way to say something. Like us!!! We also pause, right? The language learning model (LLM) pauses not because it hesitates, but simply because it takes some CPU time to generate a complete sentence by calculating the most likely word one after the other.
But what’s the difference with our brains? We feel like these pauses are our way to connect to our inner self in order to deliver a valid idea, but unconsciously, isn’t our brain simply calculating the most likely word one after the other? Who knows? No matter what, what's important is that these pauses make the chatbot look intelligent :).
The second thing is the way the chatbot rephrases. Instead of just returning the piece of information the way it was in the original data, it produces a new sentence. Ok, if it rephrases, it means it understands it, right?
Actually, no. Here again, the reason it rephrases is because LLMs could be considered as compression algorithms; they compress text in a smaller version, but with loss, so they are not able to restore the initial wording. Our natural tendency for anthropomorphism makes us think that if it rephrases, it understands.
And there are tons of other details that make the chatbot look intelligent: the way it says “Hi” (did you notice none of the most used SQL databases never say “Hi”?), the way it handles emojis, the way it handles the user’s mood, etc.
We see the global user experience with generative AI (GenAI) is pretty good at building the illusion of intelligence, strenghtening the trust the user has in the AI.
But that’s a double-edged sword.
Stupid Is as Stupid Does
Keep in mind that intelligence and stupidity are the two sides of the same coin.
Stupidity only apply to things that are supposed to be intelligent. If a rock doesn’t understand a question, it’s not stupid, it’s just a rock. If a cat asks for going outside, you open the door, but then the cat doesn’t go outside, you might be more inclined to call the cat stupid (especially if he wakes you up at 3:00 am for that), because you have a certain expectation of intelligence from a cat.
It’s all about your expectations. The higher the expectation, the more likely you are to be disappointed.
We just said that the UX around generative AI tends to develop very high expectations from the user (partly because of user’s anthropomorphism and common misunderstandings about AI, and partly because, well, your UX team is good at its job). So the risk of disappointment is high.
And it will come, inevitably, because AI is not intelligent. It’s just quacking like a duck.
Let’s say you forgot to mention the current date in your system prompt. The user is having great time chatting with your AI, and then asks "What else happened since yesterday?" Here, it is very likely that the AI will produce a wrong answer, because it does not know the current date. Plus, it's not aware that it doesn’t know, so it will just make up something and look stupid.
The disappointment will be huge, because the user had a strong feeling they were chatting with something very close to a human, even though they know it is not (which is even worse). All users know that any random stupid software they use on a daily basis is able to filter by date and get results corresponding to yesterday (the most common SQL databases can do that). So, what they see is a human-like system not able to understand a simple question but also software that's unable to do a simple filtering task.
That’s where the UX challenge is.
How to Manage the User’s Expectations
When building the UX around a generative AI, you need to keep a fragile balance between making your app look trustworthy and not overpromising.
The first thing to do is to make sure the user understands the limitations of the system. You can do that by making the AI say "I don’t know" when it doesn’t know, or by making it say "I’m not sure, but I think…" when it is not sure.
Then, you should try to identify wrong answers. It can be done by monitoring the chat answers and use an evaluation model, like REMi. You can also try to detect wrong answer in real time, and then provide alternative solutions to the user. For example, you can point to the terms of the question that seem to be confusing for the LLM, or, in case of Retrieval-Augmetned Genration (RAG), explain that the retrieved context seems not relevant enough and possibly propose other questions that are more consistent with the scope of the current knowledge field.
It's critical to be transparent about the fact that the system is not magic. Rather, it's a clever assistant, but still just an assistant. The user is the one leading the process. Be sure to avoid a situation where the user blindly follows a foolish GPS that would drive them to a dead end.
Conclusion
As you can see, the user experience in a generative AI project will be based on both AI technics (mostly the prompt, but also evaluation frameworks, etc.) and UI interactions.
It is a challenging task as it implies a mix of skills and cross-disciplinary collaboration, but it's' also a very rewarding one and the key to make your AI project successful.
To see Nuclia in action, check out our demos or get started with a free trial.