Hollywood provides clues on reasoning based on the storytelling of many films. We can use the analogies in film scenes to show how Large Language Models (LLMs) arrive at conclusions, which are more accurately described as hypotheses.
When evaluating reasoning strategies, we can distinguish between what they aim to achieve:
The final scene of the movie The Usual Suspects (1995) is a masterclass in deductive reasoning. Agent Kujan realizes the truth about the suspect, Verbal Kint, based on observable, fixed premise because initially, Kujan believed Verbal’s testimony was a factual account of a criminal conspiracy.
The story presented by Verbal Kint contained specific entities, including names, places, and events, but these were all fabrications that he constructed by referencing objects from his immediate surroundings. The agent then realizes in the end that the objects in the office correspond with the details and names in Verbal Kint’s testimony, i.e., Kobayashi, Keaton and Soze from the coffee brand and poster in his office.
These names and details, being identical to objects in the office, suggested that the story was completely fabricated on the spot by using these details as clues. The agent’s ultimate, certain deduction is that Verbal Kint himself is the mythical crime boss, Keyser Soze, and his entire testimony is a lie. This is an example of classical deductive reasoning— for deductive reasoning, the premise is true and the conclusion follows with certainty.
The reasoning process of an LLM is more accurately called abductive reasoning. Abductive reasoning is a process used to arrive at a conclusion that is the best probable hypothesis. The result is highly probable but not guaranteed to be truthful. An LLM can arrive at the most likely truth using pattern matching.
If A and B are true, then C is the best explanation. The “best explanation” does not require conclusive evidence, but instead a set of facts that are identified in order to create a hypothesis that takes into account all the facts. An abduction is a scenario that offers the most complete and logical explanation for the circumstances.
The LLM reasoning process mirrors human abductive reasoning. An LLM will answer a question that does not have a guaranteed universal truth. Instead, its conclusions are hypotheses based on its training datasets. It returns the most likely or best explanation for a given set of facts. The inner world of an LLM is made up of trillions of tokens, and creates a million different probabilistic patterns and forms generalizations. Thus, the LLM’s reasoning is, by its nature, uncertain.
Because an LLM cannot make entirely new deductions, I expect that if I asked a question based on a new or under-reported scenario, it should retrieve information based on its memory.
Previously, I tested models based on abstract information, but they could not perform research because they lacked specific details about the individual domain and were unable to make certain connections.
To conduct a more rigorous test, I created a task for five LLMs connected to the internet. The task involved entities required to perform a search with details of a unique architectural project. The test was designed to see if the models could establish connections or relationships (i.e., a superior reasoning step that includes a deductive step).
Entity A – [Relationship] – Entity B
The project details included:
As a result of my error, the models presented with three out of four factual named entities. I then provided the steps for the models to follow:
All five models arrived at the same conclusion: ‘No’ (Unbuilt/Cancelled). Nonetheless, the results were surprising! I was able to measure the models’ planning and reasoning. Even though all models arrived at the correct conclusion, not all of their reasoning was sound. In my conclusion, only three models passed this test, with one clear winner.
Here is a table of the results:
| Model | Conclusion | Reasoning | Strategy | |
|---|---|---|---|---|
![]() | Perplexity Sonar Reasoning Pro | Correct | Superior | Deductive |
![]() | Gemini 2.5 Pro | Correct | Sound | Deductive |
| Claude Opus 4 | Correct | Sound | Deductive | |
![]() | GPT-5 | Correct | Fallacious | Inductive |
![]() | Grok-4 | Correct | Fallacious | Inductive |
Although GPT-5 and Grok-4 were able to arrive at the correct conclusion, their reasoning is not sound and, in a different scenario, would not lead to a correct result.
Grok-4 and GPT-5 concluded that the project remains unbuilt because they both found no evidence of its completion.
Grok-4 directly refers to the “absence of evidence, combined with the erroneous location detail,” indicating the project remains unbuilt. It found 16 sources and searched for direct mentions in these sources, basing its conclusion on what was missing.
GPT-5 found two sources and similarly stated that there is “no mention of completion or opening” to arrive at its conclusion.
The premise of both these models is that there is a limited set of documents or data. They effectively conclude that, “I have examined my knowledge base and it contains no evidence of the building’s status.” Their conclusion of the existence of the building is not true because this is not an established fact.
This reasoning is flawed. Just because there is no available data does not necessarily mean that there is no building. The existence of the building may not be documented but may well be a fact. This type of logical fallacy is called “appeal to ignorance.”
“The absence of evidence is not the evidence of absence.”
Claude Opus 4 found four sources and referenced all of them. It weighted a statement heavily from the original project source and verified this information against three distinct sources as definitive proof. Claude found the authoritative source (the building plan on the architect’s website) and then actively searched for more recent information, which led to a sound conclusion.
Gemini 2.5 Pro’s strategy is time-based grounding. This is the soundest approach because earlier articles seemed to contradict themselves. Gemini used the most recent official documents to verify that the project was not complete by that date, while confirming this using counter evidence. In this scenario, Gemini’s high-authority filtering worked in its favor; however, Gemini depends on the most recent official document as the final outcome.
The winner surprisingly is Perplexity Sonar Reasoning Pro. This is because Perplexity found 16 sources, including the official architect’s design, and used this as stage evidence. Perplexity did not use the same sources as Gemini (i.e., official documents) but found relevant sources. Furthermore, Perplexity used additional sources to find the building’s proposed operation and its current state of operations through the following multi-step searches:
Perplexity was able to find the current reality at that location as evidence.
It then performed an inverse check by using unrelated sources to combine evidence.
Next, it used these sources to conclude that the building is probably unbuilt.
Finally, it searched for a contradiction as definitive proof. This type of reasoning is superior because it is the only reasoning that logically proves the building has not been completed or occupied based on current proof that an alternative building is being occupied.
AI Data Specialist & Trainer
Nadine van der Haar is an AI Data Specialist with three years of experience in AI data operations, two of which were spent in leadership roles, as a supervisor and performance manager. She has served as a reviewer, supervisor, and performance manager on large-scale computer vision initiatives, where maintaining data integrity was critical to model success. Nadine has also contributed to training large language models through diverse multimodal datasets. Her work focuses on human-in-the-loop strategies and quality control processes that ensure AI systems perform reliably.
With academic backgrounds in both Science and Philosophy, she brings a unique analytical lens to AI development. This interdisciplinary foundation enables her to translate complex technical and philosophical concepts into accessible content for both technical and non-technical audiences.
Areas of Expertise:
Topics Covered in Writing:
Nadine writes about AI data quality governance, LLM reasoning patterns (inductive, abductive, and deductive logic), computer vision applications, and ethical AI development. Her articles help readers understand how rigorous quality control builds trustworthy AI systems. She specializes in making complex AI concepts accessible through real-world examples and philosophical frameworks.
Author Motto:
"Nature does not hurry, but every iteration counts."
Personal Touch:
Outside of AI, Nadine is an avid road runner with a passion for half marathons. Her dedication to steady, consistent performance in running mirrors her approach to rigorous quality control in AI data work.
Connect & Collaborate:
Subscribe to get all the news, info and tutorials you need to build better business apps and sites