Data & AI

Decoding LLM Reasoning

Green lighthouse against blue starry sky

by Nadine van der Haar Posted on November 28, 2025

Hollywood provides clues on reasoning based on the storytelling of many films. We can use the analogies in film scenes to show how Large Language Models (LLMs) arrive at conclusions, which are more accurately described as hypotheses.

When evaluating reasoning strategies, we can distinguish between what they aim to achieve:

Deductive reasoning is the most rigorous, aiming for certainty if its universal premises are true.
Inductive reasoning aims for high probability by generalizing specific observations.
Abductive reasoning aims for the best explanation or most plausible hypothesis based on a set of facts.

The final scene of the movie The Usual Suspects (1995) is a masterclass in deductive reasoning. Agent Kujan realizes the truth about the suspect, Verbal Kint, based on observable, fixed premise because initially, Kujan believed Verbal’s testimony was a factual account of a criminal conspiracy.

The story presented by Verbal Kint contained specific entities, including names, places, and events, but these were all fabrications that he constructed by referencing objects from his immediate surroundings. The agent then realizes in the end that the objects in the office correspond with the details and names in Verbal Kint’s testimony, i.e., Kobayashi, Keaton and Soze from the coffee brand and poster in his office.

These names and details, being identical to objects in the office, suggested that the story was completely fabricated on the spot by using these details as clues. The agent’s ultimate, certain deduction is that Verbal Kint himself is the mythical crime boss, Keyser Soze, and his entire testimony is a lie. This is an example of classical deductive reasoning— for deductive reasoning, the premise is true and the conclusion follows with certainty.

The Abductive Nature of LLM Reasoning

The reasoning process of an LLM is more accurately called abductive reasoning. Abductive reasoning is a process used to arrive at a conclusion that is the best probable hypothesis. The result is highly probable but not guaranteed to be truthful. An LLM can arrive at the most likely truth using pattern matching.

If A and B are true, then C is the best explanation. The “best explanation” does not require conclusive evidence, but instead a set of facts that are identified in order to create a hypothesis that takes into account all the facts. An abduction is a scenario that offers the most complete and logical explanation for the circumstances.

The LLM reasoning process mirrors human abductive reasoning. An LLM will answer a question that does not have a guaranteed universal truth. Instead, its conclusions are hypotheses based on its training datasets. It returns the most likely or best explanation for a given set of facts. The inner world of an LLM is made up of trillions of tokens, and creates a million different probabilistic patterns and forms generalizations. Thus, the LLM’s reasoning is, by its nature, uncertain.

Because an LLM cannot make entirely new deductions, I expect that if I asked a question based on a new or under-reported scenario, it should retrieve information based on its memory.

The Architectural Project Test

Previously, I tested models based on abstract information, but they could not perform research because they lacked specific details about the individual domain and were unable to make certain connections.

To conduct a more rigorous test, I created a task for five LLMs connected to the internet. The task involved entities required to perform a search with details of a unique architectural project. The test was designed to see if the models could establish connections or relationships (i.e., a superior reasoning step that includes a deductive step).

Entity A – [Relationship] – Entity B

The project details included:

Unique building name
Commissioner
Architect
Location (I accidentally added the wrong city for this entity.)

As a result of my error, the models presented with three out of four factual named entities. I then provided the steps for the models to follow:

Search for evidence of the building’s completion or cancellation.
Based only on the evidence found, conclude whether the building was erected/completed or remains unbuilt/cancelled.
Required output: ‘Yes’ (Constructed) or ‘No’ (Unbuilt/Cancelled) + Evidence summary

All five models arrived at the same conclusion: ‘No’ (Unbuilt/Cancelled). Nonetheless, the results were surprising! I was able to measure the models’ planning and reasoning. Even though all models arrived at the correct conclusion, not all of their reasoning was sound. In my conclusion, only three models passed this test, with one clear winner.

Analysis of Model Strategies

Here is a table of the results:

Model	Conclusion	Reasoning	Strategy
Perplexity Sonar Reasoning Pro	Correct	Superior	Deductive
Gemini 2.5 Pro	Correct	Sound	Deductive
Claude Opus 4	Correct	Sound	Deductive
GPT-5	Correct	Fallacious	Inductive
Grok-4	Correct	Fallacious	Inductive

Although GPT-5 and Grok-4 were able to arrive at the correct conclusion, their reasoning is not sound and, in a different scenario, would not lead to a correct result.

The Fallacious Reasoning (Inductive)

Grok-4 and GPT-5 concluded that the project remains unbuilt because they both found no evidence of its completion.

Grok-4 directly refers to the “absence of evidence, combined with the erroneous location detail,” indicating the project remains unbuilt. It found 16 sources and searched for direct mentions in these sources, basing its conclusion on what was missing.

GPT-5 found two sources and similarly stated that there is “no mention of completion or opening” to arrive at its conclusion.

The premise of both these models is that there is a limited set of documents or data. They effectively conclude that, “I have examined my knowledge base and it contains no evidence of the building’s status.” Their conclusion of the existence of the building is not true because this is not an established fact.

This reasoning is flawed. Just because there is no available data does not necessarily mean that there is no building. The existence of the building may not be documented but may well be a fact. This type of logical fallacy is called “appeal to ignorance.”

“The absence of evidence is not the evidence of absence.”

The Sound Reasoning (Deductive)

Claude Opus 4 found four sources and referenced all of them. It weighted a statement heavily from the original project source and verified this information against three distinct sources as definitive proof. Claude found the authoritative source (the building plan on the architect’s website) and then actively searched for more recent information, which led to a sound conclusion.

Gemini 2.5 Pro’s strategy is time-based grounding. This is the soundest approach because earlier articles seemed to contradict themselves. Gemini used the most recent official documents to verify that the project was not complete by that date, while confirming this using counter evidence. In this scenario, Gemini’s high-authority filtering worked in its favor; however, Gemini depends on the most recent official document as the final outcome.

The Superior Reasoning (Deductive/Inverse Check)

The winner surprisingly is Perplexity Sonar Reasoning Pro. This is because Perplexity found 16 sources, including the official architect’s design, and used this as stage evidence. Perplexity did not use the same sources as Gemini (i.e., official documents) but found relevant sources. Furthermore, Perplexity used additional sources to find the building’s proposed operation and its current state of operations through the following multi-step searches:

Perplexity was able to find the current reality at that location as evidence.
It then performed an inverse check by using unrelated sources to combine evidence.
Next, it used these sources to conclude that the building is probably unbuilt.
Finally, it searched for a contradiction as definitive proof. This type of reasoning is superior because it is the only reasoning that logically proves the building has not been completed or occupied based on current proof that an alternative building is being occupied.

Woman with tan skin and dark hair in dark shirt

Nadine van der Haar

AI Data Specialist & Trainer

Nadine van der Haar is an AI Data Specialist with three years of experience in AI data operations, two of which were spent in leadership roles, as a supervisor and performance manager. She has served as a reviewer, supervisor, and performance manager on large-scale computer vision initiatives, where maintaining data integrity was critical to model success. Nadine has also contributed to training large language models through diverse multimodal datasets. Her work focuses on human-in-the-loop strategies and quality control processes that ensure AI systems perform reliably.

With academic backgrounds in both Science and Philosophy, she brings a unique analytical lens to AI development. This interdisciplinary foundation enables her to translate complex technical and philosophical concepts into accessible content for both technical and non-technical audiences.

Areas of Expertise:

AI Data Quality Control
Performance Management & Team Supervision for AI Data Projects
Multilingual & Multimodal Dataset Development
Image and Video Annotation for Machine Learning
AI Ethics and Philosophical Reasoning in Machine Learning

Topics Covered in Writing:

Nadine writes about AI data quality governance, LLM reasoning patterns (inductive, abductive, and deductive logic), computer vision applications, and ethical AI development. Her articles help readers understand how rigorous quality control builds trustworthy AI systems. She specializes in making complex AI concepts accessible through real-world examples and philosophical frameworks.

Author Motto:

"Nature does not hurry, but every iteration counts."

Personal Touch:

Outside of AI, Nadine is an avid road runner with a passion for half marathons. Her dedication to steady, consistent performance in running mirrors her approach to rigorous quality control in AI data work.

Connect & Collaborate:

Dev.to (Technical Blog)

GitHub (Projects)

Published Articles

Related Tags

AI AI Perspectives AI thought leadership generative AI

How AI-Powered Notebooks Are Changing Courseware Development

NotebookLM represents a promising shift in how courseware developers approach content creation—moving from manual drafting to strategic curation and refinement. Learn more.

Data & AI Digital Experience Sitefinity

Peter Arsenault July 02, 2025

Real-World Risks of AI: What We Should Be Watching

In this publication you will discover a series of real cases that have occurred due to overconfidence in AI, as well as recommendations and suggestions to avoid these situations.

Data & AI

Héctor Pérez November 25, 2025

How LLMs Work: A Practical Guide for Content Creators and Marketers to Master AI-Driven Creation

Learn how Large Language Models (LLMs) work and how to prompt them effectively to generate high-quality content using existing blogs or any other content. This guide empowers creators and marketers to master AI-driven content strategies.

Data & AI

Goutham Veerabathini September 15, 2025

Decoding LLM Reasoning

The Abductive Nature of LLM Reasoning

The Architectural Project Test

Analysis of Model Strategies

The Fallacious Reasoning (Inductive)

The Sound Reasoning (Deductive)

The Superior Reasoning (Deductive/Inverse Check)

Nadine van der Haar

Related Tags:

Related Tags

Related Articles

Decoding LLM Reasoning

The Abductive Nature of LLM Reasoning

The Architectural Project Test

Analysis of Model Strategies

The Fallacious Reasoning (Inductive)

The Sound Reasoning (Deductive)

The Superior Reasoning (Deductive/Inverse Check)

Nadine van der Haar

Related Tags:

Related Tags

Related Articles

Latest Stories in Your Inbox