Ever Wondered How AI Agents Think? Here’s the Truth—Most Don’t.
In Part 1, we talked about the basics of this emerging technology and how AI agents are everywhere—at least, that’s what the marketing says.
But when you look under the hood, most of these ‘agents’ aren’t much more than glorified chatbots with function-calling capabilities.
They execute tasks, but do they really ‘think’? Do they adapt, learn, or remember?
That’s where things get interesting.
So, what does it actually mean for an AI agent to think?
Thinking isn’t just spitting out a response from a language model. It’s about making decisions, applying quality metrics and operating with controlled autonomy.
Today, we’re going to break down the three main pillars of AI agent cognition:
Decision-Making – How agents decide what to do next.
Quality - How agents decide what’s good and what’s rubbish.
Autonomy – What agents do with new information to improve their actions.
By the end of this, you’ll understand how I’ve gone about building cognition into my own software to create AI agents that can actually do some thinking.
And.. Bonus this week. There’s a video demo 👇
Let’s get into it.
How AI Agents Make Decisions
When we talk about AI agents “thinking,” it’s important to recognise that not all decision-making approaches are the same. While many AI agents today rely on simple function-calling, rule-based heuristics, or API-driven workflows, more advanced cognition patterns exist, each with trade-offs.
Before I show you how my framework uses ReAct (Reasoning + Acting) to actually make decisions in real time, let’s take a quick look at some of the other ways AI tries to “think.”
Chain of Thought (CoT)
Imagine trying to solve a math problem in your head.
You don’t just blurt out the answer—you break it down into steps.
That’s what CoT does. It lets LLMs reason through problems step by step, improving logical accuracy.
But here’s the catch: it follows predefined reasoning steps and doesn’t change course dynamically.
Great for structured problem-solving, not so great when things go off-script.
Tree of Thought (ToT)
This is CoT’s more strategic sibling. Instead of following one straight line of thought, ToT explores multiple pathways and picks the best one. Think of it like playing chess—evaluating different moves before deciding which one to make.
Some prompt-chaining techniques use this method, especially when combining multiple LLMs to cast a wider net of responses. Strong for strategy, but again—not great at reacting to unexpected situations.
Graph of Thought (GoT)
Now we’re getting somewhere. This one’s fancy.
Instead of just moving in a straight line (CoT) or branching in a tree (ToT), GoT creates a web of interconnected thoughts.
Picture a detective’s corkboard—strings connecting different pieces of evidence, shifting as new clues emerge.
That’s GoT. Unlike ToT, which follows strict layers, GoT lets AI agents dynamically adjust execution paths, rerouting decisions based on new inputs.
The ESCARGOT framework showed how this approach significantly improves reliability by organising execution steps on the fly and using knowledge graphs for context-aware reasoning.
CoT, ToT, and GoT all have their strengths, but they’re better for structured problem-solving rather than dynamic, real-world adaptability. They work well when you already know what kind of problem you’re solving. But when things change on the fly? They struggle.
That’s why ReAct matters. It merges reasoning and real-time action, so instead of just “thinking” about a problem, the AI actually does something, learns from the outcome, and adjusts in real time.
That’s the difference between an AI that just “plans” and an AI that reacts and adapts.
ReAct in Action: How My Agents Actually Make Decisions
Unlike basic API-driven automation tools, Templonix follows the ReAct model. Why? Because it needs to! The system is designed for flexibility across multiple workflows, rather than being locked into rigid, single-purpose automation. This horizontal integration means it can be plugged into different domains, adapting dynamically instead of being confined to a single, predefined process.
This architecture design decision means that a Templonix agent doesn’t just retrieve information or execute predefined steps—it’s got little choice but to iterate, reflect, and refine its approach dynamically. Which is a good thing, because it reinforces agentic behaviors all the time.
As you can see above (in Yellow), if I need to, I can also introduce domain specific capabilities by building plugins. These vertically align to a process and also give me the chance to introduce the other reasoning patterns, if needed.
ReAct with Quality Assessment
Traditional ReAct follows Observe-Think-Act, but in practice, we need more nuance.
I’ve extended ReAct a bit with quality-aware decision making:
Observe
Instead of just gathering data, I evaluate quality through multiple lenses:
Relevance scoring against goals and tasks
Content quality assessment (quantitative/qualitative analysis)
Source credibility checks
Recency validation
Think
The reflection phase includes:
Content quality scoring (0-100 scale)
Coverage analysis across multiple relevance factors
Aggregate quality metrics for multiple sources
Pattern recognition in content quality
Act
Actions are quality-guided:
Retry strategies for low-quality results
Dynamic query regeneration based on quality feedback
Adaptive search patterns based on content scores
Quality-threshold gating for content acceptance
This is the pattern in diagram form and how you’ll see it working with the Web Search tool in the demo.
For instance, when searching for market analysis, the agent doesn't just collect URLs, it will:
Scores content quality (based on comprehensiveness, structure, etc.)
Analyses relevance factors (quantitative data, expert citations, etc.)
Calculates coverage statistics across sources
Dynamically adjusts its strategy based on quality metrics
This means if initial searches return low-quality content (scores < 30), the agent automatically analyses why the content failed quality checks, regenerates queries to target higher-quality sources and adjusts its acceptance criteria based on observed patterns.
Let’s jump in and see how this works in practice 👇
How the Big Players Approach the Topic
Now that I’ve shown you how I handle how my agents “think”, how does this compare to what the big boys are doing?
Recent publications from Google, Anthropic, and Microsoft reveal fascinating convergences in agent architecture, particularly around cognitive patterns and quality control.
In their recent papers, Google and Anthropic talk a lot about about "simple, composable patterns" - which is just a fancy way of saying "don't overengineer it." Fair play to them, they've got three solid points:
Start Small - Begin with basic LLM APIs before adding complexity
Focus on Composability - Build modular workflows that can dynamically adjust
Right Tool, Right Task - Use smaller models for quick decisions, larger ones for deep reasoning
What's really interesting though is how they both straight-up discuss quality control. Google's whitepaper goes on about "iterative refinement" (otherwise known as testing it till it works properly), while Anthropic are very keen on "grounding" their outputs (making sure the thing's not talking rubbish).
But here's the rub - neither of them actually tells you HOW to do it. That's where I had to roll up my sleeves and sort out my own quality-aware ReAct pattern. For all the talk around Model Context Protocol (MCP) and "orchestration layers", the industry leaders leave a lot open to interpretation when ideally they could pick a side.
Beyond the Basics: Quality-Aware Agency
What I think is becoming clear is that whilst the industry recognises the importance of quality control with what AI agents are doing, there's still a significant gap between theory and implementation.
✅ Quality-Aware ReAct
While both Google and Anthropic advocate for ReAct patterns, I've extended this with quantifiable quality assessment.
Rather than simple accept/reject decisions, my agents use a scoring system (0-100) that evaluates multiple quality dimensions.
They also look for content relevance, information recency, source credibility, and analytical depth.
This gives my framework more nuanced decision-making that delivers consistently better results. But I’ve had to build this all myself.
✅ Dynamic Quality Thresholds
Instead of fixed "good enough" criteria, I've built to adjust quality thresholds dynamically. The system considers task complexity, historical performance, confidence scores, and required accuracy levels in real-time.
This adaptive approach means my agents maintain high standards whilst avoiding unnecessary iterations - they know when to press on and when to dig deeper.
✅ Transparent Quality Metrics
Unlike black-box approaches, I've built towards providing clear visibility into decision-making processes. Each action comes with a quality breakdown - the stuff you saw in the terminal on the video. This transparency means you can see not just what the agent did, but it also explains why it made the choice.
While all this talk of quality might be a tad boring, it’s going to become a big talking point as agentic systems become more mainstream and people start building them “for real”.
Why do I say this?
Well, I don’t see a world where the execs that pay for the new shiny agent toys are going to be able to justify their spend or demonstration ROI without it.
No doubt it won’t be long before someone at one of the hyperscaler shops cottons on to the term Agent Economics. 😉
Curious About What GenAI Could Do for You?
If this article got you thinking about AI agents and their real impact, you’re not alone. Many readers are exploring this new frontier but struggle to separate reality from hype.
That’s exactly why I built ProtoNomics™—a risk-free way to validate GenAI feasibility before you commit resources. No hype. No sales pitch. Just data-driven insights to help you make an informed decision.
If you’re interested, I now run a limited number of GenAI Readiness Assessments each month. If you'd like to see what this technology could do for your business, you can Learn More Here
Or, if you're “just here for the tech” the next article in the series is 👇
Next Time on The Anatomy of an AI Agent
Part 3: Inside the AI Agent’s Brain
Next week we’re diving into how AI agents decompose objectives, manage execution, and coordinate across multiple tools, plugins and components.
🚀 How does an AI agent think at both a micro and macro level?
🤖 Why is task decomposition the key to intelligent automation?
🧠 How does the Memory Manager enable real-time awareness across tools—like the Borg Collective processing shared knowledge?
I’ll be showing you the Goal Manager, Task Executor, and some Memory-Orchestrated Coordination that allow Templonix to function not as a simple workflow automation tool, but as a agentic intelligence with shared awareness across all its components.
Until next time, Chris.
Enjoyed this post? Please share your thoughts in the comments or spread the word by hitting that Restack button.
Interesting insights! Would love to know the types of projects for which you have deployed this framework. Especially interested in the memory and scoring protocols