ZyVOP Logo
Content That Connects
SeriesCategoriesTags
ZyVOP Logo
Content That Connects

Empowering developers and creators with cutting-edge insights, comprehensive tutorials, and innovative solutions for the digital future.

Content

  • Tags
  • Write Article
  • Newsletter

Company

  • About Us
  • Contact

Connect

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • DMCA Policy
  • Code of Conduct

© 2026 ZyVOP. Crafted with care for the developer community.

Made with ❤️ by the ZyVOP team
All systems operational
HomeI Built a Tiny AI Agent From Scratch — Every Line Tested Before It Touched a Real API
👍1

I Built a Tiny AI Agent From Scratch — Every Line Tested Before It Touched a Real API

No frameworks, no magic. Just two Python functions, a loop, and Claude's tool-use API — plus the offline test suite that proves it actually works.

#AI#AI agents#Claude API#Python#Tool Use#agentic AI#Programming#tutorial#software development#LLM
Pushpum Vats
Pushpum Vats

Senior Developer

June 18, 2026
8 min read
17 views
I Built a Tiny AI Agent From Scratch — Every Line Tested Before It Touched a Real API

What an "agent" actually is, stripped of the hype

Every few weeks there's a new framework promising to make "agentic AI" easy. Most of them are wrappers around one core idea: the model doesn't just generate text — it can pause, say "I need to call this function with these arguments," wait for the result, and then keep going with that information in hand.

That's it. That's the whole trick. Anthropic calls this tool use, and it's the same mechanism powering everything from "let Claude check the weather" to multi-step coding agents.

This tutorial builds a working version of that loop from the ground up — no LangChain, no agent framework, just the Claude API and plain Python. By the end you'll have a small agent that can do arithmetic and count words by actually calling real Python functions, decide on its own when to use them, and chain them together when a question needs both.


What you'll need

  • Python 3.9 or newer

  • An Anthropic API key (from the Claude Console)

  • The official SDK: pip install anthropic

That's the whole list. No vector databases, no Docker, nothing else.


Step 1: Write the actual tools (just functions)

This is the part people often overcomplicate. A "tool" is just a regular Python function, plus a small JSON description telling Claude what it does and what arguments it takes.

We'll build two: a calculator and a word counter. Save this as tools.py:

"""
The actual Python functions our agent can call, plus the JSON-schema
descriptions of those tools that we hand to the Claude API.
"""

import ast
import operator


_OPS = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.truediv,
    ast.Pow: operator.pow,
    ast.USub: operator.neg,
}


def calculate(expression: str) -> str:
    """Safely evaluate a basic arithmetic expression like '12 * (3 + 4)'."""

    def _eval(node):
        if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
            return node.value
        if isinstance(node, ast.BinOp) and type(node.op) in _OPS:
            return _OPS[type(node.op)](_eval(node.left), _eval(node.right))
        if isinstance(node, ast.UnaryOp) and type(node.op) in _OPS:
            return _OPS[type(node.op)](_eval(node.operand))
        raise ValueError(f"Unsupported expression: {expression!r}")

    tree = ast.parse(expression, mode="eval")
    result = _eval(tree.body)
    return str(result)


def count_words(text: str) -> str:
    """Count the words in a piece of text."""
    return str(len(text.split()))


TOOLS = [
    {
        "name": "calculate",
        "description": (
            "Evaluate a basic arithmetic expression and return the numeric "
            "result as a string. Supports +, -, *, /, **, parentheses, and "
            "negative numbers. Use this any time the user asks for a "
            "calculation, even a simple one -- do not do math in your head."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A valid arithmetic expression, e.g. '127 * 38' or '(12 + 4) / 2'.",
                }
            },
            "required": ["expression"],
        },
    },
    {
        "name": "count_words",
        "description": (
            "Count how many words are in a given piece of text and return "
            "the count as a string. Use this when the user asks for a word "
            "count of something rather than estimating it yourself."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "The text to count words in.",
                }
            },
            "required": ["text"],
        },
    },
]

TOOL_FUNCTIONS = {
    "calculate": calculate,
    "count_words": count_words,
}

A couple of deliberate choices worth calling out. The calculate function uses Python's ast module to parse the expression into a syntax tree and walk it manually, rather than calling eval() directly — eval("import os; os.system(...)") is exactly the kind of thing you don't want an AI-controlled function anywhere near, even though ast.parse(mode="eval") would itself reject statements like import. The description fields are also longer than feels natural at first. That's intentional — Claude's tool selection quality depends heavily on how clearly each tool explains what it does and when to use it.


Step 2: The agent loop

This is the part that actually makes it "agentic." Save this as agent.py:

"""
The agent loop: send a message, check whether Claude wants to use a tool,
run that tool locally, send the result back, and repeat until Claude
gives a final text answer.
"""

from tools import TOOLS, TOOL_FUNCTIONS


def run_agent(client, user_message, model="claude-sonnet-4-6", max_iterations=5, verbose=True):
    messages = [{"role": "user", "content": user_message}]

    for step in range(max_iterations):
        response = client.messages.create(
            model=model,
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
        )

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            return "".join(
                block.text for block in response.content if block.type == "text"
            )

        tool_results = []
        for block in response.content:
            if block.type == "text" and verbose and block.text.strip():
                print(f"  [Claude says]: {block.text.strip()}")

            if block.type == "tool_use":
                func = TOOL_FUNCTIONS.get(block.name)
                if verbose:
                    print(f"  [tool call]: {block.name}({block.input})")

                if func is None:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": f"Unknown tool: {block.name}",
                        "is_error": True,
                    })
                    continue

                try:
                    result = func(**block.input)
                    if verbose:
                        print(f"  [tool result]: {result}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
                except Exception as exc:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(exc),
                        "is_error": True,
                    })

        messages.append({"role": "user", "content": tool_results})

    return "Reached max_iterations without a final answer -- something is looping."

Three things in here are easy to get wrong, and the API will reject your request with a 400 error if you do:

The tool_result blocks have to go in a new user message, not appended to the assistant's message. The tool_result blocks must come first in that message's content array — any text from your side has to come after them. And every tool_use block in the assistant's response needs a matching tool_result with the same tool_use_id, including when a tool errors out — which is why even the error case still appends a tool_result, just with is_error: true.


Step 3: Test the loop before it ever calls the real API

Here's the part most tutorials skip, and it's the most useful part for actually trusting your code. The Claude API's tool-use responses have a documented, predictable shape — a stop_reason, and a content list of blocks that are either text or tool_use. So we can fake that shape, feed it to run_agent, and verify the loop, the tool dispatch, and the actual math/word-counting logic all work — without an API key, without spending a token.

Save this as test_agent_offline.py:

from types import SimpleNamespace
from agent import run_agent
from tools import calculate, count_words


def block(**kwargs):
    return SimpleNamespace(**kwargs)


class FakeMessages:
    def __init__(self, script):
        self.script = script
        self.calls = 0

    def create(self, **kwargs):
        response = self.script[self.calls]
        self.calls += 1
        return response


class FakeClient:
    def __init__(self, script):
        self.messages = FakeMessages(script)


def test_parallel_tool_calls():
    turn1 = SimpleNamespace(
        stop_reason="tool_use",
        content=[
            block(type="text", text="I'll do both of those."),
            block(type="tool_use", id="toolu_010", name="calculate",
                  input={"expression": "(12 + 4) / 2"}),
            block(type="tool_use", id="toolu_011", name="count_words",
                  input={"text": "the quick brown fox jumps over the lazy dog"}),
        ],
    )
    turn2 = SimpleNamespace(
        stop_reason="end_turn",
        content=[block(type="text", text="(12 + 4) / 2 is 8.0, and that sentence has 9 words.")],
    )

    client = FakeClient([turn1, turn2])
    answer = run_agent(client, "Two things for you...", verbose=True)

    assert "8.0" in answer
    assert "9 words" in answer
    print("test_parallel_tool_calls passed\n")


def test_underlying_functions_directly():
    assert calculate("127 * 38") == "4826"
    assert calculate("(12 + 4) / 2") == "8.0"
    assert calculate("-3 + 7 ** 2") == "46"
    assert count_words("the quick brown fox jumps over the lazy dog") == "9"

    try:
        calculate("import os")
        raise AssertionError("should have raised")
    except (ValueError, SyntaxError):
        pass

    print("test_underlying_functions_directly passed\n")


if __name__ == "__main__":
    test_underlying_functions_directly()
    test_parallel_tool_calls()
    print("All offline tests passed.")

Running this with python3 test_agent_offline.py produces:

test_underlying_functions_directly passed

  [Claude says]: I'll do both of those.
  [tool call]: calculate({'expression': '(12 + 4) / 2'})
  [tool result]: 8.0
  [tool call]: count_words({'text': 'the quick brown fox jumps over the lazy dog'})
  [tool result]: 9
test_parallel_tool_calls passed

All offline tests passed.

That output is from actually running the code above — not a transcript I wrote by hand. It confirms three things at once: the calculator handles operator precedence and negative numbers correctly, the agent loop correctly processes multiple tool calls in a single turn (Claude often does both calculations in parallel rather than one at a time), and the message history gets built in the shape the real API expects.

If you change anything — add a tool, change a schema, rewrite the loop — rerun this file first. It catches the majority of "why did my agent just 400" problems in seconds, with zero API cost.


Step 4: Run it for real

Once the offline tests pass, swap in the real client. Save this as run.py:

"""
Run with a real API key:

    export ANTHROPIC_API_KEY="sk-ant-..."
    pip install anthropic
    python3 run.py
"""

from anthropic import Anthropic
from agent import run_agent

client = Anthropic()  # reads ANTHROPIC_API_KEY from the environment

if __name__ == "__main__":
    question = (
        "What's 127 * 38, and how many words are in the sentence "
        "'the quick brown fox jumps over the lazy dog'?"
    )
    answer = run_agent(client, question)
    print("\nFinal answer:", answer)

Set your API key as an environment variable, install the SDK, and run it:

export ANTHROPIC_API_KEY="sk-ant-your-key-here"
pip install anthropic
python3 run.py

Because the offline test already exercised the exact same run_agent function with a response shaped the way the real API responds, what you're really testing here is just "does my API key work and does the real model behave the way the documented shape says it will" — which is a much smaller, cheaper thing to debug.


What's actually happening, step by step

For the question above, here's the real sequence:

Claude receives the question along with the two tool definitions. It decides this needs both tools, and — because Claude 4-generation models default to parallel tool calling — it can return both tool_use blocks in a single response, often with a short sentence of context first ("I'll calculate that and count the words for you").

Our loop sees stop_reason == "tool_use", runs calculate("127 * 38") and count_words(...) locally, and sends both results back in a single new user message, with the tool_result blocks first.

Claude receives those results and, now having everything it needs, responds with stop_reason == "end_turn" and a plain text answer. Our loop sees that and returns the text. Done — two API calls total, with real computation happening in real Python in between.


Things to watch out for as you extend this

Pick the right model for the job. Anthropic's own guidance is to use a larger model like Opus for tools with ambiguous inputs or many options, and a smaller model like Haiku for simple, well-defined tools — smaller models are more likely to guess at missing parameters rather than asking.

Don't skip max_iterations. If a tool's result regularly causes Claude to call the same tool again, you can end up in a loop. The cap in run_agent is a blunt but effective safety net while you're developing.

Tool descriptions are most of the work. If Claude picks the wrong tool, or the right tool with weird arguments, the fix is almost always a clearer description — what the tool does, when to use it, when not to, and what each parameter means — rather than a change to your loop logic.

For anything beyond a toy, look at the SDK's tool runner. Once you're comfortable with the manual loop above (and understand why it's shaped the way it is), Anthropic's Python, TypeScript, and Ruby SDKs include a beta "tool runner" that handles the request/response cycle and conversation state for you. It's worth learning the manual version first — it's what the tool runner is doing under the hood, and it's much easier to debug when something goes wrong.


Related tutorials on this blog

A couple of places to go from here:

  • Getting Claude to Actually Talk to Your Files: A Real-World MCP Setup Guide — the loop you just built by hand is conceptually what MCP standardizes; this shows the same idea via a config file instead of code.

  • Your Laptop Can Run Its Own AI Now — Here's How to Actually Do It — for experimenting with the agent loop above using a free local model instead of API calls while you're still debugging.

Further reading

  • How tool use works — Claude API docs

  • How to implement tool use — full reference

  • Writing tools for agents — Anthropic engineering blog

  • Anthropic Python SDK on GitHub

Pushpum Vats

Pushpum Vats

Passionate developer sharing knowledge about modern web technologies and best practices.

Comments (0)

Login to post a comment.

Stay Updated

Get the latest articles delivered to your inbox.

We respect your privacy. Unsubscribe anytime.

Related Posts

Token Budgeting: The Engineering Skill Nobody Talks About

Most developers think token optimization means shorter prompts. In 2026, the biggest costs come from bloated chat history, unused tool schemas, cache misses, and overusing expensive models. This guide covers five high-impact levers, with pricing, cost breakdowns, and a case study that cut a Claude bill from $2,400/month to $680.

Read article

Apple Just Confirmed Claude Is Coming to Your iPhone — Here's What WWDC 2026 Actually Changes

Last week’s WWDC was Apple’s most AI-focused event in years: a rebuilt Siri, a standalone chat interface, deeper on-screen awareness, and smarter app actions were officially announced. But claims that iOS 27 will let Siri hand requests to Claude, ChatGPT, or Gemini remain reports and rumors, not Apple-confirmed features.

Read article

AI Agents in 2026: Your No-Fluff Guide to Building One That Actually Works

AI agents are the most talked-about tech of 2026 — and for good reason. They don't just answer questions; they take action. This hands-on guide breaks down exactly how they work, which frameworks are worth your time, and walks you through building your first agent step by step.

Read article

Context Engineering: The Skill That's Actually Replacing Prompt Engineering in 2026

For two years, prompt engineering was the AI skill everyone wanted. LinkedIn courses, boot camps, six-figure job titles. Then something changed. In 2026, the teams building the most reliable AI systems have mostly stopped talking about prompts — and started talking about context. This is what that shift means, and how to get ahead of it.

Read article

Vibe Coding: The Complete Tutorial for Non-Developers (and Developers Who Want to Ship 3x Faster)

Sixty percent of new code in 2026 is AI-generated. Collins Dictionary named it Word of the Year. MIT called it a breakthrough technology. It all began with a single Tuesday-morning tweet. Here’s your guide to vibe coding: what it is, the best tools to use, and how to build something real by week’s end.

Read article

Popular Tags

#.env.example Node.js#0x profiling#10x faster python scraper tutorial#12-factor#2026#AI#AI Backend#AI Comparison#AI Cost Optimization#AI agents