Falsehoods I believed about building agent tool loops
Building an agent loop isn’t hard. The AI SDK gives you streaming, tool calling, stop conditions - every SDK ships these well now.
What’s hard is the rules around the loop: how tools run, how permissions are gated, how artifacts reach the user, how tight your tool schemas are. The LLM picks tools by reading their descriptions every turn - ambiguous descriptions route to the wrong tool, every time.
I just shipped a chat agent platform on QuickFlo. Here are ten things I had to unlearn - most of them about the rules around the loop, not the loop itself. Format borrowed from Patrick McKenzie’s Falsehoods Programmers Believe About Names.
1. The model is the hard part.
The model is 20% of the work. Picking a provider, sending prompts, parsing responses - that part is mature. Every SDK does it well now. The other 80% is the runtime around the model: streaming, state management, persistence, error handling, cost caps, approval flows, observability. If you’re spending most of your time tuning prompts, you’re either really early or you’re not building a product.
2. “Agentic” is a meaningful technical term.
It mostly isn’t. Most “agentic” loops are scripted - the LLM picks a function from a list, the function runs, the output goes back into the next prompt, repeat. There’s exactly one decision point per round. That’s a routing layer, not autonomy - the classifier happens to be a language model. “Agentic” makes it sound emergent. “Function-call loop” is precise, and makes the engineering problem visible.
3. State correctness is solved once you get the model output right.
It isn’t. Streaming agent runs have multiple state writers - the model, the tool execute, the approval handler, the snapshot at completion - and the bugs live in their interactions. Your assistant bubble shows the wrong tool status because two code paths both wrote to it. Your tool-call card spins forever because a NOTIFY arrived out of order. Your message persists with a stale snapshot. The model never lied. The state syncs lied. Plan for that ratio.
4. Approvals belong in their own framework.
They don’t. The propose/apply approval pattern doesn’t need workflow suspension or a separate human-in-the-loop state machine. Stage the proposal on the chat session, render an approval card, the user’s click is just another chat turn that the next workflow run sees. Approvals ride on the existing multi-turn channel. Vendors selling “human-in-the-loop infrastructure” as a standalone product are selling you a state machine your chat session already encodes.
5. Tools need their own framework.
They don’t. Anything in your stack that already has an input schema and a description is already a tool. OpenAPI endpoints. Functions with Zod schemas. Workflows. Services. Whatever you’re using to define typed callables - those ARE your tools. Plumbing them into an agent step is configuration, not architecture.
In QuickFlo’s case, the agent step picks tools from the existing workflow library. Workflow name = tool name. Workflow description = tool description. Workflow input schema = tool input schema. There’s no separate tool registry, no separate lifecycle, no separate observability layer. Tools inherit retries, traces, connections, and error handling from the workflow engine - there’s no second copy to maintain.
The lesson generalizes: if your platform already has typed callable units, those are your tools. Building a parallel “tools” abstraction is rebuilding what you already have. Tighter, fewer concepts, less drift.
6. Tool names and descriptions are loose hints.
They’re software contracts. The LLM reads them on every turn - that’s how it picks which tool to call and how to fill in arguments. Treat them like API docs: precise, scoped, with a clear when-to-use sentence and a clear what-not-to-use-for sentence.
A tool named update_account with description “updates the account” is a bug. A tool named update_account with description “Updates one field on a customer’s account record. For multi-field updates use bulk_update_account. Requires user approval.” is a contract. Renaming lookup_account to query_account_by_id changes which prompts route to it. Treat names like REST endpoints - you’re going to live with them.
7. You need a vector store for memory.
You usually don’t. Most “memory” needs are: summarize the previous conversation, surface a few past summaries on the next session. That’s a string in a jsonb column. Vector stores start paying off when you’re searching across thousands of unstructured documents - at which point you have an actual retrieval problem, not a memory problem. Default to summaries on rows. Reach for vectors when you can’t help it.
8. The right question is “what if the LLM does X?”
The right question is “what does my system do when the LLM does X?” - for any X. The LLM is going to call the wrong tool, hallucinate arguments, fire fifty parallel tool calls, return malformed JSON, and emit prompt injections back to you. Your job is to build a system where that’s a recoverable event, not a disaster. Cost caps, validation, approval gates, idempotent tools, good error handling. The model is part of your input distribution, not your application logic.
9. Cost caps that exist in the schema actually enforce.
Mine didn’t. The agent step had a maxTokens field for months. The runtime computed running totals and emitted a logger.log() when the budget was exceeded. That was the entire enforcement. Under adversarial input - “call every tool forever” - the agent would blow through any budget you set. A schema field that doesn’t bite is a lie. Verify enforcement, not configuration.
10. Streaming is free.
It isn’t. Streaming token-by-token rendering looks magical, but it forces you to confront state correctness in ways batch responses don’t. Two systems (server emitting events, client rendering them) drift out of sync. Mid-flight UI states need design. Reconnect-on-drop needs replay. Out-of-order delivery needs sequencing. If your use case doesn’t need streaming - internal tools, batch processing, async workflows - don’t pay for it. Streaming is a UX commitment, not a default.
The meta-pattern: building agent tools is mostly classical software engineering with one decision step that happens to be a language model. The loop is easy. The rules around it - schemas, contracts, permissions, artifact flow, error handling - are the work.
The tightest agent platforms aren’t the ones with the cleverest prompts. They’re the ones with the strictest contracts around the prompt. Tools that are typed and obvious. Schemas that match the read shape. Cost caps that enforce. Approvals that ride on a channel that already exists.
Most of the AI tooling discourse is about prompting. The real lessons are about state, contracts, and graceful degradation. Same as every other distributed system you’ve shipped.