Architecture
We didn't pick a managed agent platform. Here's the calculus.
For most SMB-facing AI products, running your own thin agent loop against an LLM API is the right call. Here's when the platforms make sense and when they're just expensive indirection.
Every few months a builder asks why we run our own agent loop against an LLM API directly instead of using a managed platform โ Bedrock, Vertex AI Agent Builder, Azure AI Foundry, the Assistants API, or any of the agent frameworks that wrap them. The answer is load-bearing for how we built the rest of the product, so it's worth writing down.
What the platforms bundle
A managed agent platform is, almost universally, a bundle of seven things:
- Inference โ the actual model call.
- Agent loop โ the orchestration that takes a user message, gives it to the model with a list of tools, runs the tools the model picks, feeds results back, and loops until the model produces a final answer.
- Knowledge bases / RAG โ chunking, embeddings, vector store, retrieval, all behind a single API.
- Guardrails โ content filters, PII redaction, denied topics, hallucination flags.
- Compliance โ the audit trail and certifications you need to sell to regulated industries (HIPAA, FedRAMP, SOC 2 on the inference layer specifically).
- Multi-model โ switch between vendors without rewriting client code.
- Operations โ IAM, networking, billing, throughput reservation, capacity.
When you adopt a platform you take the whole bundle. When you build directly, you write each piece you actually need and skip the rest.
What "directly" looks like
The thin version of an agent fits in a few hundred lines of code. You hold a connection to the user, you hold a connection to the model provider, and you shuttle structured messages between them. When the model wants a tool, you call it locally and append the result. When the model emits text, you stream it. You stop when the model stops asking for tools.
That's it. No framework, no orchestration product, no DSL. Your tools are just functions. Your "knowledge base" is whichever data store fits the data โ for a small business's facts (services, hours, address, a few dozen pages of website content), a regular database plus full-text search is more than enough.
When the platforms genuinely earn their keep
There are real scenarios where the bundle is the right choice:
- The buyer requires it. If you're selling into healthcare, banking, or government, the buyer is going to ask whether your inference layer is HIPAA-eligible or has a FedRAMP authorization. Managed platforms inside the hyperscalers have done this paperwork. Going direct means doing it yourself or accepting you can't sell into those verticals yet.
- You're already in the hyperscaler's trust boundary. If your app runs in AWS and your customers expect IAM-authenticated, VPC-isolated, no-public-internet inference, Bedrock fits the rest of the architecture. Going to a public API for one component creates a hole in the model.
- You actually do A/B between model vendors. Some companies meaningfully test Claude vs Llama vs Mistral on every call, route the request to whichever is cheapest for the prompt class, and need a single client interface across all of them. That's what a managed multi-model API is for.
- You don't want to operate a vector database. Managed RAG removes a real operational burden โ embeddings cost, chunk pipelines, query latency, version skew between models and embeddings. If your team is small and your retrieval needs are large, paying for "knowledge base in a box" is rational.
- Procurement. A single AWS or Azure invoice, with reserved capacity, sitting inside your existing committed-spend agreement, is sometimes worth a per-token markup that a procurement org would never approve as a separate SaaS line item.
When the platforms are a tax
The opposite cases are common in SMB-facing AI products like ours:
- You're cost-sensitive. Managed inference adds a markup over the underlying model API โ somewhere in the single-to-low-double-digit percent range. At scale, that's real money. You're also paying for features (compliance, multi-model, guardrails) you're not using.
- You care about latency on every call. Routing through a managed service adds a hop. For streaming chat โ where the first token landing fast is the entire UX โ that hop is felt.
- You want new models on day one. Direct model APIs ship new versions on the day of release. Managed platforms typically pick them up days to weeks later, sometimes months.
- You want to control the agent loop. Adding a tool. Changing the system prompt. Capping rounds. Refreshing per-user state mid-conversation. Streaming tool-result hints to the UI alongside the model's text. Implementing per-turn rate limiting. These are knobs you'd otherwise inherit from the platform; if any of them is a competitive surface for you, you don't want them owned by someone else.
- You're not in the hyperscaler. Running on Cloudflare, Vercel, Railway, Fly, or your own boxes? A managed agent platform inside one of the clouds is then a foreign dependency, with auth, billing, and network rules you have to fit into your existing stack.
The single useful question
The question to ask is: do you want to operate the agent loop, or do you want to outsource it?
If the agent loop is part of your product's competitive surface โ the prompts, the tool design, the orchestration policy, the latency budget โ outsourcing it is paying to be slower, more expensive, and less differentiated. If the agent loop is a commodity that you wish you didn't have to think about, the platform is doing you a favor.
For us, the prompts are a competitive surface. The tools are a competitive surface. We change them weekly. They're the difference between a chat that sounds like the business and a chat that sounds like generic AI. We're not going to outsource that.
What this means in practice
A few things follow:
- We picked a single model vendor and stayed on their direct API. Cheapest path to inference, fastest path to new model releases, full control over the request shape. If the day comes that we want to A/B another vendor, we'll add it then โ it'll cost us less than two years of paying a platform tax to enable optionality we never used.
- Our knowledge layer is the database. Structured facts live in a regular database. Free-form content lives in a search-indexed table. The model fetches what it needs per turn via narrow tools. When retrieval quality on real traffic suffers, the column behind the search tool gets a different index โ possibly vector embeddings โ without changing the tool's interface to the model.
- Our guardrails are prompt rules and rate limits. That's adequate for our threat model. When we eventually sell into a vertical that needs automated PII redaction, we'll bolt it on or move that surface to a vendor โ but we'll make that decision then, not preemptively.
- Our agent loop is a file we understand line by line. When it misbehaves we know exactly where to look. When we want to change behavior, we change one function. There is no framework abstraction in the way.
The general lesson
This isn't really about Bedrock. It's about a pattern that recurs every time a new layer of infrastructure becomes hot: hosted databases, container orchestrators, build systems, observability stacks. The platforms exist because at scale, in enterprise contexts, with strict compliance, they're the right answer. They become the wrong answer when they bundle features you don't need, charge you for the bundle, and limit your ability to shape the layer that matters most.
Build small, build directly, until something in the bundle is actually load-bearing for you. Then buy that one thing.
Want this for your business?
NovaBuildBot ships a full AI-managed site + chat assistant for your business. No code, no infra, paid monthly.
Start on Telegram โ