Chat Brain Internals
How the website chat widget works under the hood — system prompt, the 8 tools the brain can call, the WebSocket request lifecycle, security boundaries, and where every piece of knowledge actually lives.
This is the technical companion to the AI Chat Widget page. If you're just trying to install or customize the widget, start there. If you want to understand the engineering — what the AI sees, how a single chat turn is processed, where each fact comes from — keep reading.
What the brain sees (the system prompt)
Built per-turn in web_chat/brain.py:build_system_prompt(). For a business like "Smile Clinic" the prompt looks like:
You are the website chat assistant for Smile Clinic (dental clinic,
located at 12 MG Road). Your job is to answer questions about the
business and help visitors book appointments.
Voice:
- Speak AS Smile Clinic — use 'we', 'our', 'us'. You ARE the business;
you're not 'an AI assistant for' the business.
- Be direct and confident. State facts as facts. Never say 'based on
our website', 'according to our X page', 'I found that …', 'it looks
like …', or any phrase that reveals you searched for the answer —
that breaks the illusion and sounds uncertain.
- Be concise. Two or three sentences per reply unless asked for
detail. Skip filler ('Great question!', 'I'd be happy to help').
- No AI disclaimers, no hedging, no apologizing for limitations
unless you genuinely can't help.
Behavior:
- Use tools to fetch facts you don't already know — don't guess
services, prices, hours, or addresses.
- For anything the structured tools can't answer (specific policies,
team bios, articles, locations, history), call search_site_content
FIRST before saying you don't know — and present the result as your
own knowledge, not as 'something I found'.
- For booking: confirm date and time with the visitor, fetch slots
via list_available_slots, collect contact info via
request_contact_info if missing, then call confirm_booking.
- If the visitor asks something off-topic (jokes, world facts, code),
politely steer back to the business.
- Never invent appointments, prices, or staff.
- Visitor info on file: name: Asha Sharma, phone: +91 98765 43210.
(or: 'Visitor is anonymous — you have NO contact info for them.')
- The visitor is currently viewing: 'Dental Cleaning · Smile Clinic'
at https://smileclinic.com/services/cleaning
The system prompt is deliberately small. Facts come from tool calls — the model fetches what it needs per turn — which keeps the context cache-friendly and forces the brain to use ground truth.
The 8 tools
Each tool is a Python function (input_dict, business, customer) -> {text, ui}. The text is what goes back into the Claude conversation as the tool result. The optional ui is forwarded over the WebSocket as a tool_ui event so the widget can render a richer card.
| Tool | When the model calls it | Hits |
|---|---|---|
get_services |
"What do you offer?" / "How much is X?" | businesses.services JSONB |
get_hours |
"When are you open?" | businesses.hours JSONB |
get_contact_info |
"Where are you?" / "How do I reach you?" | businesses.{address, phone, email} |
list_available_slots |
"What's available Tuesday?" | availability_slots (live-generated if needed) |
request_contact_info |
Before booking when visitor has no phone on file | Renders inline form in the widget |
confirm_booking |
After date+time+service confirmed | INSERTs into appointments |
list_my_appointments |
"When's my appointment?" (visitor must have phone) | appointments by customer_id |
search_site_content |
Anything the structured tools can't answer | Postgres FTS over business_site_pages |
The brain caps itself at 4 tool rounds per turn (cost ceiling) and 1024 output tokens.
Request lifecycle (one chat turn, end to end)
(1) (2) (3)
+----------------+ +-----------+ +-------------------------+
| Customer site | ───> | widget.js | ───> | iframe |
| smileclinic | | mounts | | hub.novabuildbot.com |
| .com | | button | | /chat?biz=42 |
+----------------+ +-----------+ | &parent=smileclinic |
+-----+-------------------+
│
(4)│ WebSocket open
▼
+--------------------------+
| FastAPI /ws/chat |
| - origin allowlist check│
| - upsert visitor row │
| - register socket │
+--------------------------+
│
(8) stream tokens (5)│ user message
◀───────────────────────────────────────────── ┤
(7) tool result back (6)│ Anthropic API stream
◀──── ┐ ▼
│ +--------------------------+
│ | Brain (Claude Haiku 4.5) │
│ | with 8 tools │
▼ +-------+------------------+
+-----------------+ │
| Tool execution | ◀──────────────────────────────┘ (tool_use)
| (Postgres reads,│
| site FTS, │
| booking insert) │
+-----------------+
Stage by stage
- Snippet on the customer site. One line:
<script src="https://hub.novabuildbot.com/widget.js" data-biz="42" async></script> widget.js(vanilla JS, no build step). Readsdata-biz, mounts a 60×60 round button bottom-right, and on click lazily injects:<iframe sandbox="allow-scripts allow-same-origin allow-forms allow-popups" src="https://hub.novabuildbot.com/chat?biz=42&parent=https://smileclinic.com&ref_url=...&ref_title=...">/chathandler. Looks upbusinesses.allowed_originsfor biz 42 and serves the SPA shell withContent-Security-Policy: frame-ancestors 'self' https://smileclinic.com https://www.smileclinic.com. The browser refuses to render the iframe on any other parent.- WebSocket handshake. SPA opens
wss://hub.novabuildbot.com/ws/chat?biz=42&session=<uuid>&parent=.... Server validates UUID, checks origin shape, loads business (rejects ifdeleted_at/disabled_at), upserts the visitor as acustomersrow (channel='web', phone NULL), registers the socket in an in-memory map keyed(biz, session_id)so multiple tabs of the same browser stay in sync, and fires a background re-index if the site hasn't been crawled in 24h. - Visitor types a message. SPA sends
{type:'message', text:'...'}. Server rate-limits (8/min, 60/hr per session), persists toconversations, broadcasts the inbound to all sockets in the session (cross-tab sync), emits atypingevent, then callsbrain.respond(). - Brain loop. Loads last 20 messages from
conversations, builds the system prompt, calls Claude with the 8 tools in streaming mode. - Tool execution. For each
tool_useblock, dispatches intobrain_tools.execute_toolwhich runs the corresponding Python function (Postgres reads, site FTS, booking inserts). Returns text back into Claude's conversation and emits anytool_uievent for rich rendering. - Tokens stream back. For each
text_deltaevent from Claude, the server broadcasts{type:'stream_token', text:<chunk>}to every socket in the session. The UI appends to the streaming bubble. When the brain emits its final canonical message, the bubble's text is replaced and the result is persisted toconversations.
End to end: ~1.5s to first token, ~3-5s to final response in steady state.
Where each fact actually lives
Three sources, in priority order for the model:
| Source | Where stored | How the brain accesses it |
|---|---|---|
| Structured facts (services, hours, contact, staff, availability) | businesses table + JSONB cols + availability_slots + business_members |
Direct DB read via the 7 narrow tools |
| Free-form site content (about page, team bios, policy pages, blog posts) | business_site_pages (full-text indexed via tsvector + GIN) |
search_site_content tool, Postgres FTS |
| This visitor's conversation history | conversations table |
Last 20 messages auto-loaded into the LLM message list each turn |
There is no vector store, no embeddings, no RAG today — just structured DB lookups and Postgres full-text search. That's been adequate for the small, well-shaped corpus each business has (services list, hours, ~10–30 page Eleventy site). If retrieval quality on real traffic suffers, swapping the FTS for pgvector is a column change behind the same tool surface.
Security boundaries
| Layer | What it enforces |
|---|---|
/chat Content-Security-Policy: frame-ancestors |
Browser refuses to render the iframe on any parent that isn't in the business's allowed_origins |
/ws/chat origin check |
Refuses WebSocket upgrades whose parent and Origin headers don't match a permitted shape — defense in depth |
iframe sandbox (omits allow-top-navigation) |
The widget can't redirect the host page |
| Rate limit | 8 messages / 60 seconds (burst) + 60 / hour (sustained) per (biz, session) |
MAX_TOOL_ROUNDS=4 |
Caps tool rounds per turn (cost ceiling) |
MAX_OUTPUT_TOKENS=1024 |
Caps any single Claude response |
MAX_MESSAGE_CHARS=4096 |
Caps inbound text per message |
| Origin allowlist | Per-business; auto-seeded from businesses.site_url (migration 049); managed via the bot's install_chat_widget tool |
Where things run
- Customer site (Cloudflare Pages, or wherever they host): hosts just the one-line
<script>tag. hub.novabuildbot.com(Railway, novachat service): FastAPI + the React SPA + the WebSocket endpoint + the site indexer.- Postgres (Railway): every table referenced above. Shared with the Telegram bot service.
- Anthropic API: where the actual Claude call lands. Nothing else leaves Railway.
File map (for future engineers)
| Path | Role |
|---|---|
novachat/src/web_chat/widget.js |
Embeddable loader |
novachat/src/main.py (/chat, /widget.js) |
SPA + loader routes |
novachat/src/web_chat/ws.py |
WebSocket endpoint, origin check, rate limit, brain dispatch |
novachat/src/web_chat/brain.py |
System prompt + streaming Claude loop |
novachat/src/web_chat/brain_tools.py |
The 8 tools |
novachat/src/web_chat/site_indexer.py |
Sitemap-first / BFS crawler |
novachat/src/web_chat/origin_check.py |
Origin normalization + allowlist |
novachat/src/db/site_pages.py |
FTS query layer |
novachat/migrations/047..049 |
Customer schema, site pages, allowed_origins |
novachat/web/src/pages/chat/ |
React SPA: ChatPage + components + hooks |
bot/tools.py (install_chat_widget) |
Drops snippet into customer site + registers origin |
brain_tools.py:TOOLS, write the handler function, register it in _HANDLERS. The next deploy picks it up. No prompt changes needed — the brain reads tool descriptions at runtime.