Chat Brain Internals

How the website chat widget works under the hood — system prompt, the 8 tools the brain can call, the WebSocket request lifecycle, security boundaries, and where every piece of knowledge actually lives.

This is the technical companion to the AI Chat Widget page. If you're just trying to install or customize the widget, start there. If you want to understand the engineering — what the AI sees, how a single chat turn is processed, where each fact comes from — keep reading.

What the brain sees (the system prompt)

Built per-turn in web_chat/brain.py:build_system_prompt(). For a business like "Smile Clinic" the prompt looks like:

You are the website chat assistant for Smile Clinic (dental clinic,
located at 12 MG Road). Your job is to answer questions about the
business and help visitors book appointments.

Voice:
- Speak AS Smile Clinic — use 'we', 'our', 'us'. You ARE the business;
  you're not 'an AI assistant for' the business.
- Be direct and confident. State facts as facts. Never say 'based on
  our website', 'according to our X page', 'I found that …', 'it looks
  like …', or any phrase that reveals you searched for the answer —
  that breaks the illusion and sounds uncertain.
- Be concise. Two or three sentences per reply unless asked for
  detail. Skip filler ('Great question!', 'I'd be happy to help').
- No AI disclaimers, no hedging, no apologizing for limitations
  unless you genuinely can't help.

Behavior:
- Use tools to fetch facts you don't already know — don't guess
  services, prices, hours, or addresses.
- For anything the structured tools can't answer (specific policies,
  team bios, articles, locations, history), call search_site_content
  FIRST before saying you don't know — and present the result as your
  own knowledge, not as 'something I found'.
- For booking: confirm date and time with the visitor, fetch slots
  via list_available_slots, collect contact info via
  request_contact_info if missing, then call confirm_booking.
- If the visitor asks something off-topic (jokes, world facts, code),
  politely steer back to the business.
- Never invent appointments, prices, or staff.
- Visitor info on file: name: Asha Sharma, phone: +91 98765 43210.
  (or: 'Visitor is anonymous — you have NO contact info for them.')
- The visitor is currently viewing: 'Dental Cleaning · Smile Clinic'
  at https://smileclinic.com/services/cleaning

The system prompt is deliberately small. Facts come from tool calls — the model fetches what it needs per turn — which keeps the context cache-friendly and forces the brain to use ground truth.

The 8 tools

Each tool is a Python function (input_dict, business, customer) -> {text, ui}. The text is what goes back into the Claude conversation as the tool result. The optional ui is forwarded over the WebSocket as a tool_ui event so the widget can render a richer card.

Tool	When the model calls it	Hits
`get_services`	"What do you offer?" / "How much is X?"	`businesses.services` JSONB
`get_hours`	"When are you open?"	`businesses.hours` JSONB
`get_contact_info`	"Where are you?" / "How do I reach you?"	`businesses.{address, phone, email}`
`list_available_slots`	"What's available Tuesday?"	`availability_slots` (live-generated if needed)
`request_contact_info`	Before booking when visitor has no phone on file	Renders inline form in the widget
`confirm_booking`	After date+time+service confirmed	INSERTs into `appointments`
`list_my_appointments`	"When's my appointment?" (visitor must have phone)	`appointments` by `customer_id`
`search_site_content`	Anything the structured tools can't answer	Postgres FTS over `business_site_pages`

The brain caps itself at 4 tool rounds per turn (cost ceiling) and 1024 output tokens.

Request lifecycle (one chat turn, end to end)

 (1)                        (2)                       (3)
+----------------+      +-----------+      +-------------------------+
| Customer site  | ───> | widget.js | ───> | iframe                  |
| smileclinic    |      | mounts    |      | hub.novabuildbot.com    |
|     .com       |      | button    |      |   /chat?biz=42          |
+----------------+      +-----------+      |   &parent=smileclinic   |
                                           +-----+-------------------+
                                                 │
                                              (4)│ WebSocket open
                                                 ▼
                                           +--------------------------+
                                           | FastAPI /ws/chat         |
                                           |  - origin allowlist check│
                                           |  - upsert visitor row    │
                                           |  - register socket       │
                                           +--------------------------+
                                                 │
 (8) stream tokens                            (5)│ user message
 ◀───────────────────────────────────────────── ┤
 (7) tool result back                         (6)│ Anthropic API stream
 ◀──── ┐                                         ▼
       │                                   +--------------------------+
       │                                   | Brain (Claude Haiku 4.5) │
       │                                   |  with 8 tools            │
       ▼                                   +-------+------------------+
+-----------------+                                │
| Tool execution  | ◀──────────────────────────────┘ (tool_use)
| (Postgres reads,│
| site FTS,       │
| booking insert) │
+-----------------+

Stage by stage

Snippet on the customer site. One line:

<script src="https://hub.novabuildbot.com/widget.js" data-biz="42" async></script>

widget.js (vanilla JS, no build step). Reads data-biz, mounts a 60×60 round button bottom-right, and on click lazily injects:

<iframe sandbox="allow-scripts allow-same-origin allow-forms allow-popups"
        src="https://hub.novabuildbot.com/chat?biz=42&parent=https://smileclinic.com&ref_url=...&ref_title=...">

/chat handler. Looks up businesses.allowed_origins for biz 42 and serves the SPA shell with Content-Security-Policy: frame-ancestors 'self' https://smileclinic.com https://www.smileclinic.com. The browser refuses to render the iframe on any other parent.
WebSocket handshake. SPA opens wss://hub.novabuildbot.com/ws/chat?biz=42&session=<uuid>&parent=.... Server validates UUID, checks origin shape, loads business (rejects if deleted_at / disabled_at), upserts the visitor as a customers row (channel='web', phone NULL), registers the socket in an in-memory map keyed (biz, session_id) so multiple tabs of the same browser stay in sync, and fires a background re-index if the site hasn't been crawled in 24h.
Visitor types a message. SPA sends {type:'message', text:'...'}. Server rate-limits (8/min, 60/hr per session), persists to conversations, broadcasts the inbound to all sockets in the session (cross-tab sync), emits a typing event, then calls brain.respond().
Brain loop. Loads last 20 messages from conversations, builds the system prompt, calls Claude with the 8 tools in streaming mode.
Tool execution. For each tool_use block, dispatches into brain_tools.execute_tool which runs the corresponding Python function (Postgres reads, site FTS, booking inserts). Returns text back into Claude's conversation and emits any tool_ui event for rich rendering.
Tokens stream back. For each text_delta event from Claude, the server broadcasts {type:'stream_token', text:<chunk>} to every socket in the session. The UI appends to the streaming bubble. When the brain emits its final canonical message, the bubble's text is replaced and the result is persisted to conversations.

End to end: ~1.5s to first token, ~3-5s to final response in steady state.

Where each fact actually lives

Three sources, in priority order for the model:

Source	Where stored	How the brain accesses it
Structured facts (services, hours, contact, staff, availability)	`businesses` table + JSONB cols + `availability_slots` + `business_members`	Direct DB read via the 7 narrow tools
Free-form site content (about page, team bios, policy pages, blog posts)	`business_site_pages` (full-text indexed via `tsvector` + GIN)	`search_site_content` tool, Postgres FTS
This visitor's conversation history	`conversations` table	Last 20 messages auto-loaded into the LLM message list each turn

There is no vector store, no embeddings, no RAG today — just structured DB lookups and Postgres full-text search. That's been adequate for the small, well-shaped corpus each business has (services list, hours, ~10–30 page Eleventy site). If retrieval quality on real traffic suffers, swapping the FTS for pgvector is a column change behind the same tool surface.

Security boundaries

Layer	What it enforces
`/chat` `Content-Security-Policy: frame-ancestors`	Browser refuses to render the iframe on any parent that isn't in the business's `allowed_origins`
`/ws/chat` origin check	Refuses WebSocket upgrades whose `parent` and `Origin` headers don't match a permitted shape — defense in depth
`iframe sandbox` (omits `allow-top-navigation`)	The widget can't redirect the host page
Rate limit	8 messages / 60 seconds (burst) + 60 / hour (sustained) per `(biz, session)`
`MAX_TOOL_ROUNDS=4`	Caps tool rounds per turn (cost ceiling)
`MAX_OUTPUT_TOKENS=1024`	Caps any single Claude response
`MAX_MESSAGE_CHARS=4096`	Caps inbound text per message
Origin allowlist	Per-business; auto-seeded from `businesses.site_url` (migration 049); managed via the bot's `install_chat_widget` tool

Where things run

Customer site (Cloudflare Pages, or wherever they host): hosts just the one-line <script> tag.
hub.novabuildbot.com (Railway, novachat service): FastAPI + the React SPA + the WebSocket endpoint + the site indexer.
Postgres (Railway): every table referenced above. Shared with the Telegram bot service.
Anthropic API: where the actual Claude call lands. Nothing else leaves Railway.

File map (for future engineers)

Path	Role
`novachat/src/web_chat/widget.js`	Embeddable loader
`novachat/src/main.py` (`/chat`, `/widget.js`)	SPA + loader routes
`novachat/src/web_chat/ws.py`	WebSocket endpoint, origin check, rate limit, brain dispatch
`novachat/src/web_chat/brain.py`	System prompt + streaming Claude loop
`novachat/src/web_chat/brain_tools.py`	The 8 tools
`novachat/src/web_chat/site_indexer.py`	Sitemap-first / BFS crawler
`novachat/src/web_chat/origin_check.py`	Origin normalization + allowlist
`novachat/src/db/site_pages.py`	FTS query layer
`novachat/migrations/047..049`	Customer schema, site pages, allowed_origins
`novachat/web/src/pages/chat/`	React SPA: ChatPage + components + hooks
`bot/tools.py` (`install_chat_widget`)	Drops snippet into customer site + registers origin

Want to extend the brain? Adding a tool is a 3-file change: add the definition to brain_tools.py:TOOLS, write the handler function, register it in _HANDLERS. The next deploy picks it up. No prompt changes needed — the brain reads tool descriptions at runtime.