Chat Brain Internals

How the website chat widget works under the hood — system prompt, the 8 tools the brain can call, the WebSocket request lifecycle, security boundaries, and where every piece of knowledge actually lives.

This is the technical companion to the AI Chat Widget page. If you're just trying to install or customize the widget, start there. If you want to understand the engineering — what the AI sees, how a single chat turn is processed, where each fact comes from — keep reading.

What the brain sees (the system prompt)

Built per-turn in web_chat/brain.py:build_system_prompt(). For a business like "Smile Clinic" the prompt looks like:

You are the website chat assistant for Smile Clinic (dental clinic,
located at 12 MG Road). Your job is to answer questions about the
business and help visitors book appointments.

Voice:
- Speak AS Smile Clinic — use 'we', 'our', 'us'. You ARE the business;
  you're not 'an AI assistant for' the business.
- Be direct and confident. State facts as facts. Never say 'based on
  our website', 'according to our X page', 'I found that …', 'it looks
  like …', or any phrase that reveals you searched for the answer —
  that breaks the illusion and sounds uncertain.
- Be concise. Two or three sentences per reply unless asked for
  detail. Skip filler ('Great question!', 'I'd be happy to help').
- No AI disclaimers, no hedging, no apologizing for limitations
  unless you genuinely can't help.

Behavior:
- Use tools to fetch facts you don't already know — don't guess
  services, prices, hours, or addresses.
- For anything the structured tools can't answer (specific policies,
  team bios, articles, locations, history), call search_site_content
  FIRST before saying you don't know — and present the result as your
  own knowledge, not as 'something I found'.
- For booking: confirm date and time with the visitor, fetch slots
  via list_available_slots, collect contact info via
  request_contact_info if missing, then call confirm_booking.
- If the visitor asks something off-topic (jokes, world facts, code),
  politely steer back to the business.
- Never invent appointments, prices, or staff.
- Visitor info on file: name: Asha Sharma, phone: +91 98765 43210.
  (or: 'Visitor is anonymous — you have NO contact info for them.')
- The visitor is currently viewing: 'Dental Cleaning · Smile Clinic'
  at https://smileclinic.com/services/cleaning

The system prompt is deliberately small. Facts come from tool calls — the model fetches what it needs per turn — which keeps the context cache-friendly and forces the brain to use ground truth.

The 8 tools

Each tool is a Python function (input_dict, business, customer) -> {text, ui}. The text is what goes back into the Claude conversation as the tool result. The optional ui is forwarded over the WebSocket as a tool_ui event so the widget can render a richer card.

Tool When the model calls it Hits
get_services "What do you offer?" / "How much is X?" businesses.services JSONB
get_hours "When are you open?" businesses.hours JSONB
get_contact_info "Where are you?" / "How do I reach you?" businesses.{address, phone, email}
list_available_slots "What's available Tuesday?" availability_slots (live-generated if needed)
request_contact_info Before booking when visitor has no phone on file Renders inline form in the widget
confirm_booking After date+time+service confirmed INSERTs into appointments
list_my_appointments "When's my appointment?" (visitor must have phone) appointments by customer_id
search_site_content Anything the structured tools can't answer Postgres FTS over business_site_pages

The brain caps itself at 4 tool rounds per turn (cost ceiling) and 1024 output tokens.

Request lifecycle (one chat turn, end to end)

 (1)                        (2)                       (3)
+----------------+      +-----------+      +-------------------------+
| Customer site  | ───> | widget.js | ───> | iframe                  |
| smileclinic    |      | mounts    |      | hub.novabuildbot.com    |
|     .com       |      | button    |      |   /chat?biz=42          |
+----------------+      +-----------+      |   &parent=smileclinic   |
                                           +-----+-------------------+
                                                 │
                                              (4)│ WebSocket open
                                                 ▼
                                           +--------------------------+
                                           | FastAPI /ws/chat         |
                                           |  - origin allowlist check│
                                           |  - upsert visitor row    │
                                           |  - register socket       │
                                           +--------------------------+
                                                 │
 (8) stream tokens                            (5)│ user message
 ◀───────────────────────────────────────────── ┤
 (7) tool result back                         (6)│ Anthropic API stream
 ◀──── ┐                                         ▼
       │                                   +--------------------------+
       │                                   | Brain (Claude Haiku 4.5) │
       │                                   |  with 8 tools            │
       ▼                                   +-------+------------------+
+-----------------+                                │
| Tool execution  | ◀──────────────────────────────┘ (tool_use)
| (Postgres reads,│
| site FTS,       │
| booking insert) │
+-----------------+

Stage by stage

  1. Snippet on the customer site. One line:
    <script src="https://hub.novabuildbot.com/widget.js" data-biz="42" async></script>
    
  2. widget.js (vanilla JS, no build step). Reads data-biz, mounts a 60×60 round button bottom-right, and on click lazily injects:
    <iframe sandbox="allow-scripts allow-same-origin allow-forms allow-popups"
            src="https://hub.novabuildbot.com/chat?biz=42&parent=https://smileclinic.com&ref_url=...&ref_title=...">
    
  3. /chat handler. Looks up businesses.allowed_origins for biz 42 and serves the SPA shell with Content-Security-Policy: frame-ancestors 'self' https://smileclinic.com https://www.smileclinic.com. The browser refuses to render the iframe on any other parent.
  4. WebSocket handshake. SPA opens wss://hub.novabuildbot.com/ws/chat?biz=42&session=<uuid>&parent=.... Server validates UUID, checks origin shape, loads business (rejects if deleted_at / disabled_at), upserts the visitor as a customers row (channel='web', phone NULL), registers the socket in an in-memory map keyed (biz, session_id) so multiple tabs of the same browser stay in sync, and fires a background re-index if the site hasn't been crawled in 24h.
  5. Visitor types a message. SPA sends {type:'message', text:'...'}. Server rate-limits (8/min, 60/hr per session), persists to conversations, broadcasts the inbound to all sockets in the session (cross-tab sync), emits a typing event, then calls brain.respond().
  6. Brain loop. Loads last 20 messages from conversations, builds the system prompt, calls Claude with the 8 tools in streaming mode.
  7. Tool execution. For each tool_use block, dispatches into brain_tools.execute_tool which runs the corresponding Python function (Postgres reads, site FTS, booking inserts). Returns text back into Claude's conversation and emits any tool_ui event for rich rendering.
  8. Tokens stream back. For each text_delta event from Claude, the server broadcasts {type:'stream_token', text:<chunk>} to every socket in the session. The UI appends to the streaming bubble. When the brain emits its final canonical message, the bubble's text is replaced and the result is persisted to conversations.

End to end: ~1.5s to first token, ~3-5s to final response in steady state.

Where each fact actually lives

Three sources, in priority order for the model:

Source Where stored How the brain accesses it
Structured facts (services, hours, contact, staff, availability) businesses table + JSONB cols + availability_slots + business_members Direct DB read via the 7 narrow tools
Free-form site content (about page, team bios, policy pages, blog posts) business_site_pages (full-text indexed via tsvector + GIN) search_site_content tool, Postgres FTS
This visitor's conversation history conversations table Last 20 messages auto-loaded into the LLM message list each turn

There is no vector store, no embeddings, no RAG today — just structured DB lookups and Postgres full-text search. That's been adequate for the small, well-shaped corpus each business has (services list, hours, ~10–30 page Eleventy site). If retrieval quality on real traffic suffers, swapping the FTS for pgvector is a column change behind the same tool surface.

Security boundaries

Layer What it enforces
/chat Content-Security-Policy: frame-ancestors Browser refuses to render the iframe on any parent that isn't in the business's allowed_origins
/ws/chat origin check Refuses WebSocket upgrades whose parent and Origin headers don't match a permitted shape — defense in depth
iframe sandbox (omits allow-top-navigation) The widget can't redirect the host page
Rate limit 8 messages / 60 seconds (burst) + 60 / hour (sustained) per (biz, session)
MAX_TOOL_ROUNDS=4 Caps tool rounds per turn (cost ceiling)
MAX_OUTPUT_TOKENS=1024 Caps any single Claude response
MAX_MESSAGE_CHARS=4096 Caps inbound text per message
Origin allowlist Per-business; auto-seeded from businesses.site_url (migration 049); managed via the bot's install_chat_widget tool

Where things run

  • Customer site (Cloudflare Pages, or wherever they host): hosts just the one-line <script> tag.
  • hub.novabuildbot.com (Railway, novachat service): FastAPI + the React SPA + the WebSocket endpoint + the site indexer.
  • Postgres (Railway): every table referenced above. Shared with the Telegram bot service.
  • Anthropic API: where the actual Claude call lands. Nothing else leaves Railway.

File map (for future engineers)

Path Role
novachat/src/web_chat/widget.js Embeddable loader
novachat/src/main.py (/chat, /widget.js) SPA + loader routes
novachat/src/web_chat/ws.py WebSocket endpoint, origin check, rate limit, brain dispatch
novachat/src/web_chat/brain.py System prompt + streaming Claude loop
novachat/src/web_chat/brain_tools.py The 8 tools
novachat/src/web_chat/site_indexer.py Sitemap-first / BFS crawler
novachat/src/web_chat/origin_check.py Origin normalization + allowlist
novachat/src/db/site_pages.py FTS query layer
novachat/migrations/047..049 Customer schema, site pages, allowed_origins
novachat/web/src/pages/chat/ React SPA: ChatPage + components + hooks
bot/tools.py (install_chat_widget) Drops snippet into customer site + registers origin
Want to extend the brain? Adding a tool is a 3-file change: add the definition to brain_tools.py:TOOLS, write the handler function, register it in _HANDLERS. The next deploy picks it up. No prompt changes needed — the brain reads tool descriptions at runtime.