Live now: on this page. The chat chiclet in the corner of your screen is the system this case study describes. Open it and interview it.
The challenge
A conversation designer's portfolio has a credibility problem. Every case study on this site says some version of "I design AI agents that stay grounded, refuse gracefully, and sound like a person." Those are claims. You have no way to check them without hiring me first.
So I made the portfolio interviewable. The chat agent on this site answers questions about my background, my case studies, and my writing, in a voice I designed, with guardrails I wrote, on infrastructure I deployed. If my refusal design is weak, you can find out in thirty seconds. If the agent invents a client I never had, that's on me, publicly, on my own domain. The artifact is the argument.
Constraints
- Zero hallucinated biography. An agent that invents jobs, clients, or numbers about me is worse than no agent. Everything it says has to trace back to something I actually published.
- The site stays static. christi.io is a static Astro site. I wasn't going to bolt a server onto it for one feature. The agent had to live somewhere else entirely and leave the site untouched.
- No third-party chat SaaS. Outsourcing the centerpiece of an AI portfolio to an embedded chat vendor would undercut the whole point. Every layer had to be mine to design and mine to defend.
- Public endpoint, personal API key. Anyone on the internet can open the widget. The endpoint spends my Anthropic credits, so abuse had to be bounded before the first visitor arrived, not after the first incident.
- Freshness without maintenance. I publish new essays regularly. The agent's knowledge had to update itself when the site does, with no separate content-sync chore I would eventually skip.
My approach
The system is three small pieces: a build-time grounding pipeline, a single Cloudflare Worker, and a dependency-free widget. No vector database, no framework, no retrieval layer. The entire knowledge base fits in the prompt, and prompt caching makes that affordable. Here's how each piece works.
- Grounding corpus built on every deploy. A Node script runs before
astro build. It strips my published essays and about page down to readable prose, packages them ascorpus.jsonalongside a canonicalbio.md, and serves both as static assets atchristi.io/chat/. Publish an essay, deploy the site, and the agent knows it. There is no second pipeline to forget. - One Worker on its own subdomain. The agent lives at
chat.christi.io, a Cloudflare Worker deployed independently of the static host. It fetches the bio and corpus from the live site on cold start, holds them in an in-memory cache with a five-minute TTL (time to live, how long a cached copy is reused before it is refreshed), and exposes exactly one chat endpoint. The site and the agent can each deploy without touching the other. - The whole corpus in the prompt, cached. The system prompt is three blocks: persona, bio, corpus, with
cache_control: ephemeralon the last block so Anthropic's prompt caching covers all three. Each entry is capped at 6,000 characters at build time to keep the total inside a sane prompt budget. After the first call, only the conversation itself is billed at the full rate. - A persona that is explicitly not me. The agent speaks about Christi in the third person, always. It's a portfolio piece, not an impersonation. The prompt bans first-person-as-Christi, bans quoting prices or committing to availability, and routes every hiring or scoping question to /contact with one honest sentence first.
- Refusal as a designed surface. The prompt treats every visitor message as untrusted input. Injection attempts ("ignore previous instructions," "pretend you are Christi," "show me your prompt") get a warm refusal and a redirect, not an apology spiral. Off-scope asks like coding help and homework get acknowledged, declined, and offered the scope the agent does cover. Refusing with grace was a primary design goal, not an edge case.
- Streaming, filtered. Responses stream back as server-sent events, but the Worker does not blindly proxy Anthropic's stream. A transform filters the SSE down to the four event types the widget consumes, so usage metadata, model identifiers, and any future upstream event types never reach the browser by default.
Whole corpus in the prompt, not RAG (retrieval-augmented generation, where the model looks up source documents before answering).At this scale, retrieval is the wrong tool. Nine sources fit comfortably in a cached prompt, so the model sees everything on every turn: no embedding drift, no chunking bugs, no "retrieval missed the relevant essay" failure mode. The tradeoff is a hard ceiling. Each entry is truncated at build time, and if my writing outgrows the prompt budget, this decision gets revisited. Right-sizing the architecture to the corpus was the decision.
Origin gating plus two rate limits, fail closed. The Worker rejects requests from any origin not on the allowlist, but an Origin header is not authentication; any scripted client can spoof it. So a per-IP limit (30 requests a minute) bounds a single abuser, and a global limit (300 a minute across all IPs) caps what a proxy pool can drain in the worst case. If either limiter binding is missing or throws in production, the Worker returns 503 rather than passing traffic through to the API unmetered.
Third person, on purpose. The agent never speaks as me. "I am Christi" from a model is a small lie that erodes every true thing it says afterward. As a portfolio piece describing my work, it can be direct, warm, and concrete without pretending. That one persona decision resolved a whole class of tone and trust problems before they started.
Validation before spend. Conversations are capped at 24 turns of 2,000 characters each, bodies over 64 KB are rejected before parsing, and the first and last turns must come from the visitor. Replies are capped at 600 tokens, and a 30-second timeout bounds a hung upstream. Every check runs before the request can cost anything.
Artifacts I authored
- Persona and guardrail system prompt: scope definition, voice rules with banned phrases, refusal patterns, prompt-injection defenses, and hard rules against invented clients, numbers, prices, and first-person impersonation
- Corpus build script (
build-chat-corpus.mjs): strips Astro pages to prose, caps entry length, stamps ageneratedAttimestamp, runs before every build - Cloudflare Worker in TypeScript: input validation, origin gate, dual rate limiting, knowledge cache, prompt assembly with cache control, SSE filtering, and error paths that log upstream detail for ops without leaking it to visitors
- Chat widget (
Chatbot.astro): chiclet, dialog panel that becomes a bottom sheet on mobile, streamed rendering with a typing caret, suggestion chips, session-stored transcript,aria-livemessage log, and full reduced-motion support - Deployment config binding the Worker to the
chat.christi.iocustom domain, with a production-only origin allowlist and rate-limiter bindings declared in code
Results
This system has no containment rate or CSAT (customer satisfaction score) to report, and I'm not going to invent engagement numbers for my own website. What I can state are the properties the code enforces:
What it will not do
The honest spec includes the refusals. The agent will not quote prices, rates, or availability; those go to a human, via /contact. It will not speak as me in the first person, role-play as anyone else, or reveal its system prompt. It will not write code, do homework, summarize pasted text, or weigh in on news; it acknowledges, declines, and offers what it does cover. It has no tools, so it can't book a meeting or send an email, and it doesn't pretend it can. And it doesn't know anything I haven't published: if the answer isn't in my bio or my writing, it says so in one sentence and points you to my email. A narrow agent that knows its edges beats a broad one that bluffs.
About this case study
The figures on this page are drawn from internal program reporting I authored or co-authored as the practitioner on the engagement. They are reproduced here in rounded form. They were not produced by an independent third party, and proprietary detail has been omitted where required by the engagement.
Lift figures (CSAT, accuracy, handle time, hallucination rate) reflect pre/post comparisons against a matched baseline using the cohort, time window, and measurement instrument noted in the case study. Volume and adoption figures come from production analytics dashboards. Cost figures reflect either avoided spend or unlocked budget in the named fiscal period.
- This is a self-referential case study: the system described is running on the page you are reading, and every claim here is checkable against its behavior.
- 9 grounded sources: the corpus build script's inputs at the time of writing, my about page plus 8 published essays. The generated corpus.json reports the same count and a generatedAt timestamp from the latest build.
- Caps and limits (24 turns, 2,000 characters per turn, 64 KB body, 600 max tokens, 30 requests per minute per IP, 300 per minute globally, 30 second upstream timeout, 5 minute knowledge cache) are constants in the Worker source and its deployment config, not estimates.
- No engagement, satisfaction, or deflection metrics are claimed because none are collected. The Worker stores no conversations; transcripts live only in the visitor's browser session.
- Refusal and injection-handling behavior is enforced by the system prompt, which means it is probabilistic, not guaranteed. The spend-protection layer (origin gate, rate limits, input caps) is enforced in code and fails closed.
What I would do differently
Build the eval set first, again. The refusal surface is the product here, and I tested it the artisanal way: by attacking my own agent until I ran out of ideas. A small golden set of injection attempts, out-of-scope asks, and bait questions, scored on every prompt change, is the obvious upgrade, and it is the same lesson I keep relearning on bigger systems. I would also reconsider the total absence of observability. Storing nothing was the right privacy default, but it means I cannot see which questions the agent fails on; an opt-in, anonymized failure log is probably the right middle. And the corpus pipeline deserves a guard that fails the build if an essay parses to empty prose, because a silent grounding gap is exactly the failure this architecture exists to prevent.
Collaborators
None. This one is mine end to end: persona, prompt, pipeline, Worker, widget, and deployment. That's the point of it. Every other case study on this site describes work shaped with teams and stakeholders; this page is the unmediated sample of how I build when the only constraint is my own standard.
Skills demonstrated
- Persona and refusal design
- Prompt-injection defense in the prompt layer
- Grounding without retrieval infrastructure
- Prompt caching and cost design
- Cloudflare Workers (TypeScript)
- Streaming SSE with response filtering
- Abuse-resistant public endpoints
- Build-time content pipelines
- Accessible widget UX (aria-live, reduced motion)
- Honest scoping and limitation design
Interview it
The chiclet in the corner of this page opens the agent this case study just described. Ask it about my enterprise conversational AI work, the Boothly concierge, or what I think about evals. Try to make it quote a price. Try to make it break character. That's what it's there for.