Live now: www.boothlyevents.com — open the chat widget on the homepage to see the concierge in action.
Stack at a glance
Stack at a glance
Claude Sonnet 4.6
Next.js 16
React 19
TypeScript
Tailwind v4
Microsoft Graph
Twilio SMS
Vercel
Brand marks via Simple Icons (CC0 1.0). Trademarks belong to their respective owners; used here for accurate factual reference to real integrations.
The challenge
Boothly Events was running a small empire on three inboxes and a phone: Instagram DMs, email, the call line, the contact form. Quotes drifted. "Did we ever reply to her?" was a recurring question. Nights and weekends, when most events are actually being planned, response time was effectively the next morning. A photo-booth booking is impulse shaped: if a host does not get a quote while they are on the planning sprint, they book the next vendor on the list.
The goal was to turn the website into a real concierge that triages every lead, builds a real quote with real pricing, and lands a tentative hold on the actual Outlook calendar, at 11pm on a Tuesday, on autopilot, without pretending to be more capable than it is.
Constraints
- One operator, no margin for error. A solo owner cannot afford a bot that misquotes, double books, or claims a date is held when it is not. Hallucination at the booking layer is a business-ending failure mode.
- Real systems of record. Bookings have to live on the actual Outlook calendar the owner already runs. A parallel "AI calendar" is not a feature, it is a second source of truth that will eventually disagree with the first.
- Voice that does not sound like a form. Most "AI booking assistants" sound like a contact form with sentences glued on. The studio's brand is warm, DFW-rooted, and conversational. The agent had to match.
- Cost discipline. A side hustle cannot subsidize OpenAI-grade per-conversation cost. The unit economics of a single quote had to stay well under the cost of one paid-ad click.
- Owner notification is the moat. The owner needed to know about every quote and every hold the moment it happened, on a phone she already carries, without opening a laptop.
My approach
I designed the system around four boring, deterministic tools and one non-deterministic agent that decides when to call them. The agent has no special powers. It has four functions, and it picks which one to use. That is the entire pattern.
- Four-tool template.
check_calendar_availabilityreads the Outlook free/busy view.propose_holdwrites a tentative event flagged for owner approval.send_quote_emailsends the customer a branded quote and the owner a lead-summary email.escalate_to_teamtexts the owner when the agent is stuck or sees a hot lead. Every tool is plain TypeScript that does exactly one thing. The model decides when to call them and what arguments to pass, but it cannot invent calendar holds or invoice numbers. - One pricing source of truth. Pricing is loaded into the system prompt at build time from a single
packages.tsmodule. Add an add-on, change a price, edit one object — the system prompt rebuilds on the next request. No retraining, no second prompt-engineering session. - Prompt caching on the system prompt. The system prompt is ~140 lines of pricing tables, FAQs, and voice rules. Wrapping it in
cache_control: ephemeralcuts cost-per-conversation dramatically and shaves first-token latency on every turn after the first. - Voice tuned by example, not by rule. The system prompt forbids the bulleted-checklist energy that most chatbots default to, and includes one good response shown end to end so the model can pattern-match the energy. Every iteration on the prompt, the test was: read the response aloud. If it sounds like a form, rewrite the rule. If it sounds like a person, ship.
- Anti-hallucination guard at the prompt level, enforced by the loop. One line of the system prompt: NEVER claim a date is held without ACTUALLY calling
propose_hold. If you say "tentatively held" in a message, you MUST have calledpropose_holdin that same turn. Combined with the multi-iteration tool loop in the API route, this structurally pushes the model toward call the tool, then announce, not announce, then maybe call. - Owner-notification flow as a first-class feature. A booking concierge that does not tell the human anything is a customer-facing toy. Every hold sends an SMS with a deep link to the calendar entry. Every quote sends a separate lead-summary email and an SMS to the owner. Both internal notifications are wrapped in their own try/catch so a Graph or Twilio hiccup can never make a successful customer send look failed to the agent.
stop_reason === "tool_use" with a 5-iteration cap; the model picks
tools, the tools do real work.
Lifecycle of a lead
- 01
Customer opens the chat on boothlyevents.com
One question at a time, contractions everywhere, no bulleted intake form.
Next.js
React
Tailwind
- 02
Agent confirms the date is open
Calls
check_calendar_availability, which hits/calendar/getScheduleon the studio's Outlook calendar.Claude tool use
Microsoft Graph
- 03
Agent builds a real quote
Line items composed from the single
packages.tssource of truth — never guessed, always sourced.TypeScript
Claude (prompt-cached)
- 04
Agent places a tentative hold
Calls
propose_hold, writes[HOLD - APPROVAL NEEDED]with full lead body and theBoothly Leadcategory, and texts the owner.Outlook Calendar
Twilio SMS
- 05
Customer + owner emails go out, owner gets a second SMS
Branded HTML quote to the customer; lead-summary email to the owner; SMS recap with total + date. Internal sends are isolated from the customer-facing send.
Graph Mail
Twilio SMS
- 06
Anything weird escalates straight to the owner
Custom asks, distant venues, unusual dates —
escalate_to_teamtexts the owner with the reason and a compact lead summary.Twilio SMS
Vercel (always-on)
Tool use over prompt-only. A larger prompt could have "described" pricing and dates well enough to fool a casual reader. Tool use forces the model to commit: capacity is checked against a real calendar, holds are written to a real event, quotes are built from a typed line-item schema. Failures are localizable to the tool that produced them.
Tool-use loop capped at 5 iterations. The API route
loops until stop_reason is no longer tool_use,
with a hard cap. The cap bounds runaway cost and the rare case of the
model getting stuck in a tool/think/tool ping-pong. If it ever trips,
the user sees a graceful "our team has been notified" fallback and the
owner gets an SMS.
Pricing in the prompt, not in the model. Pricing lives
in src/content/packages.ts and is injected into the system
prompt at build time. The model never "knows" prices in any durable
sense, it reads them on every turn. Price changes ship in a single PR
and take effect on the next request.
Server-only credentials. The browser never sees Graph
or Twilio secrets. The chat widget is a 125-line React component, input
box, scrollable transcript, dot-loader, that posts to /api/agent.
All intelligence and all secrets live on the server.
Artifacts I authored
- System prompt: ~140 lines covering brand voice, pricing tables, FAQs, escalation rules, and the anti-hallucination guard, wrapped in ephemeral prompt caching
- Four tool definitions and their TypeScript implementations against Microsoft Graph and Twilio
- Tool-use loop in the Next.js API route with a 5-iteration cap and graceful fallback path
- Pricing source of truth (
packages.ts) — single file, types, used by both the system prompt and the quote builder - Branded customer quote email template (HTML) sent from the studio's events mailbox
- Internal lead-summary email and Twilio SMS templates for owner notification, isolated in their own try/catch so internal failures never break the customer experience
- Tentative-hold convention:
[HOLD - APPROVAL NEEDED]subject prefix,Boothly LeadOutlook category, lead details in event body - Chat widget React component (
ChatWidget.tsx): input, transcript, loading state — no streaming, deliberately
Results
Lead capture is now genuinely 24/7. Late-night planners get a full quote and a tentative hold before the lead has time to comparison-shop the next vendor on the list. Every quote is a paper trail: the owner's inbox becomes a lead pipeline by accident, every quote searchable, replyable, and itemized. SMS pings on every quote and hold mean the owner can vet a hot lead from a venue site visit without opening a laptop. And the "tentatively held" guarantee is structurally backed by an actual Outlook event, not by the model's good intentions.
About this case study
The figures on this page are drawn from internal program reporting I authored or co-authored as the practitioner on the engagement. They are reproduced here in rounded form. They were not produced by an independent third party, and proprietary detail has been omitted where required by the engagement.
Lift figures (CSAT, accuracy, handle time, hallucination rate) reflect pre/post comparisons against a matched baseline using the cohort, time window, and measurement instrument noted in the case study. Volume and adoption figures come from production analytics dashboards. Cost figures reflect either avoided spend or unlocked budget in the named fiscal period.
- Boothly Events is a side project I designed and shipped solo (architecture, prompt design, implementation, deployment).
- Latency claim (<60s lead → quote + hold + SMS): measured end-to-end on cached system-prompt turns in production. Conversational turns excluding the initial intake are typically <2s.
- Cost claim ('well under the cost of a single paid-ad click'): based on Anthropic per-token pricing for Sonnet 4.6 with prompt caching on the system prompt across the typical greeting → details → quote → hold → confirmation arc. Estimate, not audited.
- Anti-hallucination guarantee: enforced jointly by a system-prompt rule and the tool-use loop in the API route, not by post-hoc filtering. The guard is structural, not statistical.
- Owner notifications: customer email and internal owner email/SMS are dispatched from separate try/catch blocks so an internal-notification failure never causes the customer-facing send to be reported as failed.
What I would do differently
Build the evaluation harness alongside the system prompt, not after it. A small set of golden conversations, scored on "did the right tool get called with the right arguments," would have caught two prompt regressions earlier than read-aloud testing did. The other note to self: streaming would have been worth the extra complexity from day one. The agent's first-token latency is fine, but a streamed response feels twice as fast even when it is not, and feel is the product on a booking site.
The reusable pattern is not photo-booth-specific. Any service business with three traits drops into the same shape: schedulable capacity that already lives in a system of record, a pricing model that can be expressed as packages plus add-ons plus rules, and an owner who currently triages leads from a phone. The four-tool template — check capacity, hold capacity, send proposal, escalate — covers wedding photographers, dog trainers, mobile detailers, tutoring services, mobile bartending, event rental, lawn care. Swap the calendar source. Swap the quote schema. Swap the brand voice. The agent loop is the same.
Collaborators
Built solo, in close partnership with the studio owner on brand voice, pricing rules, and the operational guardrails for what the agent should and should not commit to. The owner remains the final authority on every booking — the agent only places tentative holds, every one of which is reviewed and confirmed by a human before it becomes a real booking.
Skills demonstrated
- Tool-use agent design (Anthropic SDK)
- Prompt architecture with prompt caching
- System-level anti-hallucination guardrails
- Microsoft Graph integration (calendar + mail)
- Twilio SMS notification design
- Next.js 16 App Router + server-only secrets
- Single-source-of-truth pricing modules
- Brand-voice tuning by example
- Cost-aware LLM engineering
- Owner-facing operational design
- Reusable agent templates for service businesses