Technical Deep Dive

100% Cloudflare-native.
Zero origin servers.

No AWS. No Vercel. No containers. Every component — inference, storage, sessions, queues — runs on Cloudflare's global edge across 300+ points of presence. Here's exactly how it works.

Platform Architecture

Every request enters through Cloudflare Workers and stays on the edge. No round-trips to a centralised cloud region.

Ingress Channels
Twilio Voice
Voice calls
Resend / Email Routing
Inbound email
Twilio SMS
Text messages
Web Chat
Browser widget
Webhooks
Worker Layer
Channel Router
Hono middleware
AI Gateway
Cache / Rate-limit / Fallback
Agent Orchestrator
Durable Object
Function Calling
AI + Tool Chain
Kimi K2.6
LLM Inference
Whisper v3
STT
ElevenLabs
TTS
check_availability
Tool
quote_price
Tool
book_slip
Tool
Read / Write
Storage Layer
D1
Relational (SQLite)
KV
Config & rate cards
R2
Contracts & recordings
Vectorize
RAG over policies
Queues
Async tasks
Workers AI
Inference

Kimi K2.6 inference — MoE model with native function-calling. Runs on Cloudflare GPUs at the edge with zero cold starts.

D1 Database
Relational

SQLite at the edge. 7 tables with multi-tenant indexes, foreign keys, and CHECK constraints. Global replication.

Durable Objects
Sessions

Stateful agent sessions per (marina_id, conversation_id). Maintains context across multi-turn voice conversations.

Workers KV
Config

Globally replicated key-value store for hot-path data: rate cards, agent configs. Sub-ms reads from 300+ PoPs.

R2 Storage
Files

S3-compatible object storage for contracts (PDF), call recordings, and email attachments. Zero egress fees.

Vectorize
RAG

Vector database for RAG over marina policies, FAQ docs, and historical interactions. Enables policy citation.

AI Gateway
Gateway

Sits in front of all LLM calls. Response caching, rate limiting, cost tracking, and automatic fallback routing.

Queues
Async

Async task processing: DockMaster sync, email dispatch, Slack notifications, contract generation.

AI Stack

Three models working together: one thinks, one listens, one speaks.

Kimi K2.6

@cf/moonshotai/kimi-k2.6

Primary reasoning model. MoE architecture activates only relevant expert sub-networks per token, keeping inference cost low at high volume.

Architecture MoE — 1T total, ~32B active
Context 128K tokens
Function calling Native (not prompt-injected)
Temperature 0.2 (low creativity for reliability)
Tool chain 5 tools in booking sequence
Fallback Llama 3.3 70B via AI Gateway

Whisper Large v3 Turbo

@cf/openai/whisper-large-v3-turbo

Speech-to-text for all voice interactions. Processes Twilio Media Stream audio chunks in near real-time with speaker diarisation.

Latency < 300ms per chunk
Languages 99+ (English primary)
Input Twilio Media Streams (mulaw/8kHz)
Features Timestamps, confidence scores
Runs on Cloudflare Workers AI GPU

ElevenLabs TTS

Primary + Workers AI fallback

Text-to-speech for voice responses. ElevenLabs for production-quality voices; Workers AI MeloTTS as a zero-latency fallback.

Primary ElevenLabs (configurable voice)
Fallback Workers AI MeloTTS
Latency < 200ms first byte
Output PCM/mulaw streamed to Twilio
Voice IDs Per-marina configurable

5-Tool Booking Chain

Kimi K2.6 calls these tools via native function-calling. The model receives JSON-schema definitions and returns structured tool_call objects.

1
check_availability

Queries D1 for matching slips with date-overlap exclusion and vessel dimension filtering.

2
quote_price

Deterministic pricing engine: base × season × DOW × occupancy × events. Never LLM-generated.

3
draft_contract

Generates PDF rental agreement from template, stores in R2, returns DocuSign e-sign link.

4
take_payment

Creates Stripe Checkout session with booking amount. Returns payment link to guest.

5
book_slip

Writes confirmed booking to D1, syncs to DockMaster PMS via API, logs audit event.

Escape hatch: escalate_to_human

A 6th tool the agent can call at any point to route the conversation to a human. Triggered automatically when confidence drops below threshold, dollar cap is exceeded, or max turns is reached.

Data Layer — D1 Schema

7 tables, all scoped by marina_id for strict multi-tenant isolation.

marinas

Tenant root table. One row per marina property.

id TEXT PK
name TEXT
timezone TEXT
address TEXT
lat REAL / lng REAL
total_slips INTEGER
dm_api_endpoint TEXT
slips

Physical slip inventory with dimensions and amenities.

id TEXT PK
marina_id TEXT FK
slip_no TEXT
dock_section TEXT
length_ft / beam_ft / depth_ft
has_power_30a / has_power_50a
has_water / has_wifi
status ENUM
rate_cards

Pricing configuration with JSON curve definitions.

id TEXT PK
marina_id TEXT FK
base_rate_json
season_curve_json
dow_curve_json
event_premiums_json
occupancy_curve_json
cancellation_policy_json
agent_configs

Per-marina AI agent personality and guardrails.

marina_id TEXT FK
system_prompt TEXT
greeting_message TEXT
voice_id TEXT
dollar_cap_per_booking INT
confidence_threshold REAL
max_turns_before_escalation INT
escalation_rules_json
inquiries

Every inbound interaction across all channels.

id TEXT PK
marina_id TEXT FK
channel ENUM
caller_info TEXT
transcript_text TEXT
confidence_score REAL
status ENUM
assigned_to TEXT
bookings

Confirmed reservations with PMS sync status.

id TEXT PK
marina_id TEXT FK
slip_id TEXT FK
inquiry_id TEXT FK
guest_name / guest_email / vessel_name
start_ts / end_ts DATETIME
price_cents INT
dm_synced BOOLEAN
agent_attributed BOOLEAN
events

Full audit trail — every action the agent takes.

id INTEGER PK AUTOINCREMENT
marina_id TEXT FK
inquiry_id TEXT FK
event_type TEXT
actor TEXT
detail_json TEXT
ts DATETIME

Multi-Tenancy Rule

Every D1 table, KV key prefix, Vectorize namespace, R2 bucket prefix, and Durable Object ID includes marina_id as a scoping dimension. Zero cross-tenant data leakage by design.

Deterministic Pricing Engine

The LLM never generates prices. Every dollar amount comes from this formula, executed deterministically on the Worker.

// Final price calculation
total = base_rate × vessel_length × nights
        × season_multiplier
        × avg(dow_multipliers)
        × occupancy_multiplier
        × event_premium
        + add_ons
Season Curve
Peak (Dec-Mar): 1.55×
Shoulder (Apr-May, Oct-Nov): 1.20×
Off-Peak (Jun-Sep): 0.80×
Day of Week
Mon–Wed: 1.00×
Thu: 1.05× / Fri: 1.15×
Sat: 1.25× / Sun: 1.10×
Occupancy
> 90%: 1.35×
> 80%: 1.15× / > 70%: 1.00×
< 60%: 0.85× (fill incentive)
Events
FLIBS: 2.00× (Oct)
Winterfest Parade: 1.50× (Dec)
July 4th: 1.40×
Base Rates
Nightly: $2.75/ft
Weekly: $16.50/ft
Monthly: $45.00/ft
Add-Ons
30A power: $15/night
50A power: $25/night
WiFi: $8 / Pump-out: $50

Why deterministic?

LLMs are great at conversation but unreliable at arithmetic. A hallucinated price creates legal liability and erodes guest trust. By running pricing as a pure function on the Worker, the agent can confidently quote exact rates that match your published rate card.

Guardrails & Security

Production AI needs more than vibes. These are hard constraints, not suggestions.

Agent Guardrails

Dollar cap — Max booking value before auto-escalation. Default: $15,000.
Confidence threshold — Below this score, the agent escalates. Default: 0.75.
Max turns — Turn limit before forcing human handoff. Default: 20.
Deterministic pricing — Prices always from rate card function, never LLM-generated.
Double-booking lock — D1 query checks date overlaps before any reservation.
Out-of-policy detection — Liveaboards, groups, insurance — auto-routed to staff.

Infrastructure Security

Google OAuth SSO — Google OAuth 2.0 with JWT session cookies on all dashboard routes.
API tokens as secrets — Stripe, Twilio, ElevenLabs keys stored as Cloudflare Secrets.
Tenant isolation — All data paths include marina_id. No shared-namespace leaks.
AI Gateway — Rate-limits LLM calls per tenant. Prevents runaway token spend.
Audit trail — Every agent action logged with actor, timestamp, and detail JSON.
Data residency — D1 replication stays within configured jurisdiction.

Voice Pipeline — End to End

From phone ring to spoken response in under 1.5 seconds.

TWI
Twilio

Guest calls marina number. Twilio opens a Media Stream WebSocket.

~100ms
WOR
Worker

Channel Router receives audio chunks via WebSocket on Cloudflare edge.

~5ms
WHI
Whisper

Audio chunks transcribed to text in near real-time. Partial results streamed.

~300ms
KIM
Kimi K2.6

Transcript → agent reasoning → tool calls → response text.

~600ms
ELE
ElevenLabs

Response text → natural speech audio. First byte in < 200ms.

~200ms
TWI
Twilio

Audio streamed back to caller via Media Stream.

~100ms
Total roundtrip: < 1.5 seconds (vs. 8-15s for typical IVR + hold queue)

Full Stack Reference

Everything that powers Harbourmaster AI, in one table.

Layer Technology Purpose
Framework Hono 4 Lightweight, fast web framework for Workers
Build Vite + @hono/vite-build SSR bundle for Cloudflare Pages
Runtime Cloudflare Workers V8 isolates at 300+ global PoPs
LLM Kimi K2.6 (MoE) Reasoning + native function-calling
STT Whisper Large v3 Turbo Real-time speech transcription
TTS ElevenLabs / MeloTTS Natural voice synthesis
Database Cloudflare D1 (SQLite) Relational data, multi-tenant
KV Store Cloudflare Workers KV Config, rate cards, session cache
Object Storage Cloudflare R2 Contracts, recordings, attachments
Vector DB Cloudflare Vectorize RAG over marina policies
Sessions Durable Objects Stateful multi-turn agent sessions
Gateway Cloudflare AI Gateway LLM caching, rate limits, fallback
Queues Cloudflare Queues Async PMS sync, notifications
Auth Google OAuth 2.0 + JWT SSO for dashboard with session cookies
Voice Twilio Media Streams Telephony ingress/egress
Email Resend + CF Email Routing Inbound/outbound email
SMS Twilio Messaging Text message channel
Payments Stripe Checkout Guest payment collection
Contracts DocuSign E-signature for rental agreements
PMS DockMaster API Property management sync
Alerts Slack API Staff notifications & escalations
Frontend Tailwind CSS + Space Grotesk Utility-first styling, Abyssal Intelligence theme
TypeScript ES2022 target Type-safe Workers code

Want to kick the tyres?

Open the dashboard, try the live chat, explore the API. Everything's running.