Voice AI (Cloudflare Neuron)

The Cloudflare Stack Behind Voice

The 🎙️ voice button opens a real-time, full-duplex conversation with ProposalForge. Speech-to-text, the reasoning model, and text-to-speech are orchestrated by a stateful agent running on the edge — no phone numbers, no native app, just the browser microphone. Here is exactly what powers it:

Layer	Cloudflare Technology	Role in Voice
🧠 AI compute	Workers AI (Neurons)	Runs the speech + language models on Cloudflare's edge GPUs, metered in Neurons.
🎤 Speech-to-Text	`@cf/deepgram/flux`	Continuous streaming STT with built-in turn detection — transcribes you as you speak.
🔊 Text-to-Speech	`@cf/deepgram/aura-1`	Converts the assistant's reply into natural spoken audio, streamed back to your browser.
🗣️ Voice runtime	@cloudflare/voice SDK	Wires the microphone stream → STT → LLM → TTS pipeline and handles barge-in / interruptions.
📌 Session state	Durable Objects	One `VoiceAgent` instance per call holds the live conversation, auth, and audio session.
🧩 Reasoning (LLM)	Google Gemini → `@cf/google/gemma-4-26b-a4b-it`	Gemini is primary for resilience; Workers AI Gemma is the automatic fallback.
🔧 Tools / data	MCP Server over HTTP	Lets the voice agent list, create, send, and PDF your proposals — securely scoped to your token.
🌐 Edge routing	Workers (proxy)	A proxy Worker serves the assistant on a clean branded URL at the network edge.

How a Single Voice Turn Flows

🎤 You speak

→

Flux STT (Neurons)

→

LLM reasons + calls tools

→

Aura TTS (Neurons)

→

🔊 You hear the reply

The Voice Agent, in Code

The whole pipeline is declared on a Durable Object. The transcriber and TTS bind directly to the Workers AI Neuron runtime via this.env.AI:

import {
  WorkersAIFluxSTT,
  WorkersAITTS,
} from "@cloudflare/voice";

class VoiceAgent {
  // Speech-to-text and text-to-speech run on Cloudflare Neurons
  transcriber = new WorkersAIFluxSTT(this.env.AI);   // 🎤 @cf/deepgram/flux
  tts         = new WorkersAITTS(this.env.AI);        // 🔊 @cf/deepgram/aura-1

  async onTurn(transcript, context) {
    // Reasoning: Google Gemini primary, Workers AI Gemma fallback
    // Tools (list / create / send / PDF) are fetched from the MCP server
    // over HTTP, scoped to the signed-in user's token.
    return await this.think(transcript, context);
  }
}

Why Build Voice on Neurons

⚡ Edge Latency

Models run in the same Cloudflare data center as the Worker — no round-trip to a distant GPU cloud, so replies feel instant.

🧠 One Currency: Neurons

STT, LLM, and TTS are all billed in Neurons. The free tier includes a generous daily Neuron allowance — enough for everyday voice use at zero cost.

📌 Stateful Calls

Durable Objects keep each conversation alive with full context, so the assistant remembers what you said earlier in the call.

🔐 Token-Scoped Tools

The agent can only touch your proposals — every MCP tool call carries your JWT, with no shared database binding.

🔁 Resilient Reasoning

If one model provider is unavailable, the agent automatically falls back to another so the conversation keeps going.

🌍 No Servers to Run

The entire voice stack is serverless and globally distributed — no GPU instances to provision, patch, or scale.

Understanding the Neuron Free Tier

💜 Daily allowance: Workers AI includes a free daily pool of Neurons, resetting every day at 00:00 UTC.
🎤 STT & 🔊 TTS: Each second of audio in and out draws from that Neuron pool.
🧩 Reasoning offload: Routing the LLM step to Gemini keeps more of your Neuron budget available for the speech models.
🚀 Scale up anytime: Upgrading Workers AI to a paid plan lifts the daily cap for uninterrupted, high-volume voice.

🎙️ Voice AI Cloudflare Neuron

🧠 What is a Neuron?