🎙️ Voice AI Cloudflare Neuron

Talk to ProposalForge in plain English — powered end-to-end by Cloudflare's Workers AI Neurons, Durable Objects, and the @cloudflare/voice SDK.

🧠 What is a Neuron?

A Neuron is Cloudflare's unit of AI compute — the metering currency for Workers AI. Every transcription, every spoken reply, and every model inference consumes Neurons. ProposalForge's voice assistant runs its real-time speech pipeline entirely on Neurons at the edge, so there are no separate GPU servers to manage and the free tier covers everyday use.

The Cloudflare Stack Behind Voice

The 🎙️ voice button opens a real-time, full-duplex conversation with ProposalForge. Speech-to-text, the reasoning model, and text-to-speech are orchestrated by a stateful agent running on the edge — no phone numbers, no native app, just the browser microphone. Here is exactly what powers it:

LayerCloudflare TechnologyRole in Voice
🧠 AI compute Workers AI (Neurons) Runs the speech + language models on Cloudflare's edge GPUs, metered in Neurons.
🎤 Speech-to-Text @cf/deepgram/flux Continuous streaming STT with built-in turn detection — transcribes you as you speak.
🔊 Text-to-Speech @cf/deepgram/aura-1 Converts the assistant's reply into natural spoken audio, streamed back to your browser.
🗣️ Voice runtime @cloudflare/voice SDK Wires the microphone stream → STT → LLM → TTS pipeline and handles barge-in / interruptions.
📌 Session state Durable Objects One VoiceAgent instance per call holds the live conversation, auth, and audio session.
🧩 Reasoning (LLM) Google Gemini → @cf/google/gemma-4-26b-a4b-it Gemini is primary for resilience; Workers AI Gemma is the automatic fallback.
🔧 Tools / data MCP Server over HTTP Lets the voice agent list, create, send, and PDF your proposals — securely scoped to your token.
🌐 Edge routing Workers (proxy) A proxy Worker serves the assistant on a clean branded URL at the network edge.

How a Single Voice Turn Flows

🎤 You speak
Flux STT (Neurons)
LLM reasons + calls tools
Aura TTS (Neurons)
🔊 You hear the reply

The Voice Agent, in Code

The whole pipeline is declared on a Durable Object. The transcriber and TTS bind directly to the Workers AI Neuron runtime via this.env.AI:

import { WorkersAIFluxSTT, WorkersAITTS, } from "@cloudflare/voice"; class VoiceAgent { // Speech-to-text and text-to-speech run on Cloudflare Neurons transcriber = new WorkersAIFluxSTT(this.env.AI); // 🎤 @cf/deepgram/flux tts = new WorkersAITTS(this.env.AI); // 🔊 @cf/deepgram/aura-1 async onTurn(transcript, context) { // Reasoning: Google Gemini primary, Workers AI Gemma fallback // Tools (list / create / send / PDF) are fetched from the MCP server // over HTTP, scoped to the signed-in user's token. return await this.think(transcript, context); } }

Why Build Voice on Neurons

⚡ Edge Latency

Models run in the same Cloudflare data center as the Worker — no round-trip to a distant GPU cloud, so replies feel instant.

🧠 One Currency: Neurons

STT, LLM, and TTS are all billed in Neurons. The free tier includes a generous daily Neuron allowance — enough for everyday voice use at zero cost.

📌 Stateful Calls

Durable Objects keep each conversation alive with full context, so the assistant remembers what you said earlier in the call.

🔐 Token-Scoped Tools

The agent can only touch your proposals — every MCP tool call carries your JWT, with no shared database binding.

🔁 Resilient Reasoning

If one model provider is unavailable, the agent automatically falls back to another so the conversation keeps going.

🌍 No Servers to Run

The entire voice stack is serverless and globally distributed — no GPU instances to provision, patch, or scale.

Understanding the Neuron Free Tier

Talk to ProposalForge

Sign in, tap the 🎙️ button, and ask it to draft, send, or summarize a proposal — out loud.

Try Voice AI