Serverless AI inference running on Cloudflare's global GPU network — the engine that drafts your proposals, writes your emails, and powers the voice assistant.
Workers AI lets ProposalForge run large language models directly on Cloudflare's network — in the same data center that serves the page. There are no GPU servers to rent, no API region to pick, and no cold starts. You describe a proposal in a sentence; a 70-billion-parameter model writes the full document in seconds.
Cloudflare Workers AI is a serverless platform for running machine-learning models on GPUs distributed across Cloudflare's global edge. ProposalForge calls it through a simple env.AI binding — no API keys, no infrastructure — and is billed in Neurons, Cloudflare's unit of AI compute.
| Feature | What the model does |
|---|---|
| 📝 Proposal generation | Turns a short brief into a structured, multi-section proposal with scope, pricing, and timelines. |
| ✉️ Email drafting | Writes the cover email that accompanies each proposal send. |
| 🎙️ Voice reasoning | Acts as the fallback brain for the voice assistant when Gemini is unavailable. |
| 🧠 Summaries & analytics | Condenses proposals and surfaces insights on demand. |
| Model | Used for |
|---|---|
@cf/meta/llama-3.3-70b-instruct-fp8-fast |
Primary text generation — proposals and emails (fast 70B instruct model). |
@cf/google/gemma-4-26b-a4b-it |
Voice assistant reasoning fallback. |
@cf/deepgram/flux |
Streaming speech-to-text for the voice assistant. |
@cf/deepgram/aura-1 |
Text-to-speech for the voice assistant. |
For maximum resilience, ProposalForge pairs Workers AI with a Google Gemini fallback — if one provider is rate-limited or unavailable, generation automatically continues on the other.
Calling a 70B model is a single binding call from a Cloudflare Worker — no SDK, no endpoint URL, no key:
The env.AI binding authenticates automatically — no secrets to manage or rotate for inference.
Models execute in the same location as your request, cutting the round-trip latency of a distant AI cloud.
Access state-of-the-art open models like Llama 3.3 70B without provisioning a single GPU.
A daily Neuron allowance — resetting at 00:00 UTC — covers everyday proposal generation at zero cost.
Paired with a Google Gemini fallback so AI features stay up even if one provider is throttled.
Your prompts run on Cloudflare's network and aren't used to train third-party models.
Describe your project in one sentence and let Workers AI write the full proposal.
Try AI Generation