🤖 Workers AI Edge Inference

Serverless AI inference running on Cloudflare's global GPU network — the engine that drafts your proposals, writes your emails, and powers the voice assistant.

⚡ AI That Runs at the Edge

Workers AI lets ProposalForge run large language models directly on Cloudflare's network — in the same data center that serves the page. There are no GPU servers to rent, no API region to pick, and no cold starts. You describe a proposal in a sentence; a 70-billion-parameter model writes the full document in seconds.

Inference Without Servers

Cloudflare Workers AI is a serverless platform for running machine-learning models on GPUs distributed across Cloudflare's global edge. ProposalForge calls it through a simple env.AI binding — no API keys, no infrastructure — and is billed in Neurons, Cloudflare's unit of AI compute.

How ProposalForge Uses Workers AI

FeatureWhat the model does
📝 Proposal generation Turns a short brief into a structured, multi-section proposal with scope, pricing, and timelines.
✉️ Email drafting Writes the cover email that accompanies each proposal send.
🎙️ Voice reasoning Acts as the fallback brain for the voice assistant when Gemini is unavailable.
🧠 Summaries & analytics Condenses proposals and surfaces insights on demand.

The Models Under the Hood

ModelUsed for
@cf/meta/llama-3.3-70b-instruct-fp8-fast Primary text generation — proposals and emails (fast 70B instruct model).
@cf/google/gemma-4-26b-a4b-it Voice assistant reasoning fallback.
@cf/deepgram/flux Streaming speech-to-text for the voice assistant.
@cf/deepgram/aura-1 Text-to-speech for the voice assistant.

For maximum resilience, ProposalForge pairs Workers AI with a Google Gemini fallback — if one provider is rate-limited or unavailable, generation automatically continues on the other.

A Generation Request, in Code

Calling a 70B model is a single binding call from a Cloudflare Worker — no SDK, no endpoint URL, no key:

// Inside a Cloudflare Pages Function export async function onRequestPost({ request, env }) { const { brief } = await request.json(); const result = await env.AI.run( "@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages: [ { role: "system", content: "You are an expert proposal writer." }, { role: "user", content: `Draft a proposal for: ${brief}` }, ], } ); return Response.json({ proposal: result.response }); }

How a Proposal Gets Written

📝 Your brief
env.AI.run()
Llama 3.3 70B (Neurons)
Structured proposal
📄 Saved to D1

Why Workers AI

🔑 No API Keys

The env.AI binding authenticates automatically — no secrets to manage or rotate for inference.

🌍 Runs at the Edge

Models execute in the same location as your request, cutting the round-trip latency of a distant AI cloud.

🧠 Big Models, Instantly

Access state-of-the-art open models like Llama 3.3 70B without provisioning a single GPU.

💚 Generous Free Tier

A daily Neuron allowance — resetting at 00:00 UTC — covers everyday proposal generation at zero cost.

🔁 Provider Fallback

Paired with a Google Gemini fallback so AI features stay up even if one provider is throttled.

🔒 Private by Design

Your prompts run on Cloudflare's network and aren't used to train third-party models.

Billing in Neurons

Generate a Proposal with AI

Describe your project in one sentence and let Workers AI write the full proposal.

Try AI Generation