Workers AI - ProposalForge

Inference Without Servers

Cloudflare Workers AI is a serverless platform for running machine-learning models on GPUs distributed across Cloudflare's global edge. ProposalForge calls it through a simple env.AI binding — no API keys, no infrastructure — and is billed in Neurons, Cloudflare's unit of AI compute.

How ProposalForge Uses Workers AI

Feature	What the model does
📝 Proposal generation	Turns a short brief into a structured, multi-section proposal with scope, pricing, and timelines.
✉️ Email drafting	Writes the cover email that accompanies each proposal send.
🎙️ Voice reasoning	Acts as the fallback brain for the voice assistant when Gemini is unavailable.
🧠 Summaries & analytics	Condenses proposals and surfaces insights on demand.

The Models Under the Hood

Model	Used for
`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	Primary text generation — proposals and emails (fast 70B instruct model).
`@cf/google/gemma-4-26b-a4b-it`	Voice assistant reasoning fallback.
`@cf/deepgram/flux`	Streaming speech-to-text for the voice assistant.
`@cf/deepgram/aura-1`	Text-to-speech for the voice assistant.

For maximum resilience, ProposalForge pairs Workers AI with a Google Gemini fallback — if one provider is rate-limited or unavailable, generation automatically continues on the other.

A Generation Request, in Code

Calling a 70B model is a single binding call from a Cloudflare Worker — no SDK, no endpoint URL, no key:

// Inside a Cloudflare Pages Function
export async function onRequestPost({ request, env }) {
  const { brief } = await request.json();

  const result = await env.AI.run(
    "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    {
      messages: [
        { role: "system", content: "You are an expert proposal writer." },
        { role: "user", content: `Draft a proposal for: ${brief}` },
      ],
    }
  );

  return Response.json({ proposal: result.response });
}

How a Proposal Gets Written

📝 Your brief

→

env.AI.run()

→

Llama 3.3 70B (Neurons)

→

Structured proposal

→

📄 Saved to D1

Why Workers AI

🔑 No API Keys

The env.AI binding authenticates automatically — no secrets to manage or rotate for inference.

🌍 Runs at the Edge

Models execute in the same location as your request, cutting the round-trip latency of a distant AI cloud.

🧠 Big Models, Instantly

Access state-of-the-art open models like Llama 3.3 70B without provisioning a single GPU.

💚 Generous Free Tier

A daily Neuron allowance — resetting at 00:00 UTC — covers everyday proposal generation at zero cost.

🔁 Provider Fallback

Paired with a Google Gemini fallback so AI features stay up even if one provider is throttled.

🔒 Private by Design

Your prompts run on Cloudflare's network and aren't used to train third-party models.

Billing in Neurons

🧮 Neurons are Cloudflare's single unit of AI compute across every model — text, speech, and embeddings.
💚 Free daily pool: Workers AI includes a free Neuron allowance that resets every day at 00:00 UTC.
🎙️ Shared budget: Proposal generation, summaries, and the voice assistant all draw from the same Neuron pool.
🚀 Scale up: A paid plan lifts the daily cap for high-volume, uninterrupted AI.

🤖 Workers AI Edge Inference

⚡ AI That Runs at the Edge