Streaming API

MuAPI supports Server-Sent Events (SSE) streaming for LLM text generation endpoints. Streaming delivers tokens progressively as they are generated, enabling real-time display in chat UIs and interactive applications.

Available Streaming Endpoints

Endpoint	Model	Description
`POST /api/v1/gemini-flash/stream`	Gemini 3 Flash	Fast multimodal LLM with vision support

How Streaming Works

Send a POST request to a /stream endpoint with your payload.
Read the response as a stream of text/event-stream events.
Each event contains a JSON delta with partial content.
The stream ends with a data: [DONE] sentinel.

Billing: Cost is calculated from actual token usage after the stream completes — $0.30/M input tokens and $1.80/M output tokens for Gemini 3 Flash. A minimum wallet balance of $1.00 is required.

Request Format

Method: POST
Authentication: x-api-key header
Content-Type: application/json
Response Content-Type: text/event-stream

Payload

{
  "prompt": "Explain quantum entanglement in simple terms.",
  "image_url": "https://example.com/image.jpg",
  "system_prompt": "You are a concise science communicator."
}

Field	Type	Required	Description
`prompt`	string	Yes	The user message or instruction
`image_url`	string (URL)	No	Optional image for multimodal requests
`system_prompt`	string	No	System-level instruction to control model behavior

SSE Response Format

Each event line starts with data: followed by a JSON object:

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":" entanglement"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":"..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":80,"total_tokens":92}}

data: [DONE]

Code Examples

Python (httpx — recommended for async)

import httpx

API_KEY = "your_api_key_here"

with httpx.Client(timeout=120) as client:
    with client.stream(
        "POST",
        "https://api.muapi.ai/api/v1/gemini-flash/stream",
        headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
        json={
            "prompt": "Write a short poem about the ocean.",
            "system_prompt": "You are a creative poet."
        }
    ) as response:
        for line in response.iter_lines():
            if line.startswith("data: "):
                data = line[6:]
                if data == "[DONE]":
                    break
                import json
                chunk = json.loads(data)
                delta = chunk["choices"][0]["delta"].get("content", "")
                if delta:
                    print(delta, end="", flush=True)
        print()

Python (requests)

import requests
import json

API_KEY = "your_api_key_here"

response = requests.post(
    "https://api.muapi.ai/api/v1/gemini-flash/stream",
    headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
    json={
        "prompt": "Summarize the history of the internet.",
        "system_prompt": "Be concise and factual."
    },
    stream=True,
    timeout=120
)

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            data = line[6:]
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0]["delta"].get("content", "")
            if delta:
                print(delta, end="", flush=True)

print()

Python (with image — multimodal)

import httpx
import json

API_KEY = "your_api_key_here"

with httpx.Client(timeout=120) as client:
    with client.stream(
        "POST",
        "https://api.muapi.ai/api/v1/gemini-flash/stream",
        headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
        json={
            "prompt": "Describe what you see in this image.",
            "image_url": "https://example.com/photo.jpg"
        }
    ) as response:
        for line in response.iter_lines():
            if line.startswith("data: "):
                data = line[6:]
                if data == "[DONE]":
                    break
                chunk = json.loads(data)
                delta = chunk["choices"][0]["delta"].get("content", "")
                if delta:
                    print(delta, end="", flush=True)
        print()

JavaScript / Node.js (fetch)

const API_KEY = "your_api_key_here";

async function streamGeminiFlash(prompt, systemPrompt = null) {
  const body = { prompt };
  if (systemPrompt) body.system_prompt = systemPrompt;

  const response = await fetch("https://api.muapi.ai/api/v1/gemini-flash/stream", {
    method: "POST",
    headers: {
      "x-api-key": API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(body),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    for (const line of chunk.split("\n")) {
      if (line.startsWith("data: ")) {
        const data = line.slice(6).trim();
        if (data === "[DONE]") return;
        try {
          const parsed = JSON.parse(data);
          const delta = parsed.choices?.[0]?.delta?.content ?? "";
          if (delta) process.stdout.write(delta);
        } catch {}
      }
    }
  }
}

streamGeminiFlash(
  "Explain how neural networks learn.",
  "You are a clear technical writer."
).then(() => console.log());

JavaScript (Browser — EventSource alternative via fetch)

const API_KEY = "your_api_key_here";

async function streamToElement(prompt, targetElement) {
  const response = await fetch("https://api.muapi.ai/api/v1/gemini-flash/stream", {
    method: "POST",
    headers: {
      "x-api-key": API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ")) {
        const data = line.slice(6).trim();
        if (data === "[DONE]") return;
        try {
          const parsed = JSON.parse(data);
          const delta = parsed.choices?.[0]?.delta?.content ?? "";
          if (delta) targetElement.textContent += delta;
        } catch {}
      }
    }
  }
}

// Usage
const outputDiv = document.getElementById("output");
streamToElement("Write a haiku about AI.", outputDiv);

cURL

curl -X POST "https://api.muapi.ai/api/v1/gemini-flash/stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What are the main causes of climate change?"}' \
  --no-buffer

To see only the text content (strips SSE framing):

curl -X POST "https://api.muapi.ai/api/v1/gemini-flash/stream" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "List 5 programming best practices."}' \
  --no-buffer \
  | grep "^data: " \
  | grep -v "\[DONE\]" \
  | sed 's/^data: //' \
  | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        chunk = json.loads(line)
        delta = chunk['choices'][0]['delta'].get('content', '')
        print(delta, end='', flush=True)
    except: pass
print()
"

TypeScript (with type safety)

const API_BASE = "https://api.muapi.ai/api/v1";

interface StreamChunk {
  id: string;
  choices: Array<{
    index: number;
    delta: { content?: string; role?: string };
    finish_reason: string | null;
  }>;
  usage?: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}

async function streamGeminiFlash(
  prompt: string,
  options: { imageUrl?: string; systemPrompt?: string } = {},
  apiKey: string
): Promise<string> {
  const body: Record<string, string> = { prompt };
  if (options.imageUrl) body.image_url = options.imageUrl;
  if (options.systemPrompt) body.system_prompt = options.systemPrompt;

  const response = await fetch(`${API_BASE}/gemini-flash/stream`, {
    method: "POST",
    headers: { "x-api-key": apiKey, "Content-Type": "application/json" },
    body: JSON.stringify(body),
  });

  if (!response.ok) throw new Error(`HTTP ${response.status}`);

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();
  let fullText = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (!line.startsWith("data: ")) continue;
      const data = line.slice(6).trim();
      if (data === "[DONE]") return fullText;
      try {
        const chunk: StreamChunk = JSON.parse(data);
        const delta = chunk.choices[0]?.delta?.content ?? "";
        fullText += delta;
        process.stdout.write(delta);
      } catch {}
    }
  }

  return fullText;
}

// Usage
streamGeminiFlash(
  "Describe the future of AI in healthcare.",
  { systemPrompt: "Be optimistic but realistic." },
  "your_api_key_here"
).then(() => console.log());

Comparison: Streaming vs Standard

Feature	Standard (`/gemini-flash`)	Streaming (`/gemini-flash/stream`)
Response	`request_id` → poll for result	Live SSE token stream
Latency to first token	Higher (full generation first)	Low (tokens arrive immediately)
Best for	Workflows, automation, batch	Chat UIs, real-time display
Webhook support	Yes	No (response is the stream)
Billing	Post-call, token-based	Post-stream, token-based
Minimum balance	$1.00	$1.00

Error Handling

If an error occurs during streaming, the stream will emit an error event before closing:

data: {"error": "upstream provider timeout"}

Always handle this case in your client:

chunk = json.loads(data)
if "error" in chunk:
    print(f"Stream error: {chunk['error']}")
    break
delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")

Pricing

Gemini 3 Flash streaming uses token-based billing applied after the stream completes:

Token Type	Rate
Input tokens	$0.30 per million
Output tokens	$1.80 per million

Example: A request with 500 input tokens and 800 output tokens costs:

Input: 500 × $0.30 / 1,000,000 = $0.00015
Output: 800 × $1.80 / 1,000,000 = $0.00144
Total: ~$0.0016