Streaming API
MuAPI supports Server-Sent Events (SSE) streaming for LLM text generation endpoints. Streaming delivers tokens progressively as they are generated, enabling real-time display in chat UIs and interactive applications.
Available Streaming Endpoints
| Endpoint | Model | Description |
|---|---|---|
POST /api/v1/gemini-flash/stream | Gemini 3 Flash | Fast multimodal LLM with vision support |
How Streaming Works
- Send a POST request to a
/streamendpoint with your payload. - Read the response as a stream of
text/event-streamevents. - Each event contains a JSON delta with partial content.
- The stream ends with a
data: [DONE]sentinel.
Billing: Cost is calculated from actual token usage after the stream completes — $0.30/M input tokens and $1.80/M output tokens for Gemini 3 Flash. A minimum wallet balance of $1.00 is required.
Request Format
- Method:
POST - Authentication:
x-api-keyheader - Content-Type:
application/json - Response Content-Type:
text/event-stream
Payload
{
"prompt": "Explain quantum entanglement in simple terms.",
"image_url": "https://example.com/image.jpg",
"system_prompt": "You are a concise science communicator."
}
| Field | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | The user message or instruction |
image_url | string (URL) | No | Optional image for multimodal requests |
system_prompt | string | No | System-level instruction to control model behavior |
SSE Response Format
Each event line starts with data: followed by a JSON object:
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":" entanglement"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":"..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":80,"total_tokens":92}}
data: [DONE]
Code Examples
Python (httpx — recommended for async)
import httpx
API_KEY = "your_api_key_here"
with httpx.Client(timeout=120) as client:
with client.stream(
"POST",
"https://api.muapi.ai/api/v1/gemini-flash/stream",
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
json={
"prompt": "Write a short poem about the ocean.",
"system_prompt": "You are a creative poet."
}
) as response:
for line in response.iter_lines():
if line.startswith("data: "):
data = line[6:]
if data == "[DONE]":
break
import json
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"].get("content", "")
if delta:
print(delta, end="", flush=True)
print()
Python (requests)
import requests
import json
API_KEY = "your_api_key_here"
response = requests.post(
"https://api.muapi.ai/api/v1/gemini-flash/stream",
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
json={
"prompt": "Summarize the history of the internet.",
"system_prompt": "Be concise and factual."
},
stream=True,
timeout=120
)
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: "):
data = line[6:]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"].get("content", "")
if delta:
print(delta, end="", flush=True)
print()
Python (with image — multimodal)
import httpx
import json
API_KEY = "your_api_key_here"
with httpx.Client(timeout=120) as client:
with client.stream(
"POST",
"https://api.muapi.ai/api/v1/gemini-flash/stream",
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
json={
"prompt": "Describe what you see in this image.",
"image_url": "https://example.com/photo.jpg"
}
) as response:
for line in response.iter_lines():
if line.startswith("data: "):
data = line[6:]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"].get("content", "")
if delta:
print(delta, end="", flush=True)
print()
JavaScript / Node.js (fetch)
const API_KEY = "your_api_key_here";
async function streamGeminiFlash(prompt, systemPrompt = null) {
const body = { prompt };
if (systemPrompt) body.system_prompt = systemPrompt;
const response = await fetch("https://api.muapi.ai/api/v1/gemini-flash/stream", {
method: "POST",
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split("\n")) {
if (line.startsWith("data: ")) {
const data = line.slice(6).trim();
if (data === "[DONE]") return;
try {
const parsed = JSON.parse(data);
const delta = parsed.choices?.[0]?.delta?.content ?? "";
if (delta) process.stdout.write(delta);
} catch {}
}
}
}
}
streamGeminiFlash(
"Explain how neural networks learn.",
"You are a clear technical writer."
).then(() => console.log());
JavaScript (Browser — EventSource alternative via fetch)
const API_KEY = "your_api_key_here";
async function streamToElement(prompt, targetElement) {
const response = await fetch("https://api.muapi.ai/api/v1/gemini-flash/stream", {
method: "POST",
headers: {
"x-api-key": API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({ prompt }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split("\n")) {
if (line.startsWith("data: ")) {
const data = line.slice(6).trim();
if (data === "[DONE]") return;
try {
const parsed = JSON.parse(data);
const delta = parsed.choices?.[0]?.delta?.content ?? "";
if (delta) targetElement.textContent += delta;
} catch {}
}
}
}
}
// Usage
const outputDiv = document.getElementById("output");
streamToElement("Write a haiku about AI.", outputDiv);
cURL
curl -X POST "https://api.muapi.ai/api/v1/gemini-flash/stream" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "What are the main causes of climate change?"}' \
--no-buffer
To see only the text content (strips SSE framing):
curl -X POST "https://api.muapi.ai/api/v1/gemini-flash/stream" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "List 5 programming best practices."}' \
--no-buffer \
| grep "^data: " \
| grep -v "\[DONE\]" \
| sed 's/^data: //' \
| python3 -c "
import sys, json
for line in sys.stdin:
try:
chunk = json.loads(line)
delta = chunk['choices'][0]['delta'].get('content', '')
print(delta, end='', flush=True)
except: pass
print()
"
TypeScript (with type safety)
const API_BASE = "https://api.muapi.ai/api/v1";
interface StreamChunk {
id: string;
choices: Array<{
index: number;
delta: { content?: string; role?: string };
finish_reason: string | null;
}>;
usage?: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
};
}
async function streamGeminiFlash(
prompt: string,
options: { imageUrl?: string; systemPrompt?: string } = {},
apiKey: string
): Promise<string> {
const body: Record<string, string> = { prompt };
if (options.imageUrl) body.image_url = options.imageUrl;
if (options.systemPrompt) body.system_prompt = options.systemPrompt;
const response = await fetch(`${API_BASE}/gemini-flash/stream`, {
method: "POST",
headers: { "x-api-key": apiKey, "Content-Type": "application/json" },
body: JSON.stringify(body),
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let fullText = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const line of text.split("\n")) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6).trim();
if (data === "[DONE]") return fullText;
try {
const chunk: StreamChunk = JSON.parse(data);
const delta = chunk.choices[0]?.delta?.content ?? "";
fullText += delta;
process.stdout.write(delta);
} catch {}
}
}
return fullText;
}
// Usage
streamGeminiFlash(
"Describe the future of AI in healthcare.",
{ systemPrompt: "Be optimistic but realistic." },
"your_api_key_here"
).then(() => console.log());
Comparison: Streaming vs Standard
| Feature | Standard (/gemini-flash) | Streaming (/gemini-flash/stream) |
|---|---|---|
| Response | request_id → poll for result | Live SSE token stream |
| Latency to first token | Higher (full generation first) | Low (tokens arrive immediately) |
| Best for | Workflows, automation, batch | Chat UIs, real-time display |
| Webhook support | Yes | No (response is the stream) |
| Billing | Post-call, token-based | Post-stream, token-based |
| Minimum balance | $1.00 | $1.00 |
Error Handling
If an error occurs during streaming, the stream will emit an error event before closing:
data: {"error": "upstream provider timeout"}
Always handle this case in your client:
chunk = json.loads(data)
if "error" in chunk:
print(f"Stream error: {chunk['error']}")
break
delta = chunk.get("choices", [{}])[0].get("delta", {}).get("content", "")
Pricing
Gemini 3 Flash streaming uses token-based billing applied after the stream completes:
| Token Type | Rate |
|---|---|
| Input tokens | $0.30 per million |
| Output tokens | $1.80 per million |
Example: A request with 500 input tokens and 800 output tokens costs:
- Input: 500 × $0.30 / 1,000,000 = $0.00015
- Output: 800 × $1.80 / 1,000,000 = $0.00144
- Total: ~$0.0016