AI Large Language Models (LLM)

MuApi gives you a unified API to query all major Large Language Models (LLMs) in production. Send a message prompt, get back a completion response or a live stream SSE, and control parameters like reasoning depth and web search — all with credit-based pricing on a single, shared credit balance.

→Access Gemini, Claude, GPT, Grok, Qwen, Llama, DeepSeek and more via one endpoint
→Support for standard async execution and real-time streaming
→Credit-based pay-as-you-go pricing with no mandatory monthly commitments
→Unified JSON schema for chat, reasoning, and document parsing

Try in Playground Get an API Key llms.txt

Quick Start

Every model in this category uses the same submit-then-poll API. Replace gpt-5-5 with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/gpt-5-5 \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain quantum computing in simple terms."}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Large Language Models (LLM) Models

Model	Provider	Cost	Best For
gemini-3-5-flash	—	$0.000	Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.
claude-sonnet-4-6	—	—	Claude Sonnet 4.6 delivers strong reasoning, advanced coding, and native computer-use functionality. Supports text and image inputs with up to 1M token context. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-6) and live streaming (/claude-sonnet-4-6/stream) via SSE.
gpt-codex	—	—	OpenAI GPT Codex delivers advanced coding capabilities with scalable reasoning depth. Supports multiple model variants (gpt-5-codex through gpt-5.4-codex) and multimodal inputs. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-codex) and live streaming (/gpt-codex/stream) via SSE.
claude-fable-5	—	—	Claude Fable 5 is the latest flagship model from Anthropic. Supports text and image inputs with advanced reasoning and creative capabilities. Token-based pricing: $8.00/M input tokens, $40.00/M output tokens. Two endpoints: standard async (/claude-fable-5) and live streaming (/claude-fable-5/stream) via SSE.
gemini-3-flash	—	$0.001	Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.

All 20 Models

-42%

Text to Text

$0.0007$0.001

gemini-3-1-pro

Gemini 3.1 Pro is Google's next-generation multimodal model, optimized for complex reasoning, planning, coding, and multi-turn conversation. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-1-pro) and live streaming (/gemini-3-1-pro/stream) via SSE.

10%

Text to Text

$0.1111$0.100

generate-social-video-script

Generate viral short-form video scripts for social media based on a topic and niche.

Text to Text

claude-opus-4-6

Claude Opus 4.6 is Anthropic's most capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-6) and live streaming (/claude-opus-4-6/stream) via SSE.

Text to Text

claude-opus-4-5

Claude Opus 4.5 is Anthropic's highly capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-5) and live streaming (/claude-opus-4-5/stream) via SSE.

Text to Text

claude-sonnet-4-6

Claude Sonnet 4.6 delivers strong reasoning, advanced coding, and native computer-use functionality. Supports text and image inputs with up to 1M token context. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-6) and live streaming (/claude-sonnet-4-6/stream) via SSE.

100%

Text to Text

$0.0001$0.000

gemini-3-5-flash

Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.

Text to Text

claude-opus-4-7

Claude Opus 4.7 is Anthropic's highly capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-7) and live streaming (/claude-opus-4-7/stream) via SSE.

Text to Text

claude-opus-4-8

Claude Opus 4.8 is Anthropic's most capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-8) and live streaming (/claude-opus-4-8/stream) via SSE.

-42%

Text to Text

$0.0007$0.001

gemini-3-pro

Gemini 3 Pro is Google's powerful multimodal reasoning model, designed for complex problem solving, coding, and logical tasks. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-pro) and live streaming (/gemini-3-pro/stream) via SSE.

Text to Text

claude-sonnet-4-5

Claude Sonnet 4.5 is Anthropic's state-of-the-art model offering high intelligence, speed, and efficiency for code generation, writing, and logical analysis. Supports text and image inputs. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-5) and live streaming (/claude-sonnet-4-5/stream) via SSE.

100%

Text to Text

$0.0001$0.000

gemini-3-5-flash-openai

Gemini 3.5 Flash (OpenAI-compatible) is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash-openai) and live streaming (/gemini-3-5-flash-openai/stream) via SSE.

Text to Text

claude-haiku-4-5

Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, designed for high-frequency queries, simple tasks, and near-instant response times. Supports text and image inputs. Token-based pricing: $0.60/M input tokens, $3.00/M output tokens. Two endpoints: standard async (/claude-haiku-4-5) and live streaming (/claude-haiku-4-5/stream) via SSE.

100%

Text to Text

$0.0003$0.000

gemini-2-5-pro

Gemini 2.5 Pro is Google's advanced multimodal reasoning model, optimized for complex coding, logical tasks, and deep analysis. Supports text and image inputs. Token-based pricing: $1.25/M input tokens, $10.00/M output tokens. Two endpoints: standard async (/gemini-2-5-pro) and live streaming (/gemini-2-5-pro/stream) via SSE.

100%

Text to Text

$0.0001$0.000

gemini-2-5-flash

Gemini 2.5 Flash is Google's high-speed multimodal language model, optimized for rapid text generation, real-time image understanding, and high-frequency tasks. Supports text and image inputs. Token-based pricing: $0.30/M input tokens, $2.50/M output tokens. Two endpoints: standard async (/gemini-2-5-flash) and live streaming (/gemini-2-5-flash/stream) via SSE.

Text to Text

gpt-5-2

GPT 5.2 is a lightweight reasoning model with fast response times and deep coding capabilities. Supports image inputs, system prompts, web search capabilities, and reasoning effort control. Pricing: $1.25/M input tokens, $9.00/M output tokens.

Text to Text

gpt-5-5

GPT 5.5 is OpenAI's state-of-the-art flagship reasoning model for high-complexity problems. Supports image and file uploads, system prompts, web search capabilities, and reasoning effort control. Pricing: $2.40/M input tokens, $16.00/M output tokens.

Text to Text

claude-fable-5

Claude Fable 5 is the latest flagship model from Anthropic. Supports text and image inputs with advanced reasoning and creative capabilities. Token-based pricing: $8.00/M input tokens, $40.00/M output tokens. Two endpoints: standard async (/claude-fable-5) and live streaming (/claude-fable-5/stream) via SSE.

10%

Text to Text

$0.0011$0.001

gemini-3-flash

Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.

Text to Text

gpt-5-4

GPT-5.4 delivers powerful reasoning, coding, and professional knowledge work. Supports multimodal inputs (text and image) with adjustable reasoning depth. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-5-4) and live streaming (/gpt-5-4/stream) via SSE.

Text to Text

gpt-codex

OpenAI GPT Codex delivers advanced coding capabilities with scalable reasoning depth. Supports multiple model variants (gpt-5-codex through gpt-5.4-codex) and multimodal inputs. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-codex) and live streaming (/gpt-codex/stream) via SSE.

Frequently Asked Questions

Which LLM model is best?

GPT 5.5 and Claude Opus 4.8 are the best for deep logic, complex coding, and multi-step reasoning. Gemini 3.5 Flash is highly optimized for rapid, real-time, low-latency applications. MuApi hosts all of them so you can benchmark quality vs. cost dynamically.

Do you support streaming?

Yes! Every LLM model has a streaming endpoint (e.g. `POST /api/v1/{model}/stream`) that pushes Server-Sent Events (SSE) for near-instant rendering in chat bubbles.

minimax-hailuo-02-standard-t2v

meshy-6-image-to-3d

pixverse-v5-t2v

veo3-fast-text-to-video

kling-v1-avatar-pro

meshy-6-multi-image-to-3d

ai-product-photography

flux-kontext-dev-i2i

gemini-3-1-pro

gpt-image-1.5

ovi-text-to-video

minimax-hailuo-2.3-pro-i2v

happy-horse-1-text-to-video-720p

kling-v2.1-standard-i2v

pixverse-v6-i2v

wan2.2-image-to-video

veed-lipsync

vidu-v2.0-i2v

minimax-image-01-subject-reference

flux-pulid

latent-sync

infinitetalk-image-to-video

bytedance-seededit-v3

flux-redux

kling-v2.5-turbo-pro-i2v

wan2.2-animate

ai-background-remover

wan2.5-text-to-image

topaz-video-upscale

leonardoai-motion-2.0

ai-object-eraser

ovi-image-to-video

minimax-hailuo-2.3-pro-t2v

mmaudio-v2-text-to-audio

flux-dev-lora

vidu-q2-reference-to-image

minimax-speech-2.6-turbo

veo3.1-4k-video

kling-v3.0-std-motion-control

flux-kontext-pro-i2i

ai-skin-enhancer

suno-generate-lyrics

sd-2-character

ai-product-shot

ai-image-extension

veo3.1-fast-image-to-video

sd-2-image-to-video

wan2.2-edit-video

openai-sora-2-pro-text-to-video

ltx-2-pro-text-to-video

kling-v2-avatar-pro

runway-aleph-v2v

qwen-image-2.0-pro-edit

flux-2-klein-9b-turbo

qwen-image-edit-plus

kling-v2.6-pro-motion-control

pixverse-v6-t2v

flux-schnell

sd-2-video-watermark-remover-pro

wan2.7-image-edit

kling-v2.1-pro-i2v

veo3.1-lite-text-to-video

happy-horse-1-image-to-video-1080p

wan2.2-text-to-video

sd-2-vip-first-last-frame-1080p

kling-o3-image

tripo3d-h31-text-to-3d

veo3-image-to-video

openai-sora-2-text-to-video

kling-o1-text-to-video

kling-o1-edit-image

twitter-fetch-posts

gemini-omni-character

grok-imagine-video-1-5-preview

ai-image-face-swap

nano-banana-pro-edit

facebook-fetch-reels

generate-social-video-script

omnihuman-1-5

hidream-i1-full