AI Large Language Models (LLM)

MuApi gives you a unified API to query all major Large Language Models (LLMs) in production. Send a message prompt, get back a completion response or a live stream SSE, and control parameters like reasoning depth and web search — all with credit-based pricing on a single, shared credit balance.

  • Access Gemini, Claude, GPT, Grok, Qwen, Llama, DeepSeek and more via one endpoint
  • Support for standard async execution and real-time streaming
  • Credit-based pay-as-you-go pricing with no mandatory monthly commitments
  • Unified JSON schema for chat, reasoning, and document parsing

Quick Start

Every model in this category uses the same submit-then-poll API. Replace gpt-5-5 with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/gpt-5-5 \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Explain quantum computing in simple terms."}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Large Language Models (LLM) Models

ModelProviderCostBest For
gemini-3-5-flash$0.000Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.
claude-sonnet-4-6Claude Sonnet 4.6 delivers strong reasoning, advanced coding, and native computer-use functionality. Supports text and image inputs with up to 1M token context. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-6) and live streaming (/claude-sonnet-4-6/stream) via SSE.
gpt-codexOpenAI GPT Codex delivers advanced coding capabilities with scalable reasoning depth. Supports multiple model variants (gpt-5-codex through gpt-5.4-codex) and multimodal inputs. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-codex) and live streaming (/gpt-codex/stream) via SSE.
claude-fable-5Claude Fable 5 is the latest flagship model from Anthropic. Supports text and image inputs with advanced reasoning and creative capabilities. Token-based pricing: $8.00/M input tokens, $40.00/M output tokens. Two endpoints: standard async (/claude-fable-5) and live streaming (/claude-fable-5/stream) via SSE.
gemini-3-flash$0.001Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.

All 20 Models

gemini-3-1-pro
-42%
Text to Text
$0.0007$0.001

gemini-3-1-pro

Gemini 3.1 Pro is Google's next-generation multimodal model, optimized for complex reasoning, planning, coding, and multi-turn conversation. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-1-pro) and live streaming (/gemini-3-1-pro/stream) via SSE.

generate-social-video-script
10%
Text to Text
$0.1111$0.100

generate-social-video-script

Generate viral short-form video scripts for social media based on a topic and niche.

claude-opus-4-6
Text to Text

claude-opus-4-6

Claude Opus 4.6 is Anthropic's most capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-6) and live streaming (/claude-opus-4-6/stream) via SSE.

claude-opus-4-5
Text to Text

claude-opus-4-5

Claude Opus 4.5 is Anthropic's highly capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-5) and live streaming (/claude-opus-4-5/stream) via SSE.

claude-sonnet-4-6
Text to Text

claude-sonnet-4-6

Claude Sonnet 4.6 delivers strong reasoning, advanced coding, and native computer-use functionality. Supports text and image inputs with up to 1M token context. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-6) and live streaming (/claude-sonnet-4-6/stream) via SSE.

gemini-3-5-flash
100%
Text to Text
$0.0001$0.000

gemini-3-5-flash

Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.

claude-opus-4-7
Text to Text

claude-opus-4-7

Claude Opus 4.7 is Anthropic's highly capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-7) and live streaming (/claude-opus-4-7/stream) via SSE.

claude-opus-4-8
Text to Text

claude-opus-4-8

Claude Opus 4.8 is Anthropic's most capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-8) and live streaming (/claude-opus-4-8/stream) via SSE.

gemini-3-pro
-42%
Text to Text
$0.0007$0.001

gemini-3-pro

Gemini 3 Pro is Google's powerful multimodal reasoning model, designed for complex problem solving, coding, and logical tasks. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-pro) and live streaming (/gemini-3-pro/stream) via SSE.

claude-sonnet-4-5
Text to Text

claude-sonnet-4-5

Claude Sonnet 4.5 is Anthropic's state-of-the-art model offering high intelligence, speed, and efficiency for code generation, writing, and logical analysis. Supports text and image inputs. Token-based pricing: $1.80/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/claude-sonnet-4-5) and live streaming (/claude-sonnet-4-5/stream) via SSE.

gemini-3-5-flash-openai
100%
Text to Text
$0.0001$0.000

gemini-3-5-flash-openai

Gemini 3.5 Flash (OpenAI-compatible) is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash-openai) and live streaming (/gemini-3-5-flash-openai/stream) via SSE.

claude-haiku-4-5
Text to Text

claude-haiku-4-5

Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, designed for high-frequency queries, simple tasks, and near-instant response times. Supports text and image inputs. Token-based pricing: $0.60/M input tokens, $3.00/M output tokens. Two endpoints: standard async (/claude-haiku-4-5) and live streaming (/claude-haiku-4-5/stream) via SSE.

gemini-2-5-pro
100%
Text to Text
$0.0003$0.000

gemini-2-5-pro

Gemini 2.5 Pro is Google's advanced multimodal reasoning model, optimized for complex coding, logical tasks, and deep analysis. Supports text and image inputs. Token-based pricing: $1.25/M input tokens, $10.00/M output tokens. Two endpoints: standard async (/gemini-2-5-pro) and live streaming (/gemini-2-5-pro/stream) via SSE.

gemini-2-5-flash
100%
Text to Text
$0.0001$0.000

gemini-2-5-flash

Gemini 2.5 Flash is Google's high-speed multimodal language model, optimized for rapid text generation, real-time image understanding, and high-frequency tasks. Supports text and image inputs. Token-based pricing: $0.30/M input tokens, $2.50/M output tokens. Two endpoints: standard async (/gemini-2-5-flash) and live streaming (/gemini-2-5-flash/stream) via SSE.

gpt-5-2
Text to Text

gpt-5-2

GPT 5.2 is a lightweight reasoning model with fast response times and deep coding capabilities. Supports image inputs, system prompts, web search capabilities, and reasoning effort control. Pricing: $1.25/M input tokens, $9.00/M output tokens.

gpt-5-5
Text to Text

gpt-5-5

GPT 5.5 is OpenAI's state-of-the-art flagship reasoning model for high-complexity problems. Supports image and file uploads, system prompts, web search capabilities, and reasoning effort control. Pricing: $2.40/M input tokens, $16.00/M output tokens.

claude-fable-5
Text to Text

claude-fable-5

Claude Fable 5 is the latest flagship model from Anthropic. Supports text and image inputs with advanced reasoning and creative capabilities. Token-based pricing: $8.00/M input tokens, $40.00/M output tokens. Two endpoints: standard async (/claude-fable-5) and live streaming (/claude-fable-5/stream) via SSE.

gemini-3-flash
10%
Text to Text
$0.0011$0.001

gemini-3-flash

Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.

gpt-5-4
Text to Text

gpt-5-4

GPT-5.4 delivers powerful reasoning, coding, and professional knowledge work. Supports multimodal inputs (text and image) with adjustable reasoning depth. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-5-4) and live streaming (/gpt-5-4/stream) via SSE.

gpt-codex
Text to Text

gpt-codex

OpenAI GPT Codex delivers advanced coding capabilities with scalable reasoning depth. Supports multiple model variants (gpt-5-codex through gpt-5.4-codex) and multimodal inputs. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-codex) and live streaming (/gpt-codex/stream) via SSE.

Frequently Asked Questions

Which LLM model is best?

GPT 5.5 and Claude Opus 4.8 are the best for deep logic, complex coding, and multi-step reasoning. Gemini 3.5 Flash is highly optimized for rapid, real-time, low-latency applications. MuApi hosts all of them so you can benchmark quality vs. cost dynamically.

Do you support streaming?

Yes! Every LLM model has a streaming endpoint (e.g. `POST /api/v1/{model}/stream`) that pushes Server-Sent Events (SSE) for near-instant rendering in chat bubbles.