AI Video Generator

MuApi unifies every leading text-to-video model behind a single API. Submit a prompt, optionally a reference image and duration, get a request ID, then poll for an `outputs[]` array of MP4 URLs. Veo 3 and Kling Master deliver cinematic quality at high cost; Seedance Lite and Hunyuan are the fastest and cheapest; Runway and Pixverse fit the social-content pipeline; Hailuo and Vidu offer specialty motion controls. One integration, twenty providers — switch models by changing the URL path, not the SDK.

  • 20+ text-to-video models — Veo 3 / Veo 3 Fast, Kling Master, Seedance Pro/Lite, Hunyuan, Runway, Pixverse, Vidu, Hailuo, Wan 2.1/2.2
  • Same submit-and-poll pattern as image models — one client handles every modality
  • Per-call credits with dynamic cost based on duration / resolution
  • Webhook callbacks for long jobs (videos can take 60-300s)
  • Optional reference image input on most models

Quick Start

Every model in this category uses the same submit-then-poll API. Replace veo3-fast-text-to-video with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/veo3-fast-text-to-video \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"a cinematic shot of a city at night with neon reflections"}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Video Generator Models

ModelProviderCostBest For
openai-sora-2-text-to-video$0.800Sora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.
sd-2-t2v$0.750SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.
seedance-lite-t2v$0.100Seedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.
veo3-fast-text-to-video$0.600VEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.
grok-imagine-text-to-video$0.150Grok Imagine is xAI’s fast, creative text-to-video model that generates cinematic clips from 6 to 30 seconds with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.

All 70 Models

11%
Text to Video
$2.7778$2.500

veo3-text-to-video

VEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.

11%
Text to Video
$1.6667$1.500

sd-2-vip-text-to-video

SD 2 Text-to-Video VIP (Pro) by ByteDance. Generates high-quality cinematic video from a text prompt with priority routing, native audio-visual sync, up to 2K resolution, and 4–15 second duration.

10%
Text to Video
$0.3333$0.300

wan2.1-text-to-video

WAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.

11%
Text to Video
$0.1667$0.150

hunyuan-text-to-video

Hunyuan T2V generates detailed and dynamic videos from text prompts with a focus on realism and coherent motion. It handles multi-object scenes, human actions, and cinematic compositions effectively, making it ideal for storytelling and visual concepts.

11%
Text to Video
$0.0556$0.050

hunyuan-fast-text-to-video

Hunyuan Fast T2V provides accelerated video generation from text prompts with slightly reduced detail but excellent speed. Ideal for rapid prototyping, concept testing, and short-form ideas where time is critical.

10%
Text to Video
$0.3333$0.300

wan2.2-text-to-video

Wan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.

10%
Text to Video
$0.3333$0.300

pixverse-v4.5-t2v

PixVerse v4.5 transforms descriptive text into vivid, high-resolution video clips. It understands complex scenes, human motion, and cinematic camera angles — great for creative storytelling, trailers, and animated concepts.

11%
Text to Video
$0.0178$0.016

wan2.2-5b-fast-t2v

Wan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.

10%
Text to Video
$0.3333$0.300

minimax-hailuo-02-standard-t2v

Fast and lightweight text-to-video generation. Ideal for quick drafts, previews, or playful content where speed matters more than cinematic quality.

11%
Text to Video
$0.6667$0.600

minimax-hailuo-02-pro-t2v

High-fidelity text-to-video with cinematic rendering. Best for storytelling, cinematic clips, or realistic visuals with depth, atmosphere, and detail.

10%
Text to Video
$0.1111$0.100

seedance-lite-t2v

Seedance Lite T2V offers quick video generation from text with decent visual quality and motion. Ideal for fast previews, prototyping, or lightweight use cases where speed matters more than fine detail.

11%
Text to Video
$0.2000$0.180

seedance-pro-t2v

Seedance Pro delivers high-fidelity video generation from text, producing rich visuals, smooth camera movement, and realistic scenes. Best for storytelling, content creation, and visual production.

10%
Text to Video
$0.3333$0.300

pixverse-v5-t2v

PixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.

10%
Text to Video
$0.7222$0.650

wan2.5-text-to-video

WAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.

11%
Text to Video
$0.4889$0.440

wan2.5-text-to-video-fast

Transform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.

11%
Text to Video
$0.8889$0.800

openai-sora-2-text-to-video

Sora 2 T2V converts text prompts into short, dynamic 10-second video clips with synchronized audio. Users can describe scenes, motion, camera angles, and sound effects, and Sora 2 brings them to life with cinematic realism or stylized visuals. Perfect for storytelling, social media content, and creative experimentation, while maintaining high-quality visuals and immersive audio.

10%
Text to Video
$0.2222$0.200

ovi-text-to-video

Ovi is a unified model that generates synchronized video and audio from textual input. You write a scene description, including dialogue and ambient sounds, and Ovi produces a short video clip (typically ~5 seconds) where visuals and sound align naturally. Videos are generated in 540p resolution.

11%
Text to Video
$2.7778$2.500

veo3.1-text-to-video

Veo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.

11%
Text to Video
$0.6667$0.600

veo3.1-fast-text-to-video

Veo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.

10%
Text to Video
$0.6444$0.580

openai-sora-2-pro-storyboard

Sora 2 Pro enables creators to structure video narratives by chaining multiple scenes through storyboard “cards.” Each card defines a segment of the video—setting, characters, actions, timing—and the model stitches them into a cohesive multi-scene video. This gives you more control over pacing, transitions, and storytelling flow.

11%
Text to Video
$0.6667$0.600

veo3.1-extend-video

Veo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.

10%
Text to Video
$0.5111$0.460

ltx-2-pro-text-to-video

LTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.

10%
Text to Video
$0.7000$0.630

minimax-hailuo-2.3-pro-t2v

Hailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.

11%
Text to Video
$0.4000$0.360

minimax-hailuo-2.3-standard-t2v

Hailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.

11%
Text to Video
$0.8000$0.720

kling-o1-text-to-video

Kling O1 is a unified, multi-modal video generation engine that transforms natural language prompts into short cinematic video clips. It supports text-to-video generation with realistic motion, dynamic camera moves, and coherent scene rendering.

10%
Text to Video
$1.0000$0.900

kling-v2.6-pro-t2v

Kling-v2.6-Pro Text-to-Video generates high-fidelity cinematic videos directly from text prompts. It excels at complex compositions, dramatic lighting, fluid camera motion, and visually rich fantasy or sci-fi sequences.

10%
Text to Video
$0.1111$0.100

pixverse-v5.5-t2v

PixVerse v5.5 T2V generates cinematic short videos directly from text. It excels at stylized fantasy, anime, surreal worlds, atmospheric environments, and fluid camera motion. The model produces vivid lighting, dynamic effects, depth-rich parallax, and smooth motion.

10%
Text to Video
$0.7222$0.650

wan2.6-text-to-video

WAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.

11%
Text to Video
$0.3778$0.340

seedance-v1.5-pro-t2v

Seedance v1.5 Pro Text-to-Video generates high-quality cinematic videos directly from text prompts. It focuses on smooth motion, rich atmosphere, and coherent scene structure, making it ideal for fantasy worlds, sci-fi environments, surreal visuals, and cinematic storytelling shots with detailed lighting and depth.

11%
Text to Video
$0.2889$0.260

seedance-v1.5-pro-t2v-fast

Seedance v1.5 Pro Text-to-Video Fast generates short cinematic videos directly from text with an emphasis on speed and stability. It produces coherent scenes with simple camera motion, light environmental animation, and consistent lighting.

11%
Text to Video
$0.6667$0.600

veo3.1-4k-video

Get the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.

11%
Text to Video
$0.8000$0.720

kling-v3.0-pro-text-to-video

Kling 3.0 Pro is a high-end video generation model capable of producing longer, smoother, and more realistic cinematic videos with strong motion consistency. It handles complex scenes, realistic physics, natural camera movement, and detailed environments better than earlier versions.

11%
Text to Video
$0.1156$0.104

ltx-2.3-text-to-video

LTX-2.3 Text-to-Video generates cinematic video clips directly from text prompts. Built on an upgraded 2.3B architecture, it delivers sharper temporal consistency, faster synthesis, and more precise motion control than previous LTX versions. Ideal for concept visualization, story beats, and prompt-driven animation.

10%
Text to Video
$0.3333$0.300

openai-sora-2-standard-text-to-video

OpenAI Sora 2 Standard Text to Video model (High Priority). Generate stunning 10s videos from text prompts.

11%
Text to Video
$0.3278$0.295

pixverse-v6-t2v

Generate high-quality videos from text prompts using PixVerse V6. Supports resolutions up to 1080p, durations up to 15 seconds, and optional AI-generated audio.

10%
Text to Video
$0.3333$0.300

veo3.1-lite-text-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation.

11%
Text to Video
$1.1667$1.050

sd-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

wan2.7-text-to-video
10%
Text to Video
$0.1111$0.100

wan2.7-text-to-video

Alibaba WAN 2.7 Text-to-Video turns plain prompts into coherent, cinematic clips.

11%
Text to Video
$1.3889$1.250

sd-2-text-to-video

SD 2 Text-to-Video (Pro) by ByteDance. Generates high-quality cinematic video from a text prompt with native audio-visual sync, up to 2K resolution, and 4–15 second duration.

10%
Text to Video
$0.8333$0.750

sd-2-text-to-video-fast

SD 2 Text-to-Video (Fast) by ByteDance. Generates video from text at faster speeds with 4–15 second duration and 2K resolution.

11%
Text to Video
$1.1667$1.050

sd-2-vip-text-to-video-fast

SD 2 Text-to-Video VIP Fast by ByteDance. Faster generation with priority routing from a text prompt, 4–15 second duration and 2K resolution.

happy-horse-1-text-to-video-720p
10%
Text to Video
$1.0000$0.900

happy-horse-1-text-to-video-720p

Happy Horse 1.0 Text to Video (720p) — generate expressive, stylized video clips from text prompts at 720p output resolution.

veo-4-text-to-video
10%
Text to Video
$3.3333$3.000

veo-4-text-to-video

Veo 4 Text to Video — Google DeepMind's fourth-generation model delivering photorealistic, high-fidelity 1080p videos with exceptional prompt adherence and cinematic camera control.

10%
Text to Video
$3.7500$3.375

sd-2-vip-text-to-video-1080p

SD 2 Text-to-Video VIP 1080p by ByteDance. Generates cinematic 1080p video from a text prompt with priority routing, native audio-visual sync, and 4–15 second duration.

10%
Text to Video
$2.2222$2.000

kling-v3.0-4k-text-to-video

Kling 3.0 4K Text-to-Video generates ultra-high-resolution 3840×2160 cinematic video directly from text prompts with smooth, realistic motion and strong temporal consistency. Choose 4K when you need the sharpest output Kling 3.0 can produce — perfect for high-end advertising, hero shots, and large-screen playback.

10%
Text to Video
$0.8333$0.750

vidu-q3-pro-text-to-video

Vidu Q3 Pro Text-to-Video generates cinematic, prompt-faithful clips with strong temporal consistency, accurate motion, and rich detail across resolutions up to 1080p. Pick this when you want the highest visual fidelity Vidu Q3 can produce — great for hero shots, narrative beats, and stylized sequences driven purely from text.

10%
Text to Video
$0.3333$0.300

vidu-q3-turbo-text-to-video

Vidu Q3 Turbo Text-to-Video is the fast, affordable tier of Vidu Q3 — same prompt understanding and motion quality, optimised for rapid iteration. Use it for storyboards, social cuts, and high-volume generation where speed and cost matter as much as polish.

10%
Text to Video
$0.2222$0.200

vidu-q2-pro-text-to-video

Vidu Q2 Pro Text-to-Video generates cinematic, prompt-faithful clips from text alone with strong temporal consistency and rich detail at up to 1080p. Pick this when you need polished output without a reference frame.

10%
Text to Video
$0.1444$0.130

vidu-q2-turbo-text-to-video

Vidu Q2 Turbo Text-to-Video is the fast, affordable Q2 tier for prompt-only generation. Use it for storyboards, social cuts, and high-volume work where speed and cost matter.

11%
Text to Video
$1.1667$1.050

sd-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

11%
Text to Video
$2.6250$2.362

sd-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

11%
Text to Video
$0.4667$0.420

kling-v3.0-omni-standard-text-to-video

Kling v3 Omni at 720P. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

10%
Text to Video
$0.6222$0.560

kling-v3.0-omni-pro-text-to-video

Kling v3 Omni at 1080P. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

10%
Text to Video
$2.9761$2.679

kling-v3.0-omni-4k-text-to-video

Kling v3 Omni at 4K. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

11%
Text to Video
$0.6667$0.600

veo3-fast-text-to-video

VEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.

11%
Text to Video
$0.1000$0.090

runway-text-to-video

Generate short, high-quality videos from plain text prompts. RunwayML’s text-to-video model interprets your written description and animates it into a moving visual scene with realistic or stylized motion.

10%
Text to Video
$0.3333$0.300

vidu-v2.0-t2v

Vidu's 2.0 model offers enhanced visual quality and comprehensive workflow support across multiple resolution options for versatile content creation.

10%
Text to Video
$1.3333$1.200

kling-v2.1-master-t2v

Kling 2.1 Master’s T2V mode allows users to generate vivid, high-quality videos from detailed text prompts. It supports dynamic scenes, natural motion, and cinematic quality — perfect for storytelling, ads, or content creation from imagination alone.

11%
Text to Video
$0.5556$0.500

openai-sora

Sora is a text-to-video generative AI model developed by OpenAI. It can generate short video clips based on descriptive text inputs, producing content that ranges from photorealistic scenes to stylized animations.

10%
Text to Video
$0.5000$0.450

kling-v2.5-turbo-pro-t2v

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

10%
Text to Video
$0.5111$0.460

ltx-2-fast-text-to-video

LTX Video Fast is a speed-optimised mode of Lightricks’ video-generation engine, supporting text-to-video workflows. It allows you to input a descriptive prompt and get a short video clip with motion, camera movement, lighting, and stylised visuals. The underlying model (LTX-Video) is built for real-time or near-real-time generation of video clips.

11%
Text to Video
$2.6667$2.400

openai-sora-2-pro-text-to-video

Sora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.

11%
Text to Video
$0.1667$0.150

grok-imagine-text-to-video

Grok Imagine is xAI’s fast, creative text-to-video model that generates cinematic clips from 6 to 30 seconds with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.

11%
Text to Video
$0.0667$0.060

seedance-pro-t2v-fast

Seedance Pro Fast is ByteDance’s advanced text-to-video model that turns natural-language prompts into short, cinematic video clips with realistic motion, camera dynamics, and consistent scene detail.

11%
Text to Video
$0.6667$0.600

ltx-2-19b-text-to-video

LTX-2-19B Text-to-Video generates coherent cinematic videos directly from text, with an emphasis on temporal stability, natural motion, and conceptual clarity. It works best when the scene has a strong visual idea where motion reinforces meaning rather than overwhelming it.

10%
Text to Video
$0.8333$0.750

sd-2-t2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

11%
Text to Video
$0.8000$0.720

kling-v3.0-standard-text-to-video

Kling 3.0 Standard Text-to-Video generates smooth, realistic videos from text with stable motion and natural behavior. It works best with clear subjects, simple actions, and one continuous scene, making it ideal for cute animals, small actions, and calm cinematic moments.

11%
Text to Video
$0.0556$0.050

grok-imagine-extend

Grok Imagine Extend lets you continue and expand existing Grok Imagine video generations seamlessly. Starting from a previously generated video, you can extend the scene while maintaining visual style, characters, motion, and audio consistency. Requires the original task_id from the initial video generation.

11%
Text to Video
$0.6667$0.600

sd-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

happy-horse-1-text-to-video-1080p
10%
Text to Video
$2.0000$1.800

happy-horse-1-text-to-video-1080p

Happy Horse 1.0 Text to Video — generate expressive, stylized video clips from text prompts with vivid character motion and dynamic scene storytelling.

Frequently Asked Questions

Which AI video generator API is the best?

Veo 3 and Kling Master produce the highest-quality cinematic output but cost more credits. Seedance Lite and Hunyuan are the fastest and cheapest for social-format content. Runway delivers the strongest motion fidelity and brand-friendly outputs. The right pick depends on quality vs. cost vs. latency — MuApi gives you all three to evaluate without rewriting your code.

How long does a video take?

Anywhere from 30 seconds (Seedance Lite, 5s clip) to 4-5 minutes (Veo 3, 8s clip). Use the polling endpoint or pass `?webhook=https://your-server/path` on the submit call to receive a callback when it finishes.

Can I generate longer videos?

Most models cap individual clips at 5-10 seconds. For longer sequences, chain multiple clips together using the workflow builder or generate keyframes with image models and stitch them via image-to-video models.

What format do I get back?

An MP4 URL in the `outputs[]` array of the polling response. URLs are hosted on MuApi's CDN and remain available for 30 days.

Do I need to handle retries?

MuApi automatically retries transient provider failures up to 3 times. For client errors (invalid prompt, content moderation), the polling endpoint returns `status: failed` with an `error` message — no retry will help.