Image-to-Video Generator

Pass a reference image plus a motion prompt and MuApi returns a video clip — the most common animation pattern for ads, product demos, social content, and storyboarding. Kling Pro and Veo 3 lead on quality; Wan 2.1/2.2 and Seedance lead on cost; Runway and Pixverse are tuned for social-vertical content. The same `POST /api/v1/{model}` endpoint takes an image URL or base64 payload and returns a polling ID.

  • Reference image + prompt → MP4 output
  • Models: Kling Std/Pro/Master, Veo 3, Wan 2.1/2.2, Seedance, Hunyuan, Runway, Pixverse, Vidu, Midjourney, Hailuo
  • Optional duration, aspect ratio, and motion-strength controls per model
  • Same submit-and-poll API as every other MuApi endpoint

Quick Start

Every model in this category uses the same submit-then-poll API. Replace kling-o1-standard-image-to-video with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/kling-o1-standard-image-to-video \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"slow camera dolly forward, gentle wind on leaves"}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Image-to-Video Generator Models

ModelProviderCostBest For
sd-2-omni-reference$1.500SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.
sd-2-i2v$0.750SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.
seedance-lite-i2v$0.100Seedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.
grok-imagine-image-to-video$0.150Grok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into cinematic videos from 6 to 30 seconds with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.
sd-2-vip-omni-reference$1.500SD 2 Omni Reference VIP (Pro) by ByteDance. Generate videos using up to 9 image references, up to 3 video clips, and up to 3 audio references with priority routing. Reference materials in your prompt with @image1…@image9, @video1…@video3, and @audio1…@audio3. Also supports @omni-character:<char_id> for trained characters.

All 100 Models

11%
Image to Video
$0.1667$0.150

runway-image-to-video

Animate any image by turning it into a video with motion effects or scene continuity. RunwayML’s I2V model transforms static visuals into short clips by extrapolating depth, movement, and temporal dynamics.

10%
Image to Video
$0.3333$0.300

motion-controls

Motion Controls adds dynamic camera movements, speed ramps, and zoom effects to bring your images to life as smooth, engaging videos.

11%
Image to Video
$0.6667$0.600

veo3-fast-image-to-video

Quickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.

11%
Image to Video
$0.1667$0.150

hunyuan-image-to-video

Hunyuan I2V takes a static image and generates realistic video animations by interpreting motion and context. It works well for human portraits, objects, or scenes, adding lifelike movement while maintaining the image's integrity.

10%
Image to Video
$0.2500$0.225

kling-v2.1-standard-i2v

Kling 2.1 Standard (developed by Kuaishou) brings static images to life by generating smooth, realistic video clips from a single frame. It captures subtle motion, background dynamics, and camera movement to produce professional-looking animations — ideal for portraits, digital art, and cinematic illustrations.

10%
Image to Video
$0.3333$0.300

wan2.2-image-to-video

Wan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.

10%
Image to Video
$0.3333$0.300

vidu-v2.0-i2v

Vidu's 2.0 model delivers advanced image-based video generation with enhanced lighting, emotion dynamics, and automatic frame interpolation for polished visual content.

10%
Image to Video
$0.4444$0.400

vidu-q1-reference

Vidu Q1 enables you to generate cinematic 1080p videos using multiple visual references—up to seven images—and text prompts. Designed for consistency, it preserves character appearance, props, and backgrounds across scenes while adding new motion and narrative elements.

11%
Image to Video
$0.1667$0.150

minimax-hailuo-02-standard-i2v

Transforms an image into video with light, natural motion. Great for social media, quick animations, and previews.

11%
Image to Video
$0.6667$0.600

minimax-hailuo-02-pro-i2v

Advanced image-to-video with cinematic realism. Adds dynamic camera motion, realistic physics, and atmospheric detail for storytelling.

10%
Image to Video
$0.3333$0.300

video-effects

AI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.

10%
Image to Video
$0.1111$0.100

seedance-lite-i2v

Seedance Lite I2V version animates static images into short videos quickly, focusing on basic motion effects and efficient processing—best suited for fast demos or mobile-friendly use.

11%
Image to Video
$0.2000$0.180

seedance-pro-i2v

Seedance Pro I2V advanced model animates still images into stunning short videos, preserving intricate visual details and applying smooth motion dynamics, ideal for high-end visuals and cinematic edits.

10%
Image to Video
$0.3333$0.300

pixverse-v5-i2v

PixVerse V5 delivers a major leap forward in AI-powered video creation — now featuring smoother motion, ultra-high resolution, and expanded visual effects.

10%
Image to Video
$0.3333$0.300

vfx

VFX delivers high-impact visual effects like explosions, particles, and cinematic overlays to transform static images into action-packed videos.

10%
Image to Video
$0.5000$0.450

kling-v2.5-turbo-pro-i2v

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

11%
Image to Video
$0.4889$0.440

wan2.5-image-to-video-fast

Convert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.

10%
Image to Video
$0.2222$0.200

ovi-image-to-video

Ovi is a unified audio–video generation model that can transform a static image plus a descriptive prompt into a short video with synchronized audio. It supports both text-to-video and image-conditioned video inputs. With built-in lip sync, background audio / sound effects, and dialogue support, Ovi brings still visuals to life in cinematic fashion. Videos are generated in 540p resolution.

11%
Image to Video
$2.6667$2.400

openai-sora-2-pro-image-to-video

Sora 2 Pro I2V brings still images to life, transforming them into short videos with natural motion, realistic lighting, and synchronized audio. Upload your image, describe the movement (camera motion, subject action, ambience), add optional dialogue or sound effects, and watch it animate. Ideal for cinematic reveals, promo videos, social content, or storytelling from a static photo.

10%
Image to Video
$0.4444$0.400

leonardoai-motion-2.0

Motion 2.0 is Leonardo.AI's cutting-edge model for creating high-quality 5-second videos from text prompts. It offers enhanced control over animation, including camera movements, lighting, and scene dynamics.

11%
Image to Video
$2.7778$2.500

veo3-image-to-video

VEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.

11%
Image to Video
$0.6667$0.600

veo3.1-fast-image-to-video

Veo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.

11%
Image to Video
$0.6667$0.600

veo3.1-reference-to-video

Veo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.

11%
Image to Video
$0.0667$0.060

seedance-pro-i2v-fast

Seedance Pro Fast is the high-speed image-to-video generation variant from ByteDance’s Seedance series. With this model you upload a reference image and—using a text prompt—generate short, dynamic video clips (typically 3-12 seconds) featuring smooth motion, cinematic camera moves, prompt-accurate actions, and high visual fidelity. It supports resolutions up to 1080p, multiple aspect ratios (16:9, 9:16, etc.), and rapid turnaround—ideal for social content, product motion, storytelling from a still, and fast prototyping.

10%
Image to Video
$0.5111$0.460

ltx-2-pro-image-to-video

LTX-2 Pro is the high-fidelity video-generation engine by Lightricks designed for professional workflows, supporting both text-to-video and image-to-video inputs. It enables realistic motion, synchronized audio-video, cinematic camera moves and stylized visuals. Ideal for your timeline-based video interface: you supply a prompt or image, define duration/aspect ratio, then it generates a clip that you can ingest, rename, batch-move, split or timeline-edit.

10%
Image to Video
$0.5111$0.460

ltx-2-fast-image-to-video

LTX-2 Fast is a speed-optimized mode of the LTX-2 engine by Lightricks, focused on generating short video clips from a still image + prompt (I2V) with good fidelity and rapid turnaround. It supports audio/video together, multiple aspect ratios, and is ideal when you need quick output for iteration or storyboarding.

10%
Image to Video
$0.0722$0.065

vidu-q2-reference

Vidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.

11%
Image to Video
$0.0667$0.060

vidu-q2-turbo-start-end-video

Vidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.

10%
Image to Video
$0.7000$0.630

minimax-hailuo-2.3-pro-i2v

Hailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.

11%
Image to Video
$0.4000$0.360

minimax-hailuo-2.3-standard-i2v

Hailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.

11%
Image to Video
$0.1667$0.150

grok-imagine-image-to-video

Grok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into cinematic videos from 6 to 30 seconds with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.

11%
Image to Video
$0.8000$0.720

kling-o1-image-to-video

Kling O1’s Image-to-Video mode transforms one or more reference images into short cinematic video clips by adding natural motion, camera choreography, and scene dynamics while preserving subject identity and visual consistency. It supports start/end frames.

11%
Image to Video
$0.8000$0.720

kling-o1-reference-to-video

Kling O1’s Reference-to-Video mode generates a dynamic video using one or multiple reference images as the visual foundation. It preserves identity, style, composition, and key visual details from the references while adding realistic camera motion, environment dynamics, and scene animation.

10%
Image to Video
$1.0000$0.900

kling-v2.6-pro-i2v

Kling-v2.6-Pro Image-to-Video transforms a single creative image into a short cinematic video. It preserves the original style, lighting, and composition while adding smooth camera motion, atmospheric effects, and dynamic environmental animation.

10%
Image to Video
$0.1111$0.100

pixverse-v5.5-i2v

PixVerse v5.5 I2V transforms a single image into a dynamic cinematic video clip. It adds smooth camera motion, atmospheric animation, natural parallax, and environmental effects while preserving the image’s original art style and composition.

10%
Image to Video
$0.2222$0.200

wan2.2-spicy-image-to-video

Wan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.

10%
Image to Video
$0.7222$0.650

wan2.6-image-to-video

WAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.

11%
Image to Video
$0.5556$0.500

kling-o1-standard-image-to-video

Kling O1 Standard Image-to-Video converts a single still image into a short, natural-looking video clip. It preserves the original image’s composition and lighting while adding subtle camera motion, gentle parallax, and light environmental animation. This mode focuses on realism and stability rather than heavy effects, making it ideal for clean cinematic shots, environments, characters, and product visuals.

11%
Image to Video
$0.3778$0.340

seedance-v1.5-pro-i2v

Seedance v1.5 Pro Image-to-Video converts a single still image into a smooth cinematic video clip. It preserves the original image’s composition, subject identity, and lighting while adding controlled camera motion, natural parallax, and environmental animation. This mode balances visual quality and motion complexity, making it ideal for cinematic scenes, fantasy worlds, sci-fi environments, and storytelling shots.

11%
Image to Video
$0.2889$0.260

seedance-v1.5-pro-i2v-fast

Seedance v1.5 Pro Image-to-Video Fast converts a single still image into a short cinematic video with quick generation speed. It preserves the original image’s composition, subject identity, and lighting while adding simple camera motion, light parallax, and subtle environmental animation.

11%
Image to Video
$0.6667$0.600

ltx-2-19b-image-to-video

LTX-2-19B Image-to-Video animates a single image into a coherent cinematic clip with strong temporal stability. It preserves composition and lighting while adding controlled camera motion, realistic parallax, and subtle environmental dynamics—well suited for grounded scenes, near-future concepts, and story beats.

11%
Image to Video
$0.8000$0.720

kling-v3.0-standard-image-to-video

Kling 3.0 Standard Image-to-Video animates a single input image into a short, realistic video with smooth, stable motion. It prioritizes temporal consistency, natural physics, and subtle camera movement, making it ideal for everyday scenes, travel moments, people, vehicles, and calm cinematic shots.

10%
Image to Video
$0.8333$0.750

sd-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

11%
Image to Video
$0.1156$0.104

ltx-2.3-image-to-video

LTX-2.3 Image-to-Video animates a single image into a coherent cinematic clip. It preserves scene composition and lighting while adding smooth camera motion, parallax, and environmental dynamics. Built on the upgraded LTX-2.3 architecture for sharper output and improved temporal consistency.

11%
Image to Video
$0.8000$0.720

kling-v3.0-pro-image-to-video

Kling 3.0 Pro Image-to-Video animates a single input image into a high-quality, realistic video with smooth camera motion, natural physics, and strong temporal consistency. It excels at real-world scenes, human motion, environmental details, and cinematic movement while preserving the original image’s structure and lighting.

11%
Image to Video
$0.3278$0.295

pixverse-v6-i2v

Animate any image into a video using PixVerse V6. Supports resolutions up to 1080p, durations up to 15 seconds, and prompt-based motion control.

10%
Image to Video
$0.3333$0.300

veo3.1-lite-image-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation from images.

10%
Image to Video
$0.3333$0.300

pixverse-v6-transition

Create a smooth transition between two images (start and end) or from a single starting image to a generated video.

11%
Image to Video
$0.6667$0.600

sd-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

10%
Image to Video
$0.8333$0.750

sd-2-image-to-video-fast

SD 2 Image-to-Video (Fast) by ByteDance. Quickly animates a start-frame image into video with 4–15 second duration at reduced cost.

11%
Image to Video
$1.3889$1.250

sd-2-first-last-frame

SD 2 First & Last Frame (Pro) by ByteDance. Generate video that transitions between two reference images. Provide 1 image for start-frame-only, or 2 images for both start and end frames.

10%
Image to Video
$0.8333$0.750

sd-2-first-last-frame-fast

SD 2 First & Last Frame (Fast) by ByteDance. Quickly generate video that transitions between reference images at reduced cost. Provide 1 or 2 images.

11%
Image to Video
$1.6667$1.500

sd-2-vip-image-to-video

SD 2 Image-to-Video VIP (Pro) by ByteDance. Animates a start-frame image into a high-quality video with priority routing, native audio, 4–15 second duration, and 2K resolution.

11%
Image to Video
$1.1667$1.050

sd-2-vip-image-to-video-fast

SD 2 Image-to-Video VIP Fast by ByteDance. Faster animation of a start-frame image with priority routing, 4–15 second duration, and 2K resolution.

11%
Image to Video
$1.6667$1.500

sd-2-vip-first-last-frame

SD 2 First & Last Frame VIP (Pro) by ByteDance. Generate video that transitions between two reference images with priority routing. Provide 1 image for start-frame-only, or 2 images for both start and end frames.

11%
Image to Video
$1.1667$1.050

sd-2-vip-first-last-frame-fast

SD 2 First & Last Frame VIP Fast by ByteDance. Faster generation of video transitions between two reference images with priority routing.

11%
Image to Video
$1.1667$1.050

sd-2-vip-omni-reference-fast

SD 2 Omni Reference VIP Fast by ByteDance. Faster video generation using up to 9 image references, up to 3 video clips, and up to 3 audio references with priority routing. Reference materials in your prompt with @image1…@image9, @video1…@video3, and @audio1…@audio3.

happy-horse-1-image-to-video-1080p
10%
Image to Video
$2.0000$1.800

happy-horse-1-image-to-video-1080p

Happy Horse 1.0 Image to Video — bring still images to life with fluid, expressive animation and fine-grained motion control.

happy-horse-1-image-to-video-720p
10%
Image to Video
$1.0000$0.900

happy-horse-1-image-to-video-720p

Happy Horse 1.0 Image to Video (720p) — bring still images to life with fluid, expressive animation at 720p output resolution.

veo-4-image-to-video
10%
Image to Video
$3.3333$3.000

veo-4-image-to-video

Veo 4 Image to Video — animate any still image with Veo 4's motion synthesis engine, supporting fine-grained camera control and realistic physics at up to 1080p.

10%
Image to Video
$3.7500$3.375

sd-2-vip-image-to-video-1080p

SD 2 Image-to-Video VIP 1080p by ByteDance. Animates a still image into a cinematic 1080p video with priority routing, 4–15 second duration.

10%
Image to Video
$3.7500$3.375

sd-2-vip-omni-reference-1080p

SD 2 Omni Reference VIP 1080p by ByteDance. Generate full HD videos using up to 9 image references, up to 3 video clips, and up to 3 audio references with priority routing. Reference materials in your prompt with @image1…@image9, @video1…@video3, and @audio1…@audio3.

10%
Image to Video
$3.7500$3.375

sd-2-vip-first-last-frame-1080p

SD 2 First & Last Frame VIP 1080p by ByteDance. Generate 1080p video that transitions between two reference images with priority routing. Provide 1 image for start-frame-only, or 2 images for both start and end frames.

10%
Image to Video
$0.8333$0.750

vidu-q3-pro-image-to-video

Vidu Q3 Pro Image-to-Video animates a single starting image into a smooth, prompt-guided clip up to 1080p. It preserves character identity, lighting, and composition while introducing natural motion, camera moves, and atmosphere — ideal for bringing concept art, product shots, and stills to life.

10%
Image to Video
$0.8333$0.750

vidu-q3-pro-first-last-frames

Vidu Q3 Pro First-Last Frames interpolates a smooth, cinematic transition between two key images — your start frame and end frame — guided by a text prompt. Perfect for transformation reveals, scene transitions, product morphs, and storytelling beats that need a clean, controlled arc from A to B.

10%
Image to Video
$0.3333$0.300

vidu-q3-turbo-first-last-frames

Vidu Q3 Turbo First-Last Frames interpolates a quick, cost-efficient transition between two key images — your start frame and end frame — guided by a text prompt. Great for transformation reveals, transitions, and short-form storytelling at scale.

10%
Image to Video
$0.2222$0.200

vidu-q2-pro-image-to-video

Vidu Q2 Pro Image-to-Video animates a single starting image into a smooth, prompt-guided clip up to 1080p while preserving subject identity, lighting, and composition.

10%
Image to Video
$0.1444$0.130

vidu-q2-turbo-image-to-video

Vidu Q2 Turbo Image-to-Video animates a starting image into a fast, prompt-guided clip while preserving subject identity. Built for speed and cost efficiency.

happy-horse-1-reference-to-video-1080p
10%
Image to Video
$2.3333$2.100

happy-horse-1-reference-to-video-1080p

Happy Horse 1.0 Reference to Video (1080p) - generate expressive 1080p video clips conditioned on 1-9 reference images plus a text prompt.

11%
Image to Video
$0.4667$0.420

kling-v3.0-omni-standard-image-to-video

Kling v3 Omni at 720P. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

10%
Image to Video
$0.6222$0.560

kling-v3.0-omni-pro-image-to-video

Kling v3 Omni at 1080P. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

11%
Image to Video
$1.3889$1.250

sd-2-omni-reference-no-video

SD 2 Omni Reference by ByteDance. Generate videos using up to 9 image references and up to 3 audio references. Reference images in your prompt with @image1, @image2, etc. and audio with @audio1, @audio2, etc.

10%
Image to Video
$2.9761$2.679

kling-v3.0-omni-4k-image-to-video

Kling v3 Omni at 4K. Multi-image reference video generation — supply up to 4 images and reference them in your prompt with <<<image_N>>>. Apimart-backed.

10%
Image to Video
$0.3333$0.300

kling-v2.1-master-i2v

Kling 2.1 Master’s I2V animates a still image into a coherent video sequence. It interprets motion, environment, and context to create realistic, visually stunning video outputs — ideal for animating portraits, scenes, or concept art.

10%
Image to Video
$0.4444$0.400

kling-v2.1-pro-i2v

Kling 2.1 Pro is the high-end version of Kuaishou’s video generation model, offering enhanced realism, longer motion sequences, and cinematic quality. In I2V mode, it animates static images with fluid environmental effects.

runway-act-two-i2v
11%
Image to Video
$0.0778$0.070

runway-act-two-i2v

Upload a single character image and a driving video — the model transfers facial expressions and head movements from the video onto your image, bringing it to life. It works with photos, illustrations, or stylized portraits, making them speak, blink, and move naturally. Ideal for avatars, AI presenters, digital actors, and story scenes.

10%
Image to Video
$0.3333$0.300

wan2.1-image-to-video

Animate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.

10%
Image to Video
$0.3333$0.300

pixverse-v4.5-i2v

Upload an image and PixVerse v4.5 will breathe life into it with smooth camera motion, realistic effects, and animated elements. Whether it’s a portrait, landscape, or concept art, this mode turns still visuals into dynamic short videos.

10%
Image to Video
$0.1111$0.100

wan2.1-reference-video

WAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.

10%
Image to Video
$0.3333$0.300

hf-dop-image-to-video

Higgsfield’s DOP (Director of Photography) Motion Effects empower creators to combine cinematic camera moves with built-in visual effects—like explosions, fire, distortion, disintegration, and transitions—directly in AI video generation. You choose from a library of motion presets (e.g. Earth Zoom, Bullet Time, Dolly Zoom) and overlay dynamic effects that accentuate storytelling without needing a full VFX pipeline.

10%
Image to Video
$0.1111$0.100

seedance-lite-reference-video

Seedance Lite's Reference-to-Video feature allows you to supply up to 4 images as reference inputs. The model intelligently blends aspects from these images to generate a cohesive, high-quality video.

10%
Image to Video
$0.7222$0.650

wan2.5-image-to-video

WAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.

11%
Image to Video
$2.7778$2.500

veo3.1-image-to-video

Veo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.

11%
Image to Video
$0.8889$0.800

openai-sora-2-image-to-video

Sora 2’s I2V lets you bring still images to life by animating them into short video clips with natural motion, audio, and visual effects. While realistic portraits of people aren’t allowed at launch, you can use objects, landscapes, stylized characters or scenes. Use detailed prompts for camera movement, atmosphere, and pacing to get the best results.

11%
Image to Video
$0.2667$0.240

minimax-hailuo-2.3-fast

Minimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.

10%
Image to Video
$0.1444$0.130

vidu-q2-pro-start-end-video

Vidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.

10%
Image to Video
$0.3111$0.280

kling-v2.5-turbo-std-i2v

Kling 2.5 Turbo Std: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

11%
Image to Video
$0.8000$0.720

kling-o1-standard-reference-to-video

Kling O1 Standard Reference-to-Video generates a smooth, realistic video using one or multiple reference images as visual guidance. It preserves the visual identity, composition, and lighting from the references while adding subtle camera motion, natural parallax, and light environmental animation. This mode prioritizes stability and realism, making it ideal for character shots, environments, product visuals, and calm cinematic scenes.

10%
Image to Video
$0.3333$0.300

openai-sora-2-standard-image-to-video

OpenAI Sora 2 Standard Image to Video model (High Priority). Generate stunning 10s videos from an image and text prompt.

11%
Image to Video
$1.6667$1.500

sd-2-omni-reference

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

wan2.7-image-to-video
10%
Image to Video
$0.1111$0.100

wan2.7-image-to-video

Alibaba WAN 2.7 converts images into videos with optional audio.

wan2.7-reference-to-video
10%
Image to Video
$0.1111$0.100

wan2.7-reference-to-video

Alibaba WAN 2.7 Reference-to-Video. Reference characters/props to generate new shots.

11%
Image to Video
$1.3889$1.250

sd-2-image-to-video

SD 2 Image-to-Video (Pro) by ByteDance. Animates a start-frame image into a high-quality video with native audio, 4–15 second duration, and 2K resolution.

11%
Image to Video
$1.6000$1.440

sd-2-omni-reference-480p

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

10%
Image to Video
$0.8333$0.750

sd-2-omni-reference-no-video-fast

SD 2 Omni Reference (Fast) by ByteDance. Quickly generate videos using up to 9 image references and up to 3 audio references at reduced cost. Reference images in your prompt with @image1, @image2, etc. and audio with @audio1, @audio2, etc.

11%
Image to Video
$1.6667$1.500

sd-2-vip-omni-reference

SD 2 Omni Reference VIP (Pro) by ByteDance. Generate videos using up to 9 image references, up to 3 video clips, and up to 3 audio references with priority routing. Reference materials in your prompt with @image1…@image9, @video1…@video3, and @audio1…@audio3. Also supports @omni-character:<char_id> for trained characters.

10%
Image to Video
$2.2222$2.000

kling-v3.0-4k-image-to-video

Kling 3.0 4K Image-to-Video animates a single input image into ultra-high-resolution 3840×2160 video with smooth camera motion, natural physics, and strong temporal consistency. 4K mode delivers the sharpest detail in Kling 3.0 — ideal for cinematic shots, product showcases, and premium content where pixel-level clarity matters.

10%
Image to Video
$0.3333$0.300

vidu-q3-turbo-image-to-video

Vidu Q3 Turbo Image-to-Video animates a starting image into a fast, prompt-guided clip while keeping subject identity and composition intact. Built for speed and cost efficiency — perfect for batch animation, social content, and quick creative exploration.

happy-horse-1-reference-to-video-720p
11%
Image to Video
$1.1667$1.050

happy-horse-1-reference-to-video-720p

Happy Horse 1.0 Reference to Video (720p) - generate expressive 720p video clips conditioned on 1-9 reference images plus a text prompt.

10%
Image to Video
$0.3333$0.300

ai-video-effects

AI Video Effects applies advanced visual transformations, color grading, and cinematic filters to create stunning videos from images.

Frequently Asked Questions

What image formats are supported?

Pass a public URL to a JPEG, PNG, or WebP image — or upload via `POST /api/v1/upload_file` first to get a hosted URL.

How do I keep the subject consistent across multiple shots?

Use the same reference image across multiple image-to-video calls with different prompts. For tighter consistency, use the workflow builder to chain a face-preservation node between clips.

Which model handles complex camera moves best?

Kling Pro and Veo 3 handle dolly, pan, and orbit moves most reliably. Pass camera instructions in the prompt — most models parse English camera terminology.