Explore/muapi.ai/vidu-q2-reference

muapi/vidu-q2-reference

Image to Video

Vidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.

Result

🚀Related Models

View all

vidu-q2-reference-to-image

VIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.

Image to Image

vidu-q2-turbo-image-to-video

Vidu Q2 Turbo Image-to-Video animates a starting image into a fast, prompt-guided clip while preserving subject identity. Built for speed and cost efficiency.

Image to Video

vidu-q2-turbo-text-to-video

Vidu Q2 Turbo Text-to-Video is the fast, affordable Q2 tier for prompt-only generation. Use it for storyboards, social cuts, and high-volume work where speed and cost matter.

Text to Video

vidu-q2-pro-text-to-video

Vidu Q2 Pro Text-to-Video generates cinematic, prompt-faithful clips from text alone with strong temporal consistency and rich detail at up to 1080p. Pick this when you need polished output without a reference frame.

Text to Video

vidu-q2-turbo-start-end-video

Vidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.

Image to Video

vidu-q2-pro-start-end-video

Vidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.

Image to Video

vidu-q2-text-to-image

VIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.

Text to Image

vidu-q2-pro-image-to-video

Vidu Q2 Pro Image-to-Video animates a single starting image into a smooth, prompt-guided clip up to 1080p while preserving subject identity, lighting, and composition.

Image to Video

📝

Overview

About this model

Vidu Q2 Reference Video is a state-of-the-art image-to-video generation model that transforms text prompts and multiple reference images into breathtaking cinematic clips. Leveraging advanced deep learning techniques and sophisticated image processing technology, it meticulously refines each frame’s subject, environment, and visual tone to ensure perfect consistency in appearance and motion. The model’s ability to merge detailed textual descriptions with visual references sets a new benchmark for creative video production.

This robust technology is not only capable of generating high-quality videos at resolutions up to 1080p, but it also offers customizable parameters such as aspect ratio, duration, and movement amplitude. Whether used for professional filmmaking, advertising, or social media content creation, Vidu Q2 Reference Video gives creators unparalleled control and flexibility, enabling them to bring their artistic visions to life with cinematic precision and flair.

1Creating cinematic trailers and teasers for films and video games.

2Generating high-impact advertising videos and promotional content.

3Storyboarding and visual development for film and animation projects.

4Producing immersive social media content with dynamic motion effects.

5Enhancing virtual presentations with consistent and visually appealing video clips.

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.065 per generation	Offers exceptional quality and is 20-50% more affordable than leading competitors.
Fal.ai	$0.10 per generation	Priced higher than muapiapp, making muapiapp a more cost-effective choice with comparable or superior quality.
Replicate	$0.10 per generation	Similarly priced to Fal.ai. Muapiapp delivers a 20-50% cost saving while matching their performance and quality.

muapiapp$0.065 per generation

Offers exceptional quality and is 20-50% more affordable than leading competitors.

Fal.ai$0.10 per generation

Priced higher than muapiapp, making muapiapp a more cost-effective choice with comparable or superior quality.

Replicate$0.10 per generation

Similarly priced to Fal.ai. Muapiapp delivers a 20-50% cost saving while matching their performance and quality.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	The prompt to generate the video	`The female explorer walks slowly across the alien terrain, crystals glimmering around her. The camera glides beside her as light from twin suns scatters across her reflective suit. Wind stirs the mist as she looks up toward the horizon, where a colossal planet looms above — evoking awe and wonder.`
Image URLs	array	Upload or provide image urls. Used for image-to-video generation.	`https://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/vidu-q2-reference-1.jpg`
Resolution	Enum (4 options)	The resolution of the generated video.	`720p`
Aspect Ratio	Enum (5 options)	Aspect ratio of the output video.	`16:9`
Duration	int	The duration of the generated video in seconds	`5`
Movement Amplitude	Enum (4 options)	The movement amplitude of objects in the frame.	`auto`

Promptstring

The prompt to generate the video

Default Value

The female explorer walks slowly across the alien terrain, crystals glimmering around her. The camera glides beside her as light from twin suns scatters across her reflective suit. Wind stirs the mist as she looks up toward the horizon, where a colossal planet looms above — evoking awe and wonder.

Image URLsarray

Upload or provide image urls. Used for image-to-video generation.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/vidu-q2-reference-1.jpg

ResolutionEnum (4 options)

The resolution of the generated video.

Default Value720p

Aspect RatioEnum (5 options)

Aspect ratio of the output video.

Default Value16:9

Durationint

The duration of the generated video in seconds

Default Value5

Movement AmplitudeEnum (4 options)

The movement amplitude of objects in the frame.

Default Valueauto

📖

Implementation Guide

Developer documentation

How to Use Vidu Q2 Reference Video

Prepare Your Inputs
- Craft a detailed text prompt that describes the scene, including key details such as subject, environment, and desired mood.
- Select and upload multiple reference images (up to 7) that best represent your envisioned visuals. These images will guide the model in refining the frame-by-frame details.
Set Your Parameters
- Choose the desired resolution (360p, 540p, 720p, or 1080p).
- Pick an aspect ratio from options such as 16:9, 9:16, 4:3, 3:4, or 1:1 to match your video format needs.
- Define the duration of the video, ensuring it falls within the range of 2 to 8 seconds.
- Select the movement amplitude (auto, small, medium, or large) to control the dynamic motions within the frame.
Generate and Review
- Submit your inputs to initiate the video generation process. The model processes the data, leveraging your text and image references to produce a cinematic clip.
- Once generated, review the video to ensure it meets your creative criteria. Adjust inputs if necessary and regenerate for refinements.
Download and Share
- Save the high-quality video output and use it directly in your projects, presentations, or online platforms.
- Share your work with peers and audiences to showcase the innovative use of AI-driven video creation.

❓

Common Questions

Frequently asked

What resolutions are supported by Vidu Q2 Reference Video?

The model supports multiple resolutions, including 360p, 540p, 720p (default), and 1080p, allowing you to choose the best quality for your needs.

How do the reference images influence the generated video?

The provided reference images help guide the model by refining details like subject appearance, environmental elements, and overall visual tone. This ensures consistency in motion and appearance throughout every frame of the video.

What is the maximum number of reference images I can use?

You can upload up to 7 reference images to guide the video generation process.

How customizable is the video output?

The model allows you to adjust several parameters including resolution, aspect ratio, duration, and movement amplitude, giving you full control over the cinematic output.

minimax-hailuo-02-standard-t2v

meshy-6-image-to-3d

pixverse-v5-t2v

veo3-fast-text-to-video

kling-v1-avatar-pro

meshy-6-multi-image-to-3d

ai-product-photography

flux-kontext-dev-i2i

gemini-3-1-pro

gpt-image-1.5

ovi-text-to-video

minimax-hailuo-2.3-pro-i2v

happy-horse-1-text-to-video-720p

kling-v2.1-standard-i2v

pixverse-v6-i2v

wan2.2-image-to-video

veed-lipsync

vidu-v2.0-i2v

minimax-image-01-subject-reference

flux-pulid

latent-sync

infinitetalk-image-to-video

bytedance-seededit-v3

flux-redux

kling-v2.5-turbo-pro-i2v

wan2.2-animate

ai-background-remover

wan2.5-text-to-image

topaz-video-upscale

leonardoai-motion-2.0

ai-object-eraser

ovi-image-to-video

minimax-hailuo-2.3-pro-t2v

mmaudio-v2-text-to-audio

flux-dev-lora

vidu-q2-reference-to-image

minimax-speech-2.6-turbo

veo3.1-4k-video

kling-v3.0-std-motion-control

flux-kontext-pro-i2i

ai-skin-enhancer

suno-generate-lyrics

sd-2-character

ai-product-shot

ai-image-extension

veo3.1-fast-image-to-video

sd-2-image-to-video

wan2.2-edit-video

openai-sora-2-pro-text-to-video

ltx-2-pro-text-to-video

kling-v2-avatar-pro

runway-aleph-v2v

qwen-image-2.0-pro-edit

flux-2-klein-9b-turbo

qwen-image-edit-plus

kling-v2.6-pro-motion-control

pixverse-v6-t2v

flux-schnell

sd-2-video-watermark-remover-pro

wan2.7-image-edit

kling-v2.1-pro-i2v

veo3.1-lite-text-to-video

happy-horse-1-image-to-video-1080p

wan2.2-text-to-video

sd-2-vip-first-last-frame-1080p

kling-o3-image

tripo3d-h31-text-to-3d

veo3-image-to-video

openai-sora-2-text-to-video

kling-o1-text-to-video

kling-o1-edit-image

twitter-fetch-posts

gemini-omni-character

grok-imagine-video-1-5-preview

ai-image-face-swap

nano-banana-pro-edit

facebook-fetch-reels

generate-social-video-script

omnihuman-1-5

hidream-i1-full