Explore/muapi.ai/vidu-q2-reference

muapi/vidu-q2-reference

Image to Video

Vidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.

Input

Configure the model parameters below.

0/7 items
Drag & drop images here or paste file/image

Result

🚀Related Models

View all
vidu-q2-reference-to-image

vidu-q2-reference-to-image

VIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.

Image to Image
vidu-q2-turbo-image-to-video

vidu-q2-turbo-image-to-video

Vidu Q2 Turbo Image-to-Video animates a starting image into a fast, prompt-guided clip while preserving subject identity. Built for speed and cost efficiency.

Image to Video
vidu-q2-turbo-text-to-video

vidu-q2-turbo-text-to-video

Vidu Q2 Turbo Text-to-Video is the fast, affordable Q2 tier for prompt-only generation. Use it for storyboards, social cuts, and high-volume work where speed and cost matter.

Text to Video
vidu-q2-pro-text-to-video

vidu-q2-pro-text-to-video

Vidu Q2 Pro Text-to-Video generates cinematic, prompt-faithful clips from text alone with strong temporal consistency and rich detail at up to 1080p. Pick this when you need polished output without a reference frame.

Text to Video
vidu-q2-turbo-start-end-video

vidu-q2-turbo-start-end-video

Vidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.

Image to Video
vidu-q2-pro-start-end-video

vidu-q2-pro-start-end-video

Vidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.

Image to Video
vidu-q2-text-to-image

vidu-q2-text-to-image

VIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.

Text to Image
vidu-q2-pro-image-to-video

vidu-q2-pro-image-to-video

Vidu Q2 Pro Image-to-Video animates a single starting image into a smooth, prompt-guided clip up to 1080p while preserving subject identity, lighting, and composition.

Image to Video
📝

Overview

About this model

Vidu Q2 Reference Video is a state-of-the-art image-to-video generation model that transforms text prompts and multiple reference images into breathtaking cinematic clips. Leveraging advanced deep learning techniques and sophisticated image processing technology, it meticulously refines each frame’s subject, environment, and visual tone to ensure perfect consistency in appearance and motion. The model’s ability to merge detailed textual descriptions with visual references sets a new benchmark for creative video production.

This robust technology is not only capable of generating high-quality videos at resolutions up to 1080p, but it also offers customizable parameters such as aspect ratio, duration, and movement amplitude. Whether used for professional filmmaking, advertising, or social media content creation, Vidu Q2 Reference Video gives creators unparalleled control and flexibility, enabling them to bring their artistic visions to life with cinematic precision and flair.

1Creating cinematic trailers and teasers for films and video games.
2Generating high-impact advertising videos and promotional content.
3Storyboarding and visual development for film and animation projects.
4Producing immersive social media content with dynamic motion effects.
5Enhancing virtual presentations with consistent and visually appealing video clips.
💰

Pricing & Value

Cost analysis

muapiapp$0.065 per generation

Offers exceptional quality and is 20-50% more affordable than leading competitors.

Fal.ai$0.10 per generation

Priced higher than muapiapp, making muapiapp a more cost-effective choice with comparable or superior quality.

Replicate$0.10 per generation

Similarly priced to Fal.ai. Muapiapp delivers a 20-50% cost saving while matching their performance and quality.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Promptstring

The prompt to generate the video

Default ValueThe female explorer walks slowly across the alien terrain, crystals glimmering around her. The camera glides beside her as light from twin suns scatters across her reflective suit. Wind stirs the mist as she looks up toward the horizon, where a colossal planet looms above — evoking awe and wonder.
Image URLsarray

Upload or provide image urls. Used for image-to-video generation.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/vidu-q2-reference-1.jpg
ResolutionEnum (4 options)

The resolution of the generated video.

Default Value720p
Aspect RatioEnum (5 options)

Aspect ratio of the output video.

Default Value16:9
Durationint

The duration of the generated video in seconds

Default Value5
Movement AmplitudeEnum (4 options)

The movement amplitude of objects in the frame.

Default Valueauto
📖

Implementation Guide

Developer documentation

How to Use Vidu Q2 Reference Video

  1. Prepare Your Inputs

    • Craft a detailed text prompt that describes the scene, including key details such as subject, environment, and desired mood.
    • Select and upload multiple reference images (up to 7) that best represent your envisioned visuals. These images will guide the model in refining the frame-by-frame details.
  2. Set Your Parameters

    • Choose the desired resolution (360p, 540p, 720p, or 1080p).
    • Pick an aspect ratio from options such as 16:9, 9:16, 4:3, 3:4, or 1:1 to match your video format needs.
    • Define the duration of the video, ensuring it falls within the range of 2 to 8 seconds.
    • Select the movement amplitude (auto, small, medium, or large) to control the dynamic motions within the frame.
  3. Generate and Review

    • Submit your inputs to initiate the video generation process. The model processes the data, leveraging your text and image references to produce a cinematic clip.
    • Once generated, review the video to ensure it meets your creative criteria. Adjust inputs if necessary and regenerate for refinements.
  4. Download and Share

    • Save the high-quality video output and use it directly in your projects, presentations, or online platforms.
    • Share your work with peers and audiences to showcase the innovative use of AI-driven video creation.

Common Questions

Frequently asked

What resolutions are supported by Vidu Q2 Reference Video?

The model supports multiple resolutions, including 360p, 540p, 720p (default), and 1080p, allowing you to choose the best quality for your needs.

How do the reference images influence the generated video?

The provided reference images help guide the model by refining details like subject appearance, environmental elements, and overall visual tone. This ensures consistency in motion and appearance throughout every frame of the video.

What is the maximum number of reference images I can use?

You can upload up to 7 reference images to guide the video generation process.

How customizable is the video output?

The model allows you to adjust several parameters including resolution, aspect ratio, duration, and movement amplitude, giving you full control over the cinematic output.