Explore/muapi.ai/seedance-2-omni-reference

muapi/seedance-2-omni-reference

Image to Video

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Input

Configure the model parameters below.

0/9 items
Drag & drop images here or paste file/image
0/3 items
Drag & drop videos here, paste file, or paste a link
0/3 items
Drag & drop audios here, paste file, or paste a link

Result

$0.30/sec ($1.50 for 5s, $3.00 for 10s, $4.50 for 15s)Flat per-second billing with no surcharges. Supports multi-modal references (image + video + audio) in a single request.

🚀Related Models

View all
seedance-2-character

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image
seedance-2-t2v

seedance-2-t2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Text to Video
seedance-2-watermark-remover

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video
seedance-2-video-watermark-remover-pro

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video
seedance-2-i2v-480p

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video
seedance-2-omni-reference-train

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training
seedance-2-i2v

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video
seedance-2-video-edit

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video
seedance-2-extend

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-omni-reference-480p

seedance-2-omni-reference-480p

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video
seedance-2-t2v-480p

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video
seedance-2-vip-extend

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-vip-extend-1080p

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
📝

Overview

About this model

SD 2.0 Omni Reference generates videos with visual consistency using reference images, videos, and audio. Unlike standard Image-to-Video which animates a single image, Omni Reference uses your uploaded materials as creative guides — maintaining character identity, visual style, and scene continuity. Combine up to 9 images, 3 video clips, and 3 audio files in a single request. Use @image1, @video1, @audio1 syntax in your prompt to precisely control how each reference influences the generated video.

1Character Consistency: Keep a character's appearance consistent across multiple scenes by providing a portrait as @image1.
2Style Transfer: Apply the visual style of a reference image to a newly generated video scene.
3Audio-Synced Video: Generate video synchronized to a reference music clip or voice recording via @audio1.
4Scene Continuity: Provide a scene screenshot and generate a visually matching continuation.
5Multi-Modal Control: Combine a character image + a motion video + background audio for rich creative control.
💰

Pricing & Value

Cost analysis

muapiapp$0.30/sec ($1.50 for 5s, $3.00 for 10s, $4.50 for 15s)

Flat per-second billing with no surcharges. Supports multi-modal references (image + video + audio) in a single request.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is roughly the same on high ($0.30/sec) and 13% cheaper on basic ($0.21/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp is competitive on high quality and 13% cheaper on basic.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Promptstring

Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a character sheet, reference it with @character:<request_id> (from a completed Seedance 2 Character generation). To use a trained Omni Reference character, reference it with @omni-character:<character_id> where character_id is the value returned by Omni Reference Train Character (e.g. char_1775422630065_4vbana). Both methods can be combined in the same prompt. Multiple characters are supported. Example: '@omni-character:char_1775422630065_4vbana walking through a neon-lit city at night'.

Default Value@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting
Image URLsarray

Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png
Video Reference URLsarray

Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.

Default Valueundefined
Audio Reference URLsarray

Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.

Default Valueundefined
Aspect RatioEnum (6 options)

Output video aspect ratio.

Default Value16:9
QualityEnum (2 options)

Generation quality. 'high' uses the standard model ($0.30/sec output + $0.09/sec per input video second). 'basic' uses the fast model (~2x speed, $0.21/sec output + $0.063/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.

Default Valuehigh
Duration (seconds)int

Video duration in seconds (4–15).

Default Value5
📖

Implementation Guide

Developer documentation

  1. Upload reference images (JPEG/PNG/WebP) as 'images_list' — up to 9 images.
  2. Optionally upload video clips (MP4, max 15s each) as 'video_files' — up to 3 videos.
  3. Optionally upload audio files (MP3/WAV) as 'audio_files' — up to 3 files, total max 15s.
  4. Write a prompt describing the scene. Reference files with @image1…@image9 for images, @video1…@video3 for videos, @audio1…@audio3 for audio.
  5. Set duration (4–15s) and aspect ratio.
  6. Poll or use webhook to retrieve the completed video.

Common Questions

Frequently asked

How is Omni Reference different from Image-to-Video?

Image-to-Video uses your image as a literal first frame and animates it. Omni Reference uses images, videos, and audio as creative guides — the model generates a new scene that visually matches your references without using them as literal frames. It also supports video and audio references, which Image-to-Video does not.

How do I reference my uploaded files in the prompt?

Use @image1, @image2, etc. to reference images by position in images_list. Use @video1, @video2 for videos by position in video_files. Use @audio1, @audio2 for audio files by position in audio_files. If you don't use @ syntax, the first reference is automatically used as the primary reference.

Do I need to provide all types of references?

No. All reference arrays are optional. You can provide just images, just a video, just audio, or any combination. A text-only prompt is also valid.

What file formats are supported?

Images: JPEG, PNG, or WebP (up to 9). Videos: MP4 only, max 15 seconds each (up to 3). Audio: MP3, WAV, or other common formats, total max 15 seconds (up to 3 files). All URLs must be publicly accessible.

How is cost calculated?

Cost = (rate × output_duration) + (0.3 × rate × total_input_video_duration). 'high' quality: $0.30/sec output. 'basic' quality: $0.21/sec output. If video_files are provided, an additional 30% surcharge applies per second of combined input video duration. Example: 5s output (high) + two 5s input videos = 5×$0.30 + 10×$0.09 = $1.50 + $0.90 = $2.40.

What is the difference between 'high' and 'basic' quality?

'high' uses the standard model for best quality output. 'basic' uses the fast model which generates at approximately 2x speed with slightly reduced quality — ideal for quick iterations and previews. Existing requests without a quality field default to 'high'.

What aspect ratios are supported?

16:9, 9:16, 1:1, 4:3, 3:4, and 21:9. Default is 16:9.