Explore/muapi.ai/seedance-2-omni-reference

muapi/seedance-2-omni-reference

Image to Video

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Input

Configure the model parameters below.

Prompt* requiredVideo description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a character sheet, reference it with @character:<request_id> (from a completed Seedance 2 Character generation). To use a trained Omni Reference character, reference it with @omni-character:<character_id> where character_id is the value returned by Omni Reference Train Character (e.g. char_1775422630065_4vbana). Both methods can be combined in the same prompt. Multiple characters are supported. Example: '@omni-character:char_1775422630065_4vbana walking through a neon-lit city at night'.

Image URLsUp to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.0/9 items

Drag & drop images here or paste file/image

+Add

Video Reference URLsUp to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.0/3 items

Drag & drop videos here, paste file, or paste a link

Upload

Audio Reference URLsUp to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.0/3 items

Drag & drop audios here, paste file, or paste a link

Upload

Aspect RatioOutput video aspect ratio. (Default: 16:9)

QualityGeneration quality. 'high' uses the standard model ($0.30/sec output + $0.09/sec per input video second). 'basic' uses the fast model (~2x speed, $0.21/sec output + $0.063/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration. (Default: high)

Duration (seconds)Video duration in seconds (4–15).

Result

$0.30/sec ($1.50 for 5s, $3.00 for 10s, $4.50 for 15s)— Flat per-second billing with no surcharges. Supports multi-modal references (image + video + audio) in a single request.

🚀Related Models

View all

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image

seedance-2-t2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Text to Video

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-omni-reference-480p

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

📝

Overview

About this model

SD 2.0 Omni Reference generates videos with visual consistency using reference images, videos, and audio. Unlike standard Image-to-Video which animates a single image, Omni Reference uses your uploaded materials as creative guides — maintaining character identity, visual style, and scene continuity. Combine up to 9 images, 3 video clips, and 3 audio files in a single request. Use @image1, @video1, @audio1 syntax in your prompt to precisely control how each reference influences the generated video.

1Character Consistency: Keep a character's appearance consistent across multiple scenes by providing a portrait as @image1.

2Style Transfer: Apply the visual style of a reference image to a newly generated video scene.

3Audio-Synced Video: Generate video synchronized to a reference music clip or voice recording via @audio1.

4Scene Continuity: Provide a scene screenshot and generate a visually matching continuation.

5Multi-Modal Control: Combine a character image + a motion video + background audio for rich creative control.

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.30/sec ($1.50 for 5s, $3.00 for 10s, $4.50 for 15s)	Flat per-second billing with no surcharges. Supports multi-modal references (image + video + audio) in a single request.
Fal.ai	$0.3024/sec (high) / $0.2419/sec (basic)	Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is roughly the same on high ($0.30/sec) and 13% cheaper on basic ($0.21/sec).
Replicate	$0.3024/sec (high) / $0.2419/sec (basic)	Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp is competitive on high quality and 13% cheaper on basic.

muapiapp$0.30/sec ($1.50 for 5s, $3.00 for 10s, $4.50 for 15s)

Flat per-second billing with no surcharges. Supports multi-modal references (image + video + audio) in a single request.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is roughly the same on high ($0.30/sec) and 13% cheaper on basic ($0.21/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp is competitive on high quality and 13% cheaper on basic.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a character sheet, reference it with @character:<request_id> (from a completed Seedance 2 Character generation). To use a trained Omni Reference character, reference it with @omni-character:<character_id> where character_id is the value returned by Omni Reference Train Character (e.g. char_1775422630065_4vbana). Both methods can be combined in the same prompt. Multiple characters are supported. Example: '@omni-character:char_1775422630065_4vbana walking through a neon-lit city at night'.	`@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting`
Image URLs	array	Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.	`https://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png`
Video Reference URLs	array	Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.	`undefined`
Audio Reference URLs	array	Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.	`undefined`
Aspect Ratio	Enum (6 options)	Output video aspect ratio.	`16:9`
Quality	Enum (2 options)	Generation quality. 'high' uses the standard model ($0.30/sec output + $0.09/sec per input video second). 'basic' uses the fast model (~2x speed, $0.21/sec output + $0.063/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.	`high`
Duration (seconds)	int	Video duration in seconds (4–15).	`5`

Promptstring

Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a character sheet, reference it with @character:<request_id> (from a completed Seedance 2 Character generation). To use a trained Omni Reference character, reference it with @omni-character:<character_id> where character_id is the value returned by Omni Reference Train Character (e.g. char_1775422630065_4vbana). Both methods can be combined in the same prompt. Multiple characters are supported. Example: '@omni-character:char_1775422630065_4vbana walking through a neon-lit city at night'.

Default Value@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting

Image URLsarray

Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png

Video Reference URLsarray

Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.

Default Valueundefined

Audio Reference URLsarray

Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.

Default Valueundefined

Aspect RatioEnum (6 options)

Output video aspect ratio.

Default Value16:9

QualityEnum (2 options)

Generation quality. 'high' uses the standard model ($0.30/sec output + $0.09/sec per input video second). 'basic' uses the fast model (~2x speed, $0.21/sec output + $0.063/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.

Default Valuehigh

Duration (seconds)int

Video duration in seconds (4–15).

Default Value5

📖

Implementation Guide

Developer documentation

Upload reference images (JPEG/PNG/WebP) as 'images_list' — up to 9 images.
Optionally upload video clips (MP4, max 15s each) as 'video_files' — up to 3 videos.
Optionally upload audio files (MP3/WAV) as 'audio_files' — up to 3 files, total max 15s.
Write a prompt describing the scene. Reference files with @image1…@image9 for images, @video1…@video3 for videos, @audio1…@audio3 for audio.
Set duration (4–15s) and aspect ratio.
Poll or use webhook to retrieve the completed video.

❓

Common Questions

Frequently asked

How is Omni Reference different from Image-to-Video?

Image-to-Video uses your image as a literal first frame and animates it. Omni Reference uses images, videos, and audio as creative guides — the model generates a new scene that visually matches your references without using them as literal frames. It also supports video and audio references, which Image-to-Video does not.

How do I reference my uploaded files in the prompt?

Use @image1, @image2, etc. to reference images by position in images_list. Use @video1, @video2 for videos by position in video_files. Use @audio1, @audio2 for audio files by position in audio_files. If you don't use @ syntax, the first reference is automatically used as the primary reference.

Do I need to provide all types of references?

No. All reference arrays are optional. You can provide just images, just a video, just audio, or any combination. A text-only prompt is also valid.

What file formats are supported?

Images: JPEG, PNG, or WebP (up to 9). Videos: MP4 only, max 15 seconds each (up to 3). Audio: MP3, WAV, or other common formats, total max 15 seconds (up to 3 files). All URLs must be publicly accessible.

How is cost calculated?

Cost = (rate × output_duration) + (0.3 × rate × total_input_video_duration). 'high' quality: $0.30/sec output. 'basic' quality: $0.21/sec output. If video_files are provided, an additional 30% surcharge applies per second of combined input video duration. Example: 5s output (high) + two 5s input videos = 5×$0.30 + 10×$0.09 = $1.50 + $0.90 = $2.40.

What is the difference between 'high' and 'basic' quality?

'high' uses the standard model for best quality output. 'basic' uses the fast model which generates at approximately 2x speed with slightly reduced quality — ideal for quick iterations and previews. Existing requests without a quality field default to 'high'.

What aspect ratios are supported?

16:9, 9:16, 1:1, 4:3, 3:4, and 21:9. Default is 16:9.

ai-image-face-swap

youtube-fetch-shorts

mmaudio-v2-text-to-audio

perfect-pony-xl

ai-product-shot

omnihuman-1-5

kling-v3-turbo-pro-text-to-video

ai-skin-enhancer

flux-kontext-dev-i2i

veo3-fast-text-to-video

bytedance-seededit-v3

infinitetalk-image-to-video

happy-horse-1.1-text-to-video-1080p

happy-horse-1.1-image-to-video-1080p

flux-2-pro-edit

happy-horse-1.1-text-to-video-720p

flux-dev-lora

ai-product-photography

ai-image-extension

ai-object-eraser

flux-kontext-pro-i2i

happy-horse-1.1-image-to-video-720p

minimax-image-01-subject-reference

veed-lipsync

wan2.2-edit-video

ovi-image-to-video

openai-sora-2-pro-text-to-video

happy-horse-1.1-reference-to-video-1080p

happy-horse-1.1-reference-to-video-720p

vidu-q3-turbo-text-to-video

happy-horse-1.1-video-edit-1080p

nano-banana-pro-edit

qwen-image-edit-2511

happy-horse-1.1-video-edit-720p

gemini-omni-image-to-video

kling-v3.0-std-motion-control

pixverse-v6-t2v

tiktok-fetch-profile

gpt-image-2-text-to-image

wan2.5-text-to-image

topaz-video-upscale

happy-horse-1-reference-to-video-1080p

ai-video-upscaler-pro

happy-horse-1-video-edit-720p

kling-v3.0-omni-standard-text-to-video

leonardoai-lucid-origin

ltx-2-fast-text-to-video

kling-o1-text-to-video

kling-v2.6-pro-motion-control

flux-2-klein-9b

kling-o3-image

meshy-6-image-to-3d

kling-v2.1-standard-i2v

kling-v3.0-standard-image-to-video

ai-captions

flux-2-klein-9b-turbo

suno-generate-sounds

suno-generate-lyrics

seedance-2-character

veo3.1-lite-text-to-video

youtube-publish

seedance-2-mini-image-to-video

gpt-codex

wan2.7-text-to-image-pro

grok-imagine-video-1-5-preview

seedance-2-vip-text-to-video

gemini-3-1-pro

ai-background-remover

tripo3d-h31-text-to-3d

tripo3d-h31-image-to-3d

suno-remix-music

gemini-omni-audio

veo3-image-to-video

kling-v2.1-pro-i2v

flux-schnell

wan2.2-image-to-video

wan2.2-text-to-video

vidu-v2.0-i2v

claude-opus-4-8

qwen-image-edit-plus