Explore/muapi.ai/seedance-2-omni-reference-480p

muapi/seedance-2-omni-reference-480p

Image to Video

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Input

Configure the model parameters below.

Prompt* requiredVideo description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a fictional character, reference it with @character:<id> (request_id from a completed Seedance 2 Character generation) — characters are automatically appended to images_list. Multiple characters are supported.

Image URLsUp to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.0/9 items

Drag & drop images here or paste file/image

+Add

Video Reference URLsUp to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.0/3 items

Drag & drop videos here, paste file, or paste a link

Upload

Audio Reference URLsUp to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.0/3 items

Drag & drop audios here, paste file, or paste a link

Upload

Aspect RatioOutput video aspect ratio. (Default: 16:9)

QualityGeneration quality. 'high' uses the standard model ($0.24/sec output + $0.072/sec per input video second). 'basic' uses the fast model ($0.18/sec output + $0.054/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration. (Default: basic)

Duration (seconds)Video duration in seconds (8–15).

Result

$0.24/sec (high) / $0.18/sec (basic)— Per-second billing for 480p output. Video reference inputs add a 30% surcharge per second of combined input video duration.

🚀Related Models

View all

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image

seedance-2-t2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Text to Video

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video

seedance-2-omni-reference

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

📝

Overview

About this model

SD 2.0 Omni Reference 480p generates videos with visual consistency using reference images, videos, and audio at 480p resolution. It offers the same multi-modal reference capabilities as the 720p variant — maintaining character identity, visual style, and scene continuity — at a lower cost. Combine up to 9 images, 3 video clips, and 3 audio files in a single request. Use @image1, @video1, @audio1 syntax in your prompt to precisely control how each reference influences the generated video.

1Character Consistency: Keep a character's appearance consistent across multiple scenes by providing a portrait as @image1.

2Style Transfer: Apply the visual style of a reference image to a newly generated video scene.

3Audio-Synced Video: Generate video synchronized to a reference music clip or voice recording via @audio1.

4Scene Continuity: Provide a scene screenshot and generate a visually matching continuation.

5Draft Previews: Quickly prototype multi-modal video concepts at 480p before committing to 720p generation.

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.24/sec (high) / $0.18/sec (basic)	Per-second billing for 480p output. Video reference inputs add a 30% surcharge per second of combined input video duration.
Fal.ai	$0.3024/sec (high) / $0.2419/sec (basic)	Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 21% cheaper on high ($0.24/sec) and 26% cheaper on basic ($0.18/sec).
Replicate	$0.3024/sec (high) / $0.2419/sec (basic)	Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves 21–26% vs Replicate at 480p resolution.

muapiapp$0.24/sec (high) / $0.18/sec (basic)

Per-second billing for 480p output. Video reference inputs add a 30% surcharge per second of combined input video duration.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 21% cheaper on high ($0.24/sec) and 26% cheaper on basic ($0.18/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves 21–26% vs Replicate at 480p resolution.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a fictional character, reference it with @character:<id> (request_id from a completed Seedance 2 Character generation) — characters are automatically appended to images_list. Multiple characters are supported.	`@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting`
Image URLs	array	Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.	`https://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png`
Video Reference URLs	array	Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.	`undefined`
Audio Reference URLs	array	Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.	`undefined`
Aspect Ratio	Enum (4 options)	Output video aspect ratio.	`16:9`
Quality	Enum (2 options)	Generation quality. 'high' uses the standard model ($0.24/sec output + $0.072/sec per input video second). 'basic' uses the fast model ($0.18/sec output + $0.054/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.	`basic`
Duration (seconds)	int	Video duration in seconds (8–15).	`8`

Promptstring

Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a fictional character, reference it with @character:<id> (request_id from a completed Seedance 2 Character generation) — characters are automatically appended to images_list. Multiple characters are supported.

Default Value@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting

Image URLsarray

Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png

Video Reference URLsarray

Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.

Default Valueundefined

Audio Reference URLsarray

Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.

Default Valueundefined

Aspect RatioEnum (4 options)

Output video aspect ratio.

Default Value16:9

QualityEnum (2 options)

Generation quality. 'high' uses the standard model ($0.24/sec output + $0.072/sec per input video second). 'basic' uses the fast model ($0.18/sec output + $0.054/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.

Default Valuebasic

Duration (seconds)int

Video duration in seconds (8–15).

Default Value8

📖

Implementation Guide

Developer documentation

Upload reference images (JPEG/PNG/WebP) as 'images_list' — up to 9 images.
Optionally upload video clips (MP4, max 15s each) as 'video_files' — up to 3 videos.
Optionally upload audio files (MP3/WAV) as 'audio_files' — up to 3 files, total max 15s.
Write a prompt describing the scene. Reference files with @image1…@image9 for images, @video1…@video3 for videos, @audio1…@audio3 for audio.
Set duration (8–15s) and aspect ratio.
Poll or use webhook to retrieve the completed video.

❓

Common Questions

Frequently asked

How is this different from the 720p Omni Reference?

This endpoint generates 480p video, which is faster and more cost-effective ($0.18/sec basic, $0.24/sec high output) compared to the 720p variant ($0.21/sec basic, $0.30/sec high output). Both variants apply a 30% surcharge on the per-second rate for each second of input video provided. The minimum duration is 8 seconds and aspect ratio options are limited to 16:9, 9:16, 4:3, and 3:4.

How do I reference my uploaded files in the prompt?

Use @image1, @image2, etc. to reference images by position in images_list. Use @video1, @video2 for videos by position in video_files. Use @audio1, @audio2 for audio files by position in audio_files.

Do I need to provide all types of references?

No. All reference arrays are optional. You can provide just images, just a video, just audio, or any combination. A text-only prompt is also valid.

What file formats are supported?

Images: JPEG, PNG, or WebP (up to 9). Videos: MP4 only, max 15 seconds each (up to 3). Audio: MP3, WAV, or other common formats, total max 15 seconds (up to 3 files).

How is cost calculated?

Cost = (rate × output_duration) + (0.3 × rate × total_input_video_duration). 'high' quality: $0.24/sec output. 'basic' quality: $0.18/sec output. If video_files are provided, a 30% surcharge applies per second of combined input video duration. Example: 8s output (basic) + two 5s input videos = 8×$0.18 + 10×$0.054 = $1.44 + $0.54 = $1.98.

Why is the minimum duration 8 seconds?

The 480p Omni Reference variant requires at least 8 seconds of output to properly incorporate multi-modal references.

ai-image-face-swap

youtube-fetch-shorts

mmaudio-v2-text-to-audio

perfect-pony-xl

ai-product-shot

omnihuman-1-5

kling-v3-turbo-pro-text-to-video

ai-skin-enhancer

flux-kontext-dev-i2i

veo3-fast-text-to-video

bytedance-seededit-v3

infinitetalk-image-to-video

happy-horse-1.1-text-to-video-1080p

happy-horse-1.1-image-to-video-1080p

flux-2-pro-edit

happy-horse-1.1-text-to-video-720p

flux-dev-lora

ai-product-photography

ai-image-extension

ai-object-eraser

flux-kontext-pro-i2i

happy-horse-1.1-image-to-video-720p

minimax-image-01-subject-reference

veed-lipsync

wan2.2-edit-video

ovi-image-to-video

openai-sora-2-pro-text-to-video

happy-horse-1.1-reference-to-video-1080p

happy-horse-1.1-reference-to-video-720p

vidu-q3-turbo-text-to-video

happy-horse-1.1-video-edit-1080p

nano-banana-pro-edit

qwen-image-edit-2511

happy-horse-1.1-video-edit-720p

gemini-omni-image-to-video

kling-v3.0-std-motion-control

pixverse-v6-t2v

tiktok-fetch-profile

gpt-image-2-text-to-image

wan2.5-text-to-image

topaz-video-upscale

happy-horse-1-reference-to-video-1080p

ai-video-upscaler-pro

happy-horse-1-video-edit-720p

kling-v3.0-omni-standard-text-to-video

leonardoai-lucid-origin

ltx-2-fast-text-to-video

kling-o1-text-to-video

kling-v2.6-pro-motion-control

flux-2-klein-9b

kling-o3-image

meshy-6-image-to-3d

kling-v2.1-standard-i2v

kling-v3.0-standard-image-to-video

ai-captions

flux-2-klein-9b-turbo

suno-generate-sounds

suno-generate-lyrics

seedance-2-character

veo3.1-lite-text-to-video

youtube-publish

seedance-2-mini-image-to-video

gpt-codex

wan2.7-text-to-image-pro

grok-imagine-video-1-5-preview

seedance-2-vip-text-to-video

gemini-3-1-pro

ai-background-remover

tripo3d-h31-text-to-3d

tripo3d-h31-image-to-3d

suno-remix-music

gemini-omni-audio

veo3-image-to-video

kling-v2.1-pro-i2v

flux-schnell

wan2.2-image-to-video

wan2.2-text-to-video

vidu-v2.0-i2v

claude-opus-4-8

qwen-image-edit-plus