Explore/muapi.ai/seedance-2-omni-reference-480p

muapi/seedance-2-omni-reference-480p

Image to Video

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Input

Configure the model parameters below.

0/9 items
Drag & drop images here or paste file/image
0/3 items
Drag & drop videos here, paste file, or paste a link
0/3 items
Drag & drop audios here, paste file, or paste a link

Result

$0.24/sec (high) / $0.18/sec (basic)Per-second billing for 480p output. Video reference inputs add a 30% surcharge per second of combined input video duration.

🚀Related Models

View all
seedance-2-character

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image
seedance-2-t2v

seedance-2-t2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Text to Video
seedance-2-watermark-remover

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video
seedance-2-video-watermark-remover-pro

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video
seedance-2-i2v-480p

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video
seedance-2-omni-reference

seedance-2-omni-reference

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video
seedance-2-omni-reference-train

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training
seedance-2-i2v

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video
seedance-2-video-edit

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video
seedance-2-extend

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-t2v-480p

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video
seedance-2-vip-extend

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-vip-extend-1080p

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
📝

Overview

About this model

SD 2.0 Omni Reference 480p generates videos with visual consistency using reference images, videos, and audio at 480p resolution. It offers the same multi-modal reference capabilities as the 720p variant — maintaining character identity, visual style, and scene continuity — at a lower cost. Combine up to 9 images, 3 video clips, and 3 audio files in a single request. Use @image1, @video1, @audio1 syntax in your prompt to precisely control how each reference influences the generated video.

1Character Consistency: Keep a character's appearance consistent across multiple scenes by providing a portrait as @image1.
2Style Transfer: Apply the visual style of a reference image to a newly generated video scene.
3Audio-Synced Video: Generate video synchronized to a reference music clip or voice recording via @audio1.
4Scene Continuity: Provide a scene screenshot and generate a visually matching continuation.
5Draft Previews: Quickly prototype multi-modal video concepts at 480p before committing to 720p generation.
💰

Pricing & Value

Cost analysis

muapiapp$0.24/sec (high) / $0.18/sec (basic)

Per-second billing for 480p output. Video reference inputs add a 30% surcharge per second of combined input video duration.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 21% cheaper on high ($0.24/sec) and 26% cheaper on basic ($0.18/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves 21–26% vs Replicate at 480p resolution.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Promptstring

Video description. Use @image1…@image9 to reference images, @video1…@video3 for videos, @audio1…@audio3 for audio. To use a fictional character, reference it with @character:<id> (request_id from a completed Seedance 2 Character generation) — characters are automatically appended to images_list. Multiple characters are supported.

Default Value@image1 is the main character reference. A person walking on the beach at sunset, cinematic lighting
Image URLsarray

Up to 9 reference image URLs (JPEG/PNG/WebP). Each Nth image corresponds to @imageN in the prompt.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/seedance-v2.0-omni-reference.png
Video Reference URLsarray

Up to 3 reference video clip URLs (MP4, max 15s each). Each Nth video corresponds to @videoN in the prompt.

Default Valueundefined
Audio Reference URLsarray

Up to 3 reference audio clip URLs (MP3/WAV, total max 15s). Each Nth audio corresponds to @audioN in the prompt.

Default Valueundefined
Aspect RatioEnum (4 options)

Output video aspect ratio.

Default Value16:9
QualityEnum (2 options)

Generation quality. 'high' uses the standard model ($0.24/sec output + $0.072/sec per input video second). 'basic' uses the fast model ($0.18/sec output + $0.054/sec per input video second). Video reference inputs incur an additional 30% surcharge based on their combined duration.

Default Valuebasic
Duration (seconds)int

Video duration in seconds (8–15).

Default Value8
📖

Implementation Guide

Developer documentation

  1. Upload reference images (JPEG/PNG/WebP) as 'images_list' — up to 9 images.
  2. Optionally upload video clips (MP4, max 15s each) as 'video_files' — up to 3 videos.
  3. Optionally upload audio files (MP3/WAV) as 'audio_files' — up to 3 files, total max 15s.
  4. Write a prompt describing the scene. Reference files with @image1…@image9 for images, @video1…@video3 for videos, @audio1…@audio3 for audio.
  5. Set duration (8–15s) and aspect ratio.
  6. Poll or use webhook to retrieve the completed video.

Common Questions

Frequently asked

How is this different from the 720p Omni Reference?

This endpoint generates 480p video, which is faster and more cost-effective ($0.18/sec basic, $0.24/sec high output) compared to the 720p variant ($0.21/sec basic, $0.30/sec high output). Both variants apply a 30% surcharge on the per-second rate for each second of input video provided. The minimum duration is 8 seconds and aspect ratio options are limited to 16:9, 9:16, 4:3, and 3:4.

How do I reference my uploaded files in the prompt?

Use @image1, @image2, etc. to reference images by position in images_list. Use @video1, @video2 for videos by position in video_files. Use @audio1, @audio2 for audio files by position in audio_files.

Do I need to provide all types of references?

No. All reference arrays are optional. You can provide just images, just a video, just audio, or any combination. A text-only prompt is also valid.

What file formats are supported?

Images: JPEG, PNG, or WebP (up to 9). Videos: MP4 only, max 15 seconds each (up to 3). Audio: MP3, WAV, or other common formats, total max 15 seconds (up to 3 files).

How is cost calculated?

Cost = (rate × output_duration) + (0.3 × rate × total_input_video_duration). 'high' quality: $0.24/sec output. 'basic' quality: $0.18/sec output. If video_files are provided, a 30% surcharge applies per second of combined input video duration. Example: 8s output (basic) + two 5s input videos = 8×$0.18 + 10×$0.054 = $1.44 + $0.54 = $1.98.

Why is the minimum duration 8 seconds?

The 480p Omni Reference variant requires at least 8 seconds of output to properly incorporate multi-modal references.