Explore/muapi.ai/seedance-2-t2v

muapi/seedance-2-t2v

Text to Video

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Input

Configure the model parameters below.

Result

$0.60 per videomuapiapp offers SD 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.

🚀Related Models

View all
seedance-2-character

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image
seedance-2-watermark-remover

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video
seedance-2-video-watermark-remover-pro

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video
seedance-2-i2v-480p

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video
seedance-2-omni-reference

seedance-2-omni-reference

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video
seedance-2-omni-reference-train

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training
seedance-2-i2v

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video
seedance-2-video-edit

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video
seedance-2-extend

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-omni-reference-480p

seedance-2-omni-reference-480p

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video
seedance-2-t2v-480p

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video
seedance-2-vip-extend

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
seedance-2-vip-extend-1080p

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video
📝

Overview

About this model

SD 2.0 Text-to-Video is ByteDance's most advanced text-driven video generation model. Describe any scene in natural language and the model produces a cinematic clip with director-level camera control, native audio-video sync, and up to 2K resolution output. It understands complex prompts — lighting, motion physics, mood, and multi-shot storytelling — turning words into high-fidelity video sequences up to 15 seconds long.

1Social Media: Viral short-form content generated entirely from text prompts.
2Advertising: Cinematic product promos and brand story videos from a single description.
3Filmmaking: Pre-visualization and storyboard generation with realistic camera movements.
4AI Films: Multi-shot storytelling with consistent environments and characters across scenes.
💰

Pricing & Value

Cost analysis

muapiapp$0.60 per video

muapiapp offers SD 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 17% cheaper on high quality ($0.25/sec) and 50% cheaper on basic quality ($0.12/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves you 17–50% depending on quality tier.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Promptstring

Text prompt describing the video. To use a fictional character, reference it inline with @character:<id> (the request_id from a completed Seedance 2 Character generation). Multiple characters are supported. Example: '@character:ab539e5f walks on the beach at sunset'.

Default ValueA determined penguin straps itself into a homemade rocket sled on an icy mountain. The rocket ignites with a massive burst and launches the penguin across the frozen landscape at insane speed, blasting through snowdrifts and leaving a fiery trail behind.
Aspect RatioEnum (4 options)

-

Default Value16:9
DurationEnum (3 options)

-

Default Value5
QualityEnum (2 options)

-

Default Valuebasic
📖

Implementation Guide

Developer documentation

How to Use SD 2.0 Text-to-Video

  1. Write a Detailed Prompt: Describe the scene, subjects, lighting, mood, and camera movement. Be specific — 'slow dolly zoom into a neon-lit street at night' will outperform 'city street'.

  2. Choose Quality: Select basic ($0.12/sec) for fast drafts or high ($0.25/sec) for final cinematic output.

  3. Set Duration: Choose 5, 10, or 15 seconds. Longer durations allow richer storytelling.

  4. Pick Aspect Ratio: Use 16:9 for widescreen, 9:16 for mobile/social, 4:3 or 3:4 for other formats.

  5. Submit and Poll: You'll receive a request_id immediately. Poll the result endpoint until status is completed.

Common Questions

Frequently asked

What is SD 2.0 Text-to-Video?

It's ByteDance's state-of-the-art text-to-video model that generates cinematic clips from natural language prompts, with support for complex camera movements, native audio, and up to 2K resolution.

What's the difference between basic and high quality?

Basic quality uses the fast-t2v model at $0.12/sec — ideal for drafts and iteration. High quality uses the standard-t2v model at $0.25/sec for final, cinema-grade output with richer detail and smoother motion.

Does it generate audio?

Yes, SD 2.0 generates audio natively alongside video, ensuring cinema-grade sound synchronized with the visual content.

What is the maximum resolution?

SD 2.0 supports up to 2K resolution output.