Explore/muapi.ai/seedance-2-t2v

muapi/seedance-2-t2v

Text to Video

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Result

$0.60 per video— muapiapp offers SD 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.

🚀Related Models

View all

seedance-2-character

[Beta] Turn fictional character references into reusable video characters. Upload reference images and describe the outfit to get a character_id you can use in SD 2.0 Omni Reference.

Image to Image

seedance-2-watermark-remover

🎉 FREE for a limited time — Remove SD 2.0 watermarks from videos using LaMa AI inpainting. Automatically detects the watermark region, builds a precise mask via Canny edge detection, and inpaints each frame for artifact-free results. No credits deducted — requires a positive balance to access.

Video to Video

seedance-2-video-watermark-remover-pro

SD 2 Video Watermark Remover Pro uses the SD 2 AI model to remove watermarks, logos, and overlaid text from videos with high accuracy. Powered by ByteDance's SD 2 engine, it delivers superior quality compared to traditional inpainting approaches. Pricing: $0.013 per second, minimum charge for 5 seconds ($0.065).

Video to Video

seedance-2-i2v-480p

SD 2.0 480p image-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Image to Video

seedance-2-omni-reference

SD 2.0 Omni Reference — generate videos with visual consistency using reference images, videos, and audio. Maintain character identity, style, and scene continuity. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video

seedance-2-omni-reference-train

Train a reusable character from a reference photo. Once complete, reference the character in Omni Reference video prompts using @omni-character:<request_id> to generate videos featuring that character consistently.

Training

seedance-2-i2v

SD 2.0 is the latest multimodal video generation model by ByteDance, offering advanced camera control, native audio-video sync, and high-resolution output.

Image to Video

seedance-2-video-edit

SD 2.0 Video Edit modifies existing videos based on text prompts and optional reference images.

Video to Video

seedance-2-extend

SD 2.0 Extend Video continues an existing SD 2.0 generated video seamlessly. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-omni-reference-480p

SD 2.0 480p Omni Reference — generate videos with visual consistency using reference images, videos, and audio at 480p resolution. More cost-effective than the 720p variant. Supports up to 9 images, 3 video clips, and 3 audio clips. Use @image1, @video1, @audio1 syntax in your prompt.

Image to Video

seedance-2-t2v-480p

SD 2.0 480p text-to-video generation. Faster and more cost-effective than the 720p variant, ideal for previews and drafts.

Text to Video

seedance-2-vip-extend

SD 2.0 VIP Extend Video continues an existing SD 2.0 generated video seamlessly at 720p. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

seedance-2-vip-extend-1080p

SD 2.0 VIP Extend Video 1080p continues an existing SD 2.0 generated video seamlessly at 1080p resolution. Provide the original request ID and an optional prompt to guide the extension — the model preserves visual style, motion, characters, and audio consistency across the new segment. Optional image, video, and audio references can be supplied to steer the extension: user-supplied references map to @image2…@image9, @video1…@video3, @audio1…@audio3 in the prompt (the source video's last frame is always @image1).

Text to Video

📝

Overview

About this model

SD 2.0 Text-to-Video is ByteDance's most advanced text-driven video generation model. Describe any scene in natural language and the model produces a cinematic clip with director-level camera control, native audio-video sync, and up to 2K resolution output. It understands complex prompts — lighting, motion physics, mood, and multi-shot storytelling — turning words into high-fidelity video sequences up to 15 seconds long.

1Social Media: Viral short-form content generated entirely from text prompts.

2Advertising: Cinematic product promos and brand story videos from a single description.

3Filmmaking: Pre-visualization and storyboard generation with realistic camera movements.

4AI Films: Multi-shot storytelling with consistent environments and characters across scenes.

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.60 per video	muapiapp offers SD 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.
Fal.ai	$0.3024/sec (high) / $0.2419/sec (basic)	Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 17% cheaper on high quality ($0.25/sec) and 50% cheaper on basic quality ($0.12/sec).
Replicate	$0.3024/sec (high) / $0.2419/sec (basic)	Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves you 17–50% depending on quality tier.

muapiapp$0.60 per video

muapiapp offers SD 2.0 Text-to-Video starting at $0.60 per video (5s, basic quality), scaling at $0.12/sec for basic and $0.25/sec for high quality across 5–15 second durations.

Fal.ai$0.3024/sec (high) / $0.2419/sec (basic)

Fal.ai charges $0.3024/sec for high quality and $0.2419/sec for basic. muapiapp is 17% cheaper on high quality ($0.25/sec) and 50% cheaper on basic quality ($0.12/sec).

Replicate$0.3024/sec (high) / $0.2419/sec (basic)

Replicate charges the same as Fal.ai — $0.3024/sec (high), $0.2419/sec (basic). muapiapp saves you 17–50% depending on quality tier.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Text prompt describing the video. To use a fictional character, reference it inline with @character:<id> (the request_id from a completed Seedance 2 Character generation). Multiple characters are supported. Example: '@character:ab539e5f walks on the beach at sunset'.	`A determined penguin straps itself into a homemade rocket sled on an icy mountain. The rocket ignites with a massive burst and launches the penguin across the frozen landscape at insane speed, blasting through snowdrifts and leaving a fiery trail behind.`
Aspect Ratio	Enum (4 options)	-	`16:9`
Duration	Enum (3 options)	-	`5`
Quality	Enum (2 options)	-	`basic`

Promptstring

Text prompt describing the video. To use a fictional character, reference it inline with @character:<id> (the request_id from a completed Seedance 2 Character generation). Multiple characters are supported. Example: '@character:ab539e5f walks on the beach at sunset'.

Default Value

A determined penguin straps itself into a homemade rocket sled on an icy mountain. The rocket ignites with a massive burst and launches the penguin across the frozen landscape at insane speed, blasting through snowdrifts and leaving a fiery trail behind.

Aspect RatioEnum (4 options)

Default Value16:9

DurationEnum (3 options)

Default Value5

QualityEnum (2 options)

Default Valuebasic

📖

Implementation Guide

Developer documentation

How to Use SD 2.0 Text-to-Video

Write a Detailed Prompt: Describe the scene, subjects, lighting, mood, and camera movement. Be specific — 'slow dolly zoom into a neon-lit street at night' will outperform 'city street'.
Choose Quality: Select basic ($0.12/sec) for fast drafts or high ($0.25/sec) for final cinematic output.
Set Duration: Choose 5, 10, or 15 seconds. Longer durations allow richer storytelling.
Pick Aspect Ratio: Use 16:9 for widescreen, 9:16 for mobile/social, 4:3 or 3:4 for other formats.
Submit and Poll: You'll receive a request_id immediately. Poll the result endpoint until status is completed.

❓

Common Questions

Frequently asked

What is SD 2.0 Text-to-Video?

It's ByteDance's state-of-the-art text-to-video model that generates cinematic clips from natural language prompts, with support for complex camera movements, native audio, and up to 2K resolution.

What's the difference between basic and high quality?

Basic quality uses the fast-t2v model at $0.12/sec — ideal for drafts and iteration. High quality uses the standard-t2v model at $0.25/sec for final, cinema-grade output with richer detail and smoother motion.

Does it generate audio?

Yes, SD 2.0 generates audio natively alongside video, ensuring cinema-grade sound synchronized with the visual content.

What is the maximum resolution?

SD 2.0 supports up to 2K resolution output.

ai-image-face-swap

youtube-fetch-shorts

mmaudio-v2-text-to-audio

perfect-pony-xl

ai-product-shot

omnihuman-1-5

kling-v3-turbo-pro-text-to-video

ai-skin-enhancer

flux-kontext-dev-i2i

veo3-fast-text-to-video

bytedance-seededit-v3

infinitetalk-image-to-video

happy-horse-1.1-text-to-video-1080p

happy-horse-1.1-image-to-video-1080p

flux-2-pro-edit

happy-horse-1.1-text-to-video-720p

flux-dev-lora

ai-product-photography

ai-image-extension

ai-object-eraser

flux-kontext-pro-i2i

happy-horse-1.1-image-to-video-720p

minimax-image-01-subject-reference

veed-lipsync

wan2.2-edit-video

ovi-image-to-video

openai-sora-2-pro-text-to-video

happy-horse-1.1-reference-to-video-1080p

happy-horse-1.1-reference-to-video-720p

vidu-q3-turbo-text-to-video

happy-horse-1.1-video-edit-1080p

nano-banana-pro-edit

qwen-image-edit-2511

happy-horse-1.1-video-edit-720p

gemini-omni-image-to-video

kling-v3.0-std-motion-control

pixverse-v6-t2v

tiktok-fetch-profile

gpt-image-2-text-to-image

wan2.5-text-to-image

topaz-video-upscale

happy-horse-1-reference-to-video-1080p

ai-video-upscaler-pro

happy-horse-1-video-edit-720p

kling-v3.0-omni-standard-text-to-video

leonardoai-lucid-origin

ltx-2-fast-text-to-video

kling-o1-text-to-video

kling-v2.6-pro-motion-control

flux-2-klein-9b

kling-o3-image

meshy-6-image-to-3d

kling-v2.1-standard-i2v

kling-v3.0-standard-image-to-video

ai-captions

flux-2-klein-9b-turbo

suno-generate-sounds

suno-generate-lyrics

seedance-2-character

veo3.1-lite-text-to-video

youtube-publish

seedance-2-mini-image-to-video

gpt-codex

wan2.7-text-to-image-pro

grok-imagine-video-1-5-preview

seedance-2-vip-text-to-video

gemini-3-1-pro

ai-background-remover

tripo3d-h31-text-to-3d

tripo3d-h31-image-to-3d

suno-remix-music

gemini-omni-audio

veo3-image-to-video

kling-v2.1-pro-i2v

flux-schnell

wan2.2-image-to-video

wan2.2-text-to-video

vidu-v2.0-i2v

claude-opus-4-8

qwen-image-edit-plus