Explore/Alibaba Models

Alibaba AI API models

Alibaba AI models on MuAPI

Explore Alibaba models for chat, code, image and video generation, including Gemini, Nano Banana and Veo-style workflows available through MuAPI.

All models

62 Models

Video Generation Models

Video

$0.300 / second

wan2.2-edit-video

Easily modify existing videos using simple text commands. With Wan 2.2 Video-Edit, you can change attire, character appearance, or other visual elements directly within your video—no need to start from scratch. Works on uploads of 480p or 720p, for up to two minutes.

Video

$0.100 / second

wan2.7-reference-to-video

Alibaba WAN 2.7 Reference-to-Video. Reference characters/props to generate new shots.

Video

$1.800 / second

happy-horse-1-text-to-video-1080p

Happy Horse 1.0 Text to Video — generate expressive, stylized video clips from text prompts with vivid character motion and dynamic scene storytelling.

Video

$0.300 / second

wan2.2-text-to-video

Wan 2.2’s T2V mode transforms descriptive text prompts into high-quality, stylized video sequences. It excels at generating anime-style or cinematic visuals with smooth motion and strong thematic consistency.

Video

$0.300 / second

wan2.2-image-to-video

Wan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.

Video

$0.650 / second

wan2.6-image-to-video

WAN 2.6 Image-to-Video converts a single still image into a smooth, cinematic video clip. It preserves the original image’s composition, lighting, and style while adding natural motion, depth parallax, atmospheric effects, and gentle camera movement.

Video

$0.350 / second

wan2.2-animate

Wan2.2 Animate is a video-to-video model for animating a character or replacing a character in existing video clips. It replicates holistic movement and facial expressions from a reference video or pose while preserving the target character’s appearance. You upload both an image (for the character) and a video containing motion/expression, and the model generates a video where the character in your image moves like the reference. Supports 480p or 720p, up to 120 seconds

Video

$0.900 / second

happy-horse-1.1-image-to-video-1080p

Happy Horse 1.1 Image to Video (1080p) — bring still images to life with fluid, expressive 1080p animation.

Video

$2.100 / second

happy-horse-1-reference-to-video-1080p

Happy Horse 1.0 Reference to Video (1080p) - generate expressive 1080p video clips conditioned on 1-9 reference images plus a text prompt.

Video

$1.050 / second

happy-horse-1-video-edit-720p

Happy Horse 1.0 Video Edit (720p) - modify an input video at 720p using a natural-language instruction with optional reference images.

Video

$0.700 / second

happy-horse-1.1-text-to-video-720p

Happy Horse 1.1 Text to Video (720p) — generate expressive 720p video clips from text prompts with vivid character motion.

Video

$0.900 / second

happy-horse-1-image-to-video-720p

Happy Horse 1.0 Image to Video (720p) — bring still images to life with fluid, expressive animation at 720p output resolution.

Video

$0.700 / second

happy-horse-1.1-image-to-video-720p

Happy Horse 1.1 Image to Video (720p) — bring still images to life with fluid, expressive 720p animation.

Video

$0.700 / second

happy-horse-1.1-reference-to-video-720p

Happy Horse 1.1 Reference to Video (720p) — generate 720p video conditioned on 1-9 reference images plus a text prompt.

Video

$0.200 / second

wan2.2-spicy-image-to-video

Wan2.2-spicy Image-to-Video transforms a single creative image into a short dynamic video with bold motion, stylized effects, high-contrast lighting, and energy-driven animations. The “spicy” variant produces more dramatic movement, more vivid colors, and more expressive visual effects.

Video

$0.900 / second

happy-horse-1.1-text-to-video-1080p

Happy Horse 1.1 Text to Video (1080p) — generate expressive 1080p video clips from text prompts with vivid character motion and dynamic scene storytelling.

Video

$0.100 / second

wan2.1-reference-video

WAN 2.1 is an advanced AI model that transforms one or more reference images into a coherent, animated video. By combining characters, objects, or environments from multiple images, it creates smooth motion sequences while preserving realism, style, and fine details.

Video

$0.100 / second

wan2.7-text-to-video

Alibaba WAN 2.7 Text-to-Video turns plain prompts into coherent, cinematic clips.

Video

$0.900 / second

happy-horse-1.1-video-edit-1080p

Happy Horse 1.1 Video Edit (1080p) — modify an input video using natural-language instructions with optional reference images.

Video

$0.100 / second

wan2.7-video-edit

Perform prompt-driven video editing with multi-image reference support.

Video

$1.800 / second

happy-horse-1-image-to-video-1080p

Happy Horse 1.0 Image to Video — bring still images to life with fluid, expressive animation and fine-grained motion control.

Video

$0.900 / second

happy-horse-1-text-to-video-720p

Happy Horse 1.0 Text to Video (720p) — generate expressive, stylized video clips from text prompts at 720p output resolution.

Video

$0.900 / second

happy-horse-1.1-reference-to-video-1080p

Happy Horse 1.1 Reference to Video (1080p) — generate 1080p video conditioned on 1-9 reference images plus a text prompt.

Video

$0.700 / second

happy-horse-1.1-video-edit-720p

Happy Horse 1.1 Video Edit (720p) — modify an input video using natural-language instructions with optional reference images.

Video

$0.300 / second

wan2.1-text-to-video

WAN 2.1 turns your written prompts into vivid, cinematic video clips. Ideal for storytelling, content creation, and visualizing abstract ideas, it supports detailed natural scenes, character motion, and dramatic camera movements — all from just text.

Video

$0.016 / second

wan2.2-5b-fast-t2v

Wan 2.2 Fast is a lightweight, high-speed version of the Wan 2.2 model, optimized for quick text-to-video generation. It trades some cinematic detail for rapid results, making it perfect for prototyping, previews, social media clips, and quick storytelling.

Video

$0.200 / second

wan2.2-speech-to-video

WAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.

Video

$0.300 / second

wan2.1-image-to-video

Animate static images into expressive video sequences with WAN 2.1. Upload any image and guide its transformation into a moving scene — great for bringing art, characters, or photos to life with smooth motion and consistent style.

Video

$0.650 / second

wan2.5-image-to-video

WAN 2.5 Image-to-Video takes your image as the starting frame and turns it into a dynamic video, preserving realism, motion, and camera effects. Upload a static image, add a descriptive text prompt, and the model generates cinematic motion—camera pans, environmental movement, and realistic physics—across the result.

Video

$0.650 / second

wan2.5-text-to-video

WAN 2.5 Text-to-Video transforms written prompts into cinematic video clips with dynamic motion, realistic physics, and natural animation. It can also generate characters delivering dialogue, making it ideal for storytelling, ads, and creative showcases.

Video

$0.440 / second

wan2.5-image-to-video-fast

Convert a single static image into a cinematic short video with realistic motion, dynamic camera movement, and environmental effects. The Fast mode generates high-quality videos quickly, perfect for rapid prototyping, social media clips, and immersive visual storytelling from still images.

Video

$0.440 / second

wan2.5-text-to-video-fast

Transform text prompts into short, cinematic videos with natural motion, realistic environments, and dynamic camera perspectives. Fast mode delivers quick, high-fidelity video generation, ideal for creative storytelling, concept visuals, and social media content.

Video

$0.200 / second

wan2.2-spicy-video-extend

Wan-2.2-spicy Video Extend continues an existing video by generating new frames that match the original style but add stronger motion, bolder effects, and spicier dramatics.

Video

$0.650 / second

wan2.6-text-to-video

WAN 2.6 Text-to-Video generates smooth, cinematic videos directly from text prompts. It’s designed for strong scene coherence, atmospheric depth, and fluid camera motion, making it ideal for fantasy and sci-fi worlds, surreal concepts, environmental storytelling, and dramatic visual sequences with rich lighting and motion.

Video

$0.100 / second

wan2.7-image-to-video

Alibaba WAN 2.7 converts images into videos with optional audio.

Video

$0.100 / second

wan2.7-video-extend

Extend existing videos seamlessly with Wan 2.7.

Video

$1.050 / second

happy-horse-1-reference-to-video-720p

Happy Horse 1.0 Reference to Video (720p) - generate expressive 720p video clips conditioned on 1-9 reference images plus a text prompt.

Video

$2.100 / second

happy-horse-1-video-edit-1080p

Happy Horse 1.0 Video Edit (1080p) - modify an input video at 1080p using a natural-language instruction with optional reference images.

Image Generation Models

Image

$0.004 / 1K tokens

z-image-p

Z-Image P is based on PiAPI's Qubico/z-image text-to-image model.

Image

$0.030 / 1K tokens

wan2.1-text-to-image

WAN 2.1 is a powerful AI model that transforms text prompts into high-resolution, photorealistic images. It excels at detailed object rendering, realistic lighting, and fine textures, making it ideal for visual content, concept art, advertising, and digital storytelling.

Image

$0.030 / 1K tokens

qwen-image

Generate high-quality, detailed images from text prompts in various styles — from realistic to artistic — perfect for creative visuals, product shots, and concept art.

Image

$0.030 / generation

qwen-image-edit-plus

Qwen Image Edit Plus is an upgraded image-editing model that supports multiple image references and superior text editing. Powered by the 20B-parameter Qwen architecture, it allows changes like background swap, style transfer, object removal/addition, and precise text edits (bilingual: English/Chinese) while maintaining visual consistency and preserving details of the original images.

Image

$0.040 / generation

qwen-image-edit-plus-lora

Qwen-Image-Edit-Plus (2509) is 20B MMDiT image-to-image editor supporting multi-image edits, single-image consistency, and native ControlNet. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Image

$0.007 / 1K tokens

z-image-turbo

Z-Image Turbo is a high-speed text-to-image model optimized for fast creative generation. It produces detailed, high-contrast, high-resolution images with strong stylization control. Ideal for rapid concept creation, visual exploration, product ideas, fantasy scenes, and cinematic composition tests. Designed for low latency and strong prompt adherence.

Image

$0.045 / generation

wan2.6-image-edit

WAN 2.6 Image Edit applies targeted, instruction-based edits to an existing image while preserving composition, perspective, and lighting. It’s ideal for object replacement, material changes, environment tweaks, and style adjustments with clean integration and minimal artifacts—keeping the original scene coherent and cinematic.

Image

$0.100 / 1K tokens

wan2.7-text-to-image-pro

Alibaba WAN 2.7 Text-to-Image Pro generates high-quality images up to 4K from text prompts with thinking mode for enhanced image quality.

Image

$0.040 / 1K tokens

wan2.5-text-to-image

WAN 2.5 Text-to-Image generates high-quality, realistic or stylized images from textual descriptions. It supports detailed visual storytelling, cinematic compositions, and versatile styles — from portraits and product shots to landscapes and fantasy scenes.

Image

$0.050 / generation

wan2.7-image-edit

Alibaba WAN 2.7 Image Edit performs prompt-driven image editing with support for multiple-image references.

Image

$0.040 / generation

qwen-image-edit-2511

Qwen Image Edit 2511 performs precise, instruction-driven edits on an existing image while preserving composition, lighting, and overall style. It’s well-suited for object replacement, material changes, localized edits, and subtle scene adjustments with strong visual consistency and minimal artifacts.

Image

$0.040 / generation

qwen-text-to-image-2512

Qwen Image Text-to-Image 2512 generates high-resolution, visually consistent images from text prompts. It focuses on strong scene structure, clean composition, and atmospheric lighting, making it well-suited for cinematic environments, surreal concepts, fantasy and sci-fi worlds.

Image

$0.300 / generation

wan2.1-lora-t2v

WAN 2.1 LoRA T2V enables users to generate videos from text prompts with custom-trained LoRA modules. Tailor the generation to specific characters, outfits, or animation styles — ideal for brand storytelling, fan content, and stylized animations.

Image

$0.013 / 1K tokens

z-image-base

Z-Image Base is a general-purpose text-to-image model designed for reliable, high-quality image generation from natural language prompts. It focuses on clear composition, good prompt adherence, and versatile output across everyday scenes, product-style visuals, characters, and creative concepts.

Image

$0.040 / 1K tokens

qwen-image-2.0

Qwen 2.0 Text to Image model with enhanced realism.

Image

$0.090 / 1K tokens

qwen-image-2.0-pro

Qwen 2.0 Pro Text to Image model with maximum realism and fidelity.

Image

$0.090 / generation

qwen-image-2.0-pro-edit

Qwen 2.0 Pro Image Edit model with maximum precision and modifications.

Image

$0.050 / 1K tokens

wan2.7-text-to-image

Alibaba WAN 2.7 Text-to-Image generates high-quality images from text prompts with thinking mode for enhanced image quality.

Image

$0.300 / generation

wan2.1-lora-i2v

Bring still images to life using WAN 2.1 LoRA I2V, which supports custom LoRA fine-tunes for identity consistency. Animate expressions, subtle movements, or full-body actions while preserving personalized features from the image and LoRA.

Image

$0.030 / generation

qwen-image-edit

The Qwen Edit Image Model allows you to modify existing images using text-based editing prompts. Instead of generating from scratch, you can upload a base image and describe the desired changes (e.g., replacing objects, altering colors, adding new elements).

Image

$0.040 / generation

wan2.5-image-edit

The Wan2.5 Edit Image model allows you to transform existing images with precision and creativity. By providing an image along with an edit prompt, you can make realistic changes, enhancements, or stylistic adjustments—whether it’s altering objects, changing backgrounds, adding details, or applying an entirely new artistic style.

Image

$0.040 / 1K tokens

wan2.6-text-to-image

WAN 2.6 Text-to-Image generates detailed, cinematic still images from text prompts. It focuses on strong composition, atmospheric lighting, and clear subject structure, making it suitable for fantasy and sci-fi environments, surreal concepts, architectural visuals, and dramatic world-building imagery.

Image

$0.040 / generation

qwen-image-2.0-edit

Qwen 2.0 Image Edit model with precise background modification and enhancements.

Image

$0.100 / generation

wan2.7-image-edit-pro

Alibaba WAN 2.7 Image Edit Pro performs prompt-driven image editing with multi-image reference support and up to 2K output.

Alibaba AI Models API

Alibaba AI models on MuAPI

All models

Video Generation Models

wan2.2-edit-video

wan2.7-reference-to-video

happy-horse-1-text-to-video-1080p

wan2.2-text-to-video

wan2.2-image-to-video

wan2.6-image-to-video

wan2.2-animate

happy-horse-1.1-image-to-video-1080p

happy-horse-1-reference-to-video-1080p

happy-horse-1-video-edit-720p

happy-horse-1.1-text-to-video-720p

happy-horse-1-image-to-video-720p

happy-horse-1.1-image-to-video-720p

happy-horse-1.1-reference-to-video-720p

wan2.2-spicy-image-to-video

happy-horse-1.1-text-to-video-1080p

wan2.1-reference-video

wan2.7-text-to-video

happy-horse-1.1-video-edit-1080p

wan2.7-video-edit

happy-horse-1-image-to-video-1080p

happy-horse-1-text-to-video-720p

happy-horse-1.1-reference-to-video-1080p

happy-horse-1.1-video-edit-720p

wan2.1-text-to-video

wan2.2-5b-fast-t2v

wan2.2-speech-to-video

wan2.1-image-to-video

wan2.5-image-to-video

wan2.5-text-to-video

wan2.5-image-to-video-fast

wan2.5-text-to-video-fast

wan2.2-spicy-video-extend

wan2.6-text-to-video

wan2.7-image-to-video

wan2.7-video-extend

happy-horse-1-reference-to-video-720p

happy-horse-1-video-edit-1080p

Image Generation Models

z-image-p

wan2.1-text-to-image

qwen-image

qwen-image-edit-plus

qwen-image-edit-plus-lora

z-image-turbo

wan2.6-image-edit

wan2.7-text-to-image-pro

wan2.5-text-to-image

wan2.7-image-edit

qwen-image-edit-2511

qwen-text-to-image-2512

wan2.1-lora-t2v

z-image-base

qwen-image-2.0

qwen-image-2.0-pro

qwen-image-2.0-pro-edit

wan2.7-text-to-image

wan2.1-lora-i2v

qwen-image-edit

wan2.5-image-edit

wan2.6-text-to-image

qwen-image-2.0-edit

wan2.7-image-edit-pro