Explore/muapi.ai/grok-imagine-image-to-video

muapi/grok-imagine-image-to-video

Image to Video

Grok Imagine is xAI’s multimodal image-to-video model, capable of animating still images into cinematic videos from 6 to 30 seconds with synchronized ambient audio. It focuses on realism, fluid motion, and expressive lighting transitions while maintaining high generation speed.

Input

Configure the model parameters below.

Prompt* requiredText prompt describing the video. Reference uploaded images using @image(n) followed by a space — e.g. @image1 a sunset over the ocean.

Image URLsUpload or provide image URLs for image-to-video generation. Supports up to 7 images. Reference each image in the prompt with @image1, @image2, etc.* required0/7 items

Drag & drop images here or paste file/image

+Add

Aspect RatioAspect ratio of the output video. Note: ignored when only a single image is provided. (Default: 2:3)

ModeNote: When generating videos using external image inputs, Spicy mode is not supported and will automatically switch to Normal. (Default: normal)

ResolutionOutput video resolution. (Default: 480p)

Duration (seconds)Video duration in seconds (6–30). Cost: $0.025/s at 480p, $0.05/s at 720p.

Result

🚀Related Models

View all

grok-imagine-text-to-video

Grok Imagine is xAI’s fast, creative text-to-video model that generates cinematic clips from 6 to 30 seconds with smooth motion, expressive lighting, and ambient audio. It turns a written idea into a visually rich video.

Text to Video

grok-imagine-text-to-image

Grok Imagine is xAI’s high-quality image generation model that transforms text prompts into detailed, stylish, and visually expressive images. It excels at creating vivid scenes, characters, environments, and concept art with strong lighting, depth, and artistic clarity. Get 6 images each time.

Text to Image

grok-imagine-image-to-image

Grok Imagine Image-to-Image transforms an existing image using natural language instructions while preserving scene structure, perspective, and lighting. It is ideal for object replacement, environment evolution, concept re-imagining, and creative edits that feel grounded and visually coherent rather than over-stylized.

Image to Image

grok-imagine-extend

Grok Imagine Extend lets you continue and expand existing Grok Imagine video generations seamlessly. Starting from a previously generated video, you can extend the scene while maintaining visual style, characters, motion, and audio consistency. Requires the original task_id from the initial video generation.

Text to Video

grok-imagine-text-to-image-quality

Grok Imagine Quality is xAI's high-fidelity text-to-image mode that prioritizes accuracy and detail over speed. It produces sharper, more visually accurate images with stronger lighting, depth, and artistic clarity. Get 6 images each time.

Text to Image

📝

Overview

About this model

Grok Imagine is xAI’s breakthrough solution in the realm of multimodal image-to-video generation, designed to transform your static images into captivating cinematic experiences. With a focus on realism, fluid motion, and expressive lighting transitions, Grok Imagine leverages cutting-edge artificial intelligence to deliver short videos (~6 seconds) that not only animate your visuals but also synchronize with ambient audio to provide an immersive experience. The model is engineered for high generation speed, ensuring your creative visions materialize in real time.

Built on sophisticated deep learning architectures, Grok Imagine distinguishes itself with unparalleled versatility and efficiency. Whether you're animating subtle scenic transitions or producing dynamic, adventure-style visual narratives, this model adapts to your creative prompt with precision. Its unique combination of technical prowess and artistic sensitivity makes it an ideal choice for content creators, marketers, and developers looking to enhance digital storytelling through automated, high-quality video generation.

1Cinematic social media content creation

2Marketing and advertising video snippets

3Dynamic storytelling for digital campaigns

4Artistic renderings from static images

5Enhanced presentation visuals

6Virtual tour and immersive experience generation

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.15 per generation	muapiapp offers a highly competitive rate, making it 20-50% more affordable than its competitors while delivering cutting-edge quality.
Fal.ai	$0.25 per generation	Although Fal.ai charges a similar price rate to Replicate, muapiapp stands out by being 20-50% cheaper with comparable or superior quality.
Replicate	$0.25 per generation	Both Replicate and Fal.ai price their services similarly. In contrast, muapiapp delivers the same high-end output at a significantly reduced cost.

muapiapp$0.15 per generation

muapiapp offers a highly competitive rate, making it 20-50% more affordable than its competitors while delivering cutting-edge quality.

Fal.ai$0.25 per generation

Although Fal.ai charges a similar price rate to Replicate, muapiapp stands out by being 20-50% cheaper with comparable or superior quality.

Replicate$0.25 per generation

Both Replicate and Fal.ai price their services similarly. In contrast, muapiapp delivers the same high-end output at a significantly reduced cost.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Text prompt describing the video. Reference uploaded images using @image(n) followed by a space — e.g. @image1 a sunset over the ocean.	`@image1 Camera glides through vines toward temple entrance, mist disperses as sunlight pierces canopy, birds fly off, subtle dust motes in the air, adventure-style cinematic score.`
Aspect Ratio	Enum (5 options)	Aspect ratio of the output video. Note: ignored when only a single image is provided.	`2:3`
Mode	Enum (3 options)	Note: When generating videos using external image inputs, Spicy mode is not supported and will automatically switch to Normal.	`normal`
Resolution	Enum (2 options)	Output video resolution.	`480p`
Duration (seconds)	int	Video duration in seconds (6–30). Cost: $0.025/s at 480p, $0.05/s at 720p.	`6`

Promptstring

Text prompt describing the video. Reference uploaded images using @image(n) followed by a space — e.g. @image1 a sunset over the ocean.

Default Value

@image1 Camera glides through vines toward temple entrance, mist disperses as sunlight pierces canopy, birds fly off, subtle dust motes in the air, adventure-style cinematic score.

Aspect RatioEnum (5 options)

Aspect ratio of the output video. Note: ignored when only a single image is provided.

Default Value2:3

ModeEnum (3 options)

Note: When generating videos using external image inputs, Spicy mode is not supported and will automatically switch to Normal.

Default Valuenormal

ResolutionEnum (2 options)

Output video resolution.

Default Value480p

Duration (seconds)int

Video duration in seconds (6–30). Cost: $0.025/s at 480p, $0.05/s at 720p.

Default Value6

📖

Implementation Guide

Developer documentation

How to Use Grok Imagine

Prepare Your Inputs
- Ensure you have a high-quality image available. Upload the image or provide a direct URL in the images_list field (maximum 1 image).
- Create a descriptive text prompt that outlines the desired video elements, such as lighting transitions, camera movements, and accompanying ambient audio.
Configure the Settings
- Choose the mode of animation: fun, normal, or spicy (note that when using external image inputs, the model automatically switches spicy mode to normal).
- Select the duration of your video (6 or 10 seconds). The default is 6 seconds.
Submit and Generate
- Send your configured input using the endpoint /grok-imagine-image-to-video.
- The model processes your image and text prompt, outputting a high-quality cinematic video with synchronized audio.
Review and Iterate
- Once your video is generated, review the output. Adjust your prompt or settings if needed to fine-tune the video’s aesthetic and narrative impact.

❓

Common Questions

Frequently asked

What types of images can I use with Grok Imagine?

Grok Imagine supports high-resolution images provided via file upload or URL. For optimal results, use images with clear details and strong visual elements.

How long does it take to generate a video?

The generation process is optimized for speed and typically produces a cinematic video in just a few moments after submission, depending on the complexity of your prompt and system load.

What is the difference between the 'fun', 'normal', and 'spicy' modes?

Each mode adjusts the video generation style. 'Fun' offers a playful rendition, 'normal' provides a balanced and true-to-life output, while 'spicy' (when available) intensifies the visual effects. Note that 'spicy' mode is automatically switched to 'normal' when external image inputs are used.

minimax-hailuo-02-standard-t2v

meshy-6-image-to-3d

pixverse-v5-t2v

veo3-fast-text-to-video

kling-v1-avatar-pro

meshy-6-multi-image-to-3d

ai-product-photography

flux-kontext-dev-i2i

gemini-3-1-pro

gpt-image-1.5

ovi-text-to-video

minimax-hailuo-2.3-pro-i2v

happy-horse-1-text-to-video-720p

kling-v2.1-standard-i2v

pixverse-v6-i2v

wan2.2-image-to-video

veed-lipsync

vidu-v2.0-i2v

minimax-image-01-subject-reference

flux-pulid

latent-sync

infinitetalk-image-to-video

bytedance-seededit-v3

flux-redux

kling-v2.5-turbo-pro-i2v

wan2.2-animate

ai-background-remover

wan2.5-text-to-image

topaz-video-upscale

leonardoai-motion-2.0

ai-object-eraser

ovi-image-to-video

minimax-hailuo-2.3-pro-t2v

mmaudio-v2-text-to-audio

flux-dev-lora

vidu-q2-reference-to-image

minimax-speech-2.6-turbo

veo3.1-4k-video

kling-v3.0-std-motion-control

flux-kontext-pro-i2i

ai-skin-enhancer

suno-generate-lyrics

sd-2-character

ai-product-shot

ai-image-extension

veo3.1-fast-image-to-video

sd-2-image-to-video

wan2.2-edit-video

openai-sora-2-pro-text-to-video

ltx-2-pro-text-to-video

kling-v2-avatar-pro

runway-aleph-v2v

qwen-image-2.0-pro-edit

flux-2-klein-9b-turbo

qwen-image-edit-plus

kling-v2.6-pro-motion-control

pixverse-v6-t2v

flux-schnell

sd-2-video-watermark-remover-pro

wan2.7-image-edit

kling-v2.1-pro-i2v

veo3.1-lite-text-to-video

happy-horse-1-image-to-video-1080p

wan2.2-text-to-video

sd-2-vip-first-last-frame-1080p

kling-o3-image

tripo3d-h31-text-to-3d

veo3-image-to-video

openai-sora-2-text-to-video

kling-o1-text-to-video

kling-o1-edit-image

twitter-fetch-posts

gemini-omni-character

grok-imagine-video-1-5-preview

ai-image-face-swap

nano-banana-pro-edit

facebook-fetch-reels

generate-social-video-script

omnihuman-1-5

hidream-i1-full