Explore/Google Models

Google AI API models

Google AI models on MuAPI

Explore Google models for chat, code, image and video generation, including Gemini, Nano Banana and Veo-style workflows available through MuAPI.

All models

44 Models

Video Generation Models

Video

$2.500 / second

veo3-image-to-video

VEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.

Video

$0.600 / second

veo3-fast-text-to-video

VEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.

Video

$0.600 / second

veo3.1-fast-image-to-video

Veo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.

Video

$0.600 / second

veo3.1-fast-text-to-video

Veo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.

Video

$0.600 / second

veo3.1-extend-video

Veo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.

Video

$0.600 / second

veo3.1-4k-video

Get the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.

Video

$0.300 / second

veo3.1-lite-text-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation.

Video

$3.000 / second

veo-4-image-to-video

Veo 4 Image to Video — animate any still image with Veo 4's motion synthesis engine, supporting fine-grained camera control and realistic physics at up to 1080p.

Video

$1.500 / second

gemini-omni-image-to-video

Gemini Omni Image to Video — animate one or more reference images with a text prompt. Unified reasoning across modalities preserves subject identity and generates synchronized audio natively.

Video

$2.500 / second

veo3-text-to-video

VEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.

Video

$0.600 / second

veo3-fast-image-to-video

Quickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.

Video

$2.500 / second

veo3.1-image-to-video

Veo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.

Video

$2.500 / second

veo3.1-text-to-video

Veo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.

Video

$0.600 / second

veo3.1-reference-to-video

Veo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.

Video

$0.300 / second

veo3.1-lite-image-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation from images.

Video

$3.000 / second

veo-4-text-to-video

Veo 4 Text to Video — Google DeepMind's fourth-generation model delivering photorealistic, high-fidelity 1080p videos with exceptional prompt adherence and cinematic camera control.

Video

$1.500 / second

gemini-omni-text-to-video

Gemini Omni — natively multimodal any-to-any model. Generates high-fidelity video with synchronized audio directly from text prompts, with unified reasoning across modalities for more coherent scenes and fewer pipeline artifacts.

Video

$2.400 / second

gemini-omni-video-edit

Gemini Omni Video Edit — natively multimodal video-to-video editing. Restyle, relight, swap subjects, or rewrite scenes from a source clip with a single prompt. Unified reasoning across modalities preserves motion and audio continuity while applying the edit.

Image Generation Models

Image

$0.030 / 1K tokens

nano-banana-2-lite

Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) is Google's fastest and most cost-efficient text-to-image model, delivering 4-second generation with exceptional prompt adherence, character consistency, and legible in-image text rendering.

Image

$0.030 / 1K tokens

nano-banana

Nano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.

Image

$0.030 / generation

nano-banana-effects

Nano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.

Image

$0.020 / 1K tokens

google-imagen4-fast

Imagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.

Image

$0.030 / generation

nano-banana-edit

Nano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.

Image

$0.120 / generation

nano-banana-pro-edit

Nano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.

Image

$0.060 / 1K tokens

nano-banana-2

Nano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.

Image

$0.300 / generation

photo-pack

Generate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.

Image

$0.000 / generation

gemini-omni-character

Generate a reusable character from a single reference image and a text description. Optionally attach a voice profile created with Gemini Omni Audio to give the character a consistent voice in future video generations.

Image

$0.030 / 1K tokens

google-imagen4

Google Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.

Image

$0.030 / generation

nano-banana-2-lite-edit

Nano Banana 2 Lite Edit (Gemini 3.1 Flash Lite Image) is Google's fastest and most cost-efficient image editing model, blending up to 14 reference images with exceptional prompt adherence and character consistency.

Image

$0.060 / 1K tokens

google-imagen4-ultra

Imagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.

Image

$0.120 / 1K tokens

nano-banana-pro

Nano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.

Image

$0.060 / generation

nano-banana-2-edit

Nano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.

Text & Chat Models

Chat

$0.001 / 1K tokens

gemini-3-1-pro

Gemini 3.1 Pro is Google's next-generation multimodal model, optimized for complex reasoning, planning, coding, and multi-turn conversation. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-1-pro) and live streaming (/gemini-3-1-pro/stream) via SSE.

Chat

$0.000 / 1K tokens

gemini-2-5-flash

Gemini 2.5 Flash is Google's high-speed multimodal language model, optimized for rapid text generation, real-time image understanding, and high-frequency tasks. Supports text and image inputs. Token-based pricing: $0.30/M input tokens, $2.50/M output tokens. Two endpoints: standard async (/gemini-2-5-flash) and live streaming (/gemini-2-5-flash/stream) via SSE.

Chat

$0.004 / 1K tokens

gemini-audio-vision

Gemini Audio Vision uses Google Gemini's native audio understanding to analyze and describe audio content in detail — speech, tone, background sounds, speaker changes, and more. Upload an audio URL and a prompt, and Gemini returns a detailed text analysis. Token-based pricing.

Chat

$0.001 / 1K tokens

gemini-3-pro

Gemini 3 Pro is Google's powerful multimodal reasoning model, designed for complex problem solving, coding, and logical tasks. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-pro) and live streaming (/gemini-3-pro/stream) via SSE.

Chat

$0.000 / 1K tokens

gemini-3-5-flash-openai

Gemini 3.5 Flash (OpenAI-compatible) is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash-openai) and live streaming (/gemini-3-5-flash-openai/stream) via SSE.

Chat

$0.000 / 1K tokens

gemini-3-5-flash

Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.

Chat

$0.000 / 1K tokens

gemini-2-5-pro

Gemini 2.5 Pro is Google's advanced multimodal reasoning model, optimized for complex coding, logical tasks, and deep analysis. Supports text and image inputs. Token-based pricing: $1.25/M input tokens, $10.00/M output tokens. Two endpoints: standard async (/gemini-2-5-pro) and live streaming (/gemini-2-5-pro/stream) via SSE.

Music

$0.035 / 1K tokens

gemini-3-1-flash-tts

Gemini 3.1 Flash TTS turns written dialogue into expressive, natural multi-speaker speech with fine-grained control over voice, accent, emotional style, and pace. Ideal for fast, affordable voiceovers, character dialogue, and narration.

Chat

$0.004 / 1K tokens

gemini-video-vision

Gemini Video Vision uses Google Gemini's native video understanding to analyze and describe video content in detail — motion, composition, subjects, on-screen text, and more. Upload a video URL and a prompt, and Gemini returns a detailed text analysis. Token-based pricing.

Music

$0.035 / 1K tokens

gemini-2-5-pro-tts

Gemini 2.5 Pro TTS is Google's premium text-to-speech model for studio-quality, high-fidelity multi-speaker audio with expressive control over voice, accent, emotional style, and pace.

Chat

$0.001 / 1K tokens

gemini-3-flash

Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.

Other Utility Models

Tools

$0.000 / generation

gemini-omni-audio

Create a named voice profile with custom timbre, style, and emotion. The returned voice ID can be used in Gemini Omni video generation to assign a consistent voice character to your videos.

Google AI Models API

Google AI models on MuAPI

All models

Video Generation Models

veo3-image-to-video

veo3-fast-text-to-video

veo3.1-fast-image-to-video

veo3.1-fast-text-to-video

veo3.1-extend-video

veo3.1-4k-video

veo3.1-lite-text-to-video

veo-4-image-to-video

gemini-omni-image-to-video

veo3-text-to-video

veo3-fast-image-to-video

veo3.1-image-to-video

veo3.1-text-to-video

veo3.1-reference-to-video

veo3.1-lite-image-to-video

veo-4-text-to-video

gemini-omni-text-to-video

gemini-omni-video-edit

Image Generation Models

nano-banana-2-lite

nano-banana

nano-banana-effects

google-imagen4-fast

nano-banana-edit

nano-banana-pro-edit

nano-banana-2

photo-pack

gemini-omni-character

google-imagen4

nano-banana-2-lite-edit

google-imagen4-ultra

nano-banana-pro

nano-banana-2-edit

Text & Chat Models

gemini-3-1-pro

gemini-2-5-flash

gemini-audio-vision

gemini-3-pro

gemini-3-5-flash-openai

gemini-3-5-flash

gemini-2-5-pro

gemini-3-1-flash-tts

gemini-video-vision

gemini-2-5-pro-tts

gemini-3-flash

Other Utility Models

gemini-omni-audio