Google AI models on MuAPI
Back to Providers
Explore/Google Models
Google

Google AI API models

Google AI models on MuAPI

Explore Google models for chat, code, image and video generation, including Gemini, Nano Banana and Veo-style workflows available through MuAPI.

All models

38 Models

Video Generation Models

Video

$2.500 / second

veo3-image-to-video

VEO3 I2V animates static images into expressive video sequences, adding lifelike movement while preserving the original composition.

Video

$0.600 / second

veo3-fast-text-to-video

VEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.

Video

$0.600 / second

veo3.1-fast-text-to-video

Veo 3.1 Fast T2V is a high-speed AI video model that transforms text prompts into realistic 8-second videos. It emphasizes rapid generation while maintaining visual quality, accurate scene representation, and smooth motion. Ideal for social media, creative storytelling, or rapid concept visualization, it supports cinematic framing, dynamic lighting, and natural object movements.

Video

$0.600 / second

veo3.1-extend-video

Veo 3.1’s Extend Video mode lets you continue or expand an existing video clip seamlessly. Starting from a short generated video, you can prompt the model to extend the scene—keeping visual style, characters, motion, and audio consistent. This model needs original task_id of the video.

Video

$0.600 / second

veo3.1-fast-image-to-video

Veo 3.1 Fast is an optimized version of Google’s Veo 3.1 AI that transforms static images into dynamic 8-second videos at higher speed. It preserves visual fidelity while enabling rapid generation, making it ideal for social media clips, storyboards, and quick creative previews.

Video

$0.300 / second

veo3.1-lite-text-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation.

Video

$3.000 / second

veo-4-image-to-video

Veo 4 Image to Video — animate any still image with Veo 4's motion synthesis engine, supporting fine-grained camera control and realistic physics at up to 1080p.

Video

$0.600 / second

veo3.1-4k-video

Get the ultra-high-definition 4K version of a Veo3.1 video generation task. This model is optimized for producing crisp, detailed videos suitable for professional and cinematic applications. It enhances visual fidelity while maintaining temporal coherence and realistic motion.

Video

$1.500 / second

gemini-omni-image-to-video

Gemini Omni Image to Video — animate one or more reference images with a text prompt. Unified reasoning across modalities preserves subject identity and generates synchronized audio natively.

Video

$2.500 / second

veo3-text-to-video

VEO3 T2V generates cinematic videos from text prompts, capturing dynamic motion, rich scenes, and storytelling visuals in stunning detail.

Video

$0.600 / second

veo3-fast-image-to-video

Quickly transform static images into short, motion-rich video clips with fast rendering and impressive quality — powered by Google's VEO3 on MuAPI.

Video

$2.500 / second

veo3.1-image-to-video

Veo 3.1 is Google's advanced AI video generation model that allows users to create high-quality, 8-second videos from static images. This feature is particularly useful for transforming concept art, storyboards, or static visuals into dynamic video clips with synchronized audio.

Video

$2.500 / second

veo3.1-text-to-video

Veo 3.1 is Google's advanced AI video generation model that transforms text prompts into high-quality videos. This model offers enhanced realism, richer audio, and improved narrative control, making it suitable for creators seeking cinematic-quality content.

Video

$0.600 / second

veo3.1-reference-to-video

Veo 3.1 R2V allows creators to generate dynamic videos using up to three reference images. The model maintains visual consistency of characters, objects, and style throughout the video, producing cinematic-quality 8-second clips. It’s perfect for turning concept art, storyboards, or character designs into short, animated sequences while preserving original aesthetics.

Video

$0.300 / second

veo3.1-lite-image-to-video

Veo 3.1 Lite is a lightweight variant of Google's Veo 3.1 model designed for faster, more accessible video generation from images.

Video

$3.000 / second

veo-4-text-to-video

Veo 4 Text to Video — Google DeepMind's fourth-generation model delivering photorealistic, high-fidelity 1080p videos with exceptional prompt adherence and cinematic camera control.

Video

$1.500 / second

gemini-omni-text-to-video

Gemini Omni — natively multimodal any-to-any model. Generates high-fidelity video with synchronized audio directly from text prompts, with unified reasoning across modalities for more coherent scenes and fewer pipeline artifacts.

Video

$2.400 / second

gemini-omni-video-edit

Gemini Omni Video Edit — natively multimodal video-to-video editing. Restyle, relight, swap subjects, or rewrite scenes from a source clip with a single prompt. Unified reasoning across modalities preserves motion and audio continuity while applying the edit.

Image Generation Models

nano-banana
Image

$0.030 / 1K tokens

nano-banana

Nano Banana is an advanced AI model excelling in natural language-driven image generation and editing. It produces hyper-realistic, physics-aware visuals with seamless style transformations.

google-imagen4-fast
Image

$0.020 / 1K tokens

google-imagen4-fast

Imagen 4 Fast is optimized for speed and accessibility, allowing you to generate high-quality images in seconds. While slightly less detailed than the Ultra version, it excels at rapid ideation, drafts, storyboarding, and casual creativity.

nano-banana-effects
Image

$0.030 / generation

nano-banana-effects

Nano Banana Effects is a creative visual effects model designed to transform ordinary images into fun, stylized, and eye-catching results. It applies artistic filters, 3D styles, cartoon transformations, and trending viral looks with a single click.

nano-banana-edit
Image

$0.030 / generation

nano-banana-edit

Nano Banana is a mysterious, high-performance image model. It excels at precise, language-driven edits and consistent character preservation, allowing users to modify images with natural text commands.

nano-banana-pro-edit
Image

$0.120 / generation

nano-banana-pro-edit

Nano Banana 2 Edit is the next-generation image editing model developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced image-edit capabilitie with improved resolution.

Image

$0.000 / generation

gemini-omni-character

Generate a reusable character from a single reference image and a text description. Optionally attach a voice profile created with Gemini Omni Audio to give the character a consistent voice in future video generations.

nano-banana-2
Image

$0.060 / 1K tokens

nano-banana-2

Nano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.

photo-pack
Image

$0.300 / generation

photo-pack

Generate a pack of high-quality, professional portraits in various styles (LinkedIn, CEO, Tinder, etc.) while preserving your facial features.

google-imagen4
Image

$0.030 / 1K tokens

google-imagen4

Google Imagen 4 is the latest text-to-image AI model from DeepMind, designed to produce stunningly photorealistic images with crisp detail, accurate text rendering, and creative flexibility. It supports high-resolution output (up to 2K), generates visuals in seconds, and embeds SynthID watermarks for authenticity.

google-imagen4-ultra
Image

$0.060 / 1K tokens

google-imagen4-ultra

Imagen 4 Ultra is Google’s flagship model, designed for photorealism, rich textures, and production-level imagery. It produces crisp, high-resolution visuals with advanced detail, lighting precision, and natural compositions.

nano-banana-pro
Image

$0.120 / 1K tokens

nano-banana-pro

Nano Banana 2 is the next-generation image generation developed by Google DeepMind, following the original Nano Banana (also known as Gemini 2.5 Flash Image). It offers advanced text-to-image capabilitie with improved resolution.

nano-banana-2-edit
Image

$0.060 / generation

nano-banana-2-edit

Nano Banana 2 (Gemini 3.1 Flash Image) is Google's most advanced image generation model, combining speed with high-fidelity 4K output and revolutionary character consistency.

Text & Chat Models

gemini-3-5-flash
Chat

$0.000 / 1K tokens

gemini-3-5-flash

Gemini 3.5 Flash is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash) and live streaming (/gemini-3-5-flash/stream) via SSE.

gemini-3-5-flash-openai
Chat

$0.000 / 1K tokens

gemini-3-5-flash-openai

Gemini 3.5 Flash (OpenAI-compatible) is a high-speed, multimodal language model built for real-time text generation, supporting text and image inputs natively. Token-based pricing: $0.60/M input tokens and $3.60/M output tokens. Two endpoints: standard async (/gemini-3-5-flash-openai) and live streaming (/gemini-3-5-flash-openai/stream) via SSE.

gemini-3-1-pro
Chat

$0.001 / 1K tokens

gemini-3-1-pro

Gemini 3.1 Pro is Google's next-generation multimodal model, optimized for complex reasoning, planning, coding, and multi-turn conversation. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-1-pro) and live streaming (/gemini-3-1-pro/stream) via SSE.

gemini-3-pro
Chat

$0.001 / 1K tokens

gemini-3-pro

Gemini 3 Pro is Google's powerful multimodal reasoning model, designed for complex problem solving, coding, and logical tasks. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-pro) and live streaming (/gemini-3-pro/stream) via SSE.

gemini-2-5-pro
Chat

$0.000 / 1K tokens

gemini-2-5-pro

Gemini 2.5 Pro is Google's advanced multimodal reasoning model, optimized for complex coding, logical tasks, and deep analysis. Supports text and image inputs. Token-based pricing: $1.25/M input tokens, $10.00/M output tokens. Two endpoints: standard async (/gemini-2-5-pro) and live streaming (/gemini-2-5-pro/stream) via SSE.

gemini-2-5-flash
Chat

$0.000 / 1K tokens

gemini-2-5-flash

Gemini 2.5 Flash is Google's high-speed multimodal language model, optimized for rapid text generation, real-time image understanding, and high-frequency tasks. Supports text and image inputs. Token-based pricing: $0.30/M input tokens, $2.50/M output tokens. Two endpoints: standard async (/gemini-2-5-flash) and live streaming (/gemini-2-5-flash/stream) via SSE.

gemini-3-flash
Chat

$0.001 / 1K tokens

gemini-3-flash

Gemini 3 Flash is a fast, multimodal language model for real-time text generation. Supports text and image inputs, function calling, and Google Search grounding. Token-based pricing: $0.30/M input tokens and $1.80/M output tokens. Two endpoints: standard async (/gemini-3-flash) and live streaming (/gemini-3-flash/stream) via SSE.