Models/Google/Gemini Omni

Gemini Omni API

2 variants

Google's first natively multimodal any-to-any model, unveiled at I/O 2026. The Gemini Omni API on Muapi delivers text-to-video and video-edit with synchronized audio generated in the same forward pass — one model, one API, every modality.

⚡ Gemini Omni FlashNatively multimodal · any-to-any · synchronized audio2 models
T2VNew
⚡ Flash

Gemini Omni Text to Video

Google's natively multimodal any-to-any model. Generate cinematic video with synchronized dialogue, ambient audio, and music from a single prompt — all in one forward pass.

1080p
$2.00 / generation
Try Model
V2VNew
⚡ Flash

Gemini Omni Video Edit

Source-driven video editing with the Gemini Omni any-to-any model. Restyle, relight, swap subjects, or rewrite dialogue while preserving original motion and timing.

1080p
$2.50 / generation
Try Model

Why use the Gemini Omni API on Muapi?

Any-to-any in one pass

Text, image, audio, and video reasoned together — no chained pipelines, no cross-model drift.

Native synchronized audio

Dialogue, ambient sound, and music generated in the same forward pass as the visuals.

Unified API surface

Single REST endpoint, async webhook callback, OpenAPI spec, and structured error envelopes.

Up to 1080p video

16:9, 9:16, and 1:1 aspect ratios — cinematic, vertical, and square from the same model.

Flat per-generation pricing

$2.00 for text-to-video, $2.50 for video-edit. Audio included. No surprise per-second billing.

Drop-in for Veo or Sora workflows

Same submit-then-poll pattern as every other Muapi model — swap the endpoint and ship.

Gemini Omni API — Frequently Asked Questions

What is the Gemini Omni API?

Gemini Omni is Google's first natively multimodal any-to-any model, unveiled at I/O 2026. The Gemini Omni API on Muapi exposes its text-to-video and video-edit capabilities through a single REST endpoint with synchronized audio generation in the same forward pass.

How is Gemini Omni different from Veo or Sora?

Gemini Omni reasons across text, image, audio, and video in one forward pass instead of relaying through specialized models. The result is native synchronized audio, fewer cross-modality artifacts, and cleaner edits than a chained pipeline can produce.

How much does the Gemini Omni API cost on Muapi?

Pricing is flat per generation: $2.00 for Gemini Omni Text to Video and $2.50 for Gemini Omni Video Edit. Synchronized audio is included — no separate audio-track billing.

What inputs and outputs does Gemini Omni support?

Gemini Omni accepts any combination of text, image, audio, and video as input. The first models on Muapi focus on text-to-video and video-to-video editing, with synchronized audio generated natively as part of the output.

Can I generate vertical or square video with Gemini Omni?

Yes — Gemini Omni Text to Video supports 16:9, 9:16, and 1:1 aspect ratios at up to 1080p, so the same API powers cinematic widescreen, TikTok-style vertical, and feed-friendly square clips.

Need video generation today?

Veo 3 and SD 2.0 are live now — text-to-video and image-to-video available via API today on Muapi.