Natively multimodal any-to-any model. The Gemini Omni API on Muapi delivers text-to-video, image-to-video, and video-edit with synchronized audio generated in the same forward pass — plus custom voice profiles and character profiles for consistent character-driven generation.
Natively multimodal any-to-any model. Generate cinematic video with synchronized dialogue, ambient audio, and music from a single text prompt — all in one forward pass.
Animate up to 5 reference images with a text prompt. Gemini Omni preserves subject identity across frames and generates synchronized audio natively in the same forward pass.
Source-driven video editing with the Gemini Omni any-to-any model. Restyle, relight, swap subjects, or rewrite dialogue while preserving original motion and timing.
Create a reusable voice profile from any of 30 preset voices. Describe timbre and style, then pass the returned audioId in audio_ids when generating Gemini Omni video.
Create a reusable character profile from one reference image. Describe the character, optionally attach voice profiles, and use the character_id in character_ids for consistent character-driven video generation.
Any-to-any in one pass
Text, image, audio, and video reasoned together — no chained pipelines, no cross-model drift.
Native synchronized audio
Dialogue, ambient sound, and music generated in the same forward pass as the visuals.
T2V, I2V, and V2V
Generate from text, animate up to 5 reference images, or edit existing clips — three modes, one API surface.
Custom voice profiles
Create reusable voice profiles from 30 preset voices. Attach up to 3 per generation via audio_ids.
Character-consistent generation
Create character profiles from a reference image. Reuse the characterId across generations for a consistent visual identity.
Drop-in for Veo or Sora workflows
Same submit-then-poll pattern as every other Muapi model — swap the endpoint and ship.
Gemini Omni is a natively multimodal any-to-any model. The Gemini Omni API on Muapi exposes text-to-video, image-to-video, and video-edit capabilities with synchronized audio generated in the same forward pass.
Gemini Omni reasons across text, image, audio, and video in one forward pass instead of relaying through specialized models. The result is native synchronized audio, fewer cross-modality artifacts, and cleaner edits than a chained pipeline can produce.
Text-to-video and image-to-video are priced by duration and resolution: from $1.50 for an 8-second 720p clip up to $2.70 for 8 s at 4K. Video-edit is a flat $2.40 (720p/1080p) or $3.60 (4K) per generation. Synchronized audio is included at no extra charge.
Text-to-video takes a text prompt. Image-to-video takes up to 5 reference images plus a prompt. Video-edit takes a source clip and a prompt describing the edit to apply. All three produce video with natively synchronized audio. You can also attach up to 3 preset voice IDs (audio_ids) and up to 3 character IDs (character_ids) to any generation.
Gemini Omni Audio lets you create a reusable voice profile by picking one of 30 preset voices, giving it a name, and optionally describing the timbre, pacing, and style. The API returns an audioId that you pass in the audio_ids field when generating Gemini Omni video — up to 3 voice profiles per generation.
Gemini Omni Character creates a reusable character profile from a single reference image and a text description. The returned characterId can be passed in the character_ids field (up to 3 per generation) to anchor the character's visual identity across multiple video generations.
Yes — all three Gemini Omni variants support 16:9 and 9:16 aspect ratios, so the same API powers cinematic widescreen and TikTok-style vertical clips.
Gemini Omni is available on the Pro and Business plans. Upgrade at muapi.ai/topup to get access.