Question 1

How many reference images can I provide?

Accepted Answer

Between 1 and 5 images. Each image counts as 1 unit toward the 7-unit capacity (videos use 2, character IDs use 1).

Question 2

Does it preserve the subject's appearance across frames?

Accepted Answer

Yes — Gemini Omni Image to Video is designed to maintain subject identity and appearance from the reference images throughout the generated clip.

Question 3

Is audio generated automatically?

Accepted Answer

Yes — synchronized dialogue, ambient sound, and music are generated natively alongside the video in the same forward pass.