Generate a reusable character from a single reference image and a text description. Optionally attach a voice profile created with Gemini Omni Audio to give the character a consistent voice in future video generations.
Happy Horse 1.0 Image to Video — bring still images to life with fluid, expressive animation and fine-grained motion control.
Generate a clean 3D mesh from a single reference image. Output is a textured .glb plus FBX/OBJ/USDZ alternatives. Optional PBR materials and rigging-ready output.
VEO3 Fast T2V creates short videos from text instantly, balancing speed and quality for quick content generation and prototyping.
Claude Opus 4.8 is Anthropic's most capable model for complex coding, long-context reasoning, and agentic workflows. Supports text and image inputs. Token-based pricing: $3.00/M input tokens, $15.00/M output tokens. Two endpoints: standard async (/claude-opus-4-8) and live streaming (/claude-opus-4-8/stream) via SSE.
Takes an input images and transforms it based on a new prompt. Keeps structure or pose while changing style, appearance, or details.
OpenAI GPT Codex delivers advanced coding capabilities with scalable reasoning depth. Supports multiple model variants (gpt-5-codex through gpt-5.4-codex) and multimodal inputs. Token-based pricing: $1.25/M input tokens, $9.00/M output tokens. Two endpoints: standard async (/gpt-codex) and live streaming (/gpt-codex/stream) via SSE.
Gemini 3.1 Pro is Google's next-generation multimodal model, optimized for complex reasoning, planning, coding, and multi-turn conversation. Supports text and image inputs. Token-based pricing: $4.00/M input tokens, $24.00/M output tokens. Two endpoints: standard async (/gemini-3-1-pro) and live streaming (/gemini-3-1-pro/stream) via SSE.
GPT-Image-1.5 is a high-quality text-to-image generation model designed for rich visual reasoning, detailed compositions, and strong prompt understanding. It excels at complex scenes, symbolic imagery, cinematic lighting, surreal concepts, product visuals, and imaginative world-building while maintaining coherence and fine detail.
Happy Horse 1.0 Text to Video (720p) — generate expressive, stylized video clips from text prompts at 720p output resolution.
Enables text-to-image generation using custom LoRA models. Generate consistent characters, styles, or branded visuals with high quality and fast results.
Sora 2 Pro T2V is the high-fidelity version of OpenAI’s video generation model. It converts your text prompts into cinematic, richly detailed video clips with synchronized audio, realistic motion, strong physics, and creative control over style, mood, and pacing. Perfect for creators, storytellers, advertisers, and anyone who wants top-quality video content from text.
Wan 2.2’s I2V mode brings static visuals to life with vivid, expressive animations. It interprets motion, emotion, and background dynamics from a single image to generate smooth and cinematic short videos.
Reconstruct a 3D mesh from 1-4 reference images. Multi-view inputs produce more accurate geometry for complex or asymmetric objects.
Create professional-grade product photos using AI. Upload your item image and describe it with a prompt, and get studio-style, lifestyle, or creative backgrounds in seconds
