Explore/muapi.ai/minimax-voice-clone

muapi/minimax-voice-clone

Text to Audio

Minimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.

Input

Configure the model parameters below.

Audio URL* requiredUrl of the audio url.

Upload

Drag & drop, paste file/image, or paste a link

Custom Voice ID* requiredCustom user-defined ID. Minimum 8 characters must include letters and numbers and start with a letter. Duplicate voice-ids will throw an error.

ModelSpecify the TTS model to be used for the preview. This is only a preview after cloning. Once the model is generated, any Minimax Turbo or HD voice model can be used for inference. (Default: speech-02-hd)

Need Noise ReductionEnable noise reduction. Default is false (no noise reduction).

Enable noise reduction. Default is false (no noise reduction).

Need Volume NormalizationSpecify whether to enable volume normalization.

Specify whether to enable volume normalization.

AccuracyText validation accuracy threshold, with a value range of [0, 1].

PromptText for audio preview. Limited to 2000 characters.

Result

Flat rate per run

Cost
$0.65

🚀Related Models

View all

minimax-hailuo-2.3-pro-i2v

Hailuo 2.3 Pro I2V breathes life into still images with stunning motion synthesis and cinematic camera control. Using deep motion understanding, it predicts realistic subject movement, depth, and environmental motion from a single input frame — delivering smooth, film-grade clips.

Image to Video

minimax-hailuo-2.3-pro-t2v

Hailuo 2.3 Pro T2V turns your imagination into motion-picture realism. It interprets natural language prompts and generates visually stunning cinematic sequences that capture depth, atmosphere, and authentic motion.

Text to Video

minimax-hailuo-2.3-standard-i2v

Hailuo 2.3 Standard I2V converts still images into visually immersive motion clips with stable dynamics and realistic movement. It provides a balanced mix of quality, speed, and coherence. In 768p video generation.

Image to Video

minimax-hailuo-2.3-fast

Minimax Hailuo 2.3 Fast is the lightweight, high-speed version of the Hailuo 2.3 family — designed for creators who need instant video generation with cinematic motion and scene consistency. In 768p video generation.

Image to Video

minimax-hailuo-2.3-standard-t2v

Hailuo 2.3 Standard T2V transforms pure imagination into moving cinematic visuals. Simply describe a scene, and this model generates a coherent, high-quality video that captures the prompt’s tone, environment, and emotion. In 768p video generation.

Text to Video

📝

Overview

About this model

Minimax Voice Clone is a cutting-edge text-to-speech solution designed to create a high-fidelity digital clone of a speaker’s voice using just a short reference audio sample. By accurately capturing the speaker’s tone, emotion, accent, rhythm, and speaking style, the model enables the generation of new, contextually appropriate speech from any text input. This robust technology leverages advanced deep learning and neural network architectures that ensure precision and realism in every synthesis output.

Built for versatility and quality, Minimax Voice Clone is ideal for a variety of applications ranging from personalized voice assistants and automated narration to immersive audiobook experiences. Its unique capability to mirror nuanced vocal traits sets it apart from competitors, offering not only technical excellence but also an intuitive and cost-effective approach to voice cloning.

1Personalized virtual assistants and customer service bots

2Audiobook and podcast narration with a custom voice

3Voiceovers for video content and advertising

4Custom audio messages and alerts for apps

5Accessible reading solutions for visually impaired users

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.65	muapiapp offers a highly cost-effective solution that is 20-50% more affordable than comparable rates from competitors while delivering superior or comparable quality.
Fal.ai	$0.85	Fal.ai's pricing is around $0.85 per generation. Compared to muapiapp, you save between 20-50% using our solution without sacrificing output quality.
Replicate	$0.85	Replicate also charges around $0.85 per generation, making muapiapp a significantly more affordable option with cost reductions of 20-50% while providing state-of-the-art voice cloning technology.

muapiapp$0.65

muapiapp offers a highly cost-effective solution that is 20-50% more affordable than comparable rates from competitors while delivering superior or comparable quality.

Fal.ai$0.85

Fal.ai's pricing is around $0.85 per generation. Compared to muapiapp, you save between 20-50% using our solution without sacrificing output quality.

Replicate$0.85

Replicate also charges around $0.85 per generation, making muapiapp a significantly more affordable option with cost reductions of 20-50% while providing state-of-the-art voice cloning technology.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Audio URL	string	Url of the audio url.	`https://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/minimax-voice-clone-in.wav`
Custom Voice ID	string	Custom user-defined ID. Minimum 8 characters must include letters and numbers and start with a letter. Duplicate voice-ids will throw an error.
Model	Enum (6 options)	Specify the TTS model to be used for the preview. This is only a preview after cloning. Once the model is generated, any Minimax Turbo or HD voice model can be used for inference.	`speech-02-hd`
Need Noise Reduction	boolean	Enable noise reduction. Default is false (no noise reduction).	`false`
Need Volume Normalization	boolean	Specify whether to enable volume normalization.	`false`
Accuracy	int	Text validation accuracy threshold, with a value range of [0, 1].	`0.7`
Prompt	string	Text for audio preview. Limited to 2000 characters.	`Hello! Welcome to Muapiapp! This is a preview of your cloned voice. I hope you enjoy it!`

Audio URLstring

Url of the audio url.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/minimax-voice-clone-in.wav

Custom Voice IDstring

Custom user-defined ID. Minimum 8 characters must include letters and numbers and start with a letter. Duplicate voice-ids will throw an error.

Default Value

ModelEnum (6 options)

Specify the TTS model to be used for the preview. This is only a preview after cloning. Once the model is generated, any Minimax Turbo or HD voice model can be used for inference.

Default Valuespeech-02-hd

Need Noise Reductionboolean

Enable noise reduction. Default is false (no noise reduction).

Default Valuefalse

Need Volume Normalizationboolean

Specify whether to enable volume normalization.

Default Valuefalse

Accuracyint

Text validation accuracy threshold, with a value range of [0, 1].

Default Value0.7

Promptstring

Text for audio preview. Limited to 2000 characters.

Default ValueHello! Welcome to Muapiapp! This is a preview of your cloned voice. I hope you enjoy it!

📖

Implementation Guide

Developer documentation

How to Use Minimax Voice Clone

Prepare Your Input Audio
- Ensure you have a clear audio sample of the speaker. The sample should represent the voice characteristics you want to clone.
- Upload the audio file via the provided audio_url field.
Set Up Your Request
- Use the input schema to structure your request. Key fields include custom_voice_id, model, need_noise_reduction, need_volume_normalization, accuracy, and prompt.
- Customize parameters like noise reduction and volume normalization based on your audio quality.
Generate the Voice Clone
- Submit your request to the minimax-voice-clone endpoint.
- Wait for the processing, during which the system will extract the voice characteristics and simulate the cloned voice.
Review and Utilize the Output
- The system returns a generated audio file accessible via the audio field.
- Play back the output and integrate it as needed in your projects.
Iterate and Optimize
- Experiment with different prompts or settings to refine the output.
- Adjust the accuracy threshold if a higher degree of fidelity is required for specific applications.

❓

Common Questions

Frequently asked

How does Minimax Voice Clone work?

The model analyzes a short reference audio to capture essential voice features such as tone, accent, emotion, and rhythm. It then uses state-of-the-art TTS technology to generate speech that mirrors the reference voice, ensuring high fidelity and naturalness in the synthesized output.

What input formats are accepted?

The primary input is an audio URL provided in the `audio_url` field. Additional parameters such as custom voice IDs and text prompts must adhere to the defined input schema.

Is any special software required to use this model?

No special software is required. The service is accessed via an API endpoint where you can submit your JSON-formatted request, making integration into existing workflows straightforward.

What is the cost per generation?

Minimax Voice Clone is offered at a competitive rate of $0.65 per generation.

gemini-omni-character

happy-horse-1-image-to-video-1080p

meshy-6-image-to-3d

veo3-fast-text-to-video

claude-opus-4-8

flux-kontext-dev-i2i

gpt-codex

gemini-3-1-pro

gpt-image-1.5

happy-horse-1-text-to-video-720p

flux-dev-lora

openai-sora-2-pro-text-to-video

wan2.2-image-to-video

meshy-6-multi-image-to-3d

ai-product-photography

vidu-v2.0-i2v

ovi-text-to-video

minimax-hailuo-2.3-pro-i2v

latent-sync

flux-pulid

flux-redux

ltx-2-19b-image-to-video

minimax-hailuo-02-standard-t2v

bytedance-seededit-v3

topaz-video-upscale

minimax-hailuo-2.3-pro-t2v

pixverse-v5-t2v

mmaudio-v2-text-to-audio

ai-background-remover

wan2.5-text-to-image

kling-v1-avatar-pro

kling-v2.1-standard-i2v

veo3.1-fast-image-to-video

leonardoai-motion-2.0

sd-2-image-to-video

ltx-2-pro-text-to-video

ai-object-eraser

pixverse-v6-i2v

ovi-image-to-video

qwen-image-2.0-pro-edit

veo3.1-4k-video

veed-lipsync

minimax-image-01-subject-reference

flux-kontext-pro-i2i

infinitetalk-image-to-video

ai-skin-enhancer

qwen-image-edit-plus

flux-schnell

suno-generate-lyrics

sd-2-character

pixverse-v6-t2v

veo3.1-lite-text-to-video

kling-v2.1-pro-i2v

ai-product-shot

wan2.7-image-edit

wan2.2-text-to-video

kling-v2.5-turbo-pro-i2v

kling-o3-image

ai-image-extension

veo3-image-to-video

wan2.2-animate

openai-sora-2-text-to-video

vidu-q2-reference-to-image

tripo3d-h31-text-to-3d

minimax-speech-2.6-turbo

kling-v3.0-std-motion-control

twitter-fetch-posts

wan2.2-edit-video

kling-v2-avatar-pro

runway-aleph-v2v

flux-2-klein-9b-turbo

ai-image-face-swap

kling-v2.6-pro-motion-control

sd-2-video-watermark-remover-pro

sd-2-vip-first-last-frame-1080p

kling-o1-text-to-video

kling-o1-edit-image

facebook-fetch-reels

grok-imagine-video-1-5-preview

nano-banana-pro-edit