Explore/muapi.ai/minimax-speech-2.6-turbo

muapi/minimax-speech-2.6-turbo

Text to Audio

Speech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.

Input

Configure the model parameters below.

Prompt* requiredText to convert to speech. Every character is 1 token. Maximum 10000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).

Voice ID* requiredDesired voice ID. Use a voice ID you have trained (https://muapi.ai/playground/minimax-voice-clone), or one of the following system voice IDs (Default: Friendly_Person)

SpeedSpeech speed. Range: 0.5-2.0, where 1.0 is normal speed.

VolumeSpeech volume. Range: 0.1-10.0, where 1.0 is normal volume.

PitchSpeech pitch. Range: -12 to 12, where 0 is normal pitch.

EmotionThe emotion of the generated speech. (Default: surprised)

English NormalizationThis parameter supports English text normalization, which improves performance in number-reading scenarios.

This parameter supports English text normalization, which improves performance in number-reading scenarios.

Sample RateSample rate of generated sound. (Default: 8000)

BitrateBitrate of generated sound. (Default: 32000)

Channelhe number of channels of the generated audio. 1: mono, 2: stereo. (Default: 1)

FormatFormat of generated sound. (Default: mp3)

Language BoostEnhance the ability to recognize specified languages and dialects. (Default: auto)

Result

🚀Related Models

View all

minimax-speech-2.6-hd

Speech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.

Text to Audio

📝

Overview

About this model

Minimax-Speech-2.6-Turbo is a cutting-edge text-to-speech model from Minimax that blends speed and quality in audio generation. Built with a focus on quick output and efficient processing, this model is optimized to deliver clear and natural-sounding speech with smooth pacing and minimal delay. It harnesses advanced deep learning techniques that ensure each generated audio clip maintains human-like intonation and a realistic tone.

Designed for both developers and businesses, this lightweight model is capable of handling a diverse range of applications—from engaging interactive applications to dynamic audio content creation. Its easily adjustable parameters such as speed, volume, pitch, and emotion allow users to fine-tune the output to suit their specific needs, making it a versatile tool in the text to audio marketplace.

1Creating engaging podcast intros with lifelike narration

2Generating dynamic voiceovers for video content

3Enabling interactive voice responses in customer service applications

4Producing accessible audio content for visually impaired users

5Automating audio content for e-learning platforms

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.65 per generation	muapiapp is 20-50% more affordable than its competitors while delivering comparable or superior quality.
Fal.ai	$0.85 per generation	Fal.ai charges about 20-50% more per generation compared to muapiapp, ensuring muapiapp remains the more cost-effective solution without compromising on quality.
Replicate	$0.85 per generation	Replicate's pricing is nearly identical to Fal.ai, making muapiapp a 20-50% more affordable option with equal or better performance.

muapiapp$0.65 per generation

muapiapp is 20-50% more affordable than its competitors while delivering comparable or superior quality.

Fal.ai$0.85 per generation

Fal.ai charges about 20-50% more per generation compared to muapiapp, ensuring muapiapp remains the more cost-effective solution without compromising on quality.

Replicate$0.85 per generation

Replicate's pricing is nearly identical to Fal.ai, making muapiapp a 20-50% more affordable option with equal or better performance.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Text to convert to speech. Every character is 1 token. Maximum 10000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).	`Welcome to Minimax-Speech 2.6 by Muapiapp! Get ready for an audio revolution! We are thrilled to introduce a model so realistic, it's virtually indistinguishable from a human voice. You're going to be amazed by its lifelike delivery!`
Voice ID	Enum (472 options)	Desired voice ID. Use a voice ID you have trained (https://muapi.ai/playground/minimax-voice-clone), or one of the following system voice IDs	`Friendly_Person`
Speed	int	Speech speed. Range: 0.5-2.0, where 1.0 is normal speed.	`1`
Volume	int	Speech volume. Range: 0.1-10.0, where 1.0 is normal volume.	`1`
Pitch	int	Speech pitch. Range: -12 to 12, where 0 is normal pitch.	`0`
Emotion	Enum (7 options)	The emotion of the generated speech.	`surprised`
English Normalization	boolean	This parameter supports English text normalization, which improves performance in number-reading scenarios.	`false`
Sample Rate	Enum (6 options)	Sample rate of generated sound.	`8000`
Bitrate	Enum (4 options)	Bitrate of generated sound.	`32000`
Channel	Enum (2 options)	he number of channels of the generated audio. 1: mono, 2: stereo.	`1`
Format	Enum (4 options)	Format of generated sound.	`mp3`
Language Boost	Enum (41 options)	Enhance the ability to recognize specified languages and dialects.	`auto`

Promptstring

Text to convert to speech. Every character is 1 token. Maximum 10000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).

Default Value

Welcome to Minimax-Speech 2.6 by Muapiapp! Get ready for an audio revolution! We are thrilled to introduce a model so realistic, it's virtually indistinguishable from a human voice. You're going to be amazed by its lifelike delivery!

Voice IDEnum (472 options)

Desired voice ID. Use a voice ID you have trained (https://muapi.ai/playground/minimax-voice-clone), or one of the following system voice IDs

Default ValueFriendly_Person

Speedint

Speech speed. Range: 0.5-2.0, where 1.0 is normal speed.

Default Value1

Volumeint

Speech volume. Range: 0.1-10.0, where 1.0 is normal volume.

Default Value1

Pitchint

Speech pitch. Range: -12 to 12, where 0 is normal pitch.

Default Value0

EmotionEnum (7 options)

The emotion of the generated speech.

Default Valuesurprised

English Normalizationboolean

This parameter supports English text normalization, which improves performance in number-reading scenarios.

Default Valuefalse

Sample RateEnum (6 options)

Sample rate of generated sound.

Default Value8000

BitrateEnum (4 options)

Bitrate of generated sound.

Default Value32000

ChannelEnum (2 options)

he number of channels of the generated audio. 1: mono, 2: stereo.

Default Value1

FormatEnum (4 options)

Format of generated sound.

Default Valuemp3

Language BoostEnum (41 options)

Enhance the ability to recognize specified languages and dialects.

Default Valueauto

📖

Implementation Guide

Developer documentation

How to Use Minimax-Speech-2.6-Turbo

Prepare Your Input:
- Write the text you want to convert to speech in the prompt field. Use special tags like <#x#> to control pause durations between words.
- Select a voice_id from the provided list or use your custom trained voice.
- Adjust parameters such as speed, volume, pitch, and emotion as needed.
Configure Technical Settings:
- Choose the sample_rate and bitrate to match your desired audio quality.
- Set your preferred channel (mono or stereo) and format (e.g., mp3, wav) for the output.
- Optionally, enable English normalization for better performance in number-reading scenarios and specify a language_boost if needed.
Submit Your Request:
- Use the provided API endpoint minimax-speech-2.6-turbo to send your configured JSON payload.
Interpret the Results:
- Once the API returns the output, access the audio link to download or play the generated speech.
- Review the audio quality and adjust any parameters if necessary for subsequent requests.

Enjoy high-quality, natural-sounding audio generation with minimal delay from Minimax-Speech-2.6-Turbo!

❓

Common Questions

Frequently asked

What makes Minimax-Speech-2.6-Turbo stand out among other text-to-speech models?

Minimax-Speech-2.6-Turbo offers a unique blend of speed and quality, ensuring rapid audio generation while maintaining a natural and clear voice. Its highly customizable parameters allow users to fine-tune speed, volume, pitch, and emotion, providing a versatile tool for a wide range of applications.

How can I control the speech pacing and pauses?

You can control the pacing by using `<#x#>` tags within your prompt text to specify the pause duration in seconds. Additionally, adjusting the `speed` parameter helps in managing the overall tempo of the speech.

Can I use custom voices with this model?

Yes, besides selecting from the available system voices via the `voice_id` parameter, you can also integrate custom trained voices through the Minimax voice cloning tool provided at https://muapi.ai/playground/minimax-voice-clone.

What output formats and quality settings are available?

The model supports multiple output formats including mp3, wav, pcm, and flac. You can also adjust the sample rate, bitrate, and channel settings to meet your specific quality requirements.

minimax-hailuo-02-standard-t2v

meshy-6-image-to-3d

pixverse-v5-t2v

veo3-fast-text-to-video

kling-v1-avatar-pro

meshy-6-multi-image-to-3d

ai-product-photography

flux-kontext-dev-i2i

gemini-3-1-pro

gpt-image-1.5

ovi-text-to-video

minimax-hailuo-2.3-pro-i2v

happy-horse-1-text-to-video-720p

kling-v2.1-standard-i2v

pixverse-v6-i2v

wan2.2-image-to-video

veed-lipsync

vidu-v2.0-i2v

minimax-image-01-subject-reference

flux-pulid

latent-sync

infinitetalk-image-to-video

bytedance-seededit-v3

flux-redux

kling-v2.5-turbo-pro-i2v

wan2.2-animate

ai-background-remover

wan2.5-text-to-image

topaz-video-upscale

leonardoai-motion-2.0

ai-object-eraser

ovi-image-to-video

minimax-hailuo-2.3-pro-t2v

mmaudio-v2-text-to-audio

flux-dev-lora

vidu-q2-reference-to-image

minimax-speech-2.6-turbo

veo3.1-4k-video

kling-v3.0-std-motion-control

flux-kontext-pro-i2i

ai-skin-enhancer

suno-generate-lyrics

sd-2-character

ai-product-shot

ai-image-extension

veo3.1-fast-image-to-video

sd-2-image-to-video

wan2.2-edit-video

openai-sora-2-pro-text-to-video

ltx-2-pro-text-to-video

kling-v2-avatar-pro

runway-aleph-v2v

qwen-image-2.0-pro-edit

flux-2-klein-9b-turbo

qwen-image-edit-plus

kling-v2.6-pro-motion-control

pixverse-v6-t2v

flux-schnell

sd-2-video-watermark-remover-pro

wan2.7-image-edit

kling-v2.1-pro-i2v

veo3.1-lite-text-to-video

happy-horse-1-image-to-video-1080p

wan2.2-text-to-video

sd-2-vip-first-last-frame-1080p

kling-o3-image

tripo3d-h31-text-to-3d

veo3-image-to-video

openai-sora-2-text-to-video

kling-o1-text-to-video

kling-o1-edit-image

twitter-fetch-posts

gemini-omni-character

grok-imagine-video-1-5-preview

ai-image-face-swap

nano-banana-pro-edit

facebook-fetch-reels

generate-social-video-script

omnihuman-1-5

hidream-i1-full