AI Audio Generator

Generate, remix, and extend music; create speech and sound effects from text; or generate audio that matches a video. All exposed through the same MuApi JSON API with the standard submit-and-poll flow.

  • Music: Suno create / remix / extend
  • Sound effects & ambience: MMAudio text-to-audio, MMAudio video-to-audio
  • Pair with lipsync for end-to-end dubbed video pipelines

Quick Start

Every model in this category uses the same submit-then-poll API. Replace suno-create-music with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/suno-create-music \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"an upbeat lofi hip-hop track with mellow piano"}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Audio Generator Models

ModelProviderCostBest For
suno-create-music$0.090Suno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.
suno-remix-music$0.090This API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.
minimax-speech-2.6-hd$0.650Speech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.
mmaudio-v2-text-to-audio$0.010Convert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.
minimax-voice-clone$0.650Minimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.

All 11 Models

minimax-speech-2.6-hd
10%
Text to Audio
$0.7222$0.650

minimax-speech-2.6-hd

Speech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.

suno-create-music
11%
Text to Audio
$0.1000$0.090

suno-create-music

Suno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.

minimax-voice-clone
10%
Text to Audio
$0.7222$0.650

minimax-voice-clone

Minimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.

suno-generate-sounds
10%
Text to Audio
$0.0222$0.020

suno-generate-sounds

Generate sound effects using Suno chirp-crow model.

suno-add-vocals
11%
Text to Audio
$0.1000$0.090

suno-add-vocals

Add vocals to an instrumental track.

suno-generate-mashup
11%
Text to Audio
$0.1000$0.090

suno-generate-mashup

Create a mashup using 1-5 audio tracks.

suno-add-instrumental
11%
Text to Audio
$0.1000$0.090

suno-add-instrumental

Add instrumental backing to acapella audio.

mmaudio-v2-text-to-audio
10%
Text to Audio
$0.0111$0.010

mmaudio-v2-text-to-audio

Convert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.

suno-remix-music
11%
Text to Audio
$0.1000$0.090

suno-remix-music

This API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.

suno-extend-music
11%
Text to Audio
$0.1000$0.090

suno-extend-music

This API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.

minimax-speech-2.6-turbo
10%
Text to Audio
$0.7222$0.650

minimax-speech-2.6-turbo

Speech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.

Frequently Asked Questions

Can I extend an existing track?

Yes — `suno-extend` accepts an audio URL and a continuation prompt and returns a longer clip preserving the original style.

How do I generate audio that matches a video?

Use `mmaudio-v2v` (video-to-audio) — it analyzes the video and generates a matching ambient track or sound effect.