AI Audio Generator

Generate, remix, and extend music; create speech and sound effects from text; or generate audio that matches a video. All exposed through the same MuApi JSON API with the standard submit-and-poll flow.

→Music: Suno create / remix / extend
→Sound effects & ambience: MMAudio text-to-audio, MMAudio video-to-audio
→Pair with lipsync for end-to-end dubbed video pipelines

Try in Playground Get an API Key llms.txt

Quick Start

Every model in this category uses the same submit-then-poll API. Replace suno-create-music with any model endpoint from the list below.

# 1. Submit
curl -X POST https://api.muapi.ai/api/v1/suno-create-music \
  -H "x-api-key: $MUAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"an upbeat lofi hip-hop track with mellow piano"}'
# → {"request_id":"abc123","status":"processing"}

# 2. Poll until completed
curl https://api.muapi.ai/api/v1/predictions/abc123/result \
  -H "x-api-key: $MUAPI_API_KEY"

Top 5 Audio Generator Models

Model	Provider	Cost	Best For
suno-create-music	—	$0.090	Suno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.
minimax-speech-2.6-hd	—	$0.650	Speech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.
minimax-speech-2.6-turbo	—	$0.650	Speech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.
mmaudio-v2-text-to-audio	—	$0.010	Convert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.
minimax-voice-clone	—	$0.650	Minimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.

All 12 Models

10%

Text to Audio

$0.0111$0.010

mmaudio-v2-text-to-audio

Convert text into natural-sounding speech using mmAudio-v2. Ideal for voiceovers, virtual assistants, and content narration with lifelike clarity and tone.

11%

Text to Audio

$0.1000$0.090

suno-add-vocals

Add vocals to an instrumental track.

11%

Text to Audio

$0.1000$0.090

suno-remix-music

This API covers an audio track by transforming it into a new style while retaining its core melody. It incorporates Suno's upload capability, enabling users to upload an audio file for processing. The expected result is a refreshed audio track with a new style, keeping the original melody intact.

11%

Text to Audio

$0.1000$0.090

suno-extend-music

This API extends audio tracks while preserving the original style of the audio track. It includes Suno's upload functionality, allowing users to upload audio files for processing. The expected result is a longer track that seamlessly continues the input style.

10%

Text to Audio

$0.7222$0.650

minimax-voice-clone

Minimax Voice Clone creates a high-fidelity digital clone of a speaker’s voice from a short reference audio sample. It reproduces the speaker’s tone, emotion, accent, rhythm, and speaking style, then generates new speech from any text input.

10%

Text to Audio

$0.7222$0.650

minimax-speech-2.6-hd

Speech-2.6-hd is Minimax’s high-definition text-to-speech model that turns written text into natural, human-like audio. It produces studio-quality speech with clear pronunciation, smooth pacing, realistic emotion, and no background noise.

10%

Text to Audio

$0.7222$0.650

minimax-speech-2.6-turbo

Speech-2.6-turbo is Minimax’s fast, lightweight text-to-speech model designed for quick audio generation while maintaining good natural voice quality. It produces clear speech with smooth pacing and minimal delay.

10%

Text to Audio

$0.0222$0.020

suno-generate-sounds

Generate sound effects using Suno chirp-crow model.

11%

Text to Audio

$0.1000$0.090

suno-add-instrumental

Add instrumental backing to acapella audio.

Text to Audio

suno-voice-clone

Clone your singing voice in two takes for use with Suno music generation. Submit a 10-second sample, then read back a fresh random phrase the system generates (anti-deepfake liveness check), and receive a reusable voice_id you can drop into Suno music creation. Free during preview.

11%

Text to Audio

$0.1000$0.090

suno-create-music

Suno generate music that turns text prompts into full songs — complete with vocals, lyrics, and instrumentation. You can describe a mood, genre, or even a specific lyric idea, and Suno creates a realistic, studio-quality track in seconds.

11%

Text to Audio

$0.1000$0.090

suno-generate-mashup

Create a mashup using 1-5 audio tracks.

Frequently Asked Questions

Can I extend an existing track?

Yes — `suno-extend` accepts an audio URL and a continuation prompt and returns a longer clip preserving the original style.

How do I generate audio that matches a video?

Use `mmaudio-v2v` (video-to-audio) — it analyzes the video and generates a matching ambient track or sound effect.

ai-product-photography

wan2.2-image-to-video

facebook-publish

hunyuan-text-to-video

runway-aleph-v2v

flux-dev-lora

happy-horse-1.1-text-to-video-1080p

pixverse-v4.5-t2v

hidream-i1-full

creatify-lipsync

flux-kontext-pro-i2i

kling-v1-avatar-standard

heygen-video-translate

wan2.2-animate

ai-image-extension

openai-sora-2-text-to-video

ai-video-upscaler-pro

ai-object-eraser

veed-lipsync

veo3.1-fast-image-to-video

veo3.1-fast-text-to-video

ai-dance-effects

image-effects

gemini-omni-image-to-video

veo3-fast-text-to-video

ltx-2-fast-text-to-video

kling-v2.5-turbo-std-i2v

minimax-hailuo-2.3-pro-i2v

minimax-hailuo-2.3-pro-t2v

wan2.1-text-to-image

reve-image-edit

grok-imagine-text-to-video

nano-banana-pro-edit

qwen-image-edit-plus-lora

ai-image-face-swap

google-imagen4-fast

sdxl-lora

infinitetalk-image-to-video

wan2.2-edit-video

ltx-2-pro-text-to-video

mmaudio-v2-text-to-audio

kling-v2-avatar-pro

flux-2-flex

flux-2-pro-edit

ai-product-shot

seedance-v1.5-pro-t2v

bytedance-seededit-v3

add-video-watermark

ai-skin-enhancer

seedance-v1.5-pro-t2v-fast

qwen-image-edit-2511

qwen-text-to-image-2512

kling-v2.1-standard-i2v

kling-v3.0-standard-image-to-video

kling-v3.0-std-motion-control

suno-add-vocals

seedance-2-video-watermark-remover-pro

ai-background-remover

latent-sync

claude-opus-4-6

flux-kontext-dev-i2i

seedance-2-image-to-video-fast

pixverse-v5.5-t2v

wan2.7-video-edit

seedance-2-omni-reference-no-video

seedance-2-i2v-480p

suno-remix-music

seedance-2-vip-image-to-video-fast

happy-horse-1-text-to-video-1080p

veo3-image-to-video

flux-schnell

happy-horse-1-text-to-video-720p

kling-v2.1-pro-i2v

seedance-2-vip-image-to-video-1080p

seedance-2-vip-first-last-frame-1080p

kling-v3.0-4k-image-to-video

gemini-2-5-pro

wan2.2-text-to-video

vidu-v2.0-i2v

vidu-q3-turbo-text-to-video