Question 1

What input format does mmaudio-v2-text-to-audio accept?

Accepted Answer

It accepts a JSON object with a required `prompt` field (a string) and an optional `duration` field (an integer between 1 and 30, with a default of 8). This simple schema makes it easy to integrate into various applications.

Question 2

How does the model ensure natural-sounding audio?

Accepted Answer

The model uses advanced deep learning techniques and large-scale speech datasets to generate audio with lifelike clarity, ensuring natural tone, intonation, and emotional nuance. It is optimized for applications where high-quality voice synthesis is essential.

Question 3

Can I customize the duration of the audio output?

Accepted Answer

Yes, you can specify the duration of the audio output by providing an integer value between 1 and 30 seconds in the input JSON. This flexibility allows you to tailor the output to your specific content requirements.

Question 4

What is the cost per generation with this model?

Accepted Answer

The cost is competitively priced at $0.01 per generation, offering an affordable solution for high-quality text-to-audio conversion.