Clipform

Generate text-to-speech audio

Generate narration audio from text using Edge TTS. Returns a public URL to the audio file and word-level caption timing data. Text is moderated before generation.

POST
/creative/tts

Authorization

Authorization
Required
Bearer <token>

API key (cf_*) passed as Bearer token

In: header

Request Body

application/jsonRequired
text
Required
string

Narration text

Minimum length: 1Maximum length: 5000
voicestring

Voice selection

Default: "ryan"Value in: "ryan" | "sonia" | "andrew" | "ava" | "guy"
ratenumber

Speech rate multiplier

curl -X POST "https://api.clipform.io/v1/creative/tts" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "string",
    "voice": "ryan",
    "rate": 0
  }'

Audio generated

{
  "audio_url": "http://example.com",
  "storage_path": "string",
  "captions": [
    {
      "word": "string",
      "start": 0,
      "end": 0
    }
  ],
  "voice": "string"
}