To authenticate with the Voxygen Cloud Service, you must have an active
Voxygen Cloud account and a valid access token.
If you do not yet have a token, see the
Token generation section below for the complete
procedure.
Include your token in the Authorization header of each request as
such: Authorization: Bearer [ACCESS_TOKEN]
The following example shows how to call the Voxygen Cloud Text-to-Speech API using Python and save the generated audio file locally.
import requests
url = "https://ws.voxygen.fr/tts"
headers = {
"Authorization": "Bearer YOUR_ACCESS_TOKEN"
}
data = {
"text": "Hello world",
"voice": "Judith_NTTS",
"language": "en-GB"
}
response = requests.post(url, data=data, headers=headers)
if response.ok:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")
else:
print("Error:", response.status_code, response.text)
The Text-to-Speech (TTS) service converts the value of the
POST
ws.voxygen.fr/tts
Below are the parameters for TTS requests.
They are divided into two groups:
speech synthesis parameters that influence how the speech is
generated,
and audio output parameters that define the format of the resulting
audio file.
If optional parameters are omitted, the server assigns default values depending on
the account configuration.
| Parameter | Required | Value |
|---|---|---|
|
|
yes | Text to be synthesized, UTF-8 encoded. Mark-up formats ssml and tags are available (max = 2000 characters) |
|
|
yes |
Voice name to use for synthesis Must be one of the voices available for the account (required only if the voice is not specified through mark-up) |
|
|
yes |
Language identifier as defined by IETF BCP 47
(e.g.
fr-FR).
(required only if the language is not specified through mark-up) |
|
|
no | Set current volume to <volume>. Accepted values for <volume> are: |
|
|
no | Set speech rate to <articulation-rate>. |
|
|
no | Set pauses rate to <pause-rate>. |
|
|
no | Set timbre coefficient to <timbre>. |
|
|
no | Set pitch baseline to <pitch-height>. |
|
|
no | Set pitch range to <pitch-range>. |
|
|
no | The name of a user normaliser file to be used. The user normalisers are stored in the user's cloud account. |
|
|
no | The name of a user lexicon file to be used. The user lexicons are stored in the user's cloud account. |
| Parameter | Required | Value |
|---|---|---|
|
|
no | Integer value, the sampling frequency in Hertz between 6000Hz and 48000Hz. Default value: 24000Hz. |
|
|
no |
Defines the audio container format. The generated audio file can be WAV
or MP3. The final format depends on application/octet-stream:
|
|
|
no |
0xFFFFFFF.
By default, the service outputs the generated speech directly in the HTTP response body. The audio can thus be streamed which minimizes the latency of your application.
Alternatively, the client application can specify the following header:
Accept:application/json.
In this mode, the service returns a JSON object that includes the
URL where the audio can be retrieved. The response structure is as follows:
{
"url": "https://ws.voxygen.fr/ws/audio/abcdefg", // URL of the generated audio file
"warnings": [], // optional warning messages
"events": [] // optional event data (see 'event' parameter below)
}
An additional parameter
| Value | Events |
|---|---|
| 1 | Marker |
| 2 | Events of level 1 + Edge of a sentence, Silence, Voice change, Word separator, Punctuation |
| 3 | Events of level 2 + Syllable, Viseme |
POST
ws.voxygen.fr/tts/info
This request gives details about the account.
The response is an application/json object with the following
structure: (with
example values)
{
// List of voices
"voices": [
{ // each voice is described in an object
"name": "Helene_NTTS", // the voice name, which is used as a value for parameter voice
"language": "fr-FR", // language identifier as defined by IETF BCP 47
"gender": "female" // "male" or "female"
},
{
"name": "Judith_NTTS",
"language": "en-GB",
"gender": "female"
},
{
"name": "Quentin_NTTS",
"language": "fr-FR",
"gender": "male"
}
],
// user lexicons
"lexicons": {
"french.bin",
"english.bin",
}
}
The service invocation is done with a HTTPS POST request with a set
of required list of parameters: [
The Content-Type header must match one of the following supported body
formats:
application/x-www-form-urlencoded, application/json or
multipart/form-data.
Parameter names and values must be UTF-8 encoded.
The mark-up of the text to vocalize allows to influence dynamically the behaviour of the speech synthesis system, changing voice, regulate flow control, change volume... Two mark-up formats are available:
Once you have set your password, you can generate your token by following these steps: