Voxygen Cloud
Service Documentation

Authentication

Prerequisites

To authenticate with the Voxygen Cloud Service, you must have an active Voxygen Cloud account and a valid access token.
If you do not yet have a token, see the Token generation section below for the complete procedure.

Authenticate to the service with a token

Include your token in the Authorization header of each request as such: Authorization: Bearer [ACCESS_TOKEN]

Quickstart

The following example shows how to call the Voxygen Cloud Text-to-Speech API using Python and save the generated audio file locally.


    import requests

    url = "https://ws.voxygen.fr/tts"
    headers = {
        "Authorization": "Bearer YOUR_ACCESS_TOKEN"
    }
    data = {
        "text": "Hello world",
        "voice": "Judith_NTTS",
        "language": "en-GB"
    }

    response = requests.post(url, data=data, headers=headers)

    if response.ok:
        with open("output.wav", "wb") as f:
            f.write(response.content)
        print("Audio saved to output.wav")
    else:
        print("Error:", response.status_code, response.text)

Text-to-Speech API #back to top

The Text-to-Speech (TTS) service converts the value of the text parameter into an audio file. The language of the text, and the name of the voice used to read the text, must be specified. The audio format depends on the header, coding and frequency parameters.

`POST` ws.voxygen.fr/tts

Below are the parameters for TTS requests.
They are divided into two groups: speech synthesis parameters that influence how the speech is generated, and audio output parameters that define the format of the resulting audio file.
If optional parameters are omitted, the server assigns default values depending on the account configuration.

Speech synthesis parameters

Parameter	Required	Value
text	yes	Text to be synthesized, UTF-8 encoded. Mark-up formats ssml and tags are available (max = 2000 characters)
voice	yes	Voice name to use for synthesis Must be one of the voices available for the account (required only if the voice is not specified through mark-up)
language	yes	Language identifier as defined by IETF BCP 47 (e.g. `fr-FR`). (required only if the language is not specified through mark-up)
volume	no	Set current volume to <volume>. Accepted values for <volume> are:
articulation-rate	no	Set speech rate to <articulation-rate>.
pause-rate	no	Set pauses rate to <pause-rate>.
timbre	no	Set timbre coefficient to <timbre>.
pitch-height	no	Set pitch baseline to <pitch-height>.
pitch-range	no	Set pitch range to <pitch-range>.
normaliser	no	The name of a user normaliser file to be used. The user normalisers are stored in the user's cloud account.
lexicon	no	The name of a user lexicon file to be used. The user lexicons are stored in the user's cloud account.

Audio output parameters

Parameter	Required	Value
frequency	no	Integer value, the sampling frequency in Hertz between 6000Hz and 48000Hz. Default value: 24000Hz.
header	no	Defines the audio container format. The generated audio file can be WAV or MP3. The final format depends on header and coding values: application/octet-stream: header=headerless coding=lin or A or mu or LIN audio/x-wav: header=wav-header or wav-stream-header coding=lin or A or mu or LIN audio/mpeg: header=headerless coding=mp3:<bitrate>-<quality> with <bitrate> in {16, 32, 64, 96, 128, 160} and <quality> between 0 (best) and 9.
coding	no

Important notes for streaming usage:

Select an audio format compatible with streaming: header must be headerless or wav-stream-header. For the wav-header format, the response is delayed until the full text is processed to set the correct signal length in the header. To avoid this delay, use wav-stream-header; in this case, the header length is fixed to 0xFFFFFFF.
Your HTTP client must handle streaming chunks as they are received to benefit from real-time playback.

Response modes

By default, the service outputs the generated speech directly in the HTTP response body. The audio can thus be streamed which minimizes the latency of your application.

Alternatively, the client application can specify the following header: Accept:application/json.
In this mode, the service returns a JSON object that includes the URL where the audio can be retrieved. The response structure is as follows:

                                    {
                                        "url": "https://ws.voxygen.fr/ws/audio/abcdefg",     // URL of the generated audio file
                                        "warnings": [],                                      // optional warning messages
                                        "events": []                                         // optional event data (see 'event' parameter below)
                                    }

Events

An additional parameter event is available when requesting a JSON response. It adds event information for synchronization to the JSON response. The event parameter can take an integer value from 1 to 3, each value corresponds to a different level of information.

Value	Events
1	Marker
2	Events of level 1 + Edge of a sentence, Silence, Voice change, Word separator, Punctuation
3	Events of level 2 + Syllable, Viseme

Account information API #back to top

`POST` ws.voxygen.fr/tts/info

This request gives details about the account.
The response is an application/json object with the following structure: (with example values)

                                    {
                                        // List of voices
                                        "voices": [
                                            {   // each voice is described in an object
                                                "name": "Helene_NTTS",         // the voice name, which is used as a value for parameter voice
                                                "language": "fr-FR",     // language identifier as defined by IETF BCP 47
                                                "gender": "female"      // "male" or "female"
                                            },
                                            { 
                                                "name": "Judith_NTTS",
                                                "language": "en-GB",
                                                "gender": "female"
                                            },
                                            {
                                                "name": "Quentin_NTTS",
                                                "language": "fr-FR",
                                                "gender": "male"
                                            }
                                        ],

                                        // user lexicons
                                        "lexicons": {
                                            "french.bin",
                                            "english.bin",
                                        }
                                    }

Request format #back to top

The service invocation is done with a HTTPS POST request with a set of required list of parameters: [parameter, value].
The Content-Type header must match one of the following supported body formats: application/x-www-form-urlencoded, application/json or multipart/form-data.
Parameter names and values must be UTF-8 encoded.

Mark-up of the text #back to top

The mark-up of the text to vocalize allows to influence dynamically the behaviour of the speech synthesis system, changing voice, regulate flow control, change volume... Two mark-up formats are available:

The standardized SSML format. The latter is described in the document VOX31_SSML_reference_manual
The baratinoo in-house format. The latter is described in the document VOX32_Baratinoo_tags_reference_manual

Token generation#back to top

Account setup

Go to ws.voxygen.fr/gettoken/password
Enter the username provided by Voxygen and you will receive an email to set your password

Once you have set your password, you can generate your token by following these steps:

Generate a token

Go to ws.voxygen.fr/gettoken and sign in with your credentials
Click Generate token
Copy and store the generated token securely

Voxygen CloudService Documentation