Voxygen Cloud
Service Documentation

Authentication

Prerequisites

To authenticate with the Voxygen Cloud Service, you must have an active Voxygen Cloud account and a valid access token.
If you do not yet have a token, see the Token generation section below for the complete procedure.


Authenticate to the service with a token

Include your token in the Authorization header of each request as such: Authorization: Bearer [ACCESS_TOKEN]

Quickstart

The following example shows how to call the Voxygen Cloud Text-to-Speech API using Python and save the generated audio file locally.


    import requests

    url = "https://ws.voxygen.fr/tts"
    headers = {
        "Authorization": "Bearer YOUR_ACCESS_TOKEN"
    }
    data = {
        "text": "Hello world",
        "voice": "Judith_NTTS",
        "language": "en-GB"
    }

    response = requests.post(url, data=data, headers=headers)

    if response.ok:
        with open("output.wav", "wb") as f:
            f.write(response.content)
        print("Audio saved to output.wav")
    else:
        print("Error:", response.status_code, response.text)

      

Text-to-Speech API #back to top

The Text-to-Speech (TTS) service converts the value of the text parameter into an audio file. The language of the text, and the name of the voice used to read the text, must be specified. The audio format depends on the header, coding and frequency parameters.


POST ws.voxygen.fr/tts

Below are the parameters for TTS requests.
They are divided into two groups: speech synthesis parameters that influence how the speech is generated, and audio output parameters that define the format of the resulting audio file.
If optional parameters are omitted, the server assigns default values depending on the account configuration.

Speech synthesis parameters

Parameter Required Value
text yes Text to be synthesized, UTF-8 encoded. Mark-up formats ssml and tags are available (max = 2000 characters)
voice yes Voice name to use for synthesis
Must be one of the voices available for the account
(required only if the voice is not specified through mark-up)
language yes Language identifier as defined by IETF BCP 47 (e.g. fr-FR).
(required only if the language is not specified through mark-up)
volume no Set current volume to <volume>. Accepted values for <volume> are:
articulation-rate no Set speech rate to <articulation-rate>.
pause-rate no Set pauses rate to <pause-rate>.
timbre no Set timbre coefficient to <timbre>.
pitch-height no Set pitch baseline to <pitch-height>.
pitch-range no Set pitch range to <pitch-range>.
normaliser no The name of a user normaliser file to be used. The user normalisers are stored in the user's cloud account.
lexicon no The name of a user lexicon file to be used. The user lexicons are stored in the user's cloud account.

Audio output parameters

Parameter Required Value
frequency no Integer value, the sampling frequency in Hertz between 6000Hz and 48000Hz. Default value: 24000Hz.
header no Defines the audio container format. The generated audio file can be WAV or MP3.
The final format depends on header and coding values:

application/octet-stream:
  • header=headerless
  • coding=lin or A or mu or LIN
audio/x-wav:
  • header=wav-header or wav-stream-header
  • coding=lin or A or mu or LIN
audio/mpeg:
  • header=headerless
  • coding=mp3:<bitrate>-<quality> with <bitrate> in {16, 32, 64, 96, 128, 160} and <quality> between 0 (best) and 9.
coding no

Important notes for streaming usage:

  • Select an audio format compatible with streaming: header must be headerless or wav-stream-header. For the wav-header format, the response is delayed until the full text is processed to set the correct signal length in the header. To avoid this delay, use wav-stream-header; in this case, the header length is fixed to 0xFFFFFFF.
  • Your HTTP client must handle streaming chunks as they are received to benefit from real-time playback.

Response modes

By default, the service outputs the generated speech directly in the HTTP response body. The audio can thus be streamed which minimizes the latency of your application.

Alternatively, the client application can specify the following header: Accept:application/json.
In this mode, the service returns a JSON object that includes the URL where the audio can be retrieved. The response structure is as follows:

                                    {
                                        "url": "https://ws.voxygen.fr/ws/audio/abcdefg",     // URL of the generated audio file
                                        "warnings": [],                                      // optional warning messages
                                        "events": []                                         // optional event data (see 'event' parameter below)
                                    }
                                

Events

An additional parameter event is available when requesting a JSON response. It adds event information for synchronization to the JSON response. The event parameter can take an integer value from 1 to 3, each value corresponds to a different level of information.

Value Events
1 Marker
2 Events of level 1 + Edge of a sentence, Silence, Voice change, Word separator, Punctuation
3 Events of level 2 + Syllable, Viseme

Account information API #back to top

POST ws.voxygen.fr/tts/info

This request gives details about the account.
The response is an application/json object with the following structure: (with example values)

                                    {
                                        // List of voices
                                        "voices": [
                                            {   // each voice is described in an object
                                                "name": "Helene_NTTS",         // the voice name, which is used as a value for parameter voice
                                                "language": "fr-FR",     // language identifier as defined by IETF BCP 47
                                                "gender": "female"      // "male" or "female"
                                            },
                                            { 
                                                "name": "Judith_NTTS",
                                                "language": "en-GB",
                                                "gender": "female"
                                            },
                                            {
                                                "name": "Quentin_NTTS",
                                                "language": "fr-FR",
                                                "gender": "male"
                                            }
                                        ],

                                        // user lexicons
                                        "lexicons": {
                                            "french.bin",
                                            "english.bin",
                                        }
                                    }
                                

Request format #back to top

The service invocation is done with a HTTPS POST request with a set of required list of parameters: [parameter, value].
The Content-Type header must match one of the following supported body formats: application/x-www-form-urlencoded, application/json or multipart/form-data.
Parameter names and values must be UTF-8 encoded.

Mark-up of the text #back to top

The mark-up of the text to vocalize allows to influence dynamically the behaviour of the speech synthesis system, changing voice, regulate flow control, change volume... Two mark-up formats are available:

  1. The standardized SSML format. The latter is described in the document VOX31_SSML_reference_manual
  2. The baratinoo in-house format. The latter is described in the document VOX32_Baratinoo_tags_reference_manual

Token generation#back to top

Account setup

Once you have set your password, you can generate your token by following these steps:

Generate a token

  • Go to ws.voxygen.fr/gettoken and sign in with your credentials
  • Click Generate token
  • Copy and store the generated token securely