Speech vendors in Fonoster provide speech-to-text and text-to-speech services for applications. Speech vendors are used to convert speech into text and text into speech.

Overview

Both AUTOPILOT and EXTERNAL applications use speech APIs. To illustrate the use of speech in Fonoster, look at the following example using the SDK:

const SDK = require("@fonoster/sdk");

const client = new SDK.Client({ accessKeyId: "WO000000000000000000000000000000" });

const appConfig = {
  name: "Custom Voice App",
  type: "EXTERNAL",
  endpoint: "welcome.demo.fonoster.local",
  speechToText: {
    productRef: "stt.deepgram",
    config: {
      languageCode: "en-US"
    }
  },
  textToSpeech: {
    productRef: "tts.deepgram",
    config: {
      voice: "aura-asteria-en"
    }
  }
}

client.loginWithApiKey("AP0eerv2g7qow3e950k7twu4rvydcunq3k", "fNc...")
  .then(async() => new SDK.Applications(client).createApplication(appConfig))
  .catch(console.error);

In the example above, we create a new voice application using the SDK. The application is configured to use Deepgram for speech-to-text and text-to-speech. The application is also configured to use the “aura-asteria-en” voice for text-to-speech.

In Fonoster, you can mix and match speech vendors to suit your needs. For example, you can use Deepgram for speech-to-text and Google for text-to-speech.

Configuring speech-to-text

The speechToText object allows you to define the speech-to-text engine to use. The speech-to-text engine is responsible for converting the caller’s speech into text.

The speechToText object has the productRef and config properties. The productRef property identifies the speech-to-text vendor you want to use. The config property is an object that contains the configuration settings for the speech-to-text engine. The configuration settings vary depending on the vendor.

Currently, only Deepgram is supported as a speech-to-text vendor, but we are working on adding more vendors.

Deepgram configuration

Deepgram is a speech-to-text vendor that provides high-quality transcription services. Deepgram supports the languageCode as well as model properties. The languageCode property is the language code of the speech you want to transcribe. The model property is the model to use for transcription and defaults to nova-2-phonecall.

The Autopilot supports the models nova-2, nova-2-phonecall, and nova-2-conversationalai.

Example of a Deepgram configuration for Spanish:

const appConfig = {
  ...
  speechToText: {
    productRef: "stt.deepgram",
    config: {
      model: "nova-2"
      languageCode: "es",
    }
  },
  ...
}

For languageCode other than en-US, you need to use the nova-2 model.

Please refer to the Deepgram documentation for more information.

Configuring text-to-speech

The textToSpeech object allows you to define the text-to-speech engine. The text-to-speech engine is responsible for converting the Autopilot’s responses into speech.

The textToSpeech object has the productRef and config properties. The productRef property identifies the text-to-speech vendor you want to use. The config property is an object that contains the configuration settings for the text-to-speech engine. The configuration settings vary depending on the vendor.

We currently support Google, Azure, Deepgram, and ElevenLabs as text-to-speech vendors.

Most vendors only support the voice property as the voice for the text-to-speech. The voice is a string that represents the voice to use. The available voices depend on the vendor.

Please visit the vendor’s documentation for more information on the available voices.

In addition to the voice property, the ElevenLabs vendor supports the model property. The model property is the model to use for text-to-speech and defaults to eleven_flash_v2_5. Please refer to the ElevenLabs documentation for additional information about the available models.

Example of a text-to-speech configuration for ElevenLabs:

const appConfig = {
  ...
  textToSpeech: {
    productRef: "tts.elevenlabs",
    config: {
      voice: "CaJslL1xziwefCeTNzHv",
      model: "eleven_flash_v2_5"
    }
  },
  ...
}

Available voices by vendor

The following links provide information on the available voices for each vendor:

If you need a non-default ElevenLabs voice, please let us know, and we will add it for you.

Was this page helpful?