> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fonoster.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech vendors

> Speech-to-text and text-to-speech vendors in Fonoster.

Speech vendors in Fonoster provide speech-to-text and text-to-speech services for applications. Speech vendors are used to convert speech into text and text into speech.

## Overview

Both *AUTOPILOT* and *EXTERNAL* applications use speech APIs. To illustrate the use of speech in Fonoster, look at the following example using the SDK:

```javascript theme={"system"}
const SDK = require("@fonoster/sdk");

const client = new SDK.Client({ accessKeyId: "WO000000000000000000000000000000" });

const appConfig = {
  name: "Custom Voice App",
  type: "EXTERNAL",
  endpoint: "welcome.demo.fonoster.local",
  speechToText: {
    productRef: "stt.deepgram",
    config: {
      languageCode: "en-US"
    }
  },
  textToSpeech: {
    productRef: "tts.deepgram",
    config: {
      voice: "aura-asteria-en"
    }
  }
}

client.loginWithApiKey("AP0eerv2g7qow3e950k7twu4rvydcunq3k", "fNc...")
  .then(async() => new SDK.Applications(client).createApplication(appConfig))
  .catch(console.error);
```

In the example above, we create a new voice application using the SDK. The application is configured to use Deepgram for speech-to-text and text-to-speech. The application is also configured to use the "aura-asteria-en" voice for text-to-speech.

In Fonoster, you can mix and match speech vendors to suit your needs. For example, you can use Deepgram for speech-to-text and Google for text-to-speech.

## Configuring speech-to-text

The *speechToText* object allows you to define the speech-to-text engine to use. The speech-to-text engine is responsible for converting the caller's speech into text.

The *speechToText* object has the *productRef* and *config* properties. The *productRef* property identifies the speech-to-text vendor you want to use. The *config* property is an object that contains the configuration settings for the speech-to-text engine. The configuration settings vary depending on the vendor.

Currently, only *Deepgram* is supported as a speech-to-text vendor, but we are working on adding more vendors.

### Deepgram configuration

*Deepgram* is a speech-to-text vendor that provides high-quality transcription services. Deepgram supports the *languageCode* as well as *model* properties. The *languageCode* property is the language code of the speech you want to transcribe. The *model* property is the model to use for transcription and defaults to `nova-2-phonecall`.

The Autopilot supports the models `nova-2`, `nova-2-phonecall`, and `nova-2-conversationalai`, `nova-3`.

Example of a Deepgram configuration for Spanish:

```javascript theme={"system"}
const appConfig = {
  ...
  speechToText: {
    productRef: "stt.deepgram",
    config: {
      model: "nova-2"
      languageCode: "es",
    }
  },
  ...
}
```

<Note>
  For *languageCode* other than `en-US`, you need to use the `nova-2` model.

  Please refer to the [Deepgram documentation](https://developers.deepgram.com/docs/models-languages-overview) for more information.
</Note>

## Configuring text-to-speech

The *textToSpeech* object allows you to define the text-to-speech engine. The text-to-speech engine is responsible for converting the Autopilot's responses into speech.

The *textToSpeech* object has the *productRef* and *config* properties. The *productRef* property identifies the text-to-speech vendor you want to use. The *config* property is an object that contains the configuration settings for the text-to-speech engine. The configuration settings vary depending on the vendor.

We currently support *Google*, *Azure*, *Deepgram*, and *ElevenLabs* as text-to-speech vendors.

Most vendors only support the *voice* property as the voice for the text-to-speech. The voice is a string that represents the voice to use. The available voices depend on the vendor.

Please visit the vendor's documentation for more information on the available voices.

<Tip>
  In addition to the *voice* property, the *ElevenLabs* vendor supports the *model* property. The *model* property is the model to use for text-to-speech and defaults to `eleven_flash_v2_5`.
  Please refer to the [ElevenLabs documentation](https://elevenlabs.io/docs/developer-guides/models) for additional information about the available models.
</Tip>

Example of a text-to-speech configuration for *ElevenLabs*:

```javascript theme={"system"}
const appConfig = {
  ...
  textToSpeech: {
    productRef: "tts.elevenlabs",
    config: {
      voice: "CaJslL1xziwefCeTNzHv",
      model: "eleven_flash_v2_5"
    }
  },
  ...
}
```

### Available voices by vendor

The following links provide information on the available voices for each vendor:

* [Deepgram](https://developers.deepgram.com/docs/tts-models)
* [ElevenLabs](https://elevenlabs.io/docs/product/voices/default-voices)
* [Google](https://cloud.google.com/text-to-speech/docs/voices)
* [Azure](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support)

<Warning>
  If you need a non-default *ElevenLabs* voice, please let us know, and we will add it for you.
</Warning>
