Programmable Voice in Fonoster allows you to control the flow of a phone call using a set of verbs. Verbs work in conjunction with the VoiceServer to create a voice application.

Overview

The following is an example of how to create an application in Fonoster using the SDK:

create-app.js
const SDK = require("@fonoster/sdk");

const client = new SDK.Client({ accessKeyId: "WO000000000000000000000000000000" });

const appConfig = {
  name: "Custom Voice App",
  type: "EXTERNAL",
  endpoint: "welcome.demo.fonoster.local", // Demo application
  speechToText: {
    productRef: "stt.deepgram",
    config: {
      languageCode: "en-US"
    }
  },
  textToSpeech: {
    productRef: "tts.deepgram",
    config: {
      voice: "aura-asteria-en"
    }
  }
}

client.loginWithApiKey("AP0eerv2g7qow3e950k7twu4rvydcunq3k", "fNc...")
  .then(async() => new SDK.Applications(client).createApplication(appConfig))
  .catch(console.error);

In the example above, we created a new voice application using the SDK. The application is configured to use Deepgram for speech-to-text and text-to-speech. The application is also configured to use the “aura-asteria-en” voice for text-to-speech.

However, so far, we have only told Fonoster the speech configuration and the location of the application represented by the endpoint property.

You also need to run a VoiceServer using your application’s logic.

The Voice Server

The VoiceServer works similarly to an Express server. It accepts requests and returns responses. The VoiceServer processes verbs and executes the desired actions.

An example of running a VoiceServer in Fonoster:

voice-server.js
const VoiceServer = require("@fonoster/voice").default;

new VoiceServer().listen(async (req, response) => {
  // Verbs go here
  await response.answer();
  await response.say("Hello World!");
  await response.hangup();
});

Like with Express, you can use the request object to access information about the call. For example, you can access the caller’s phone number with req.callerNumber.

Verbs

Verbs are the building blocks of a voice application. They are used to control the flow of a phone call. Verbs are executed in the order they are called.

Here is a list of the available verbs in Fonoster:

  • Answer - Accepts an incoming call
  • Hangup - Closes the call
  • Play - Takes a URL with a media file and streams the sound back to the calling party
  • PlayDtmf - Takes a DTMF sequence and plays it back to the calling party
  • Say - Takes a text, synthesizes the text into audio, and streams back the result
  • Gather - Waits for DTMF or speech events and returns back the result
  • SGather - Returns a stream for future DTMF and speech results
  • Stream - Creates a bidirectional stream to send and receive audio from a caller
  • Dial - Passes the call to an Agent or a Number at the PSTN
  • Record - It records the voice of the calling party and saves the audio on the Storage sub-system
  • Mute - It tells the channel to stop sending media, effectively muting the channel
  • Unmute - It tells the channel to allow media flow

Run any setup code before calling the Answer verb. The Answer verb should be the first verb in your application. Similarly, the Hangup verb should be the last in your application.

Speech settings

Programmable Voice applications support a variety of speech-to-text and text-to-speech vendors. The speechToText and textToSpeech objects allow you to define the speech-to-text and text-to-speech engines to use.

You can mix and match vendors to suit your needs. For example, you can use Deepgram for speech-to-text and Google for text-to-speech. Please check the Speech Vendors section for more information on configuring speech-to-text and text-to-speech.

Exposing the VoiceServer with Ngrok

During development, you can use Ngrok to expose your VoiceServer to the internet. Ngrok creates a secure tunnel to your local machine. This allows you to test your voice application without deploying it to a server.

To use Ngrok, install it on your machine and run the following command:

ngrok tcp 50061

Replace 50061 with the port your VoiceServer is running on. Ngrok will provide you with a URL that you can use to access your VoiceServer.

See NPM for details

For full documentation, please visit NPM.