Programmable Voice
Verb-based voice call control in Fonoster.
Programmable Voice in Fonoster allows you to control the flow of a phone call using a set of verbs. Verbs work in conjunction with the VoiceServer to create a voice application.
Overview
The following is an example of how to create an application in Fonoster using the SDK:
In the example above, we created a new voice application using the SDK. The application is configured to use Deepgram for speech-to-text and text-to-speech. The application is also configured to use the “aura-asteria-en” voice for text-to-speech.
However, so far, we have only told Fonoster the speech configuration and the location of the application represented by the endpoint property.
You also need to run a VoiceServer using your application’s logic.
The Voice Server
The VoiceServer works similarly to an Express server. It accepts requests and returns responses. The VoiceServer processes verbs and executes the desired actions.
An example of running a VoiceServer in Fonoster:
Like with Express, you can use the request object to access information about the call. For example, you can access the caller’s phone number with req.callerNumber
.
Verbs
Verbs are the building blocks of a voice application. They are used to control the flow of a phone call. Verbs are executed in the order they are called.
Here is a list of the available verbs in Fonoster:
Answer
- Accepts an incoming callHangup
- Closes the callPlay
- Takes a URL with a media file and streams the sound back to the calling partyPlayDtmf
- Takes a DTMF sequence and plays it back to the calling partySay
- Takes a text, synthesizes the text into audio, and streams back the resultGather
- Waits for DTMF or speech events and returns back the resultSGather
- Returns a stream for future DTMF and speech resultsStream
- Creates a bidirectional stream to send and receive audio from a callerDial
- Passes the call to an Agent or a Number at the PSTNRecord
- It records the voice of the calling party and saves the audio on the Storage sub-systemMute
- It tells the channel to stop sending media, effectively muting the channelUnmute
- It tells the channel to allow media flow
Run any setup code before calling the Answer verb. The Answer verb should be the first verb in your application. Similarly, the Hangup verb should be the last in your application.
Speech settings
Programmable Voice applications support a variety of speech-to-text and text-to-speech vendors. The speechToText and textToSpeech objects allow you to define the speech-to-text and text-to-speech engines to use.
You can mix and match vendors to suit your needs. For example, you can use Deepgram for speech-to-text and Google for text-to-speech. Please check the Speech Vendors section for more information on configuring speech-to-text and text-to-speech.
Exposing the VoiceServer with Ngrok
During development, you can use Ngrok to expose your VoiceServer to the internet. Ngrok creates a secure tunnel to your local machine. This allows you to test your voice application without deploying it to a server.
To use Ngrok, install it on your machine and run the following command:
Replace 50061
with the port your VoiceServer is running on. Ngrok will provide you with a URL that you can use to access your VoiceServer.
See NPM for details
For full documentation, please visit NPM.