Bidirectional streams are the foundation for modern Voice applications. They allow for more granular control over the audio stream, manipulating it in real time. In Fonoster, bidirectional streams are implemented on the Stream verb.

The Stream verb

The Stream verb lets you create a bidirectional stream to send and receive audio from a caller. Unlike other verbs in Fonoster, the Stream verb is asynchronous, meaning it does not block the execution of subsequent verbs in the script.

A common use case for the Stream verb is real-time transcription. In such scenarios, the Stream verb sends audio to a transcription service and receives the transcribed text.

Example using the Streams verb

Please see the highlighted lines for the most critical parts of the code.

voice-server.js
const VoiceServer = require("@fonoster/voice").default;
const { createClient, LiveTranscriptionEvents } = require("@deepgram/sdk");

const deepgram = createClient("YOUR_DEEPGRAM_API_KEY");

async function handler(request, response) {
  await response.answer();

  const stream = await response.stream({ direction: "OUT" });

  const connection = deepgram.listen.live({
    model: "nova-2-phonecall",
    language: "en-US",
    encoding: "linear16",
    sample_rate: 16000
  });

  connection.on(LiveTranscriptionEvents.Transcript, (data) => {
    const transcript = data?.channel?.alternatives[0]?.transcript;
    if (transcript && data.speech_final) {
      console.log("Transcription:", transcript);
    }
  });

  connection.on(LiveTranscriptionEvents.Open, () => {
    stream.onPayload(payload => connection.send(Buffer.from(payload.data)));
  });

  connection.on(LiveTranscriptionEvents.Error, (err) => {
    console.error("Deepgram error:", err);
    connection.destroy();
  });

  response.say("Hello, please speak now.");
  
  // TODO: Add the rest of the logic here, including hangup
}

new VoiceServer().listen(handler);

In the example above, we created a VoiceServer that listens for incoming calls. Upon receiving a call, the server answers and initializes a new stream configured to receive audio from the caller.

Next, a connection is established to a transcription service (Deepgram in this case). The transcription service processes the audio payloads and returns the transcribed text.

The Stream verb doesn’t yet support the “IN”, towards the caller, direction. However, you can use the Play and Say verbs to send audio to the caller.

Was this page helpful?