The Autopilot is currently in overview mode and the KnowledgeBase features have been disabled.

Fonoster’s Autopilot is a component within the platform that allows you to create powerful conversational experiences. It is built on top of Fonoster Programmable Voice and uses the latest advances in Large Language Models (LLMs) to provide a natural and engaging experience.

Overview

The following is an example of creating an Autopilot application using the SDK.

First, add the following content to a file named appConfig.yaml:

appConfig.yaml
name: "Awesome Autopilot"
type: "AUTOPILOT"
speechToText:
  productRef: "stt.deepgram"
  config:
    model: "nova-3"
    languageCode: "en-US"
textToSpeech:
  productRef: "tts.deepgram"
  config:
    voice: "aura-asteria-en"
intelligence:
  productRef: "llm.groq"
  config:
    conversationSettings:
      firstMessage: "Hello, I'm your AI assitant."
      systemPrompt: |
        You are a Customer Service Representative. You are here to help the caller with their needs.
      goodbyeMessage: "Thank you so much, bye!"
      systemErrorMessage: "I'm sorry, I didn't understand that. Can you please repeat it?"
      idleOptions:
        message: "Are you still there?"
    languageModel:
      provider: "groq"
      model: "llama-3.3-70b-versatile"
      maxTokens: 240
      temperature: 0.4

Then, create the application as follows:

fonoster applications:create --from-file appConfig.yaml

Similarly, to update the application, you can use the applications:update command with the from-file flag.

General configuration

The Autopilot configuration is divided into a general section and three sub-sections: speechToText, textToSpeech, and intelligence.

The general section contains name, type, and endpoint properties.

The name property is the name of the Autopilot application. The type property is the type of the application, which should always be set to AUTOPILOT. The endpoint is an optional property allowing you to specify the endpoint for self-hosted Autopilots.

Speech settings

Autopilot applications support a variety of speech-to-text and text-to-speech vendors. The speechToText and textToSpeech objects allow you to define the speech-to-text and text-to-speech engines to use.

You can mix and match vendors to suit your needs. For example, you can use Deepgram for speech-to-text and Google for text-to-speech. Please check the Speech Vendors section for more information on configuring speech-to-text and text-to-speech.

Conversational settings

The conversationSettings object allows you to define the Autopilot’s conversational behavior. The conversation settings are independent of the language model used.

The following is a list of the supported settings:

SettingDescriptionDefault Value
firstMessageThe first message the Autopilot will say when the conversation starts
systemPromptA prompt that describes the behavior of the Autopilot and sets the context of the conversation
systemErrorMessageThe message the Autopilot will say when an error occurs
maxSessionDurationMaximum length of the session (in milliseconds) before it is automatically terminated, regardless of activity1800000 (30 minutes)
maxSpeechWaitTimeoutSpecifies the maximum amount of time (in milliseconds) to wait for the user to begin speaking before sending the captured audio for processing0
initialDtmfA DTMF to play prior to starting the conversation
allowUserBargeInDetermines whether the user can interrupt the voice agent while it is speakingtrue
transferOptionsThe options to transfer the call to a live agent
transferOptions.phoneNumberThe phone number to transfer the call to
transferOptions.messageThe message to play before transferring the call
transferOptions.timeoutTime to wait for a transfer answer before the transfer attempt is considered failed30000
idleOptionsThe options to handle idle time during the conversation
idleOptions.messageThe message to play after the idle time is reached
idleOptions.timeoutDuration of user inactivity (in milliseconds) before the system triggers an idle event30000
idleOptions.maxTimeoutCountThe maximum number of times the idle message will be played before hanging up the call2
vadThe voice activity detection settings
vad.activationThresholdSee VAD section0.4
vad.deactivationThresholdSee VAD section0.25
vad.debounceFramesSee VAD section4

A few noteworthy settings include the maxSpeechWaitTimeout, initialDtmf, idleOptions, and vad.

Max Speech Wait Timeout

The maxSpeechWaitTimeout property allows you to specify the maximum time in milliseconds to wait for the caller before returning the speech-to-text result. If the caller does not speak within the specified time, the speech-to-text engine will return the result.

A value that is too low may result in the speech-to-text engine returning the result before the caller finishes speaking. A value that is too high may result in the speech-to-text engine waiting too long for the caller to speak.

Initial DTMF

Sometimes, users will use call forwarding to reach the number in Fonoster. Some telephony service providers require a Dual-tone multi-frequency (DTMF) to be played before connecting the call. The initialDtmf property allows you to specify a DTMF to play when the session starts.

Voice Activity Detection (VAD)

The vad object allows you to configure the voice activity detection settings. Voice activity detection is used to detect when the caller is speaking and when they are not speaking.

The vad object has the activationThreshold, deactivationThreshold, debounceFrames properties. The actionThreshold property is the activation threshold for voice activity detection. The deactivationThreshold property is the deactivation threshold for voice activity detection. The debounceFrames property is the number of frames to debounce the voice activity detection.

A lower activation threshold will make the detection more sensitive to the caller’s speech. A higher activation threshold will make detecting voice activity less sensitive to the caller’s speech.

A lower deactivation threshold will result in more aggressive voice activity detection deactivation. A higher deactivation threshold will result in less aggressive voice activity detection deactivation.

The debounceFrames parameter introduces a delay mechanism that ensures that transitions between “speech” and “non-speech” states are stable and not too sensitive to small fluctuations in the input audio signal. Here’s how it works:

By requiring multiple consecutive frames (debounceFrames) to confirm speech or non-speech, the system filters out short bursts of noise or brief gaps in speech that might otherwise cause erratic state changes.

Langue model configuration

The languageModel object allows you to define the language model the Autopilot uses. The language model is responsible for generating responses to the user’s input.

The following is a list of the supported settings:

SettingDescription
providerModel provider
modelThe model to use. The available models depend on the provider
maxTokensThe maximum number of tokens the language model can generate in a single response
temperatureThe randomness of the language model. A higher temperature will result in more random responses
knowledgeBaseA list of knowledge bases to use for the language model
toolsA list of tools to use for the language model

LLM providers and models

The Autopilot supports multiple language model providers. The following is a list of the supported providers:

ProviderDescriptionSupported models
OpenAIOpenAI provides various GPT models for conversational AIgpt-4o, gpt-4o-mini, gpt-3.5-turbo, gpt-4-turbo
GroqGroq offers high-performance AI models optimized for speedllama-3.3-70b-versatile
GoogleGoogle offers various LLM models for conversational AIgemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-pro-exp-02-05
AnthropicAnthropic offers various LLM models for conversational AIclaude-3-5-haiku-latest, claude-3-7-sonnet-latest

We have noticed that Groq models, particularly llama-3.3-70b-versatile, often require greater prompting specificity for effective tool usage. Also, Google’s gemini-2.0-flash-lite does not support tool calling. We will share best practices to ensure more consistent behavior as we gain more insights

Knowledge bases

Coming soon…

Tools

Fonoster’s Autopilot allows you to use tools to enhance the conversational experience. Tools are used to perform specific actions during the conversation.

Built-in tools

The following is a list of built-in tools available for an agent:

ToolDescription
hangupA tool to end the conversation
transferA tool to transfer the call to a live agent
holdA tool to put the call on hold (Coming soon)

Custom tools

You can add custom tools under intelligence.config.languageModel.tools, which is an array where each tool is defined as an object. These tools enable your assistant to interact with external services, APIs, or execute specific actions.

Each tool must follow the tool schema, for consistency and compatibility.

The following example demonstrates how to add a custom tool that fetches available appointment times for a specific date:

name: getAvailableTimes
description: Get available appointment times for a specific date.
requestStartMessage: "I'm looking for available appointment times for the date you provided."
parameters:
  type: object
  properties:
    date:
      type: string
      format: date // only 'enum' and 'date-time' are supported
  required:
    - date
operation:
  method: get
  url: https://api.example.com/appointment-times
  headers:
    x-api-key: your-api-key

The response from your endpoint must be a JSON object containing a result property. For example: { "result": "We have open slots for Thursday and Friday." }

Key Components of a Tool Definition:

  • name: A unique identifier for the tool
  • description: A brief explanation of what the tool does
  • requestStartMessage: The message sent when the tool is triggered
  • parameters: Defines the expected input structure in accordance with the JSON Schema standard, which is also required for OpenAI compatible tool calling
    • type: Defines the structure of the input (typically object)
    • properties: Specifies the fields expected in the input
    • required: Lists the fields that must be provided
  • operation:
    • method: The HTTP method (get and post are supported)
    • url: The endpoint to send the request to
    • headers: Any necessary headers, such as authentication keys

For additional details, refer to the tool schema documentation.

Use operation.method “post” for POST requests. If you want the Autopilot to wait for POST requests to complete, set operation.waitForResponse to true. For “get” requests, the Autopilot will wait for the response by default.

Autopilot’s Test Cases

Test cases are an experimental feature and the behavior might change in the future.

The Autopilot supports automated testing through test cases defined in the configuration. Test cases allow you to verify the behavior of your Autopilot before deploying it to production.

The following is an example of creating a test case for Fonoster Autopilot:

testCases:
  evalsLanguageModel:
    provider: openai
    model: gpt-4o
    apiKey: sk-proj-REDACTED
  scenarios:
    - ref: test-case-1
      description: Test Case 1 Description
      telephonyContext:
        callDirection: FROM_PSTN
        ingressNumber: '+1234567890'
        callerNumber: '+1234567890'
      conversation:
        - userInput: 'Hi, can you tell me what''s in the menu?'
          expected:
            text:
              type: similar
              response: |
                We have a variety of sandwiches, salads, and drinks. Anything
                in particular you're looking for?
        - userInput: 'Nevermind, I want to speak to a human'
          expected:
            text:
              type: similar
              response: |
                I'll transfer you to a human representative. Please hold while I
                connect you.
            tools:
              - tool: transfer
                parameters: {}

Evaluation Language Model

The evalsLanguageModel section defines the model used to evaluate test cases:

SettingDescription
providerEvaluation provider
modelThe OpenAI model to use for evaluations
apiKeyThe API key for the evaluation model

The evaluation model is separate from the model used in actual conversations. This separation allows for consistent evaluation results regardless of the production model being used.

Test Scenarios

Each test scenario represents a complete conversation flow. The scenario includes:

  • ref: A unique identifier for the test case
  • description: A brief description of what the test case verifies
  • telephonyContext: Emulates the context of a real phone call with the following properties:
    • callDirection: The direction of the call (e.g., “FROM_PSTN”)
    • ingressNumber: The number being called
    • callerNumber: The number making the call

This information is available to the AI model to help it understand the context of the call and might be use in your prompts.

Conversation Turns

Each scenario contains a series of conversation turns. A turn represents a single interaction between the user and the Autopilot, consisting of:

ComponentDescription
userInputThe text representing what the user says
expectedThe expected response from the Autopilot
expected.textThe expected text response from the Autopilot
expected.text.type
expected.text.responseThe actual text response from the Autopilot
expected.toolsThe expected tools to be used in the response
expected.tools.toolThe name of the tool to be used
expected.tools.parameters

Use type: "similar" for text responses to allow for natural language variations in the Autopilot’s responses while maintaining the same semantic meaning.

The expected object can validate:

  • Text responses via the text property:

    text:
      type: "similar"
      response: "Expected response..."
    
  • Tool usage via the tools property:

    tools:
      - tool: "toolName"
        parameters:
          param1: "value1"
          param2: "valid-date" # Special keyword to test for a valid date
    

Was this page helpful?