What is an AI Voice Agent?

An AI Voice Agent is a software application that interacts with users through voice commands and responses, utilizing technologies like STT, LLM, and TTS.

Why are AI Voice Agents important in the insurance industry?

They streamline customer service by handling tasks like answering policy questions and assisting with claims, enhancing customer satisfaction and reducing workload on human agents.

What are the core components of an AI Voice Agent?

The core components include Speech-to-Text (STT), Language Learning Model (LLM), and Text-to-Speech (TTS).

How do I generate a VideoSDK meeting ID?

Use the `curl` command with your VideoSDK API key to generate a meeting ID.

What should I do if I encounter API key errors?

Ensure all API keys are correctly configured in your `.env` file.

Build an AI Voice Agent for Insurance

Step-by-step guide to building an AI Voice Agent for the insurance industry using VideoSDK.

Introduction to AI Voice Agents in the Insurance Industry

AI Voice Agents are sophisticated software applications designed to interact with users through voice commands and responses. These agents leverage technologies such as Speech-to-Text (STT), Language Learning Models (LLM), and Text-to-Speech (TTS) to process user inputs, generate appropriate responses, and deliver them in a human-like voice.

In the insurance industry, AI Voice Agents play a pivotal role by streamlining customer service operations. They can handle a variety of tasks, including answering policy-related questions, assisting with claims processes, and providing general insurance advice. This not only enhances customer satisfaction by providing instant support but also reduces the workload on human agents.

Core Components of a Voice Agent

STT (Speech-to-Text): Converts spoken language into text.
LLM (Language Learning Model): Processes the text to understand and generate responses.
TTS (Text-to-Speech): Converts the response text back into spoken language.

For a comprehensive understanding of these components, refer to the

AI voice Agent core components overview

What You'll Build in This Tutorial

In this tutorial, you will build a fully functional AI Voice Agent tailored for the insurance industry using the VideoSDK framework. This agent will be capable of understanding and responding to insurance-related queries, providing users with a seamless interaction experience.

Architecture and Core Concepts

High-Level Architecture Overview

The AI Voice Agent architecture involves a seamless flow of data from user speech to agent response. The process begins with capturing the user's voice input, which is then converted to text using STT. The text is processed by an LLM to generate a suitable response, which is finally converted back to speech using TTS.

1sequenceDiagram
2    participant User
3    participant Agent
4    participant STT
5    participant LLM
6    participant TTS
7    User->>Agent: Speak
8    Agent->>STT: Convert Speech to Text
9    STT-->>Agent: Text
10    Agent->>LLM: Process Text
11    LLM-->>Agent: Response Text
12    Agent->>TTS: Convert Text to Speech
13    TTS-->>Agent: Speech
14    Agent->>User: Respond
15

Understanding Key Concepts in the VideoSDK Framework

Agent: The core class representing your bot, responsible for managing interactions.
CascadingPipeline: Manages the flow of audio processing through STT, LLM, and TTS. Learn more about the
cascading pipeline in AI voice Agents
.
VAD & TurnDetector: Tools that help the agent determine when to listen and when to speak. Discover more about the
Turn detector for AI voice Agents
.

Setting Up the Development Environment

Prerequisites

Before you begin, ensure you have Python 3.11+ installed and a VideoSDK account. You can sign up at app.videosdk.live.

Step 1: Create a Virtual Environment

To avoid conflicts with other projects, create a virtual environment:

1python -m venv venv
2source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`
3

Step 2: Install Required Packages

Install the necessary packages using pip:

1pip install videosdk-agents videosdk-plugins-silero videosdk-plugins-turn-detector videosdk-plugins-deepgram videosdk-plugins-openai videosdk-plugins-elevenlabs
2

Step 3: Configure API Keys in a `.env` File

Create a .env file in your project directory and add your API keys:

1VIDEOSDK_API_KEY=your_videosdk_api_key
2DEEPGRAM_API_KEY=your_deepgram_api_key
3OPENAI_API_KEY=your_openai_api_key
4ELEVENLABS_API_KEY=your_elevenlabs_api_key
5

Building the AI Voice Agent: A Step-by-Step Guide

Let's start by presenting the complete code for the AI Voice Agent, which we'll break down in the following subsections.

1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Agent specialized in the insurance industry, designed to assist users with their insurance-related inquiries. Your persona is that of a knowledgeable and friendly insurance advisor. Your primary capabilities include answering questions about different types of insurance policies, explaining coverage details, assisting with claims processes, and providing general advice on selecting suitable insurance plans. You can also guide users on how to contact human agents for more complex issues. However, you must adhere to certain constraints: you are not a licensed insurance agent, so you cannot provide personalized financial advice or make binding commitments. Always remind users to consult with a licensed insurance professional for specific advice and decisions. Your responses should be clear, concise, and informative, ensuring users feel supported and informed."
14
15class MyVoiceAgent(Agent):
16    def __init__(self):
17        super().__init__(instructions=agent_instructions)
18    async def on_enter(self): await self.session.say("Hello! How can I help?")
19    async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22    # Create agent and conversation flow
23    agent = MyVoiceAgent()
24    conversation_flow = ConversationFlow(agent)
25
26    # Create pipeline
27    pipeline = CascadingPipeline(
28        stt=DeepgramSTT(model="nova-2", language="en"),
29        llm=OpenAILLM(model="gpt-4o"),
30        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31        vad=SileroVAD(threshold=0.35),
32        turn_detector=TurnDetector(threshold=0.8)
33    )
34
35    session = AgentSession(
36        agent=agent,
37        pipeline=pipeline,
38        conversation_flow=conversation_flow
39    )
40
41    try:
42        await context.connect()
43        await session.start()
44        # Keep the session running until manually terminated
45        await asyncio.Event().wait()
46    finally:
47        # Clean up resources when done
48        await session.close()
49        await context.shutdown()
50
51def make_context() -> JobContext:
52    room_options = RoomOptions(
53    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
54        name="VideoSDK Cascaded Agent",
55        playground=True
56    )
57
58    return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62    job.start()
63

Step 4.1: Generating a VideoSDK Meeting ID

To generate a meeting ID, use the following curl command. This ID will be used by your agent to join a session.

1curl -X POST "https://api.videosdk.live/v1/rooms" \\
2-H "Authorization: Bearer YOUR_VIDEOSDK_API_KEY"
3

Step 4.2: Creating the Custom Agent Class

The MyVoiceAgent class is where you define the behavior of your AI Voice Agent. It inherits from the Agent class and includes custom instructions tailored for the insurance industry.

1class MyVoiceAgent(Agent):
2    def __init__(self):
3        super().__init__(instructions=agent_instructions)
4    async def on_enter(self):
5        await self.session.say("Hello! How can I help?")
6    async def on_exit(self):
7        await self.session.say("Goodbye!")
8

Step 4.3: Defining the Core Pipeline

The CascadingPipeline is crucial as it defines how audio is processed. It includes components for STT, LLM, TTS, VAD, and Turn Detection. For more details, refer to the

Voice Agent Quick Start Guide

1pipeline = CascadingPipeline(
2    stt=DeepgramSTT(model="nova-2", language="en"),
3    llm=OpenAILLM(model="gpt-4o"),
4    tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5    vad=SileroVAD(threshold=0.35),
6    turn_detector=TurnDetector(threshold=0.8)
7)
8

Step 4.4: Managing the Session and Startup Logic

The start_session function initializes the agent session and manages the lifecycle of the interaction.

1async def start_session(context: JobContext):
2    agent = MyVoiceAgent()
3    conversation_flow = ConversationFlow(agent)
4    pipeline = CascadingPipeline(...)
5    session = AgentSession(agent=agent, pipeline=pipeline, conversation_flow=conversation_flow)
6    try:
7        await context.connect()
8        await session.start()
9        await asyncio.Event().wait()
10    finally:
11        await session.close()
12        await context.shutdown()
13

The make_context function sets up the room options, and the __main__ block starts the job.

1def make_context() -> JobContext:
2    room_options = RoomOptions(
3        name="VideoSDK Cascaded Agent",
4        playground=True
5    )
6    return JobContext(room_options=room_options)
7
8if __name__ == "__main__":
9    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
10    job.start()
11

Running and Testing the Agent

Step 5.1: Running the Python Script

Run your script using the command:

1python main.py
2

Step 5.2: Interacting with the Agent in the Playground

Once the script is running, you will receive a playground link in the console. Open this link in a browser to interact with your AI Voice Agent.

Advanced Features and Customizations

Extending Functionality with Custom Tools

You can extend the agent's functionality by integrating custom tools using the function_tool feature, allowing for more specialized tasks.

Exploring Other Plugins

Consider experimenting with other plugins for STT, LLM, and TTS to enhance the agent's capabilities and tailor it to specific needs. For instance, explore the

ElevenLabs TTS Plugin for voice agent

and the

Deepgram STT Plugin for voice agent

Troubleshooting Common Issues

API Key and Authentication Errors

Ensure that all API keys are correctly configured in your .env file.

Audio Input/Output Problems

Check your microphone and speaker settings to ensure proper audio input and output.

Dependency and Version Conflicts

Make sure all dependencies are up-to-date and compatible with your Python version.

Conclusion

Summary of What You've Built

In this tutorial, you have built a fully functional AI Voice Agent for the insurance industry, capable of handling various insurance-related queries. For a quick setup, refer to the