Build an AI Voice Agent for Insurance with VideoSDK

Comprehensive tutorial to build and deploy an AI voice agent for insurance using VideoSDK, Python, and best-in-class plugins. Includes full code, setup, and testing.

Introduction to AI Voice Agents in ai voice agent for insurance

AI voice agents are transforming the way insurance companies interact with their customers. These intelligent virtual assistants use advanced speech and language technologies to understand, process, and respond to user queries in real time. In this tutorial, we'll explore how to build a robust AI voice agent tailored for the insurance industry using the VideoSDK AI Agents framework.

What is an AI Voice Agent?

An AI voice agent is a software application that can engage in spoken conversations with users. It leverages speech-to-text (STT) to convert spoken words into text, a large language model (LLM) to understand and generate responses, and text-to-speech (TTS) to reply in natural-sounding audio.

Why are they important for the ai voice agent for insurance industry?

In the insurance sector, voice agents can:
  • Answer questions about policy coverage, claims, and quotes
  • Guide customers through the claims process
  • Provide 24/7 support and reduce call center workload
  • Improve customer satisfaction with quick, accurate responses

Core Components of a Voice Agent

  • STT (Speech-to-Text): Transcribes user speech into text.
  • LLM (Large Language Model): Understands intent and generates responses.
  • TTS (Text-to-Speech): Converts the agent's reply back into audio.
To get a comprehensive understanding of these elements, refer to the

AI voice Agent core components overview

in the VideoSDK documentation.

What You'll Build in This Tutorial

By the end of this guide, you'll have a fully functional AI voice agent for insurance that you can test in the VideoSDK playground. The agent will be able to answer insurance-related questions, guide users through claims, and provide general support.

Architecture and Core Concepts

High-Level Architecture Overview

Let's look at how the voice agent processes a conversation:
  1. User speaks into their microphone.
  2. STT transcribes the audio to text.
  3. LLM analyzes the text and generates a response.
  4. TTS converts the response to audio.
  5. Agent delivers the reply back to the user.

Understanding Key Concepts in the VideoSDK Framework

  • Agent: The main class that defines your AI assistant's behavior and persona.
  • CascadingPipeline: Manages the flow of audio and text through STT, LLM, TTS, VAD, and turn detection. For more details on how this works, check out the

    Cascading pipeline in AI voice Agents

    .
  • VAD (Voice Activity Detection): Detects when the user is speaking.
  • TurnDetector: Determines when it's the agent's turn to respond.
These components work together to create a seamless conversational experience.

Setting Up the Development Environment

Prerequisites

Before you begin, ensure you have:
  • Python 3.11+ installed
  • A VideoSDK account (sign up at app.videosdk.live)
  • API keys for Deepgram (STT), ElevenLabs (TTS), and OpenAI (LLM)
If you're just getting started, the

Voice Agent Quick Start Guide

provides step-by-step instructions for setup.

Step 1: Create a Virtual Environment

Open your terminal and run:
1python3.11 -m venv venv
2source venv/bin/activate  # On Windows: venv\Scripts\activate
3

Step 2: Install Required Packages

Install the necessary dependencies:
1pip install videosdk-agents videosdk-plugins-openai videosdk-plugins-deepgram videosdk-plugins-elevenlabs videosdk-plugins-silero videosdk-plugins-turn-detector
2

Step 3: Configure API Keys in a .env File

Create a .env file in your project directory and add your API keys:
1VIDEOSDK_API_KEY=your_videosdk_api_key
2OPENAI_API_KEY=your_openai_api_key
3DEEPGRAM_API_KEY=your_deepgram_api_key
4ELEVENLABS_API_KEY=your_elevenlabs_api_key
5

Building the AI Voice Agent: A Step-by-Step Guide

Let's dive into the code! Here's the complete, runnable script for your insurance voice agent.
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Agent specializing in insurance services. Your persona is that of a knowledgeable, friendly, and patient insurance assistant. Your primary goal is to help users understand insurance products, answer questions about policy coverage, assist with claims processes, provide quotes, and guide users through common insurance procedures. You can explain different types of insurance (health, auto, home, life), clarify policy terms, and help users find relevant resources or contact information. 
14
15Capabilities:
16- Answer questions about various insurance products and coverage options.
17- Guide users through the process of filing a claim or checking claim status.
18- Provide general information about premiums, deductibles, and policy benefits.
19- Offer quotes or estimate premiums based on user input (if data is available).
20- Direct users to appropriate resources or connect them with a human agent for complex issues.
21
22Constraints and Limitations:
23- You are not a licensed insurance agent and cannot provide legally binding advice or finalize policy sales.
24- Do not collect or store sensitive personal information such as Social Security Numbers or payment details.
25- Always recommend that users consult with a licensed insurance professional for personalized advice or to complete transactions.
26- If unsure or if the user's request is outside your scope, politely suggest speaking with a human insurance representative.
27- Maintain user privacy and confidentiality at all times."
28
29class MyVoiceAgent(Agent):
30    def __init__(self):
31        super().__init__(instructions=agent_instructions)
32    async def on_enter(self): await self.session.say("Hello! How can I help?")
33    async def on_exit(self): await self.session.say("Goodbye!")
34
35async def start_session(context: JobContext):
36    # Create agent and conversation flow
37    agent = MyVoiceAgent()
38    conversation_flow = ConversationFlow(agent)
39
40    # Create pipeline
41    pipeline = CascadingPipeline(
42        stt=DeepgramSTT(model="nova-2", language="en"),
43        llm=OpenAILLM(model="gpt-4o"),
44        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
45        vad=SileroVAD(threshold=0.35),
46        turn_detector=TurnDetector(threshold=0.8)
47    )
48
49    session = AgentSession(
50        agent=agent,
51        pipeline=pipeline,
52        conversation_flow=conversation_flow
53    )
54
55    try:
56        await context.connect()
57        await session.start()
58        # Keep the session running until manually terminated
59        await asyncio.Event().wait()
60    finally:
61        # Clean up resources when done
62        await session.close()
63        await context.shutdown()
64
65def make_context() -> JobContext:
66    room_options = RoomOptions(
67    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
68        name="VideoSDK Cascaded Agent",
69        playground=True
70    )
71
72    return JobContext(room_options=room_options)
73
74if __name__ == "__main__":
75    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
76    job.start()
77
Let's break down each part of this code to understand how the agent works.

Step 4.1: Generating a VideoSDK Meeting ID

Before launching your agent, you need a meeting room where users and the agent can interact. You can generate a meeting ID using the VideoSDK API:
1curl -X POST "https://api.videosdk.live/v2/rooms" \
2  -H "Authorization: your_videosdk_api_key" \
3  -H "Content-Type: application/json"
4
The response will include a roomId you can use in your agent configuration. However, if you set playground=True in RoomOptions, the agent will auto-create a room and provide a test link for you.

Step 4.2: Creating the Custom Agent Class

The heart of your voice agent is the custom class that defines its persona and behavior.
1agent_instructions = "You are an AI Voice Agent specializing in insurance services. ..."
2
3class MyVoiceAgent(Agent):
4    def __init__(self):
5        super().__init__(instructions=agent_instructions)
6    async def on_enter(self):
7        await self.session.say("Hello! How can I help?")
8    async def on_exit(self):
9        await self.session.say("Goodbye!")
10
  • The agent_instructions string defines the agent's capabilities, constraints, and persona.
  • The MyVoiceAgent class inherits from Agent and sets up the initial greeting and goodbye messages.

Step 4.3: Defining the Core Pipeline

The pipeline coordinates all the plugins that power your agent:
1pipeline = CascadingPipeline(
2    stt=DeepgramSTT(model="nova-2", language="en"),
3    llm=OpenAILLM(model="gpt-4o"),
4    tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5    vad=SileroVAD(threshold=0.35),
6    turn_detector=TurnDetector(threshold=0.8)
7)
8
You can swap out these plugins for others supported by VideoSDK as needed.

Step 4.4: Managing the Session and Startup Logic

The session orchestrates the entire interaction lifecycle.
1async def start_session(context: JobContext):
2    agent = MyVoiceAgent()
3    conversation_flow = ConversationFlow(agent)
4    pipeline = CascadingPipeline(...)
5    session = AgentSession(
6        agent=agent,
7        pipeline=pipeline,
8        conversation_flow=conversation_flow
9    )
10    try:
11        await context.connect()
12        await session.start()
13        await asyncio.Event().wait()
14    finally:
15        await session.close()
16        await context.shutdown()
17
18def make_context() -> JobContext:
19    room_options = RoomOptions(
20        name="VideoSDK Cascaded Agent",
21        playground=True
22    )
23    return JobContext(room_options=room_options)
24
25if __name__ == "__main__":
26    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
27    job.start()
28
  • start_session sets up the agent, pipeline, and session, then starts everything.
  • make_context configures the meeting room, enabling the playground for easy testing.
  • The main block launches the agent as a job.

Running and Testing the Agent

Step 5.1: Running the Python Script

Start your agent by running:
1python main.py
2
In the console, you'll see a playground link (e.g., https://playground.videosdk.live/agent/XXXX).

Step 5.2: Interacting with the Agent in the Playground

  1. Open the playground link in your browser.
  2. Allow microphone access when prompted.
  3. Speak to the agent and listen to its responses.
  4. To stop the agent, press Ctrl+C in your terminal. This will gracefully shut down the session and free resources.
For hands-on experimentation, try the

AI Agent playground

to test and refine your agent's capabilities in real time.

Advanced Features and Customizations

Extending Functionality with Custom Tools

You can add custom tools (functions) to your agent for more advanced tasks, such as fetching policy details from a database or integrating with CRM systems. Implement a function_tool and register it with your agent for specialized actions.

Exploring Other Plugins

VideoSDK supports various plugins. For example:
  • STT: Cartesia (best quality), Deepgram (cost-effective), Rime (low-cost)
  • TTS: ElevenLabs (high quality), Deepgram (cost-effective)
  • LLM: OpenAI GPT-4, Google Gemini
Experiment with different plugins to optimize cost, speed, and accuracy for your use case.

Troubleshooting Common Issues

API Key and Authentication Errors

  • Double-check that all API keys are correct and active in your .env file.
  • Ensure your VideoSDK account is in good standing.

Audio Input/Output Problems

  • Make sure your browser has microphone permissions enabled.
  • Test with different browsers if issues persist.

Dependency and Version Conflicts

  • Use Python 3.11+ as required.
  • If you encounter errors, recreate your virtual environment and reinstall dependencies.

Conclusion

Congratulations! You've built a production-ready AI voice agent for insurance using VideoSDK and Python. This agent can answer policy questions, guide users through claims, and provide quotes.
To take your agent further, consider:
  • Integrating with backend insurance systems
  • Adding multilingual support
  • Deploying on cloud infrastructure
For guidance on taking your solution live, see the

AI voice Agent deployment

documentation.
Continue exploring the VideoSDK documentation to unlock even more capabilities for your AI voice agents.

Start Building With Free $20 Balance

No credit card required to start.

Want to level-up your learning? Subscribe now

Subscribe to our newsletter for more tech based insights

FAQ