Introduction to AI Voice Agents in ai voice agent for insurance
AI voice agents are transforming the way insurance companies interact with their customers. These intelligent virtual assistants use advanced speech and language technologies to understand, process, and respond to user queries in real time. In this tutorial, we'll explore how to build a robust AI voice agent tailored for the insurance industry using the VideoSDK AI Agents framework.
What is an AI Voice Agent?
An AI voice agent is a software application that can engage in spoken conversations with users. It leverages speech-to-text (STT) to convert spoken words into text, a large language model (LLM) to understand and generate responses, and text-to-speech (TTS) to reply in natural-sounding audio.
Why are they important for the ai voice agent for insurance industry?
In the insurance sector, voice agents can:
- Answer questions about policy coverage, claims, and quotes
- Guide customers through the claims process
- Provide 24/7 support and reduce call center workload
- Improve customer satisfaction with quick, accurate responses
Core Components of a Voice Agent
- STT (Speech-to-Text): Transcribes user speech into text.
- LLM (Large Language Model): Understands intent and generates responses.
- TTS (Text-to-Speech): Converts the agent's reply back into audio.
To get a comprehensive understanding of these elements, refer to the
AI voice Agent core components overview
in the VideoSDK documentation.What You'll Build in This Tutorial
By the end of this guide, you'll have a fully functional AI voice agent for insurance that you can test in the VideoSDK playground. The agent will be able to answer insurance-related questions, guide users through claims, and provide general support.
Architecture and Core Concepts
High-Level Architecture Overview
Let's look at how the voice agent processes a conversation:
- User speaks into their microphone.
- STT transcribes the audio to text.
- LLM analyzes the text and generates a response.
- TTS converts the response to audio.
- Agent delivers the reply back to the user.
Understanding Key Concepts in the VideoSDK Framework
- Agent: The main class that defines your AI assistant's behavior and persona.
- CascadingPipeline: Manages the flow of audio and text through STT, LLM, TTS, VAD, and turn detection. For more details on how this works, check out the
Cascading pipeline in AI voice Agents
. - VAD (Voice Activity Detection): Detects when the user is speaking.
- TurnDetector: Determines when it's the agent's turn to respond.
These components work together to create a seamless conversational experience.
Setting Up the Development Environment
Prerequisites
Before you begin, ensure you have:
- Python 3.11+ installed
- A VideoSDK account (sign up at app.videosdk.live)
- API keys for Deepgram (STT), ElevenLabs (TTS), and OpenAI (LLM)
If you're just getting started, the
Voice Agent Quick Start Guide
provides step-by-step instructions for setup.Step 1: Create a Virtual Environment
Open your terminal and run:
1python3.11 -m venv venv
2source venv/bin/activate # On Windows: venv\Scripts\activate
3Step 2: Install Required Packages
Install the necessary dependencies:
1pip install videosdk-agents videosdk-plugins-openai videosdk-plugins-deepgram videosdk-plugins-elevenlabs videosdk-plugins-silero videosdk-plugins-turn-detector
2Step 3: Configure API Keys in a .env File
Create a
.env file in your project directory and add your API keys:1VIDEOSDK_API_KEY=your_videosdk_api_key
2OPENAI_API_KEY=your_openai_api_key
3DEEPGRAM_API_KEY=your_deepgram_api_key
4ELEVENLABS_API_KEY=your_elevenlabs_api_key
5Building the AI Voice Agent: A Step-by-Step Guide
Let's dive into the code! Here's the complete, runnable script for your insurance voice agent.
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Agent specializing in insurance services. Your persona is that of a knowledgeable, friendly, and patient insurance assistant. Your primary goal is to help users understand insurance products, answer questions about policy coverage, assist with claims processes, provide quotes, and guide users through common insurance procedures. You can explain different types of insurance (health, auto, home, life), clarify policy terms, and help users find relevant resources or contact information.
14
15Capabilities:
16- Answer questions about various insurance products and coverage options.
17- Guide users through the process of filing a claim or checking claim status.
18- Provide general information about premiums, deductibles, and policy benefits.
19- Offer quotes or estimate premiums based on user input (if data is available).
20- Direct users to appropriate resources or connect them with a human agent for complex issues.
21
22Constraints and Limitations:
23- You are not a licensed insurance agent and cannot provide legally binding advice or finalize policy sales.
24- Do not collect or store sensitive personal information such as Social Security Numbers or payment details.
25- Always recommend that users consult with a licensed insurance professional for personalized advice or to complete transactions.
26- If unsure or if the user's request is outside your scope, politely suggest speaking with a human insurance representative.
27- Maintain user privacy and confidentiality at all times."
28
29class MyVoiceAgent(Agent):
30 def __init__(self):
31 super().__init__(instructions=agent_instructions)
32 async def on_enter(self): await self.session.say("Hello! How can I help?")
33 async def on_exit(self): await self.session.say("Goodbye!")
34
35async def start_session(context: JobContext):
36 # Create agent and conversation flow
37 agent = MyVoiceAgent()
38 conversation_flow = ConversationFlow(agent)
39
40 # Create pipeline
41 pipeline = CascadingPipeline(
42 stt=DeepgramSTT(model="nova-2", language="en"),
43 llm=OpenAILLM(model="gpt-4o"),
44 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
45 vad=SileroVAD(threshold=0.35),
46 turn_detector=TurnDetector(threshold=0.8)
47 )
48
49 session = AgentSession(
50 agent=agent,
51 pipeline=pipeline,
52 conversation_flow=conversation_flow
53 )
54
55 try:
56 await context.connect()
57 await session.start()
58 # Keep the session running until manually terminated
59 await asyncio.Event().wait()
60 finally:
61 # Clean up resources when done
62 await session.close()
63 await context.shutdown()
64
65def make_context() -> JobContext:
66 room_options = RoomOptions(
67 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
68 name="VideoSDK Cascaded Agent",
69 playground=True
70 )
71
72 return JobContext(room_options=room_options)
73
74if __name__ == "__main__":
75 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
76 job.start()
77Let's break down each part of this code to understand how the agent works.
Step 4.1: Generating a VideoSDK Meeting ID
Before launching your agent, you need a meeting room where users and the agent can interact. You can generate a meeting ID using the VideoSDK API:
1curl -X POST "https://api.videosdk.live/v2/rooms" \
2 -H "Authorization: your_videosdk_api_key" \
3 -H "Content-Type: application/json"
4The response will include a
roomId you can use in your agent configuration. However, if you set playground=True in RoomOptions, the agent will auto-create a room and provide a test link for you.Step 4.2: Creating the Custom Agent Class
The heart of your voice agent is the custom class that defines its persona and behavior.
1agent_instructions = "You are an AI Voice Agent specializing in insurance services. ..."
2
3class MyVoiceAgent(Agent):
4 def __init__(self):
5 super().__init__(instructions=agent_instructions)
6 async def on_enter(self):
7 await self.session.say("Hello! How can I help?")
8 async def on_exit(self):
9 await self.session.say("Goodbye!")
10- The
agent_instructionsstring defines the agent's capabilities, constraints, and persona. - The
MyVoiceAgentclass inherits fromAgentand sets up the initial greeting and goodbye messages.
Step 4.3: Defining the Core Pipeline
The pipeline coordinates all the plugins that power your agent:
1pipeline = CascadingPipeline(
2 stt=DeepgramSTT(model="nova-2", language="en"),
3 llm=OpenAILLM(model="gpt-4o"),
4 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5 vad=SileroVAD(threshold=0.35),
6 turn_detector=TurnDetector(threshold=0.8)
7)
8- DeepgramSTT: Converts speech to text. Learn more about integrating this with your agent in the
Deepgram STT Plugin for voice agent
documentation. - OpenAILLM: Processes the text and generates a response. For setup and configuration, see the
OpenAI LLM Plugin for voice agent
. - ElevenLabsTTS: Synthesizes the agent's reply into audio. Explore options in the
ElevenLabs TTS Plugin for voice agent
. - SileroVAD: Detects when the user is speaking. For more on this technology, refer to
Silero Voice Activity Detection
. - TurnDetector: Determines when the agent should respond. Detailed guidance is available in the
Turn detector for AI voice Agents
.
You can swap out these plugins for others supported by VideoSDK as needed.
Step 4.4: Managing the Session and Startup Logic
The session orchestrates the entire interaction lifecycle.
1async def start_session(context: JobContext):
2 agent = MyVoiceAgent()
3 conversation_flow = ConversationFlow(agent)
4 pipeline = CascadingPipeline(...)
5 session = AgentSession(
6 agent=agent,
7 pipeline=pipeline,
8 conversation_flow=conversation_flow
9 )
10 try:
11 await context.connect()
12 await session.start()
13 await asyncio.Event().wait()
14 finally:
15 await session.close()
16 await context.shutdown()
17
18def make_context() -> JobContext:
19 room_options = RoomOptions(
20 name="VideoSDK Cascaded Agent",
21 playground=True
22 )
23 return JobContext(room_options=room_options)
24
25if __name__ == "__main__":
26 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
27 job.start()
28start_sessionsets up the agent, pipeline, and session, then starts everything.make_contextconfigures the meeting room, enabling the playground for easy testing.- The main block launches the agent as a job.
Running and Testing the Agent
Step 5.1: Running the Python Script
Start your agent by running:
1python main.py
2In the console, you'll see a playground link (e.g.,
https://playground.videosdk.live/agent/XXXX).Step 5.2: Interacting with the Agent in the Playground
- Open the playground link in your browser.
- Allow microphone access when prompted.
- Speak to the agent and listen to its responses.
- To stop the agent, press
Ctrl+Cin your terminal. This will gracefully shut down the session and free resources.
For hands-on experimentation, try the
AI Agent playground
to test and refine your agent's capabilities in real time.Advanced Features and Customizations
Extending Functionality with Custom Tools
You can add custom tools (functions) to your agent for more advanced tasks, such as fetching policy details from a database or integrating with CRM systems. Implement a
function_tool and register it with your agent for specialized actions.Exploring Other Plugins
VideoSDK supports various plugins. For example:
- STT: Cartesia (best quality), Deepgram (cost-effective), Rime (low-cost)
- TTS: ElevenLabs (high quality), Deepgram (cost-effective)
- LLM: OpenAI GPT-4, Google Gemini
Experiment with different plugins to optimize cost, speed, and accuracy for your use case.
Troubleshooting Common Issues
API Key and Authentication Errors
- Double-check that all API keys are correct and active in your
.envfile. - Ensure your VideoSDK account is in good standing.
Audio Input/Output Problems
- Make sure your browser has microphone permissions enabled.
- Test with different browsers if issues persist.
Dependency and Version Conflicts
- Use Python 3.11+ as required.
- If you encounter errors, recreate your virtual environment and reinstall dependencies.
Conclusion
Congratulations! You've built a production-ready AI voice agent for insurance using VideoSDK and Python. This agent can answer policy questions, guide users through claims, and provide quotes.
To take your agent further, consider:
- Integrating with backend insurance systems
- Adding multilingual support
- Deploying on cloud infrastructure
For guidance on taking your solution live, see the
AI voice Agent deployment
documentation.Continue exploring the VideoSDK documentation to unlock even more capabilities for your AI voice agents.
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ