Introduction to AI Voice Agents in the Insurance Industry
AI Voice Agents are intelligent systems designed to interact with users through voice commands. These agents use technologies such as Speech-to-Text (STT), Text-to-Speech (TTS), and Language Learning Models (LLM) to understand and respond to user queries. In the insurance industry, AI Voice Agents can streamline customer service by answering policy questions, providing quotes, and guiding users through claims processes.
In this tutorial, we will build an AI Voice Assistant tailored for the insurance industry using the VideoSDK AI Agents framework. This agent will provide helpful information about insurance policies and assist users in a friendly and informative manner.
Architecture and Core Concepts
High-Level Architecture Overview
The AI Voice Agent processes user speech through a series of steps: converting speech to text, analyzing the text to generate a response, and converting the response back to speech. This flow is managed by the VideoSDK framework, which integrates various plugins for each task. To get started quickly, refer to the
Voice Agent Quick Start Guide
.1sequenceDiagram
2 participant User
3 participant Agent
4 participant STT
5 participant LLM
6 participant TTS
7
8 User->>Agent: Speak
9 Agent->>STT: Convert Speech to Text
10 STT-->>Agent: Text
11 Agent->>LLM: Analyze Text
12 LLM-->>Agent: Response
13 Agent->>TTS: Convert Text to Speech
14 TTS-->>Agent: Audio
15 Agent->>User: Respond
16Understanding Key Concepts in the VideoSDK Framework
- Agent: This is the core class representing your AI bot, responsible for handling interactions.
- CascadingPipeline: Manages the flow of audio processing, converting user speech to text, generating responses, and converting them back to speech. Learn more about the
Cascading pipeline in AI voice Agents
. - VAD & TurnDetector: These components help the agent determine when to listen and when to respond, ensuring smooth interactions. For more details, see the
Turn detector for AI voice Agents
.
Setting Up the Development Environment
Prerequisites
To get started, ensure you have Python 3.11+ installed. You will also need a VideoSDK account, which you can create at app.videosdk.live.
Step 1: Create a Virtual Environment
Create a virtual environment to manage dependencies:
1python -m venv venv
2source venv/bin/activate # On Windows use `venv\\Scripts\\activate`
3Step 2: Install Required Packages
Install the necessary packages using pip:
1pip install videosdk-agents videosdk-plugins-silero videosdk-plugins-turn-detector videosdk-plugins-deepgram videosdk-plugins-openai videosdk-plugins-elevenlabs
2Step 3: Configure API Keys in a .env file
Create a
.env file in your project directory to store API keys securely.Building the AI Voice Agent: A Step-by-Step Guide
Here is the complete code to build your AI Voice Agent:
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Assistant specialized in the insurance industry. Your persona is that of a knowledgeable and friendly insurance advisor. Your primary capabilities include answering questions about various insurance policies, providing quotes, explaining coverage details, and assisting with claims processes. You can also guide users through the steps of purchasing insurance and provide reminders for policy renewals. However, you are not a licensed insurance agent, and you must include a disclaimer advising users to consult with a licensed professional for personalized advice. You should not provide financial or legal advice and must refrain from making any guarantees about policy approvals or claims outcomes. Your responses should be clear, concise, and informative, ensuring users feel supported and informed throughout their interaction with you."
14
15class MyVoiceAgent(Agent):
16 def __init__(self):
17 super().__init__(instructions=agent_instructions)
18 async def on_enter(self): await self.session.say("Hello! How can I help?")
19 async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22 # Create agent and conversation flow
23 agent = MyVoiceAgent()
24 conversation_flow = ConversationFlow(agent)
25
26 # Create pipeline
27 pipeline = CascadingPipeline(
28 stt=DeepgramSTT(model="nova-2", language="en"),
29 llm=OpenAILLM(model="gpt-4o"),
30 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31 vad=SileroVAD(threshold=0.35),
32 turn_detector=TurnDetector(threshold=0.8)
33 )
34
35 session = AgentSession(
36 agent=agent,
37 pipeline=pipeline,
38 conversation_flow=conversation_flow
39 )
40
41 try:
42 await context.connect()
43 await session.start()
44 # Keep the session running until manually terminated
45 await asyncio.Event().wait()
46 finally:
47 # Clean up resources when done
48 await session.close()
49 await context.shutdown()
50
51def make_context() -> JobContext:
52 room_options = RoomOptions(
53 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
54 name="VideoSDK Cascaded Agent",
55 playground=True
56 )
57
58 return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62 job.start()
63Step 4.1: Generating a VideoSDK Meeting ID
To generate a meeting ID, use the following
curl command:1curl -X POST \
2 https://api.videosdk.live/v1/meetings \
3 -H "Authorization: YOUR_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{}'
6Step 4.2: Creating the Custom Agent Class
The
MyVoiceAgent class extends the Agent class, defining the agent's behavior. It includes methods for entering and exiting interactions, ensuring a friendly greeting and a polite farewell.Step 4.3: Defining the Core Pipeline
The
CascadingPipeline is a crucial component that manages the flow of audio processing. It integrates plugins for STT, LLM, TTS, VAD, and TurnDetector, each playing a vital role in the agent's functionality. For a comprehensive understanding, refer to the AI voice Agent core components overview
.Step 4.4: Managing the Session and Startup Logic
The
start_session function initializes the agent session, connects to the VideoSDK platform, and maintains the session until manually terminated. The make_context function sets up the room options for the session. Explore more about AI voice Agent Sessions
.Running and Testing the Agent
Step 5.1: Running the Python Script
To start the agent, execute the script:
1python main.py
2Step 5.2: Interacting with the Agent in the Playground
Once the agent is running, use the
AI Agent playground
link provided in the console to interact with your AI Voice Assistant.Advanced Features and Customizations
Extending Functionality with Custom Tools
You can extend the agent's capabilities by integrating custom tools, allowing for more specialized interactions. Consider using the
ElevenLabs TTS Plugin for voice agent
andDeepgram STT Plugin for voice agent
for enhanced text-to-speech and speech-to-text functionalities.Exploring Other Plugins
Consider experimenting with different plugins for STT, LLM, and TTS to enhance the agent's performance and capabilities. The
OpenAI LLM Plugin for voice agent
is an excellent choice for improving language model interactions.Troubleshooting Common Issues
API Key and Authentication Errors
Ensure your API keys are correctly configured in the
.env file to avoid authentication issues.Audio Input/Output Problems
Check your microphone and speaker settings if you encounter audio issues during interactions.
Dependency and Version Conflicts
Ensure all dependencies are up-to-date and compatible with your Python version to prevent conflicts.
Conclusion
In this tutorial, you built an AI Voice Assistant for the insurance industry using the VideoSDK framework. As a next step, consider exploring advanced features and customizations to further tailor the agent to your needs.
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ