What is an AI Voice Agent?

An AI Voice Agent is a system that uses voice recognition and natural language processing to interact with users through voice commands.

Why use AI Voice Agents in the insurance industry?

AI Voice Agents can streamline customer service in the insurance industry by answering policy questions, providing quotes, and guiding users through claims processes.

What is the VideoSDK framework?

The VideoSDK framework is a platform that provides tools and plugins for building AI Voice Agents, integrating STT, LLM, TTS, and other functionalities.

How do I generate a VideoSDK meeting ID?

Use a `curl` command with your API key to generate a meeting ID via the VideoSDK API.

What are the key components of a CascadingPipeline?

The CascadingPipeline includes components for STT, LLM, TTS, VAD, and TurnDetector, managing the flow of audio processing in the agent.

Build an AI Voice Assistant for Insurance

Step-by-step guide to building an AI voice assistant for the insurance industry with VideoSDK.

Introduction to AI Voice Agents in the Insurance Industry

AI Voice Agents are intelligent systems designed to interact with users through voice commands. These agents use technologies such as Speech-to-Text (STT), Text-to-Speech (TTS), and Language Learning Models (LLM) to understand and respond to user queries. In the insurance industry, AI Voice Agents can streamline customer service by answering policy questions, providing quotes, and guiding users through claims processes.

In this tutorial, we will build an AI Voice Assistant tailored for the insurance industry using the VideoSDK AI Agents framework. This agent will provide helpful information about insurance policies and assist users in a friendly and informative manner.

Architecture and Core Concepts

High-Level Architecture Overview

The AI Voice Agent processes user speech through a series of steps: converting speech to text, analyzing the text to generate a response, and converting the response back to speech. This flow is managed by the VideoSDK framework, which integrates various plugins for each task. To get started quickly, refer to the

Voice Agent Quick Start Guide

1sequenceDiagram
2    participant User
3    participant Agent
4    participant STT
5    participant LLM
6    participant TTS
7
8    User->>Agent: Speak
9    Agent->>STT: Convert Speech to Text
10    STT-->>Agent: Text
11    Agent->>LLM: Analyze Text
12    LLM-->>Agent: Response
13    Agent->>TTS: Convert Text to Speech
14    TTS-->>Agent: Audio
15    Agent->>User: Respond
16

Understanding Key Concepts in the VideoSDK Framework

Agent: This is the core class representing your AI bot, responsible for handling interactions.
CascadingPipeline: Manages the flow of audio processing, converting user speech to text, generating responses, and converting them back to speech. Learn more about the
Cascading pipeline in AI voice Agents
.
VAD & TurnDetector: These components help the agent determine when to listen and when to respond, ensuring smooth interactions. For more details, see the
Turn detector for AI voice Agents
.

Setting Up the Development Environment

Prerequisites

To get started, ensure you have Python 3.11+ installed. You will also need a VideoSDK account, which you can create at app.videosdk.live.

Step 1: Create a Virtual Environment

Create a virtual environment to manage dependencies:

1python -m venv venv
2source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`
3

Step 2: Install Required Packages

Install the necessary packages using pip:

1pip install videosdk-agents videosdk-plugins-silero videosdk-plugins-turn-detector videosdk-plugins-deepgram videosdk-plugins-openai videosdk-plugins-elevenlabs
2

Step 3: Configure API Keys in a `.env` file

Create a .env file in your project directory to store API keys securely.

Building the AI Voice Agent: A Step-by-Step Guide

Here is the complete code to build your AI Voice Agent:

1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Assistant specialized in the insurance industry. Your persona is that of a knowledgeable and friendly insurance advisor. Your primary capabilities include answering questions about various insurance policies, providing quotes, explaining coverage details, and assisting with claims processes. You can also guide users through the steps of purchasing insurance and provide reminders for policy renewals. However, you are not a licensed insurance agent, and you must include a disclaimer advising users to consult with a licensed professional for personalized advice. You should not provide financial or legal advice and must refrain from making any guarantees about policy approvals or claims outcomes. Your responses should be clear, concise, and informative, ensuring users feel supported and informed throughout their interaction with you."
14
15class MyVoiceAgent(Agent):
16    def __init__(self):
17        super().__init__(instructions=agent_instructions)
18    async def on_enter(self): await self.session.say("Hello! How can I help?")
19    async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22    # Create agent and conversation flow
23    agent = MyVoiceAgent()
24    conversation_flow = ConversationFlow(agent)
25
26    # Create pipeline
27    pipeline = CascadingPipeline(
28        stt=DeepgramSTT(model="nova-2", language="en"),
29        llm=OpenAILLM(model="gpt-4o"),
30        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31        vad=SileroVAD(threshold=0.35),
32        turn_detector=TurnDetector(threshold=0.8)
33    )
34
35    session = AgentSession(
36        agent=agent,
37        pipeline=pipeline,
38        conversation_flow=conversation_flow
39    )
40
41    try:
42        await context.connect()
43        await session.start()
44        # Keep the session running until manually terminated
45        await asyncio.Event().wait()
46    finally:
47        # Clean up resources when done
48        await session.close()
49        await context.shutdown()
50
51def make_context() -> JobContext:
52    room_options = RoomOptions(
53    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
54        name="VideoSDK Cascaded Agent",
55        playground=True
56    )
57
58    return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62    job.start()
63

Step 4.1: Generating a VideoSDK Meeting ID

To generate a meeting ID, use the following curl command:

1curl -X POST \
2  https://api.videosdk.live/v1/meetings \
3  -H "Authorization: YOUR_API_KEY" \
4  -H "Content-Type: application/json" \
5  -d '{}'
6

Step 4.2: Creating the Custom Agent Class

The MyVoiceAgent class extends the Agent class, defining the agent's behavior. It includes methods for entering and exiting interactions, ensuring a friendly greeting and a polite farewell.

Step 4.3: Defining the Core Pipeline

The CascadingPipeline is a crucial component that manages the flow of audio processing. It integrates plugins for STT, LLM, TTS, VAD, and TurnDetector, each playing a vital role in the agent's functionality. For a comprehensive understanding, refer to the

AI voice Agent core components overview

Step 4.4: Managing the Session and Startup Logic

The start_session function initializes the agent session, connects to the VideoSDK platform, and maintains the session until manually terminated. The make_context function sets up the room options for the session. Explore more about

AI voice Agent Sessions

Running and Testing the Agent

Step 5.1: Running the Python Script

To start the agent, execute the script:

1python main.py
2

Step 5.2: Interacting with the Agent in the Playground

Once the agent is running, use the

AI Agent playground

link provided in the console to interact with your AI Voice Assistant.

Advanced Features and Customizations

Extending Functionality with Custom Tools

You can extend the agent's capabilities by integrating custom tools, allowing for more specialized interactions. Consider using the

ElevenLabs TTS Plugin for voice agent

and

Deepgram STT Plugin for voice agent

for enhanced text-to-speech and speech-to-text functionalities.

Exploring Other Plugins

Consider experimenting with different plugins for STT, LLM, and TTS to enhance the agent's performance and capabilities. The

OpenAI LLM Plugin for voice agent

is an excellent choice for improving language model interactions.

Troubleshooting Common Issues

API Key and Authentication Errors

Ensure your API keys are correctly configured in the .env file to avoid authentication issues.

Audio Input/Output Problems

Check your microphone and speaker settings if you encounter audio issues during interactions.

Dependency and Version Conflicts

Ensure all dependencies are up-to-date and compatible with your Python version to prevent conflicts.

Conclusion

In this tutorial, you built an AI Voice Assistant for the insurance industry using the VideoSDK framework. As a next step, consider exploring advanced features and customizations to further tailor the agent to your needs.

Start Building With Free $20 Balance

No credit card required to start.

Want to level-up your learning? Subscribe now

Subscribe to our newsletter for more tech based insights

FAQ

Free $20 Balance for AI Voice Agents & Video Calls

RELEVANT BLOGS