Introduction to AI Voice Agents in BPO Companies
AI voice agents are automated systems designed to interact with humans using natural language. They play a crucial role in enhancing customer service experiences, especially in Business Process Outsourcing (BPO) companies. These agents can handle routine inquiries, provide quick responses, and assist human agents, thereby improving efficiency and customer satisfaction.
In this tutorial, we will build an AI voice assistant tailored for BPO companies using the VideoSDK framework. This assistant will utilize Speech-to-Text (STT), Language Learning Models (LLM), and Text-to-Speech (TTS) technologies to create a seamless interaction flow.
Architecture and Core Concepts
High-Level Architecture Overview
The architecture of an AI
voice agent
involves several components that work together to process user input and generate a response. The flow typically starts with capturing user speech, converting it to text, processing it through a language model, and then converting the response back to speech.
Understanding Key Concepts in the VideoSDK Framework
- Agent: The core class representing your bot, responsible for managing interactions.
Cascading pipeline in AI voice Agents
: This structure defines the flow of audio processing, integrating STT, LLM, and TTS.- VAD & TurnDetector: These components help the agent determine when to listen and respond.
Setting Up the Development Environment
Prerequisites
Before you begin, ensure you have Python 3.11+ installed and a VideoSDK account. You can sign up at app.videosdk.live.
Step 1: Create a Virtual Environment
Create a virtual environment to manage dependencies:
1python -m venv venv
2source venv/bin/activate # On Windows use `venv\Scripts\activate`
3Step 2: Install Required Packages
Install the necessary packages using pip:
1pip install videosdk-agents videosdk-plugins
2Step 3: Configure API Keys
Create a
.env file in your project directory and add your VideoSDK API key:1VIDEOSDK_API_KEY=your_api_key_here
2Building the AI Voice Agent: A Step-by-Step Guide
Here is the complete code for the AI voice agent:
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Assistant designed specifically for BPO (Business Process Outsourcing) companies. Your primary role is to assist customer service representatives by providing quick and accurate information to enhance customer interactions. \n\n**Persona:**\n- You are a knowledgeable and efficient assistant, always ready to support BPO agents in delivering excellent customer service.\n\n**Capabilities:**\n- Provide real-time information on company policies, procedures, and product details.\n- Assist in handling common customer queries and issues.\n- Offer suggestions for upselling or cross-selling based on customer interactions.\n- Log customer interactions and feedback for quality assurance purposes.\n- Support agents in managing call queues and prioritizing tasks.\n\n**Constraints and Limitations:**\n- You are not authorized to make decisions on behalf of the company or handle sensitive customer data.\n- You must always defer to a human agent for complex issues or when unsure about the information.\n- You cannot process payments or handle financial transactions.\n- Ensure compliance with data protection regulations and company policies at all times.\n- Include a disclaimer that you are an AI assistant and not a human representative."
14
15class MyVoiceAgent(Agent):
16 def __init__(self):
17 super().__init__(instructions=agent_instructions)
18 async def on_enter(self): await self.session.say("Hello! How can I help?")
19 async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22 # Create agent and conversation flow
23 agent = MyVoiceAgent()
24 conversation_flow = ConversationFlow(agent)
25
26 # Create pipeline
27 pipeline = CascadingPipeline(
28 stt=[Deepgram STT Plugin for voice agent](https://docs.videosdk.live/ai_agents/plugins/stt/deepgram)(model="nova-2", language="en"),
29 llm=[OpenAI LLM Plugin for voice agent](https://docs.videosdk.live/ai_agents/plugins/llm/openai)(model="gpt-4o"),
30 tts=[ElevenLabs TTS Plugin for voice agent](https://docs.videosdk.live/ai_agents/plugins/tts/eleven-labs)(model="eleven_flash_v2_5"),
31 vad=[Silero Voice Activity Detection](https://docs.videosdk.live/ai_agents/plugins/silero-vad)(threshold=0.35),
32 turn_detector=[Turn detector for AI voice Agents](https://docs.videosdk.live/ai_agents/plugins/turn-detector)(threshold=0.8)
33 )
34
35 session = [AI voice Agent Sessions](https://docs.videosdk.live/ai_agents/core-components/agent-session)(
36 agent=agent,
37 pipeline=pipeline,
38 conversation_flow=conversation_flow
39 )
40
41 try:
42 await context.connect()
43 await session.start()
44 # Keep the session running until manually terminated
45 await asyncio.Event().wait()
46 finally:
47 # Clean up resources when done
48 await session.close()
49 await context.shutdown()
50
51def make_context() -> JobContext:
52 room_options = RoomOptions(
53 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
54 name="VideoSDK Cascaded Agent",
55 playground=True
56 )
57
58 return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62 job.start()
63Step 4.1: Generating a VideoSDK Meeting ID
To create a meeting ID, use the following
curl command:1curl -X POST \
2 https://api.videosdk.live/v1/rooms \
3 -H "Authorization: YOUR_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{"name": "Test Room"}'
6Step 4.2: Creating the Custom Agent Class
The
MyVoiceAgent class extends the Agent class, providing a custom implementation for entering and exiting interactions. This class is where you define how the agent greets and says goodbye to users.Step 4.3: Defining the Core Pipeline
The
CascadingPipeline is crucial as it defines the flow of data through the system. It integrates the STT, LLM, and TTS components, allowing the agent to process and respond to user inputs seamlessly.- STT (DeepgramSTT): Converts user speech to text.
- LLM (OpenAILLM): Processes the text to generate a response.
- TTS (ElevenLabsTTS): Converts the response text back to speech.
- VAD (SileroVAD): Detects when the user is speaking.
- TurnDetector: Identifies when the agent should respond.
Step 4.4: Managing the Session and Startup Logic
The
start_session function initializes the agent session and manages the lifecycle of the interaction. The make_context function sets up the room options, and the main block starts the job, ensuring the agent is ready to interact.Running and Testing the Agent
Step 5.1: Running the Python Script
To start the agent, run the following command in your terminal:
1python main.py
2Step 5.2: Interacting with the Agent in the Playground
Once the agent is running, you will receive a
playground link
in the console. Open this link in your browser to interact with the agent.Advanced Features and Customizations
Extending Functionality with Custom Tools
You can extend the agent's capabilities by integrating custom tools. This allows for specialized processing or additional data handling.
Exploring Other Plugins
The VideoSDK framework supports various plugins for STT, LLM, and TTS. Explore alternatives to find the best fit for your needs.
Troubleshooting Common Issues
API Key and Authentication Errors
Ensure your API key is correctly set in the
.env file and that it has the necessary permissions.Audio Input/Output Problems
Check your microphone and speaker settings. Ensure they are correctly configured and not muted.
Dependency and Version Conflicts
Ensure all dependencies are up to date and compatible with your Python version.
Conclusion
Summary of What You've Built
In this guide, you've built an AI voice assistant tailored for BPO companies using the VideoSDK framework. This agent can effectively assist customer service representatives by providing real-time information and handling routine inquiries.
Next Steps and Further Learning
Explore additional features and plugins to enhance the agent's capabilities. Consider diving deeper into the VideoSDK documentation for more advanced use cases and integrations.
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ