Introduction to AI Voice Agents in AI Sales Call Agent
AI Voice Agents are revolutionizing the way businesses handle sales calls by automating interactions and providing real-time assistance. These agents leverage advanced technologies like speech-to-text (STT), large language models (LLM), and text-to-speech (TTS) to understand and respond to customer queries effectively.
What is an AI Voice Agent
?
An AI
Voice Agent
is a software application designed to interact with users through voice commands. It processes spoken language, understands the intent, and generates appropriate responses. These agents are becoming increasingly popular in various industries, including sales, where they help streamline communication and enhance customer experience.Why are they important for the AI Sales Call Agent industry?
In the sales industry, AI Voice Agents can handle routine inquiries, schedule follow-ups, and provide product information, freeing up human agents to focus on more complex tasks. They improve efficiency, reduce response times, and can operate 24/7, making them invaluable assets for sales teams.
Core Components of a Voice Agent
- STT (Speech-to-Text): Converts spoken language into text.
- LLM (Large Language Model): Processes the text to understand and generate responses.
- TTS (Text-to-Speech): Converts text responses back into spoken language.
What You'll Build in This Tutorial
In this tutorial, you will build an AI Sales Call Agent using the VideoSDK framework. This agent will engage with customers, answer frequently asked questions, and assist in the sales process.
Architecture and Core Concepts
High-Level Architecture Overview
The AI
Voice Agent
architecture involves a series of components that work together to process user input and generate responses. The flow begins with capturing user speech, converting it to text, processing the text to understand the intent, generating a response, and finally converting the response back to speech.
Understanding Key Concepts in the VideoSDK Framework
- Agent: The core class representing your bot, responsible for managing interactions.
Cascading Pipeline in AI voice Agents
: A sequence of processing steps (STT -> LLM -> TTS) that handle user input and generate responses.- VAD & TurnDetector: Tools that help the agent determine when to listen and when to speak, ensuring smooth interactions.
Setting Up the Development Environment
Prerequisites
Before you begin, ensure you have Python 3.11+ installed and a VideoSDK account. You can sign up at app.videosdk.live.
Step 1: Create a Virtual Environment
1python3 -m venv venv
2source venv/bin/activate # On Windows use `venv\Scripts\activate`
3Step 2: Install Required Packages
Install the necessary packages using pip:
1pip install videosdk
2Step 3: Configure API Keys in a .env file
Create a
.env file in your project directory and add your VideoSDK API keys:1VIDEOSDK_API_KEY=your_api_key_here
2Building the AI Voice Agent: A Step-by-Step Guide
Let's start by presenting the complete code for your AI Sales Call Agent. This code will serve as the foundation for the agent you'll build and test.
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Sales Call Agent designed to assist sales teams in managing and optimizing their call processes. Your primary role is to engage with potential customers over the phone, provide information about products and services, and facilitate the sales process. You are capable of answering frequently asked questions about the products, scheduling follow-up calls, and collecting customer feedback. You can also update customer records in the CRM system based on the interactions. However, you are not authorized to make final sales decisions or handle sensitive customer information such as credit card details. Always ensure to transfer such calls to a human sales representative. Additionally, you must adhere to all applicable privacy laws and regulations during interactions."
14
15class MyVoiceAgent(Agent):
16 def __init__(self):
17 super().__init__(instructions=agent_instructions)
18 async def on_enter(self): await self.session.say("Hello! How can I help?")
19 async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22 # Create agent and conversation flow
23 agent = MyVoiceAgent()
24 conversation_flow = ConversationFlow(agent)
25
26 # Create pipeline
27 pipeline = CascadingPipeline(
28 stt=DeepgramSTT(model="nova-2", language="en"),
29 llm=OpenAILLM(model="gpt-4o"),
30 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31 vad=SileroVAD(threshold=0.35),
32 turn_detector=TurnDetector(threshold=0.8)
33 )
34
35 session = [AI voice Agent Sessions](https://docs.videosdk.live/ai_agents/core-components/agent-session)(
36 agent=agent,
37 pipeline=pipeline,
38 conversation_flow=conversation_flow
39 )
40
41 try:
42 await context.connect()
43 await session.start()
44 # Keep the session running until manually terminated
45 await asyncio.Event().wait()
46 finally:
47 # Clean up resources when done
48 await session.close()
49 await context.shutdown()
50
51def make_context() -> JobContext:
52 room_options = RoomOptions(
53 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
54 name="VideoSDK Cascaded Agent",
55 playground=True
56 )
57
58 return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62 job.start()
63Step 4.1: Generating a VideoSDK Meeting ID
To generate a meeting ID, use the following
curl command:1curl -X POST "https://api.videosdk.live/v1/meetings" \
2-H "Authorization: Bearer YOUR_API_KEY" \
3-H "Content-Type: application/json"
4This command will return a meeting ID that you can use to connect your agent.
Step 4.2: Creating the Custom Agent Class
The
MyVoiceAgent class extends the Agent class from the VideoSDK framework. It initializes with specific instructions and defines behavior for entering and exiting a session.1class MyVoiceAgent(Agent):
2 def __init__(self):
3 super().__init__(instructions=agent_instructions)
4 async def on_enter(self): await self.session.say("Hello! How can I help?")
5 async def on_exit(self): await self.session.say("Goodbye!")
6Step 4.3: Defining the Core Pipeline
The
CascadingPipeline is central to processing audio input and generating responses. It integrates various plugins for STT, LLM, TTS, VAD, and turn detection.1pipeline = CascadingPipeline(
2 stt=[Deepgram STT Plugin for voice agent](https://docs.videosdk.live/ai_agents/plugins/stt/deepgram)(model="nova-2", language="en"),
3 llm=OpenAILLM(model="gpt-4o"),
4 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5 vad=[Silero Voice Activity Detection](https://docs.videosdk.live/ai_agents/plugins/silero-vad)(threshold=0.35),
6 turn_detector=[Turn detector for AI voice Agents](https://docs.videosdk.live/ai_agents/plugins/turn-detector)(threshold=0.8)
7)
8Step 4.4: Managing the Session and Startup Logic
The
start_session function sets up the agent session, connects to the context, and manages the session lifecycle.1async def start_session(context: JobContext):
2 agent = MyVoiceAgent()
3 conversation_flow = ConversationFlow(agent)
4 pipeline = CascadingPipeline(
5 stt=DeepgramSTT(model="nova-2", language="en"),
6 llm=OpenAILLM(model="gpt-4o"),
7 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
8 vad=SileroVAD(threshold=0.35),
9 turn_detector=TurnDetector(threshold=0.8)
10 )
11 session = AgentSession(
12 agent=agent,
13 pipeline=pipeline,
14 conversation_flow=conversation_flow
15 )
16 try:
17 await context.connect()
18 await session.start()
19 await asyncio.Event().wait()
20 finally:
21 await session.close()
22 await context.shutdown()
23The
make_context function defines the room options and returns a JobContext object.1def make_context() -> JobContext:
2 room_options = RoomOptions(
3 name="VideoSDK Cascaded Agent",
4 playground=True
5 )
6 return JobContext(room_options=room_options)
7The main block initializes and starts the job.
1if __name__ == "__main__":
2 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
3 job.start()
4Running and Testing the Agent
Step 5.1: Running the Python Script
To run your AI Sales Call Agent, execute the following command in your terminal:
1python main.py
2Step 5.2: Interacting with the Agent in the AI Agent playground
Once the script is running, you will receive a playground link in the console. Use this link to interact with your agent in a simulated environment.
Advanced Features and Customizations
Extending Functionality with Custom Tools
You can extend your agent's capabilities by integrating custom tools. These tools can handle specific tasks or enhance the agent's functionality.
Exploring Other Plugins
The VideoSDK framework supports various plugins for STT, LLM, and TTS. Consider exploring options like Cartesia for STT or Google Gemini for LLM to optimize your agent's performance.
Troubleshooting Common Issues
API Key and Authentication Errors
Ensure your API keys are correctly configured in the
.env file. Double-check for typos or missing keys.Audio Input/Output Problems
Verify that your microphone and speakers are functioning correctly. Check the agent's audio settings if issues persist.
Dependency and Version Conflicts
Ensure all required packages are installed and up-to-date. Use a virtual environment to manage dependencies effectively.
Conclusion
Summary of What You've Built
In this tutorial, you built an AI Sales Call Agent using the VideoSDK framework. This agent can engage with customers, answer questions, and assist in the sales process.
Next Steps and Further Learning
Consider expanding your agent's capabilities by exploring additional plugins and custom tools. Stay updated with the latest developments in AI technology to enhance your agent's performance.
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ