Introduction to AI Voice Agents in AI Call for Property Management
AI Voice Agents are revolutionizing the way businesses interact with customers by providing automated, intelligent responses to voice queries. In the context of property management, these agents can handle inquiries about property listings, schedule viewings, provide rental agreement details, and assist with maintenance requests. By automating these tasks, property management companies can improve efficiency and customer satisfaction.
What is an AI Voice Agent
?
An AI
Voice Agent
is a software application designed to interact with users through voice commands. It uses technologies like Speech-to-Text (STT), Language Learning Models (LLM), and Text-to-Speech (TTS) to understand and respond to user queries.Why are they important for the AI call for property management industry?
AI Voice Agents streamline operations by handling routine inquiries, thus freeing up human agents for more complex tasks. They provide 24/7 service availability, ensuring that potential tenants or property owners can get the information they need anytime.
Core Components of a Voice Agent
- STT (Speech-to-Text): Converts spoken language into text.
- LLM (Language Learning Model): Processes the text to understand and generate responses.
- TTS (Text-to-Speech): Converts text responses back into spoken language.
For a comprehensive understanding, refer to the
AI voice Agent core components overview
.What You'll Build in This Tutorial
In this tutorial, you'll build a fully functional AI
Voice Agent
tailored for property management using the VideoSDK framework. You'll learn how to set up the development environment, create an agent, and test it using theAI Agent playground
.Architecture and Core Concepts
High-Level Architecture Overview
The AI
Voice Agent
processes user speech through a series of stages: capturing audio, converting it to text, generating a response, and finally converting the response back to audio. This flow is managed by the VideoSDK framework, which integrates various plugins to handle each stage efficiently.
Understanding Key Concepts in the VideoSDK Framework
- Agent: The core class that represents your bot. It handles interactions with users.
- CascadingPipeline: Manages the flow of data through STT, LLM, and TTS processes. Learn more about the
Cascading pipeline in AI voice Agents
. - VAD & TurnDetector: These components determine when the agent should listen and when it should speak.
Setting Up the Development Environment
Prerequisites
To get started, ensure you have Python 3.11+ installed and a VideoSDK account. You can sign up at app.videosdk.live.
Step 1: Create a Virtual Environment
Open your terminal and run the following commands to create and activate a virtual environment:
1python3 -m venv venv
2source venv/bin/activate # On Windows use `venv\Scripts\activate`
3Step 2: Install Required Packages
Install the necessary packages using pip:
1pip install videosdk
2Step 3: Configure API Keys in a .env file
Create a
.env file in your project directory and add your VideoSDK API key:1VIDEOSDK_API_KEY=your_api_key_here
2Building the AI Voice Agent: A Step-by-Step Guide
Below is the complete code for your AI Voice Agent. We'll break it down in the following sections.
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Agent specialized in property management. Your persona is that of a knowledgeable and efficient property management assistant. Your primary capabilities include answering questions related to property listings, scheduling property viewings, providing information on rental agreements, and assisting with maintenance requests. You can also offer guidance on property management best practices and connect callers with human agents for complex inquiries. However, you are not a licensed real estate agent, and you must inform users that any legal or financial advice should be sought from qualified professionals. Additionally, you should not handle any transactions or collect personal financial information from users."
14
15class MyVoiceAgent(Agent):
16 def __init__(self):
17 super().__init__(instructions=agent_instructions)
18 async def on_enter(self): await self.session.say("Hello! How can I help?")
19 async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22 # Create agent and conversation flow
23 agent = MyVoiceAgent()
24 conversation_flow = ConversationFlow(agent)
25
26 # Create pipeline
27 pipeline = CascadingPipeline(
28 stt=DeepgramSTT(model="nova-2", language="en"),
29 llm=OpenAILLM(model="gpt-4o"),
30 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31 vad=SileroVAD(threshold=0.35),
32 turn_detector=TurnDetector(threshold=0.8)
33 )
34
35 session = AgentSession(
36 agent=agent,
37 pipeline=pipeline,
38 conversation_flow=conversation_flow
39 )
40
41 try:
42 await context.connect()
43 await session.start()
44 # Keep the session running until manually terminated
45 await asyncio.Event().wait()
46 finally:
47 # Clean up resources when done
48 await session.close()
49 await context.shutdown()
50
51def make_context() -> JobContext:
52 room_options = RoomOptions(
53 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
54 name="VideoSDK Cascaded Agent",
55 playground=True
56 )
57
58 return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62 job.start()
63Step 4.1: Generating a VideoSDK Meeting ID
Before running your agent, you'll need a meeting ID. Use the following
curl command to generate one:1curl -X POST https://api.videosdk.live/v1/rooms -H "Authorization: YOUR_API_KEY"
2Step 4.2: Creating the Custom Agent Class
The
MyVoiceAgent class extends the Agent class from the VideoSDK framework. It defines the agent's behavior when a session starts or ends:1class MyVoiceAgent(Agent):
2 def __init__(self):
3 super().__init__(instructions=agent_instructions)
4 async def on_enter(self): await self.session.say("Hello! How can I help?")
5 async def on_exit(self): await self.session.say("Goodbye!")
6This class uses predefined instructions to guide its interactions, ensuring it stays within its role as a property management assistant.
Step 4.3: Defining the Core Pipeline
The
CascadingPipeline is where the magic happens. It connects various plugins to process audio and text:1pipeline = CascadingPipeline(
2 stt=DeepgramSTT(model="nova-2", language="en"),
3 llm=OpenAILLM(model="gpt-4o"),
4 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5 vad=SileroVAD(threshold=0.35),
6 turn_detector=TurnDetector(threshold=0.8)
7)
8Each component plays a critical role:
- STT (DeepgramSTT): Converts speech to text.
- LLM (OpenAILLM): Processes the text to generate responses.
- TTS (ElevenLabsTTS): Converts the text responses back into speech.
- VAD (
Silero Voice Activity Detection
): Detects when the user is speaking. - TurnDetector: Determines when the agent should respond.
Step 4.4: Managing the Session and Startup Logic
The
start_session function manages the agent's lifecycle, ensuring it connects and interacts correctly:1async def start_session(context: JobContext):
2 # Create agent and conversation flow
3 agent = MyVoiceAgent()
4 conversation_flow = ConversationFlow(agent)
5
6 # Create pipeline
7 pipeline = CascadingPipeline(
8 stt=DeepgramSTT(model="nova-2", language="en"),
9 llm=OpenAILLM(model="gpt-4o"),
10 tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
11 vad=SileroVAD(threshold=0.35),
12 turn_detector=TurnDetector(threshold=0.8)
13 )
14
15 session = [AI voice Agent Sessions](https://docs.videosdk.live/ai_agents/core-components/agent-session)(
16 agent=agent,
17 pipeline=pipeline,
18 conversation_flow=conversation_flow
19 )
20
21 try:
22 await context.connect()
23 await session.start()
24 # Keep the session running until manually terminated
25 await asyncio.Event().wait()
26 finally:
27 # Clean up resources when done
28 await session.close()
29 await context.shutdown()
30
31def make_context() -> JobContext:
32 room_options = RoomOptions(
33 # room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
34 name="VideoSDK Cascaded Agent",
35 playground=True
36 )
37
38 return JobContext(room_options=room_options)
39Finally, the main block starts the agent:
1if __name__ == "__main__":
2 job = WorkerJob(entrypoint=start_session, jobctx=make_context)
3 job.start()
4Running and Testing the Agent
Step 5.1: Running the Python Script
Run the script using the following command:
1python main.py
2Step 5.2: Interacting with the Agent in the Playground
Once the script is running, check the console for a playground link. Open it in your browser to interact with your agent. Speak into your microphone to test its responses.
Advanced Features and Customizations
Extending Functionality with Custom Tools
You can extend your agent's capabilities by integrating custom tools. This involves creating new functions that the agent can call during its operation.
Exploring Other Plugins
The VideoSDK framework supports various plugins for STT, LLM, and TTS. Explore options like Cartesia for STT or Google Gemini for LLM to enhance your agent's performance.
Troubleshooting Common Issues
API Key and Authentication Errors
Ensure your API key is correctly set in the
.env file. Double-check for typos or missing entries.Audio Input/Output Problems
Verify that your microphone and speakers are working correctly. Check your system settings and permissions.
Dependency and Version Conflicts
Ensure all packages are up-to-date and compatible with Python 3.11+. Use a virtual environment to avoid conflicts.
Conclusion
Summary of What You've Built
You've successfully built an AI Voice Agent for property management using VideoSDK. This agent can handle property-related inquiries, schedule viewings, and more.
Next Steps and Further Learning
To further enhance your agent, consider exploring additional plugins and custom tools. Continue learning about AI and voice technologies to expand your skill set.
Want to level-up your learning? Subscribe now
Subscribe to our newsletter for more tech based insights
FAQ