Build an AI Voice Agent for Property Management

Step-by-step guide to building an AI Voice Agent for property management using VideoSDK.

Introduction to AI Voice Agents in AI Call for Property Management

AI Voice Agents are revolutionizing the way businesses interact with customers by providing automated, intelligent responses to voice queries. In the context of property management, these agents can handle inquiries about property listings, schedule viewings, provide rental agreement details, and assist with maintenance requests. By automating these tasks, property management companies can improve efficiency and customer satisfaction.

What is an AI

Voice Agent

?

An AI

Voice Agent

is a software application designed to interact with users through voice commands. It uses technologies like Speech-to-Text (STT), Language Learning Models (LLM), and Text-to-Speech (TTS) to understand and respond to user queries.

Why are they important for the AI call for property management industry?

AI Voice Agents streamline operations by handling routine inquiries, thus freeing up human agents for more complex tasks. They provide 24/7 service availability, ensuring that potential tenants or property owners can get the information they need anytime.

Core Components of a

Voice Agent

  • STT (Speech-to-Text): Converts spoken language into text.
  • LLM (Language Learning Model): Processes the text to understand and generate responses.
  • TTS (Text-to-Speech): Converts text responses back into spoken language.
For a comprehensive understanding, refer to the

AI voice Agent core components overview

.

What You'll Build in This Tutorial

In this tutorial, you'll build a fully functional AI

Voice Agent

tailored for property management using the VideoSDK framework. You'll learn how to set up the development environment, create an agent, and test it using the

AI Agent playground

.

Architecture and Core Concepts

High-Level Architecture Overview

The AI

Voice Agent

processes user speech through a series of stages: capturing audio, converting it to text, generating a response, and finally converting the response back to audio. This flow is managed by the VideoSDK framework, which integrates various plugins to handle each stage efficiently.
Diagram

Understanding Key Concepts in the VideoSDK Framework

  • Agent: The core class that represents your bot. It handles interactions with users.
  • CascadingPipeline: Manages the flow of data through STT, LLM, and TTS processes. Learn more about the

    Cascading pipeline in AI voice Agents

    .
  • VAD & TurnDetector: These components determine when the agent should listen and when it should speak.

Setting Up the Development Environment

Prerequisites

To get started, ensure you have Python 3.11+ installed and a VideoSDK account. You can sign up at app.videosdk.live.

Step 1: Create a Virtual Environment

Open your terminal and run the following commands to create and activate a virtual environment:
1python3 -m venv venv
2source venv/bin/activate  # On Windows use `venv\Scripts\activate`
3

Step 2: Install Required Packages

Install the necessary packages using pip:
1pip install videosdk
2

Step 3: Configure API Keys in a .env file

Create a .env file in your project directory and add your VideoSDK API key:
1VIDEOSDK_API_KEY=your_api_key_here
2

Building the AI Voice Agent: A Step-by-Step Guide

Below is the complete code for your AI Voice Agent. We'll break it down in the following sections.
1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are an AI Voice Agent specialized in property management. Your persona is that of a knowledgeable and efficient property management assistant. Your primary capabilities include answering questions related to property listings, scheduling property viewings, providing information on rental agreements, and assisting with maintenance requests. You can also offer guidance on property management best practices and connect callers with human agents for complex inquiries. However, you are not a licensed real estate agent, and you must inform users that any legal or financial advice should be sought from qualified professionals. Additionally, you should not handle any transactions or collect personal financial information from users."
14
15class MyVoiceAgent(Agent):
16    def __init__(self):
17        super().__init__(instructions=agent_instructions)
18    async def on_enter(self): await self.session.say("Hello! How can I help?")
19    async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22    # Create agent and conversation flow
23    agent = MyVoiceAgent()
24    conversation_flow = ConversationFlow(agent)
25
26    # Create pipeline
27    pipeline = CascadingPipeline(
28        stt=DeepgramSTT(model="nova-2", language="en"),
29        llm=OpenAILLM(model="gpt-4o"),
30        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31        vad=SileroVAD(threshold=0.35),
32        turn_detector=TurnDetector(threshold=0.8)
33    )
34
35    session = AgentSession(
36        agent=agent,
37        pipeline=pipeline,
38        conversation_flow=conversation_flow
39    )
40
41    try:
42        await context.connect()
43        await session.start()
44        # Keep the session running until manually terminated
45        await asyncio.Event().wait()
46    finally:
47        # Clean up resources when done
48        await session.close()
49        await context.shutdown()
50
51def make_context() -> JobContext:
52    room_options = RoomOptions(
53    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
54        name="VideoSDK Cascaded Agent",
55        playground=True
56    )
57
58    return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62    job.start()
63

Step 4.1: Generating a VideoSDK Meeting ID

Before running your agent, you'll need a meeting ID. Use the following curl command to generate one:
1curl -X POST https://api.videosdk.live/v1/rooms -H "Authorization: YOUR_API_KEY"
2

Step 4.2: Creating the Custom Agent Class

The MyVoiceAgent class extends the Agent class from the VideoSDK framework. It defines the agent's behavior when a session starts or ends:
1class MyVoiceAgent(Agent):
2    def __init__(self):
3        super().__init__(instructions=agent_instructions)
4    async def on_enter(self): await self.session.say("Hello! How can I help?")
5    async def on_exit(self): await self.session.say("Goodbye!")
6
This class uses predefined instructions to guide its interactions, ensuring it stays within its role as a property management assistant.

Step 4.3: Defining the Core Pipeline

The CascadingPipeline is where the magic happens. It connects various plugins to process audio and text:
1pipeline = CascadingPipeline(
2    stt=DeepgramSTT(model="nova-2", language="en"),
3    llm=OpenAILLM(model="gpt-4o"),
4    tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5    vad=SileroVAD(threshold=0.35),
6    turn_detector=TurnDetector(threshold=0.8)
7)
8
Each component plays a critical role:
  • STT (DeepgramSTT): Converts speech to text.
  • LLM (OpenAILLM): Processes the text to generate responses.
  • TTS (ElevenLabsTTS): Converts the text responses back into speech.
  • VAD (

    Silero Voice Activity Detection

    ):
    Detects when the user is speaking.
  • TurnDetector: Determines when the agent should respond.

Step 4.4: Managing the Session and Startup Logic

The start_session function manages the agent's lifecycle, ensuring it connects and interacts correctly:
1async def start_session(context: JobContext):
2    # Create agent and conversation flow
3    agent = MyVoiceAgent()
4    conversation_flow = ConversationFlow(agent)
5
6    # Create pipeline
7    pipeline = CascadingPipeline(
8        stt=DeepgramSTT(model="nova-2", language="en"),
9        llm=OpenAILLM(model="gpt-4o"),
10        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
11        vad=SileroVAD(threshold=0.35),
12        turn_detector=TurnDetector(threshold=0.8)
13    )
14
15    session = [AI voice Agent Sessions](https://docs.videosdk.live/ai_agents/core-components/agent-session)(
16        agent=agent,
17        pipeline=pipeline,
18        conversation_flow=conversation_flow
19    )
20
21    try:
22        await context.connect()
23        await session.start()
24        # Keep the session running until manually terminated
25        await asyncio.Event().wait()
26    finally:
27        # Clean up resources when done
28        await session.close()
29        await context.shutdown()
30
31def make_context() -> JobContext:
32    room_options = RoomOptions(
33    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
34        name="VideoSDK Cascaded Agent",
35        playground=True
36    )
37
38    return JobContext(room_options=room_options)
39
Finally, the main block starts the agent:
1if __name__ == "__main__":
2    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
3    job.start()
4

Running and Testing the Agent

Step 5.1: Running the Python Script

Run the script using the following command:
1python main.py
2

Step 5.2: Interacting with the Agent in the Playground

Once the script is running, check the console for a playground link. Open it in your browser to interact with your agent. Speak into your microphone to test its responses.

Advanced Features and Customizations

Extending Functionality with Custom Tools

You can extend your agent's capabilities by integrating custom tools. This involves creating new functions that the agent can call during its operation.

Exploring Other Plugins

The VideoSDK framework supports various plugins for STT, LLM, and TTS. Explore options like Cartesia for STT or Google Gemini for LLM to enhance your agent's performance.

Troubleshooting Common Issues

API Key and Authentication Errors

Ensure your API key is correctly set in the .env file. Double-check for typos or missing entries.

Audio Input/Output Problems

Verify that your microphone and speakers are working correctly. Check your system settings and permissions.

Dependency and Version Conflicts

Ensure all packages are up-to-date and compatible with Python 3.11+. Use a virtual environment to avoid conflicts.

Conclusion

Summary of What You've Built

You've successfully built an AI Voice Agent for property management using VideoSDK. This agent can handle property-related inquiries, schedule viewings, and more.

Next Steps and Further Learning

To further enhance your agent, consider exploring additional plugins and custom tools. Continue learning about AI and voice technologies to expand your skill set.

Start Building With Free $20 Balance

No credit card required to start.

Want to level-up your learning? Subscribe now

Subscribe to our newsletter for more tech based insights

FAQ