What is an AI Voice Agent?

An AI Voice Agent is a software system that uses natural language processing to interact with users through voice commands, providing responses and performing tasks.

Why use AI Voice Agents for ticket booking?

AI Voice Agents streamline the booking process by providing quick and efficient interaction, reducing the need for manual input and improving user experience.

How do I generate a VideoSDK meeting ID?

Use a `curl` command with your API key to generate a meeting ID from the VideoSDK API.

What should I do if I encounter API key errors?

Ensure your API keys are correctly configured in the `.env` file and check that your VideoSDK account is active.

Build an AI Voice Agent for Ticket Booking

Q: What are the core components of a Voice Agent?

The core components include Speech-to-Text (STT), Language Learning Models (LLM), and Text-to-Speech (TTS) technologies.

Step-by-step guide to building an AI voice agent for ticket booking using VideoSDK, complete with code examples and testing instructions.

Introduction to AI Voice Agents in Ticket Booking

AI Voice Agents are revolutionizing the way we interact with technology, providing a seamless interface for users to communicate with systems using natural language. In the context of ticket booking, these agents can streamline the process of finding and reserving tickets for events such as concerts, movies, and flights. By leveraging speech-to-text (STT), language learning models (LLM), and text-to-speech (TTS) technologies, AI Voice Agents can understand user queries, process them, and respond in a human-like manner.

In this tutorial, you will learn how to build an AI Voice Agent specifically designed for ticket booking. We will guide you through the setup process, implementation, and testing, using the VideoSDK framework.

Architecture and Core Concepts

High-Level Architecture Overview

The architecture of an AI Voice Agent involves several components that work together to process audio input and generate audio output. The process begins with the user speaking into a microphone, which is then captured by the agent. The speech is converted into text using STT, processed by an LLM to generate a response, and finally converted back into speech using TTS.

1sequenceDiagram
2    participant User
3    participant Agent
4    participant STT
5    participant LLM
6    participant TTS
7    User->>Agent: Speak query
8    Agent->>STT: Convert speech to text
9    STT->>Agent: Text
10    Agent->>LLM: Process text
11    LLM->>Agent: Response
12    Agent->>TTS: Convert text to speech
13    TTS->>Agent: Audio
14    Agent->>User: Speak response
15

Understanding Key Concepts in the VideoSDK Framework

Agent: The core class that represents your bot, responsible for handling user interactions.
CascadingPipeline: Manages the flow of audio processing, integrating STT, LLM, and TTS. Learn more about the
Cascading pipeline in AI voice Agents
.
VAD & TurnDetector: These components help the agent determine when to listen and when to respond, ensuring smooth interactions.

Setting Up the Development Environment

Prerequisites

Before you begin, ensure you have Python 3.11+ installed and a VideoSDK account, which you can create at the VideoSDK website.

Step 1: Create a Virtual Environment

Create a virtual environment to manage your project dependencies. Run the following command in your terminal:

1python -m venv myenv
2source myenv/bin/activate  # On Windows use `myenv\\Scripts\\activate`
3

Step 2: Install Required Packages

Install the necessary packages using pip:

1pip install videosdk-agents videosdk-plugins
2

Step 3: Configure API Keys in a `.env` file

Create a .env file in your project directory and add your API keys. This file should contain your VideoSDK authentication token and any other necessary keys.

Building the AI Voice Agent: A Step-by-Step Guide

To build your AI Voice Agent, we'll start by presenting the complete, runnable code. Then, we'll break it down to explain each part.

1import asyncio, os
2from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
3from videosdk.plugins.silero import SileroVAD
4from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
5from videosdk.plugins.deepgram import DeepgramSTT
6from videosdk.plugins.openai import OpenAILLM
7from videosdk.plugins.elevenlabs import ElevenLabsTTS
8from typing import AsyncIterator
9
10# Pre-downloading the Turn Detector model
11pre_download_model()
12
13agent_instructions = "You are a friendly and efficient AI Voice Agent specialized in assisting users with ticket booking. Your primary role is to help users find and book tickets for various events, including concerts, movies, and flights. You can provide information about available events, ticket prices, and seating options. You can also guide users through the booking process and confirm their reservations. However, you are not authorized to handle payments directly, and you must inform users to complete transactions through secure payment gateways. Additionally, you should remind users to check event details and terms and conditions before finalizing their bookings. Always prioritize user privacy and data security in your interactions."
14
15class MyVoiceAgent(Agent):
16    def __init__(self):
17        super().__init__(instructions=agent_instructions)
18    async def on_enter(self): await self.session.say("Hello! How can I help?")
19    async def on_exit(self): await self.session.say("Goodbye!")
20
21async def start_session(context: JobContext):
22    # Create agent and conversation flow
23    agent = MyVoiceAgent()
24    conversation_flow = ConversationFlow(agent)
25
26    # Create pipeline
27    pipeline = CascadingPipeline(
28        stt=DeepgramSTT(model="nova-2", language="en"),
29        llm=OpenAILLM(model="gpt-4o"),
30        tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
31        vad=SileroVAD(threshold=0.35),
32        turn_detector=TurnDetector(threshold=0.8)
33    )
34
35    session = AgentSession(
36        agent=agent,
37        pipeline=pipeline,
38        conversation_flow=conversation_flow
39    )
40
41    try:
42        await context.connect()
43        await session.start()
44        # Keep the session running until manually terminated
45        await asyncio.Event().wait()
46    finally:
47        # Clean up resources when done
48        await session.close()
49        await context.shutdown()
50
51def make_context() -> JobContext:
52    room_options = RoomOptions(
53    #  room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
54        name="VideoSDK Cascaded Agent",
55        playground=True
56    )
57
58    return JobContext(room_options=room_options)
59
60if __name__ == "__main__":
61    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
62    job.start()
63

Step 4.1: Generating a VideoSDK Meeting ID

To generate a meeting ID, use the following curl command. This ID is required for the agent to join a session.

1curl -X POST "https://api.videosdk.live/v1/meetings" \
2-H "Authorization: YOUR_API_KEY"
3

Step 4.2: Creating the Custom Agent Class

The MyVoiceAgent class is where you define the behavior of your AI Voice Agent. It inherits from the Agent class and uses the agent_instructions to guide its interactions.

1class MyVoiceAgent(Agent):
2    def __init__(self):
3        super().__init__(instructions=agent_instructions)
4    async def on_enter(self): await self.session.say("Hello! How can I help?")
5    async def on_exit(self): await self.session.say("Goodbye!")
6

Step 4.3: Defining the Core Pipeline

The CascadingPipeline integrates various plugins to handle STT, LLM, TTS, VAD, and Turn Detection. Each plugin plays a crucial role in processing audio and generating responses. For more details, refer to the

Voice Agent Quick Start Guide

1pipeline = CascadingPipeline(
2    stt=DeepgramSTT(model="nova-2", language="en"),
3    llm=OpenAILLM(model="gpt-4o"),
4    tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
5    vad=SileroVAD(threshold=0.35),
6    turn_detector=TurnDetector(threshold=0.8)
7)
8

Step 4.4: Managing the Session and Startup Logic

The start_session function initializes the agent session, connecting the agent to the conversation flow and pipeline. The make_context function sets up the room options for testing.

1async def start_session(context: JobContext):
2    # Create agent and conversation flow
3    agent = MyVoiceAgent()
4    conversation_flow = ConversationFlow(agent)
5
6    # Create pipeline
7    pipeline = CascadingPipeline(...)
8
9    session = AgentSession(
10        agent=agent,
11        pipeline=pipeline,
12        conversation_flow=conversation_flow
13    )
14
15    try:
16        await context.connect()
17        await session.start()
18        # Keep the session running until manually terminated
19        await asyncio.Event().wait()
20    finally:
21        # Clean up resources when done
22        await session.close()
23        await context.shutdown()
24

Running and Testing the Agent

Step 5.1: Running the Python Script

To run your agent, execute the following command in your terminal:

1python main.py
2

Step 5.2: Interacting with the Agent in the Playground

When you run the script, a test URL will be provided in the console. Use this URL to interact with your AI Voice Agent in the

AI Agent playground

Advanced Features and Customizations

Extending Functionality with Custom Tools

The VideoSDK framework allows for the integration of custom tools, enabling you to extend the functionality of your AI Voice Agent beyond the default capabilities.

Exploring Other Plugins

While this tutorial uses specific plugins, VideoSDK supports a variety of STT, LLM, and TTS options that you can explore to enhance your agent's performance. Consider using the

Deepgram STT Plugin for voice agent

OpenAI LLM Plugin for voice agent

, and

ElevenLabs TTS Plugin for voice agent

to customize your agent's capabilities further.

Troubleshooting Common Issues

API Key and Authentication Errors

Ensure your API keys are correctly configured in the .env file and that your VideoSDK account is active.

Audio Input/Output Problems

Verify that your microphone and speakers are properly connected and configured.

Dependency and Version Conflicts

Check that all required packages are installed and compatible with your Python version.

Conclusion

In this tutorial, you have built a functional AI Voice Agent for ticket booking using the VideoSDK framework. You learned how to set up the development environment, implement the agent, and test it in a playground. As next steps, consider exploring additional features and plugins to further enhance your agent's capabilities, including understanding

AI voice Agent Sessions

and utilizing

Silero Voice Activity Detection

for improved interaction management.

Start Building With Free $20 Balance

No credit card required to start.

Want to level-up your learning? Subscribe now

Subscribe to our newsletter for more tech based insights

FAQ

Free $20 Balance for AI Voice Agents & Video Calls