This guide provides a step-by-step tutorial on how to build, containerize, and deploy a fully functional AI telephony agent using VideoSDK open source agentSDK. We will cover the complete workflow, from writing the agent's logic in main.py to configuring SIP trunks for live inbound and outbound phone calls.
If you're a developer to bridge the gap between your AI models and real-world telephony, you're in the right place. Follow along to turn a few simple files into a globally accessible voice agent.
Project Directory
This simple structure is our final goal for the worker. By following along, you'll create this complete project from scratch.
worker/
├── Dockerfile # Instructions to build the Docker container
├── main.py # The core logic for your AI voice agent
├── requirements.txt # Python package dependencies
├── .env # Environments variable
└── videosdk.yaml # VideoSDK configuration for deploying the agent
Architecture - Inbound/Outbound Calls
This architecture shows how your AI Voice Agent connects to the global phone network for both inbound and outbound SIP calls.
Our container-based deployment flow handles the complex telephony.
Setting Up Agent Worker
Let's build our main.py
. This file uses three core VideoSDK components to define our agent's logic and connect it to a call.
- Agent:
This class defines your agent's personality and conversational flow. You can also integrate advanced protocols like MCP and Agent 2 Agent to provide external context to your agent.
- Pipeline:
The Pipeline
is the engine that processes the audio stream. It takes the user's voice as input and produces the agent's voice as output. VideoSDK offers two types of pipelines to fit your needs
This is the audio processing engine. You can choose between two types:
- Agent Session
This brings it all together, connecting your Agent and Pipeline to a live VideoSDK Room to start the call.
Project Modules requirements.txt
videosdk-agents
videosdk-plugins-google
videosdk
python-dotenv
Here’s how these components are implemented in the code main.py
:
import asyncio
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, MCPServerStdio
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
import os
load_dotenv(override=True)
# Agent Component
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather help with other tasks in real-time.",
)
async def on_enter(self) -> None:
await self.session.say("Hello! I'm your real-time AI avatar assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Goodbye! It was great talking with you!")
async def start_session(context: JobContext):
# Initialize Gemini Realtime model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key=os.getenv("GOOGLE_API_KEY"),
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon etc
response_modalities=["AUDIO"]
)
)
# Create pipeline with avatar
pipeline = RealTimePipeline(
model=model,
)
session = AgentSession(
agent=MyVoiceAgent(),
pipeline=pipeline
)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
def make_context() -> JobContext:
room_options = RoomOptions(
auth_token=os.getenv("VIDEOSDK_TOKEN"),
room_id="ln4i-tuwm-yzkq",
name="AI Agent",
playground=True,
recording=False
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
Running the Agent Locally
Before we containerize our agent, let's run the Python script directly in a local development environment. This is the fastest way to test your agent's core logic.
Prerequisite: Ensure you have Python 3.12 or newer installed on your machine.
1. Create and Activate a Virtual Environment
First, open your terminal in the worker directory. It's a best practice to use a virtual environment to manage project dependencies.
Create the environment:
python3 -m venv .venv
Next, Activate it! Command differ based on your environment
- On MacOS/Linux
source .venv/bin/activate
- On Windows
.venv\Scripts\activate
You'll know the environment is active when you see (.venv) at the beginning of your terminal prompt.
2. Install Dependencies
With the virtual environment active, install the necessary Python packages listed in your requirements.txt file:
pip install -r requirements.txt
3. Run the Python Script
Finally, run the agent:
python main.py
This will start your agent and connect it to a playground session, as defined by playground=True
in the make_context()
function in your code. You can now interact with it for initial testing before moving on to a full deployment.
Build and Test Your Worker
Before deploying our agent to the cloud, it's crucial to ensure it runs correctly on our local machine. The VideoSDK CLI makes this incredibly simple.
In your terminal, navigate to the worker
directory and run the following command:
videosdk run
This command reads your videosdk.yaml
and Dockerfile
, builds a local container, and starts your agent. You should see an output confirming that your worker is running. This is the perfect time to test your agent's basic functionality in a playground environment to catch any bugs before going live.
Deploy Your Agent to the Cloud
Once you've confirmed the agent works locally, it's time to deploy it to VideoSDK's global infrastructure.
1. Configure the Deployment Manifest
First, make sure your videosdk.yaml
file is configured for a cloud deployment. The most important line is cloud: true
, which tells the CLI to push your container to our servers instead of just running it locally.
version: "1.0"
deployment:
id: 82fed3a5-5316-4273-b29f-4bb26e885842
entry:
path: main.py
deploy:
cloud: true
env:
path: "./.env"
secrets:
VIDEOSDK_AUTH_TOKEN: # your_videosdk_token
2. Deploy with a Single Command
Now for the magic. Run the deploy
command:
videosdk deploy
The CLI will now package your worker, build the container, and upload it to the VideoSDK cloud. You'll see a live log of the progress.
Upon completion, you will get a Success! message along with your unique Worker ID.
Crucial Step: Copy this Worker ID! You will need this unique identifier in the next step to connect your deployed agent to a phone number using a Routing Rule in the VideoSDK dashboard.
Connect Your Agent to the Phone Network
Now we'll use the VideoSDK dashboard to connect our deployed agent to a real phone number.
1. Set up an Inbound Gateway
This tells your SIP provider (e.g., Twilio) where to forward incoming calls.
- In the VideoSDK Dashboard, go to Telephony , in Inbound Gateways and click on Add Inbound Gateway.
- Give it a name and add your phone number.
- The dashboard will generate a unique Inbound Gateway URL. Copy this URL.
- In your SIP provider's dashboard (like Twilio), paste this URL into the Origination SIP URI field for your SIP trunk.
2. Set up an Outbound Gateway
This tells VideoSDK where to send outgoing calls from your agent.
- Go to Telephony Outbound Gateways and click Add Outbound Gateway.
- Give it a name and add your phone number.
- In the Address field, enter the Termination SIP URI provided by your SIP provider.
3. Create a Routing Rule
This is the final step that connects everything. The routing rule links a phone number to your specific deployed agent.
- Go to Telephony, in Routing Rules click on Add Routing Rules Gateway.
- Select a direction (e.g., inbound).
- Choose the gateway you just created.
- Set Agent Type to Cloud.
- In the Deployment ID field, paste the Worker ID from your videosdk.yaml file.
- Click Create.
Making an Outbound Call
To trigger an outbound call from your agent, you can make a simple API request to the VideoSDK SIP endpoint.
Use a POST
request with your VIDEOSDK_TOKEN
for authorization. In the body, specify the gatewayId
(from your Outbound Gateway) and the phone number to call in sipCallTo
.
curl --request POST \\\\
--url <https://api.videosdk.live/v2/sip/call> \\\\
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
--header 'Content-Type: application/json' \\\\
--data '{
"gatewayId": "gw_123456789",
"sipCallTo": "+14155550123"
}'
Managing Agent Sessions Programmatically
While routing rules automatically manage sessions, you can also control them via API for advanced use cases, like starting a session on demand or for cost management.
Start a Deployment Session:
curl --request POST \\\\
--url <https://api.videosdk.live/ai/v1/ai-deployment-sessions/start> \\\\
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
--header 'Content-Type: application/json' \\\\
--data '{
"deploymentId": "<your-deployment-id>"
}'
End a Deployment Session:
curl --request POST \\\\
--url <https://api.videosdk.live/ai/v1/ai-deployment-sessions/end> \\\\
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
--header 'Content-Type: application/json' \\\\
--data '{
"sessionId": "<session-id-from-start-response>"
}'
Next Step
That's it! You've successfully built a Python AI agent, deployed it to the cloud, and connected it to the global telephone network for both inbound and outbound calls.