This guide provides a step-by-step tutorial on how to build, containerize, and deploy a fully functional AI telephony agent using VideoSDK open source agentSDK. We will cover the complete workflow, from writing the agent's logic in main.py to configuring SIP trunks for live inbound and outbound phone calls.

If you're a developer to bridge the gap between your AI models and real-world telephony, you're in the right place. Follow along to turn a few simple files into a globally accessible voice agent.

Project Directory

This simple structure is our final goal for the worker. By following along, you'll create this complete project from scratch.

worker/
├── Dockerfile              # Instructions to build the Docker container
├── main.py                 # The core logic for your AI voice agent
├── requirements.txt        # Python package dependencies
├── .env                    # Environments variable
└── videosdk.yaml           # VideoSDK configuration for deploying the agent

Architecture - Inbound/Outbound Calls

This architecture shows how your AI Voice Agent connects to the global phone network for both inbound and outbound SIP calls.

Our container-based deployment flow handles the complex telephony.

Video SDK Image

Setting Up Agent Worker

Let's build our main.py. This file uses three core VideoSDK components to define our agent's logic and connect it to a call.

  1. Agent:

This class defines your agent's personality and conversational flow. You can also integrate advanced protocols like MCP and Agent 2 Agent to provide external context to your agent.

  1. Pipeline:

The Pipeline is the engine that processes the audio stream. It takes the user's voice as input and produces the agent's voice as output. VideoSDK offers two types of pipelines to fit your needs

This is the audio processing engine. You can choose between two types:

Video SDK Image
Video SDK Image
  1. Agent Session

This brings it all together, connecting your Agent and Pipeline to a live VideoSDK Room to start the call.

Project Modules requirements.txt

videosdk-agents
videosdk-plugins-google
videosdk
python-dotenv

Here’s how these components are implemented in the code main.py:

import asyncio
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, MCPServerStdio
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
import os

load_dotenv(override=True)

# Agent Component
class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather help with other tasks in real-time.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your real-time AI avatar assistant. How can I help you today?")
    
    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")

async def start_session(context: JobContext):

    # Initialize Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
        api_key=os.getenv("GOOGLE_API_KEY"), 
        config=GeminiLiveConfig(
            voice="Leda",  # Puck, Charon etc
            response_modalities=["AUDIO"]
        )
    )

    # Create pipeline with avatar
    pipeline = RealTimePipeline(
        model=model,
    )
    
    session = AgentSession(
        agent=MyVoiceAgent(),
        pipeline=pipeline
    )

    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

def make_context() -> JobContext:
    room_options = RoomOptions(
        auth_token=os.getenv("VIDEOSDK_TOKEN"),
        room_id="ln4i-tuwm-yzkq",
        name="AI Agent",
        playground=True,
        recording=False
    )
    return JobContext(room_options=room_options)


if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start() 

Running the Agent Locally

Before we containerize our agent, let's run the Python script directly in a local development environment. This is the fastest way to test your agent's core logic.

Prerequisite: Ensure you have Python 3.12 or newer installed on your machine.

1. Create and Activate a Virtual Environment

First, open your terminal in the worker directory. It's a best practice to use a virtual environment to manage project dependencies.

Create the environment:

python3 -m venv .venv

Next, Activate it! Command differ based on your environment

  • On MacOS/Linux
source .venv/bin/activate
  • On Windows
.venv\Scripts\activate

You'll know the environment is active when you see (.venv) at the beginning of your terminal prompt.

2. Install Dependencies

With the virtual environment active, install the necessary Python packages listed in your requirements.txt file:

pip install -r requirements.txt

3. Run the Python Script

Finally, run the agent:

python main.py

This will start your agent and connect it to a playground session, as defined by playground=True in the make_context() function in your code. You can now interact with it for initial testing before moving on to a full deployment.

Build and Test Your Worker

Before deploying our agent to the cloud, it's crucial to ensure it runs correctly on our local machine. The VideoSDK CLI makes this incredibly simple.

In your terminal, navigate to the worker directory and run the following command:

videosdk run

This command reads your videosdk.yaml and Dockerfile, builds a local container, and starts your agent. You should see an output confirming that your worker is running. This is the perfect time to test your agent's basic functionality in a playground environment to catch any bugs before going live.

Video SDK Image

Deploy Your Agent to the Cloud

Once you've confirmed the agent works locally, it's time to deploy it to VideoSDK's global infrastructure.

1. Configure the Deployment Manifest

First, make sure your videosdk.yaml file is configured for a cloud deployment. The most important line is cloud: true, which tells the CLI to push your container to our servers instead of just running it locally.

version: "1.0"
deployment:
  id: 82fed3a5-5316-4273-b29f-4bb26e885842
  entry:
    path: main.py

deploy:
  cloud: true

env:
  path: "./.env"

secrets:
  VIDEOSDK_AUTH_TOKEN: # your_videosdk_token

2. Deploy with a Single Command

Now for the magic. Run the deploy command:

videosdk deploy

The CLI will now package your worker, build the container, and upload it to the VideoSDK cloud. You'll see a live log of the progress.

Upon completion, you will get a Success! message along with your unique Worker ID.

Video SDK Image

Crucial Step: Copy this Worker ID! You will need this unique identifier in the next step to connect your deployed agent to a phone number using a Routing Rule in the VideoSDK dashboard.

Connect Your Agent to the Phone Network

Now we'll use the VideoSDK dashboard to connect our deployed agent to a real phone number.

1. Set up an Inbound Gateway

This tells your SIP provider (e.g., Twilio) where to forward incoming calls.

  • In the VideoSDK Dashboard, go to Telephony , in Inbound Gateways and click on Add Inbound Gateway.
  • Give it a name and add your phone number.
  • The dashboard will generate a unique Inbound Gateway URL. Copy this URL.
  • In your SIP provider's dashboard (like Twilio), paste this URL into the Origination SIP URI field for your SIP trunk.
Video SDK Image

2. Set up an Outbound Gateway

This tells VideoSDK where to send outgoing calls from your agent.

  • Go to Telephony Outbound Gateways and click Add Outbound Gateway.
  • Give it a name and add your phone number.
  • In the Address field, enter the Termination SIP URI provided by your SIP provider.
Video SDK Image

3. Create a Routing Rule

This is the final step that connects everything. The routing rule links a phone number to your specific deployed agent.

  • Go to Telephony, in Routing Rules click on Add Routing Rules Gateway.
  • Select a direction (e.g., inbound).
  • Choose the gateway you just created.
  • Set Agent Type to Cloud.
  • In the Deployment ID field, paste the Worker ID from your videosdk.yaml file.
  • Click Create.
Video SDK Image

Making an Outbound Call

To trigger an outbound call from your agent, you can make a simple API request to the VideoSDK SIP endpoint.

Use a POST request with your VIDEOSDK_TOKEN for authorization. In the body, specify the gatewayId (from your Outbound Gateway) and the phone number to call in sipCallTo.

curl --request POST \\\\
  --url <https://api.videosdk.live/v2/sip/call> \\\\
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
  --header 'Content-Type: application/json' \\\\
  --data '{
    "gatewayId": "gw_123456789",
    "sipCallTo": "+14155550123"
  }'

Managing Agent Sessions Programmatically

While routing rules automatically manage sessions, you can also control them via API for advanced use cases, like starting a session on demand or for cost management.

Start a Deployment Session:

curl --request POST \\\\
  --url <https://api.videosdk.live/ai/v1/ai-deployment-sessions/start> \\\\
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
  --header 'Content-Type: application/json' \\\\
  --data '{
    "deploymentId": "<your-deployment-id>"
  }'

End a Deployment Session:

curl --request POST \\\\
  --url <https://api.videosdk.live/ai/v1/ai-deployment-sessions/end> \\\\
  --header 'Authorization: YOUR_VIDEOSDK_TOKEN' \\\\
  --header 'Content-Type: application/json' \\\\
  --data '{
    "sessionId": "<session-id-from-start-response>"
  }'

Next Step

That's it! You've successfully built a Python AI agent, deployed it to the cloud, and connected it to the global telephone network for both inbound and outbound calls.