In this blog, you'll learn how to add an AI Avatar to a VideoSDK agent in a straightforward, practical way. By the end, you’ll have a real-time, talking digital assistant with a face, a voice, and the power to answer live weather questions — all running in your browser.
Project Architecture
├── main.py # Main agent implementation
├── requirements.txt # Python dependencies
├── mcp_weather.py # Weather MCP server
├── .env.example # Environment variables template
└── README.md # This file
Set Up Your Python Project
We'll build this project in Python. Start by ensuring your environment is ready and all required dependencies are installed.
Create and Activate a Virtual Environment
python -m venv .venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
Install the Required Dependencies
Create a requirements.txt
file and add these lines:
videosdk-agents
videosdk-plugins-google
videosdk-plugins-simli
python-dotenv
fastmcp
Then install them:
pip install -r requirements.txt
The Big Picture: How the Pieces Connect
Before diving into the code, let’s map out the core components and how they interact:
- VideoSDK Agent
The “director” that orchestrates everything. It manages the session, connects to the playground, and coordinates the avatar, voice, and tools. - Google Gemini (via VideoSDK plugin)
The “brain” of your agent, responsible for understanding what you say and generating natural-sounding replies in real time. - Simli Avatar (via VideoSDK plugin)
The “face” and “voice” of your agent. It animates and speaks the responses generated by Gemini, making the agent feel alive. - MCP Weather Tool (Model Context Protocol)
The “specialist prop master.” When the conversation calls for weather info, the agent calls out to this separate process, which fetches live weather data and returns it as dialogue.
How it all works in a conversation:
- You speak to the avatar in the browser or a mobile application (using the VideoSDK playground).
- The agent (
main.py
) receives your message, processes it with Gemini, and speaks the response using Simli. - If you ask about the weather, the agent reaches out to the MCP weather tool (
mcp_weather.py
), which fetches the answer and brings it into the conversation in real time.
For more on how the playground works, check out the VideoSDK AI Playground documentation.
The Heart of the Show — The Key Files
main.py
— The Orchestrator
main.py
— The OrchestratorThis is the main script where the “performance” comes together:
- It configures your AI agent with a voice, a face, and the ability to call out to external tools (like the weather server).
- When you run it, it spins up a VideoSDK room and connects your agent to the browser-based playground, ready to talk in real time.
import asyncio
import sys
from pathlib import Path
import requests
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, MCPServerStdio
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.simli import SimliAvatar, SimliConfig
from dotenv import load_dotenv
import os
load_dotenv(override=True)
def get_room_id(auth_token: str) -> str:
url = "https://api.videosdk.live/v2/rooms"
headers = {
"Authorization": auth_token
}
response = requests.post(url, headers=headers)
response.raise_for_status()
return response.json()["roomId"]
class MyVoiceAgent(Agent):
def __init__(self):
mcp_script_weather = Path(__file__).parent / "mcp_weather.py"
super().__init__(
instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather help with other tasks in real-time.",
mcp_servers = [
MCPServerStdio(
command=sys.executable,
args=[str(mcp_script_weather)],
client_session_timeout_seconds=30
)
]
)
async def on_enter(self) -> None:
await self.session.say("Hello! I'm your real-time AI avatar assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Goodbye! It was great talking with you!")
async def start_session(context: JobContext):
# Initialize Gemini Realtime model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="AIzaSyBHRRbLb280VP4bj7sYN1tuJJSFRjxrKrY",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
# Initialize Simli Avatar
simli_config = SimliConfig(
apiKey="l7g8sozma6clp3ecefwb6",
)
simli_avatar = SimliAvatar(config=simli_config)
# Create pipeline with avatar
pipeline = RealTimePipeline(
model=model,
avatar=simli_avatar
)
session = AgentSession(
agent=MyVoiceAgent(),
pipeline=pipeline
)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
def make_context() -> JobContext:
auth_token = os.getenv("VIDEOSDK_AUTH_TOKEN")
room_id = get_room_id(auth_token)
room_options = RoomOptions(
room_id=room_id,
auth_token=auth_token,
name="Simli Avatar Realtime Agent",
playground=True
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
mcp_weather.py
— The Weather Specialist (MCP Tool)
mcp_weather.py
— The Weather Specialist (MCP Tool)About MCP:
The Model Context Protocol (MCP) allows your agent to “call out” to external specialists when it doesn’t know something itself. In this project, mcp_weather.py
is that specialist: a dedicated service that fetches live weather data for any city using the OpenWeatherMap API. When you ask your agent about the weather, it seamlessly passes your request to this MCP tool and brings the answer back, all in real time.
from fastmcp import FastMCP
import httpx
import os
from dotenv import load_dotenv
load_dotenv(override=True)
OPENWEATHER_API_KEY = os.getenv("OPENWEATHER_API_KEY")
# Replace with your actual OpenWeatherMap API key
OPENWEATHER_URL = "https://api.openweathermap.org/data/2.5/weather"
mcp = FastMCP("CurrentWeatherServer")
@mcp.tool()
async def get_current_weather(city: str) -> str:
"""
Get the current weather for a given city using OpenWeatherMap API.
"""
params = {
"q": city,
"appid": OPENWEATHER_API_KEY,
"units": "metric"
}
async with httpx.AsyncClient() as client:
try:
response = await client.get(OPENWEATHER_URL, params=params, timeout=10)
# Better error handling for authorization issues
if response.status_code == 401:
return f"Authorization error: Invalid API key. Please check your OpenWeatherMap API key."
elif response.status_code == 404:
return f"City '{city}' not found. Please check the spelling."
response.raise_for_status()
data = response.json()
weather = data["weather"][0]["description"].capitalize()
temp = data["main"]["temp"]
feels_like = data["main"]["feels_like"]
humidity = data["main"]["humidity"]
wind_speed = data.get("wind", {}).get("speed", "N/A")
return (f"Hi Sumit!, Current weather in {city}:\n"
f"{weather}, temperature: {temp}°C, feels like: {feels_like}°C.\n"
f"Humidity: {humidity}%, Wind speed: {wind_speed} m/s")
except httpx.RequestError as e:
return f"Network error: Could not retrieve weather data for {city}: {e}"
except Exception as e:
return f"Could not retrieve weather data for {city}: {e}"
if __name__ == "__main__":
mcp.run(transport="stdio")
Step-by-Step: Bringing Your AI Avatar to Life
- Set up your environment variables:
- Copy
.env.example
to.env
- Copy
- Talk to your AI avatar!
- Say hello, ask about the weather (“What’s the weather in London?”), or have a general conversation.
- The avatar will speak and respond using Gemini and Simli, and fetch live weather using the MCP tool.
Open the VideoSDK playground URL printed in your terminal.
This will look like:
https://playground.videosdk.live?token=...&meetingId=...
Run your agent:
python main.py
Fill out the following in .env
:
VIDEOSDK_AUTH_TOKEN=your-videosdk-token
SIMLI_API_KEY=your-simli-api-key
SIMLI_FACE_ID=your-simli-face-id
OPENWEATHER_API_KEY=your-openweathermap-key
GOOGLE_API_KEY=your-google-api-key
Now step onto the stage, run the code, and meet your creation. The talking avatar is waiting. What will you say first?
You can dive deeper into the playground and agent capabilities in the VideoSDK AI Playground documentation.