In this blog, you'll learn how to add an AI Avatar to a VideoSDK agent in a straightforward, practical way. By the end, you’ll have a real-time, talking digital assistant with a face, a voice, and the power to answer live weather questions — all running in your browser.

Project Architecture

├── main.py              # Main agent implementation
├── requirements.txt     # Python dependencies
├── mcp_weather.py       # Weather MCP server
├── .env.example         # Environment variables template
└── README.md            # This file

Set Up Your Python Project

We'll build this project in Python. Start by ensuring your environment is ready and all required dependencies are installed.

Create and Activate a Virtual Environment

python -m venv .venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate

Install the Required Dependencies

Create a requirements.txt file and add these lines:

videosdk-agents
videosdk-plugins-google
videosdk-plugins-simli
python-dotenv
fastmcp

Then install them:

pip install -r requirements.txt

The Big Picture: How the Pieces Connect

Before diving into the code, let’s map out the core components and how they interact:

  • VideoSDK Agent
    The “director” that orchestrates everything. It manages the session, connects to the playground, and coordinates the avatar, voice, and tools.
  • Google Gemini (via VideoSDK plugin)
    The “brain” of your agent, responsible for understanding what you say and generating natural-sounding replies in real time.
  • Simli Avatar (via VideoSDK plugin)
    The “face” and “voice” of your agent. It animates and speaks the responses generated by Gemini, making the agent feel alive.
  • MCP Weather Tool (Model Context Protocol)
    The “specialist prop master.” When the conversation calls for weather info, the agent calls out to this separate process, which fetches live weather data and returns it as dialogue.

How it all works in a conversation:

  1. You speak to the avatar in the browser or a mobile application (using the VideoSDK playground).
  2. The agent (main.py) receives your message, processes it with Gemini, and speaks the response using Simli.
  3. If you ask about the weather, the agent reaches out to the MCP weather tool (mcp_weather.py), which fetches the answer and brings it into the conversation in real time.

For more on how the playground works, check out the VideoSDK AI Playground documentation.

The Heart of the Show — The Key Files

main.py — The Orchestrator

This is the main script where the “performance” comes together:

  • It configures your AI agent with a voice, a face, and the ability to call out to external tools (like the weather server).
  • When you run it, it spins up a VideoSDK room and connects your agent to the browser-based playground, ready to talk in real time.
import asyncio
import sys
from pathlib import Path
import requests
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, MCPServerStdio
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.simli import SimliAvatar, SimliConfig
from dotenv import load_dotenv
import os

load_dotenv(override=True)

def get_room_id(auth_token: str) -> str:
    url = "https://api.videosdk.live/v2/rooms"
    headers = {
        "Authorization": auth_token
    }
    response = requests.post(url, headers=headers)
    response.raise_for_status()
    return response.json()["roomId"]

class MyVoiceAgent(Agent):
    def __init__(self):
        mcp_script_weather = Path(__file__).parent / "mcp_weather.py"
        super().__init__(
            instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather help with other tasks in real-time.",
            mcp_servers = [
                MCPServerStdio(
                command=sys.executable,
                args=[str(mcp_script_weather)],
                client_session_timeout_seconds=30
                )
                ]
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your real-time AI avatar assistant. How can I help you today?")
    
    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")
        

async def start_session(context: JobContext):
    # Initialize Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
        api_key="AIzaSyBHRRbLb280VP4bj7sYN1tuJJSFRjxrKrY", 
        config=GeminiLiveConfig(
            voice="Leda",  # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
            response_modalities=["AUDIO"]
        )
    )

    # Initialize Simli Avatar
    simli_config = SimliConfig(
        apiKey="l7g8sozma6clp3ecefwb6",
    )
    simli_avatar = SimliAvatar(config=simli_config)

    # Create pipeline with avatar
    pipeline = RealTimePipeline(
        model=model,
        avatar=simli_avatar
    )
    
    session = AgentSession(
        agent=MyVoiceAgent(),
        pipeline=pipeline
    )

    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

def make_context() -> JobContext:
    auth_token = os.getenv("VIDEOSDK_AUTH_TOKEN")
    room_id = get_room_id(auth_token)
    room_options = RoomOptions(
        room_id=room_id,
        auth_token=auth_token,
        name="Simli Avatar Realtime Agent",
        playground=True 
    )
    return JobContext(room_options=room_options)


if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start() 

mcp_weather.py — The Weather Specialist (MCP Tool)

About MCP:
The Model Context Protocol (MCP) allows your agent to “call out” to external specialists when it doesn’t know something itself. In this project, mcp_weather.py is that specialist: a dedicated service that fetches live weather data for any city using the OpenWeatherMap API. When you ask your agent about the weather, it seamlessly passes your request to this MCP tool and brings the answer back, all in real time.

from fastmcp import FastMCP
import httpx
import os
from dotenv import load_dotenv

load_dotenv(override=True)

OPENWEATHER_API_KEY = os.getenv("OPENWEATHER_API_KEY")
# Replace with your actual OpenWeatherMap API key
OPENWEATHER_URL = "https://api.openweathermap.org/data/2.5/weather"

mcp = FastMCP("CurrentWeatherServer")

@mcp.tool()
async def get_current_weather(city: str) -> str:
    """
    Get the current weather for a given city using OpenWeatherMap API.
    """
    params = {
        "q": city,
        "appid": OPENWEATHER_API_KEY,
        "units": "metric"
    }
    async with httpx.AsyncClient() as client:
        try:
            response = await client.get(OPENWEATHER_URL, params=params, timeout=10)
            
            # Better error handling for authorization issues
            if response.status_code == 401:
                return f"Authorization error: Invalid API key. Please check your OpenWeatherMap API key."
            elif response.status_code == 404:
                return f"City '{city}' not found. Please check the spelling."
            
            response.raise_for_status()
            data = response.json()
            weather = data["weather"][0]["description"].capitalize()
            temp = data["main"]["temp"]
            feels_like = data["main"]["feels_like"]
            humidity = data["main"]["humidity"]
            wind_speed = data.get("wind", {}).get("speed", "N/A")
            return (f"Hi Sumit!, Current weather in {city}:\n"
                    f"{weather}, temperature: {temp}°C, feels like: {feels_like}°C.\n"
                    f"Humidity: {humidity}%, Wind speed: {wind_speed} m/s")
        except httpx.RequestError as e:
            return f"Network error: Could not retrieve weather data for {city}: {e}"
        except Exception as e:
            return f"Could not retrieve weather data for {city}: {e}"

if __name__ == "__main__":
    mcp.run(transport="stdio")

Step-by-Step: Bringing Your AI Avatar to Life

  1. Set up your environment variables:
    • Copy .env.example to .env
  2. Talk to your AI avatar!
    • Say hello, ask about the weather (“What’s the weather in London?”), or have a general conversation.
    • The avatar will speak and respond using Gemini and Simli, and fetch live weather using the MCP tool.

Open the VideoSDK playground URL printed in your terminal.
This will look like:

https://playground.videosdk.live?token=...&meetingId=...

Run your agent:

python main.py

Fill out the following in .env:

VIDEOSDK_AUTH_TOKEN=your-videosdk-token
SIMLI_API_KEY=your-simli-api-key
SIMLI_FACE_ID=your-simli-face-id
OPENWEATHER_API_KEY=your-openweathermap-key
GOOGLE_API_KEY=your-google-api-key

Now step onto the stage, run the code, and meet your creation. The talking avatar is waiting. What will you say first?

You can dive deeper into the playground and agent capabilities in the VideoSDK AI Playground documentation.