How to Build a Voice Agent Using Agent2Agent Protocol (A2A) and MCP

What if your AI could do more than just answer questions? What if it could coordinate with other AI Agents, handle bookings, and even trigger workflows in your favorite apps? Let’s build that together, step by step.
With the Agent-to-Agent (A2A) protocol and Model Context Protocol (MCP), you can turn your Python conversational agent into a powerful, collaborative system. In this blog, you’ll wire up a full-featured, multi-agent AI stack that:

Speaks with users in real time (Gemini TTS/STT, via the VideoSDK pipeline)
Delegates work to specialist agents (flight, hotel, email) through the A2A protocol
Integrates with external tools and workflows (Zapier, calendars, CRMs) using MCP

By following each section and copying the code blocks, you’ll build a working conversational AI orchestration layer—one you can extend for travel automation, email workflows, or any complex, multi-step process.

Why A2A and MCP Matter

Most conversational agents do just one thing. But real-world automation demands collaboration: an agent should delegate, coordinate, and call out to other AIs or SaaS tools. A2A lets your agents talk to each other—no more brittle monoliths. MCP bridges your agents with the outside world, enabling access to tools, APIs, and automations like Zapier. Together, these protocols make your system modular, scalable, and endlessly extensible.

Prerequisites

Requirement	Why you need it
Python 3.10+	Enables `asyncio` concurrency for agents
VideoSDK account	Needed for `VIDEOSDK_AUTH_TOKEN` and meetings
Google AI Studio key	Powers Gemini speech-to-text & text-to-speech

Environment Setup

Let’s get our environment ready:

python -m venv .venv
source .venv/bin/activate        # (Windows: .venv\Scripts\activate)
pip install videosdk-agents==0.7.* \
            videosdk-plugins-google==0.2.* \
            aiohttp python-dotenv

Create a .env file in your project root:

VIDEOSDK_AUTH_TOKEN=your_videosdk_token
GOOGLE_API_KEY=your_google_api_key
ZAPIER_MCP_SERVER=https://hooks.zapier.com/...   # optional for MCP

Project Layout

Here’s how your folder should look:

a2a-mcp-agents/
├── agents/
│   ├── email_agent.py
│   ├── flight_agent.py
│   ├── hotel_agent.py
│   └── travel_agent.py
├── session_manager.py
└── main.py

Create these folders and files as shown. Each agent gets its own file for clarity.

Building Specialist Agents

Specialist agents run “silently” in the background, waiting for A2A messages from other agents. Here’s how to build them.

EmailAgent (`agents/email_agent.py`)

from videosdk.agents import Agent, AgentCard, A2AMessage
import asyncio

class EmailAgent(Agent):
    """Sends confirmations and updates by email."""

    def __init__(self):
        super().__init__(
            agent_id="agent_email_001",
            instructions="You automate booking confirmations, travel updates, and notifications."
        )

    async def handle_send_booking_email(self, message: A2AMessage):
        email_type  = message.content.get("email_type", "")
        details     = message.content.get("details", "")
        recipient   = message.content.get("recipient", "")
        print(f"[EmailAgent] Sending {email_type} to {recipient}")
        await asyncio.sleep(0.5)  # Simulate I/O
        status = "sent"
        await self.a2a.send_message(
            to_agent="travel_agent_1",
            message_type="email_confirmation",
            content={"status": status, "email_type": email_type}
        )

    async def on_enter(self):
        await self.register_a2a(AgentCard(
            id="agent_email_001",
            name="Email Automation Service",
            domain="email",
            capabilities=["send_confirmations", "send_updates"]
        ))
        self.a2a.on_message("send_booking_email", self.handle_send_booking_email)

    async def on_exit(self):
        await self.unregister_a2a()

FlightAgent (`agents/flight_agent.py`)

from videosdk.agents import Agent, AgentCard, A2AMessage

class FlightAgent(Agent):
    """Finds and books flights."""

    def __init__(self):
        super().__init__(
            agent_id="agent_flight_001",
            instructions="Provide flight options with prices, times, airline names."
        )

    async def handle_flight_search_query(self, message: A2AMessage):
        dest  = message.content["destination"]
        dates = message.content["dates"]
        email = message.content["customer_email"]
        reply = (f"Flights to {dest} on {dates}:\n"
                 f"1) Direct $299 08:00–11:30\n"
                 f"2) Economy Plus $399 14:15–17:45\n"
                 f"3) Premium Eco $549 18:30–22:00")
        await self.a2a.send_message(
            to_agent="travel_agent_1",
            message_type="flight_booking_response",
            content={"response": reply}
        )
        email_agent = self.a2a.registry.find_agents_by_domain("email")[0]
        await self.a2a.send_message(
            to_agent=email_agent,
            message_type="send_booking_email",
            content={"email_type": "flight_options", "details": reply, "recipient": email}
        )

    async def on_enter(self):
        await self.register_a2a(AgentCard(
            id="agent_flight_001",
            name="Skymate",
            domain="flight",
            capabilities=["search_flights"]
        ))
        self.a2a.on_message("flight_search_query", self.handle_flight_search_query)

    async def on_exit(self):
        await self.unregister_a2a()

HotelAgent (`agents/hotel_agent.py`)

from videosdk.agents import Agent, AgentCard, A2AMessage

class HotelAgent(Agent):
    """Finds and books hotels."""

    def __init__(self):
        super().__init__(
            agent_id="agent_hotel_001",
            instructions="Suggest hotels with amenities, price, and location."
        )

    async def handle_hotel_search_query(self, message: A2AMessage):
        dest  = message.content["destination"]
        dates = message.content["dates"]
        email = message.content["customer_email"]
        reply = (f"Hotels in {dest} for {dates}:\n"
                 f"1) Grand Plaza $180/night\n"
                 f"2) Comfort Inn $120/night\n"
                 f"3) Luxury Resort $350/night")
        await self.a2a.send_message(
            to_agent="travel_agent_1",
            message_type="hotel_booking_response",
            content={"response": reply}
        )
        email_agent = self.a2a.registry.find_agents_by_domain("email")[0]
        await self.a2a.send_message(
            to_agent=email_agent,
            message_type="send_booking_email",
            content={"email_type": "hotel_options", "details": reply, "recipient": email}
        )

    async def on_enter(self):
        await self.register_a2a(AgentCard(
            id="agent_hotel_001",
            name="Hotel Booker",
            domain="hotel",
            capabilities=["search_hotels"]
        ))
        self.a2a.on_message("hotel_search_query", self.handle_hotel_search_query)

    async def on_exit(self):
        await self.unregister_a2a()

Orchestrator Agent: TravelAgent

The TravelAgent is your “voice” agent. It listens to the user, delegates tasks to other agents using A2A, and (optionally) reaches out to external tools through MCP.

agents/travel_agent.py:

from videosdk.agents import Agent, function_tool, AgentCard, A2AMessage, MCPServerHTTP
import asyncio, os
from typing import Dict, Any

class TravelAgent(Agent):
    def __init__(self):
        zapier_url = os.getenv("ZAPIER_MCP_SERVER", "")
        super().__init__(
            agent_id="travel_agent_1",
            instructions="Book complete trips: flights, hotels, emails.",
            mcp_servers=[MCPServerHTTP(url=zapier_url)] if zapier_url else []
        )

    @function_tool
    async def book_travel_package(self, destination: str, travel_dates: str, email: str) -> Dict[str, Any]:
        await self.session.say(f"Looking up options for {destination}…")
        for _ in range(3):
            flights = self.a2a.registry.find_agents_by_domain("flight")
            hotels  = self.a2a.registry.find_agents_by_domain("hotel")
            if flights and hotels:
                break
            await asyncio.sleep(2)
        await self.a2a.send_message(flights[0], "flight_search_query",
                                    {"destination": destination, "dates": travel_dates, "customer_email": email})
        await self.a2a.send_message(hotels[0], "hotel_search_query",
                                    {"destination": destination, "dates": travel_dates, "customer_email": email})
        return {"status": "processing"}

    async def handle_flight_response(self, msg: A2AMessage):
        await self.session.say(f"Flight update: {msg.content['response']}")

    async def handle_hotel_response(self, msg: A2AMessage):
        await self.session.say(f"Hotel update: {msg.content['response']}")

    async def handle_email_confirm(self, msg: A2AMessage):
        await self.session.say("Confirmation email sent.")

    async def on_enter(self):
        await self.register_a2a(AgentCard(
            id="travel_agent_1",
            name="Travel Coordinator",
            domain="travel",
            capabilities=["travel_planning"]
        ))
        self.a2a.on_message("flight_booking_response", self.handle_flight_response)
        self.a2a.on_message("hotel_booking_response", self.handle_hotel_response)
        self.a2a.on_message("email_confirmation",     self.handle_email_confirm)
        await self.session.say("Hello! Where would you like to travel?")

    async def on_exit(self):
        await self.unregister_a2a()

Wiring It All Together

session_manager.py:

import os, asyncio
from videosdk.agents import AgentSession, RealTimePipeline
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from google.genai.types import Modality
from agents.travel_agent import TravelAgent
from agents.flight_agent  import FlightAgent
from agents.hotel_agent   import HotelAgent
from agents.email_agent   import EmailAgent

def make_voice_pipeline() -> RealTimePipeline:
    return RealTimePipeline(
        model=GeminiRealtime(
            model="gemini-2.0-flash-exp",
            api_key=os.environ["GOOGLE_API_KEY"],
            config=GeminiLiveConfig(voice="Aoede", response_modalities=[Modality.AUDIO]),
        )
    )

def make_text_pipeline() -> RealTimePipeline:
    return RealTimePipeline(
        model=GeminiRealtime(
            model="gemini-2.0-flash-exp",
            api_key=os.environ["GOOGLE_API_KEY"],
            config=GeminiLiveConfig(response_modalities=[Modality.TEXT]),
        )
    )

async def create_room() -> str:
    import aiohttp
    async with aiohttp.ClientSession() as s:
        async with s.post(
            "https://api.videosdk.live/v2/rooms",
            headers={"Authorization": os.environ["VIDEOSDK_AUTH_TOKEN"], "Content-Type": "application/json"},
        ) as r:
            return (await r.json())["roomId"]

async def start_agents(room_id: str):
    # context.playground = True lets us test in the web UI
    travel = AgentSession(TravelAgent(), make_voice_pipeline(),
                          {"meetingId": room_id, "join_meeting": True, "playground": True})
    flight  = AgentSession(FlightAgent(),  make_text_pipeline(), {"join_meeting": False, "playground": True})
    hotel   = AgentSession(HotelAgent(),   make_text_pipeline(), {"join_meeting": False, "playground": True})
    email   = AgentSession(EmailAgent(),   make_text_pipeline(), {"join_meeting": False, "playground": True})

    await asyncio.gather(flight.start(), hotel.start(), email.start())
    await asyncio.sleep(3)     # allow registry
    await travel.start()

Application Entry Point

main.py:

#!/usr/bin/env python3
import asyncio, os, signal
from session_manager import create_room, start_agents

def validate_env():
    for var in ("VIDEOSDK_AUTH_TOKEN", "GOOGLE_API_KEY"):
        if var not in os.environ:
            raise RuntimeError(f"{var} is not set")

async def main():
    validate_env()
    room = await create_room()
    print(f"Meeting room created: {room}")
    loop = asyncio.get_running_loop()
    loop.add_signal_handler(signal.SIGINT, loop.stop)
    await start_agents(room)

if __name__ == "__main__":
    asyncio.run(main())

Try It in the VideoSDK Agents Playground

When the TravelAgent session starts (with playground: True in the context), VideoSDK prints a link like:

Agent started in playground mode
Interact with agent here at:
https://playground.videosdk.live?token=<auth_token>&meetingId=<meeting_id>

Open that link in Chrome, give microphone access, and start talking:

You: I’d like to book a flight to Tokyo next month.
Agent: Looking up options for Tokyo…
Agent: Flight update: Flights to Tokyo on 2025-08-03…
Agent: Hotel update: Hotels in Tokyo…
Agent: Confirmation email sent.

Hear the agent respond in real time using Gemini TTS.
Watch as A2A messages coordinate between your specialist agents.
No client app needed—just use the web playground!

Tip: The playground mode is for testing and debugging. Disable playground: True in production for a secure, scalable agent deployment.

Key Takeaways

A2A enables true agent collaboration, not just single-bot workflows.
MCP opens the door to external tools and SaaS integrations.
VideoSDK Agents Playground makes it easy to iterate, test, and show off your whole multi-agent system.
With just a handful of Python files, you can stand up a fully working, extensible, open source conversational AI.

Now go build your own network of specialist agents—and let them do the heavy lifting for your users and your business!