How to Integrate Picture-in-Picture (PiP) Mode in JavaScript Video Chat App?

Introduction

With Picture-in-Picture (PiP) mode integrated into your JavaScript video chat app built with VideoSDK, you don't have to choose between staying connected and checking out that fleeting bit of online content.

PiP mode allows users to minimize the video call window into a resizable and movable window, keeping the call visible while they attend to other tasks on their screen.

This functionality enhances the user experience by providing greater flexibility and multitasking capabilities during video calls. This guide goes through the steps of integrating PiP mode into your video call app using VideoSDK.

Getting Started with VideoSDK

To take advantage of picture-in-picture mode functionality, we must use the capabilities that the VideoSDK offers. Before diving into the implementation steps, ensure you complete the necessary prerequisites.

Create a VideoSDK Account

Go to your VideoSDK dashboard and sign up if you don't have an account. This account gives you access to the required Video SDK token, which acts as an authentication key that allows your application to interact with VideoSDK functionality.

Generate your Auth Token

Visit your VideoSDK dashboard and navigate to the "API Key" section to generate your auth token. This token is crucial in authorizing your application to use VideoSDK features. Consider referring to the provided tutorial for a more visual understanding of the account creation and token generation process.

Prerequisites

Before proceeding, ensure that your development environment meets the following requirements:

VideoSDK Developer Account (if you do not have one, follow VideoSDK Dashboard)
Have Node and NPM installed on your device.

⬇️ Install VideoSDK

Import VideoSDK using the <script> tag or install it using the following npm command. Make sure you are in your app directory before you run this command.

<html>
  <head>
    <!--.....-->
  </head>
  <body>
    <!--.....-->
    <script src="https://sdk.videosdk.live/js-sdk/0.0.83/videosdk.js"></script>
  </body>
</html>

npm

npm install @videosdk.live/js-sdk

Yarn

yarn add @videosdk.live/js-sdk

Structure of the project

Your project structure should look like this.

  root
   ├── index.html
   ├── config.js
   ├── index.js

You will be working on the following files:

index.html: Responsible for creating a basic UI.
config.js: Responsible for storing the token.
index.js: Responsible for rendering the meeting view and the join meeting functionality.

Essential Steps to Implement Video Call Functionality

Once you've successfully installed VideoSDK in your project, you'll have access to a range of functionalities for building your video call application. Picture-in-Picture mode is one such feature that leverages VideoSDK's capabilities. It leverages VideoSDK's capabilities to identify the user with the strongest audio signal (the one speaking)

Step 1: Design the user interface (UI)

Create an HTML file containing the screens, join-screen and grid-screen.

<!DOCTYPE html>
<html>
  <head> </head>

  <body>
    <div id="join-screen">
      <!-- Create new Meeting Button -->
      <button id="createMeetingBtn">New Meeting</button>
      OR
      <!-- Join existing Meeting -->
      <input type="text" id="meetingIdTxt" placeholder="Enter Meeting id" />
      <button id="joinBtn">Join Meeting</button>
    </div>

    <!-- for Managing meeting status -->
    <div id="textDiv"></div>

    <div id="grid-screen" style="display: none">
      <!-- To Display MeetingId -->
      <h3 id="meetingIdHeading"></h3>

      <!-- Controllers -->
      <button id="leaveBtn">Leave</button>
      <button id="toggleMicBtn">Toggle Mic</button>
      <button id="toggleWebCamBtn">Toggle WebCam</button>

      <!-- render Video -->
      <div class="row" id="videoContainer"></div>
    </div>

    <!-- Add VideoSDK script -->
    <script src="https://sdk.videosdk.live/js-sdk/0.0.83/videosdk.js"></script>
    <script src="config.js"></script>
    <script src="index.js"></script>
  </body>
</html>

Step 2: Implement Join Screen

Configure the token in the config.js file, which you can obtain from the VideoSDK Dashbord.

// Auth token will be used to generate a meeting and connect to it
TOKEN = "Your_Token_Here";

Next, retrieve all the elements from the DOM and declare the following variables in the index.js file. Then, add an event listener to the join and create meeting buttons.

// Getting Elements from DOM
const joinButton = document.getElementById("joinBtn");
const leaveButton = document.getElementById("leaveBtn");
const toggleMicButton = document.getElementById("toggleMicBtn");
const toggleWebCamButton = document.getElementById("toggleWebCamBtn");
const createButton = document.getElementById("createMeetingBtn");
const videoContainer = document.getElementById("videoContainer");
const textDiv = document.getElementById("textDiv");

// Declare Variables
let meeting = null;
let meetingId = "";
let isMicOn = false;
let isWebCamOn = false;

function initializeMeeting() {}

function createLocalParticipant() {}

function createVideoElement() {}

function createAudioElement() {}

function setTrack() {}

// Join Meeting Button Event Listener
joinButton.addEventListener("click", async () => {
  document.getElementById("join-screen").style.display = "none";
  textDiv.textContent = "Joining the meeting...";

  roomId = document.getElementById("meetingIdTxt").value;
  meetingId = roomId;

  initializeMeeting();
});

// Create Meeting Button Event Listener
createButton.addEventListener("click", async () => {
  document.getElementById("join-screen").style.display = "none";
  textDiv.textContent = "Please wait, we are joining the meeting";

  // API call to create meeting
  const url = `https://api.videosdk.live/v2/rooms`;
  const options = {
    method: "POST",
    headers: { Authorization: TOKEN, "Content-Type": "application/json" },
  };

  const { roomId } = await fetch(url, options)
    .then((response) => response.json())
    .catch((error) => alert("error", error));
  meetingId = roomId;

  initializeMeeting();
});

Step 3: Initialize Meeting

Following that, initialize the meeting using the initMeeting() function and proceed to join the meeting.

// Initialize meeting
function initializeMeeting() {
  window.VideoSDK.config(TOKEN);

  meeting = window.VideoSDK.initMeeting({
    meetingId: meetingId, // required
    name: "Thomas Edison", // required
    micEnabled: true, // optional, default: true
    webcamEnabled: true, // optional, default: true
  });

  meeting.join();

  // Creating local participant
  createLocalParticipant();

  // Setting local participant stream
  meeting.localParticipant.on("stream-enabled", (stream) => {
    setTrack(stream, null, meeting.localParticipant, true);
  });

  // meeting joined event
  meeting.on("meeting-joined", () => {
    textDiv.style.display = "none";
    document.getElementById("grid-screen").style.display = "block";
    document.getElementById(
      "meetingIdHeading"
    ).textContent = `Meeting Id: ${meetingId}`;
  });

  // meeting left event
  meeting.on("meeting-left", () => {
    videoContainer.innerHTML = "";
  });

  // Remote participants Event
  // participant joined
  meeting.on("participant-joined", (participant) => {
    //  ...
  });

  // participant left
  meeting.on("participant-left", (participant) => {
    //  ...
  });
}

Step 4: Create the Media Elements

In this step, Create a function to generate audio and video elements for displaying both local and remote participants. Set the corresponding media track based on whether it's a video or audio stream.

// creating video element
function createVideoElement(pId, name) {
  let videoFrame = document.createElement("div");
  videoFrame.setAttribute("id", `f-${pId}`);
  videoFrame.style.width = "300px";
    

  //create video
  let videoElement = document.createElement("video");
  videoElement.classList.add("video-frame");
  videoElement.setAttribute("id", `v-${pId}`);
  videoElement.setAttribute("playsinline", true);
  videoElement.setAttribute("width", "300");
  videoFrame.appendChild(videoElement);

  let displayName = document.createElement("div");
  displayName.innerHTML = `Name : ${name}`;

  videoFrame.appendChild(displayName);
  return videoFrame;
}

// creating audio element
function createAudioElement(pId) {
  let audioElement = document.createElement("audio");
  audioElement.setAttribute("autoPlay", "false");
  audioElement.setAttribute("playsInline", "true");
  audioElement.setAttribute("controls", "false");
  audioElement.setAttribute("id", `a-${pId}`);
  audioElement.style.display = "none";
  return audioElement;
}

// creating local participant
function createLocalParticipant() {
  let localParticipant = createVideoElement(
    meeting.localParticipant.id,
    meeting.localParticipant.displayName
  );
  videoContainer.appendChild(localParticipant);
}

// setting media track
function setTrack(stream, audioElement, participant, isLocal) {
  if (stream.kind == "video") {
    isWebCamOn = true;
    const mediaStream = new MediaStream();
    mediaStream.addTrack(stream.track);
    let videoElm = document.getElementById(`v-${participant.id}`);
    videoElm.srcObject = mediaStream;
    videoElm
      .play()
      .catch((error) =>
        console.error("videoElem.current.play() failed", error)
      );
  }
  if (stream.kind == "audio") {
    if (isLocal) {
      isMicOn = true;
    } else {
      const mediaStream = new MediaStream();
      mediaStream.addTrack(stream.track);
      audioElement.srcObject = mediaStream;
      audioElement
        .play()
        .catch((error) => console.error("audioElem.play() failed", error));
    }
  }
}

Step 5: Handle participant events

Thereafter, implement the events related to the participants and the stream.

The following are the events to be executed in this step:

participant-joined: When a remote participant joins, this event will trigger. In the event callback, create video and audio elements previously defined for rendering their video and audio streams.
participant-left: When a remote participant leaves, this event will trigger. In the event callback, remove the corresponding video and audio elements.
stream-enabled: This event manages the media track of a specific participant by associating it with the appropriate video or audio element.

// Initialize meeting
function initializeMeeting() {
  // ...

  // participant joined
  meeting.on("participant-joined", (participant) => {
    let videoElement = createVideoElement(
      participant.id,
      participant.displayName
    );
    let audioElement = createAudioElement(participant.id);
    // stream-enabled
    participant.on("stream-enabled", (stream) => {
      setTrack(stream, audioElement, participant, false);
    });
    videoContainer.appendChild(videoElement);
    videoContainer.appendChild(audioElement);
  });

  // participants left
  meeting.on("participant-left", (participant) => {
    let vElement = document.getElementById(`f-${participant.id}`);
    vElement.remove(vElement);

    let aElement = document.getElementById(`a-${participant.id}`);
    aElement.remove(aElement);
  });
}

Step 6: Implement Controls

Next, implement the meeting controls, such as toggleMic, toggleWebcam, and leave the meeting.

// leave Meeting Button Event Listener
leaveButton.addEventListener("click", async () => {
  meeting?.leave();
  document.getElementById("grid-screen").style.display = "none";
  document.getElementById("join-screen").style.display = "block";
});

// Toggle Mic Button Event Listener
toggleMicButton.addEventListener("click", async () => {
  if (isMicOn) {
    // Disable Mic in Meeting
    meeting?.muteMic();
  } else {
    // Enable Mic in Meeting
    meeting?.unmuteMic();
  }
  isMicOn = !isMicOn;
});

// Toggle Web Cam Button Event Listener
toggleWebCamButton.addEventListener("click", async () => {
  if (isWebCamOn) {
    // Disable Webcam in Meeting
    meeting?.disableWebcam();

    let vElement = document.getElementById(`f-${meeting.localParticipant.id}`);
    vElement.style.display = "none";
  } else {
    // Enable Webcam in Meeting
    meeting?.enableWebcam();

    let vElement = document.getElementById(`f-${meeting.localParticipant.id}`);
    vElement.style.display = "inline";
  }
  isWebCamOn = !isWebCamOn;
});

You can check out the complete.

After installing videoSDK, PiP mode becomes readily available for you to integrate into your video call app. This functionality enhances the user experience by providing flexibility and maintaining continuity.

Integrate Picture-in-Picture Mode

Picture-in-picture (PiP) is a commonly used feature in video conferencing software, enabling users to simultaneously engage in a video conference and perform other tasks on their devices.

With PiP, you can keep the video conference window open, resize it to a smaller size, and continue working on other tasks while still seeing and hearing the other participants in the conference.

This feature proves beneficial when you need to take notes, send an email, or look up information during the conference. This guide explains the steps to implement the Picture-in-Picture feature using VideoSDK.

PiP Video

All modern-day browsers support popping a video stream out from the HTMLVideoElement. You can achieve this either directly from the controls shown on the video element or by using the Browser API method requestPictureInPicture() on the video element.

Customize Video PiP with multiple video streams

Step 1: Create a button that toggles the Picture-in-Picture (PiP) mode during the meeting. This button should invoke the togglePipMode() method when clicked.

const togglePipBtn = document.getElementById("togglePipBtn");

togglePipBtn.addEventListener("click", () => {
  togglePipMode();
});

Step 2: The first step is to check if the browser supports PiP mode; if not, display a message to the user.

const togglePipMode = async () => {
  //Check if browser supports PiP mode else show a message to user
  if ("pictureInPictureEnabled" in document) {
    //
  } else {
    alert("PiP is not supported by your browser");\
    return
  }
};

Step 3: Now, if the browser supports PiP mode, create Canvas element and a Video element. Generate a Stream from the Canvas and play it in the video element. Request PiP mode for the video element once the metadata has been loaded.

const togglePipMode = async () => {
  //Check if browser supports PiP mode else show a message to user
  if ("pictureInPictureEnabled" in document) {
    //Creating a Canvas which will render our PiP Stream
    const source = document.createElement("canvas");
    const ctx = source.getContext("2d");

    //Create a Video tag which we will popout for PiP
    const pipVideo = document.createElement("video");
    pipWindowRef.current = pipVideo;
    pipVideo.autoplay = true;

    //Creating stream from canvas which we will play
    const stream = source.captureStream();
    pipVideo.srcObject = stream;

    //Do initial Canvas Paint
    drawCanvas();

    //When Video is ready we will start PiP mode
    pipVideo.onloadedmetadata = () => {
      pipVideo.requestPictureInPicture();
    };
    await pipVideo.play();
  } else {
    alert("PiP is not supported by your browser");
  }
};

Step 4: The next step is to paint the canvas with the Participant Grid, which will be visible in the PiP window.

const getRowCount = (length) => {
  return length > 2 ? 2 : length > 0 ? 1 : 0;
};
const getColCount = (length) => {
  return length < 2 ? 1 : length < 5 ? 2 : 3;
};

const togglePipMode = async () => {
  //Check if browser supports PiP mode else show a message to user
  if ("pictureInPictureEnabled" in document) {
    //Stream playing here
    //...

    //When the PiP mode starts, we will start drawing canvas with PiP view
    pipVideo.addEventListener("enterpictureinpicture", (event) => {
      drawCanvas();
    });

    //When PiP mode exits, we will dispose the track we created earlier
    pipVideo.addEventListener("leavepictureinpicture", (event) => {
      pipWindowRef.current = null;
      pipVideo.srcObject.getTracks().forEach((track) => track.stop());
    });

    //This will draw all the video elements in to the Canvas
    function drawCanvas() {
      //Getting all the video elements in the document
      const videos = document.querySelectorAll("video");
      try {
        //Perform initial black paint on the canvas
        ctx.fillStyle = "black";
        ctx.fillRect(0, 0, source.width, source.height);

        //Drawing the participant videos on the canvas in the grid format
        const rows = getRowCount(videos.length);
        const columns = getColCount(videos.length);
        for (let i = 0; i < rows; i++) {
          for (let j = 0; j < columns; j++) {
            if (j + i * columns <= videos.length || videos.length == 1) {
              ctx.drawImage(
                videos[j + i * columns],
                j < 1 ? 0 : source.width / (columns / j),
                i < 1 ? 0 : source.height / (rows / i),
                source.width / columns,
                source.height / rows
              );
            }
          }
        }
      } catch (error) {}

      //If pip mode is on, keep drawing the canvas when ever new frame is requested
      if (document.pictureInPictureElement === pipVideo) {
        requestAnimationFrame(drawCanvas);
      }
    }
  } else {
    alert("PiP is not supported by your browser");
  }
};

Step 5: Exit the PiP mode if it is already active.

const togglePipMode = async () => {
  //Check if PiP Window is active or not
  //If active we will turn it off
  if (pipWindowRef.current) {
    await document.exitPictureInPicture();
    pipWindowRef.current = null;
    return;
  }

  //Check if browser supports PiP mode else show a message to user
  if ("pictureInPictureEnabled" in document) {
    // ...
  } else {
    alert("PiP is not supported by your browser");
  }
};

✨ Want to Add More Features to JavaScript Video Calling App?

If you found this guide helpful and want to explore more features for your JavaScript video-calling app,

Check out these additional resources:

Active Speaker Indication: Link
Image Capture: Link
Screen Share Feature: Link
RTMP Livestream: Link

Wrap-up

By incorporating PiP mode into your VideoSDK video call app, you'll allow your users to navigate between applications without missing a beat during their video calls. They can stay on top of calls while checking emails, browsing the web, or even using other apps.

If you are new here and want to build an interactive JavaScript app with free resources, you can Sign up with VideoSDK and get? 10000 free minutes every month. This will help your new video-calling app go to the next level without any costs associated with initial usage, allowing you to focus on building and scaling your application effectively.