Twilio has decided to shut down its Programmable Video SDK. This decision, while understandable, may leave many developers wondering about the future of their video-based applications.

In a recent statement, Twilio CEO Jeff Lawson explained the reason behind this decision: While I was disappointed with this decision, the Twilio Video SDK was one of the best products in town, especially for builders.

Lastly, we’ve decided to end-of-life (EOL) Twilio Programmable Video as a standalone product. Given it’s such a niche area and a relatively small part of our portfolio, we believe partnering with video industry leaders is the best way to ensure long-term product innovation for our customers.
Removing Programmable Video from our portfolio will also allow Communications to more effectively focus on our pillar products - Messaging, Voice, and Email.

I previously wrote about "Domino called exit(); for Twilio's Programmable Video", click the link below if you haven't read it yet.

🌐 Zooming into a situation

Twilio has partnered with Zoom to migrate from the Programmable Video to Zoom Video SDK:

We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide to assist you in minimizing any service disruption.

The only reason for Zoom is to not pick up the competition. Zoom is nowhere near a direct competitor to Twilio, although both have similar types of customers such as contact centers, sales/marketing departments in corporates, etc.

Either way, Twilio customers have a year to find a solution unless drastic changes are made to the WebRTC API.

πŸ“± Future of building Real-time Video Apps

I think creating a great video app is an art. Every company has different uses and daily needs. Development flexibility is even more important because each use case requires high-end customization.

πŸ–ŒοΈ Compatibility with browsers and mobile devices

First and foremost, compatibility with major browsers and mobile devices is essential. Pre-call checks play a big role in improving the call experience.

The main cause of poor call experience is either audio/video capture failure, or pre-checking on capture will save a lot of time and ensure browser/device support is available.

πŸ”„ End to End Customisable UI/UX Experience

Real-time experiences are about harmony between web apps and mobile apps. Each business requires different features to build.

For example, an education use case is driven by a single speaker while tele-consultancy is a continuous communication between two people and real-time audio broadcast requires a constant change of speakers.

Each application requires different types of user experience to implement and manage according to the user base and their behaviour.

πŸ“· High Quality Audio / Video Experience

In an era where user experience is a key differentiator, a video application with high-quality audio and video capabilities stands out. Users expect nothing less than excellence, and an app that delivers on this front not only meets but exceeds those expectations. This, in turn, contributes to positive reviews, increased user retention, and organic growth of the app's user base.

To achieve that, high bitrates are required to send and receive audio and video with optimal compression mechanisms.

πŸ€– Native Integration of AI on top of Audio and Video

As the world is going through a generative AI boom. Real-time audio and video will play an important role in the adoption of AI.

Background change, filters, and face tracking are much-needed features depending on different market segments. Apart from that speech, text and transcription are essential features when it comes to video analysis.

This is just the beginning as a growing number of companies are integrating native generative AI capabilities over real-time audio and video to better assist their users.

🀝 Collaboration and Moderation on scale

On a 1:1 basis, small group call or large group call collaborative features like chat, polling, Q&A, raising hands, layout changes, etc. are essential to create a connection between participants.

When it comes to large group calls, moderation controls like mute all, spotlight, waiting room, etc. are very necessary.

πŸ” Built-in Data Privacy and Protection

Data privacy and security is the most important aspect that a company should consider before making any decision. It is essential to protect customer information from threats.

A focus on privacy prevents unauthorized access, protects sensitive data, and maintains the app’s reputation in an environment where user trust is paramount.

⏺️ Instant recordings with customisable templates

Most use cases around cloud recording are either post-streaming of content or post-production of content. For example, the use-case is streaming in education recordings while in virtual events, it is post-production content for websites and social media.

Instant recording infrastructure plays a big role in such use cases where users do not have to wait for the recording to be processed.

😱 Nightmare of migrating to Zoom SDK

Now let's talk about the elephant in the room, migration to Zoom. Zoom by nature is an MCU architecture, meaning they decrypt audio/video streams on the server and mix them as one stream.

Compared to SFU architecture it is difficult to have a good developer experience and thus it is becoming a nightmare for developers for several reasons but below are the most important ones to consider:

🌍 Not having global connected regions to solve global latency

Zoom forces you to select a region before init the client, and that's why it's very difficult to solve for global latency because anyone from Europe joining a call to a US server will experience latency and massive packet loss.

client.init('en-US', 'Global', { patchJsMedia: true }).then(() => {
  client.join('sessionName', 'VIDEO_SDK_JWT', 'userName', 'sessionPasscode').then(() => {
    stream = client.getMediaStream()

🎨 Video Layout with Canvas Painting

Zoom doesn't have raw access to audio and video streams in the SDK, and because of that you have to calculate a lot of unnecessary mathematical code to manage multiple layouts. Although canvas rendering is good but developing and maintaining layout logic takes almost more time than creating a product.

let participants = client.getAllUser()

stream.renderVideo(document.querySelector('#participant-videos-canvas'), participants[0].userId, 960, 540, 0, 540, 2)
stream.renderVideo(document.querySelector('#participant-videos-canvas'), participants[1].userId, 960, 540, 960, 540, 2)
stream.renderVideo(document.querySelector('#participant-videos-canvas'), participants[2].userId, 960, 540, 0, 0, 2)
stream.renderVideo(document.querySelector('#participant-videos-canvas'), participants[3].userId, 960, 540, 960, 0, 2)

πŸ” Raw media Access for Generative AI use-cases

As I said earlier, Zoom doesn't allow raw media stream access which means you can't integrate any third-party SDKs or open source models on the client or server side.

Generative AI is becoming increasingly important for every application integrating text-to-speech, face tracking, face recognition, and server-side audio/video analysis.

All of the above is not possible with the Zoom SDK and cannot be done due to the nature of the technology.

πŸ’½ Large Size of SDK Binary

The Zoom SDK averages 97mb+ in mobile while it can go up to 157mb+ (sometimes I've read 500mb+ in community threads) which makes it heavy for a large number of use cases.

⭐ 720p+ resolution is not supported

Zoom is not suitable if you are building apps for high-quality experiences due to resolution and bitrate restrictions at 720p.

This kills most use cases like broadcasting, high-resolution screen sharing, high-quality content sharing, etc.

πŸ–ΌοΈ Virtual Background, Gallery View with SharedBuffer Array

The ShareadBuffer array is a compatibility and cross-device support killer in the Zoom SDK. It is only supported by two browsers as mentioned below by the Zoom team.

The bad news is that your client has to enable this because Chrome doesn't allow SharedArrayBuffer directly.

As of Chrome and Edge 92, and Firefox 79 SharedArrayBuffer is only available if your web page is Cross-Origin Isolated, or if your web page uses Credentialless headers, or if you have registered for the SharedArrayBuffer Chrome Origin Trial (works only in Chrome and Edge).

❌ Not Compatible majority of browsers

Since the Zoom SDK does not use WebRTC technology and relies on its own MCU infrastructure, it is not able to provide good support for web calls.

πŸ” End to End Encryption of Audio / Video Streams

Zoom does not allow you to encrypt streams in the browser. This means that the data transmitted from your browser, including audio and video, is not encrypted throughout its journey to the recipient. While Zoom encrypts the data in transit between its servers, there is a potential vulnerability in the browser-to-server leg of the communication.

When selecting a video conferencing platform, it is crucial to consider security needs. If End to End Encryption is essential for your specific requirements, ensuring the platform you choose offers this functionality is critical.

πŸ”„ Simulcast for Adaptive Bitrate Streaming

Zoom does not allow multi-layer sending and receiving of video with adaptive bitrate streaming due to the same MCU architecture.

This does not allow the Zoom SDK to reflect audio/video bitrate and resolution depending on the volatility and change of internet bandwidth.

πŸ‘₯ Sender / Receiver media track subscription

Zoom does not allow subscribe/unsubscribe from receiving audio/video streams which makes it difficult to implement for use cases such as breakout rooms, backstage, and watch parties.

🎫 Pre-call Testing for best call experience

Zoom does not provide a pre-call test before starting a video call. While checking quality only preview is available, connectivity and all other features are not available.

πŸ“ˆ QOS APIs and Dashboard

Quality of service (QoS) is paramount for any video conferencing platform. Zoom remains committed to providing its users with the tools and insights necessary to ensure optimal communication experiences. To this end, Zoom offers two key avenues for monitoring and managing QoS: APIs and the Dashboard.

Zoom offers a single API that grants developers access to a wealth of QoS data. This API delivers detailed information on key performance metrics. By leveraging the API, developers can integrate custom monitoring tools, automated remediation workflows, and real-time quality feedback into their applications.

πŸŽ₯ Server side raw video streams for AI use-case

It is not possible to extract raw audio/video streams from the client or server-side using the Zoom Video SDK. This makes it difficult to create custom AI use cases such as transcription, speech-to-text, or any other type of intelligence.

βœ‰οΈ Missing data channel for collaboration and moderation controls

Zoom does not have a proper Data Channel feature within the SDK. That means you can't create collaborative features like polls, Q&A, layout changes, and moderation features like mute all, invite as a host, etc.

What you should know before moving to next API / SDK?

  • List your requirements and prioritize SDKs based on match score
  • Create a small POC and test the platform
  • Check out their support in the community and their knowledge of the space.
  • Check out the latest releases and continuous updates for your industry.
  • Get a demo with their team and see how committed they are to your future roadmap and what they're building for the next couple of quarters.
  • Check if they have been laid off in the last few quarters and see if they will be around for long.
  • A red flag is people trying to sell demos instead of explaining what's available and not available compared to Twilio.
  • Another red flag is getting onboard by offering migration credits and offers, trust me it's not worth it.
  • Make sure you invest in one vendor rather than buying from multiple because in the long run one will be able to justify the usage, business needs and relationships.

πŸ‘©β€πŸ’» Developer first approach at Video SDK

Video SDK is solving one problem for the best developer experience, reliability, and security of real-time video infrastructure. Compared to Zoom, we have a rich, highly flexible, and developer-friendly SDK. Here is the comparison:

Features Zoom SDK Video SDK
Globally Connected Regions No Yes
Video Tiles Rendering In-flexible Flexible
Raw Media Access No Yes
SDK Binary Size 100mb+ 20mb
Max Resolution 720p 2k+
Browser Compatibility 10% browsers 98% browsers
Sender / Receiver media track subscription No Yes
QOS API / Dahsboard No Yes
Pre-call Check No Yes
Data Channel No Yes

Follow the migration guide from Twilio to Video SDK, and at least start building a POC to see what works for you and what doesn't. Here are some references to do the same:

Check out Migration guide that we have written:

That's all for today, feel free to reach out if you need any help to navigate through solution, Talk to Our Migration Expert

As I said earlier here is the link about "Domino called exit(); for Twilio's Programmable Video"

Until next time, see you.