Open Source AI Voice Agent SDK
Integrate voice into your apps with VideoSDK's AI Agents. Connect your chosen LLMs & TTS. Build once, deploy across all platforms.
Star us on GitHubOverview
Kokorotts is a powerful Text-to-Speech (TTS) AI generator that converts text into high-quality, natural-sounding audio. Built on the efficient Kokoro model with only 82 million parameters, it offers clarity and precision rivaling much larger models. Kokorotts is fully open-source under Apache 2.0 and allows free use for both commercial and personal applications. Designed for speed and minimal resource consumption, it's ideal for deployments with limited hardware or real-time needs. The platform currently supports English—including both British and American accents—with plans for future multilingual support. Additionally, Kokorotts converts diverse document types (EPUB, PDF, TXT) into audio formats (MP3, WAV), making it a versatile tool for content creation and accessibility.
How It Works
- Installation:
- Install via pip/uv for quick GPU-accelerated inference (Python 3.12+ required).
- Model Loading:
- Load the Kokoro model and choose among various voice packs.
- Voice Configuration:
- Blend multiple speakers, select from 40+ presets, shift pitch, and control speaking rate.
- Speech Generation:
- Input your text for 24kHz audio output.
- Document Conversion:
- Transform full EPUB chapters, PDF pages, or TXT files into MP3 or WAV with chapter metadata.
Use Cases
Content Creation
Convert books, PDFs, and raw text into high-quality audio for audiobooks, podcasts, or accessible content.
Voice-Enabled Applications
Develop local, privacy-focused, and offline voice apps, pairing TTS with ASR for conversational agents.
Educational & Commercial Solutions
Integrate superior, open-source TTS into business, educational, or multilingual platforms without proprietary restrictions.
Features & Benefits
- Superior audio clarity and precision
- Open-source under Apache 2.0 (free for commercial & personal use)
- Compact model (82M parameters) for fast processing
- Wide selection of voices—including British and American English
- Multilingual-ready architecture (English optimized, Spanish & French support)
- Supports real-time applications and ONNX integration
- Converts EPUB, PDF, TXT into 24-bit WAV and 192kbps MP3
- Voice customization: blending, pitch shifting, speaking rate control
- Efficient batch and parallel document processing
- Trained on high-quality, permissively licensed audio datasets
- Processes long text inputs (up to 510 tokens in one go)
Target Audience
- Developers: Who need robust, local, open-source TTS for integration, with no API cost concerns.
- Businesses: Seeking efficient, scalable TTS for commercial use or content production.
- Content Creators: Transforming books and texts into accessible audio formats.
- Researchers & Academics: Looking for a high-performance open-source model for AI voice research.
- Organisations: That require privacy-driven or offline TTS solutions in sensitive or connectivity-limited environments.
Pricing
- BASE: 200 Credits/month for $9.9/month
- PREMIUM: 400 Credits/month for $15.9/month
- PRO: 600 Credits/month for $19.9/month
- Kokorotts offers a free online experience and flexible subscription plans. You may change or cancel your plan any time by contacting support. Subscriptions auto-renew unless cancelled at least 24 hours before the end of the current period. All fees are non-refundable unless otherwise required by law or policy. Access remains for active subscriptions paid on time.
FAQs
What is Kokoro TTS?
Kokoro TTS is a text-to-speech model with just 82 million parameters that delivers high-quality, natural-sounding audio, outperforming larger models in efficiency and output.
How does Kokoro TTS compare to larger models?
Kokoro TTS surpasses models like XTTS (467M) and MetaVoice (1.2B) in clarity and precision, thanks to its efficient architecture and quality training data.
Is Kokoro TTS free to use?
Yes. Kokoro TTS is open-source and licensed under Apache 2.0, making it fully free for both commercial and personal use.
What voice options are available?
A variety of voice packs are available, including British and American English voices like Bella, Sarah, and Adam, allowing customization.
Can I use Kokoro TTS for multilingual applications?
Kokoro TTS is currently optimized for English, but its architecture supports future multilingual expansion (including Spanish and French).
What makes Kokoro TTS unique?
Kokoro TTS stands out for its small model size, open-source licensing, and unmatched performance, redefining TTS scalability with minimal computational requirements.
Open Source AI Voice Agent SDK
Integrate voice into your apps with VideoSDK's AI Agents. Connect your chosen LLMs & TTS. Build once, deploy across all platforms.
Star us on GitHub