Open Source · Apache 2.0

VEXYL-TTS
Indian Language
Text‑to‑Speech

Self-hosted synthesis server for 22 Indian languages.
WebSocket streaming + Batch REST API. Zero API costs. Full data sovereignty.

22 Languages
44+ Speakers
Sub-200ms Inference Latency
$0 API Cost

Two modes,
one port.

A single container on port 8080 serves both real-time streaming via WebSocket and async batch transcription via REST — designed for every workflow.

WebSocket Streaming
Real-time text-to-speech. Send JSON text requests and receive base64-encoded WAV audio chunks as soon as they are synthesized.
Real-time chunked synthesis
Sub-200ms first-byte latency
Up to 50 concurrent WebSocket connections
In-memory LRU cache for repeated phrases
📦
Batch REST API
Submit an audio file and poll for results. Handles cloud cold starts gracefully — the job queues instantly, the model loads in the background. Clients never wait in place.
Up to 5,000 characters per request
22 languages and 44+ pre-built voices
1,000 concurrent pending jobs
Auto-cleanup after 1-hour TTL
// WebSocket session lifecycle
SERVER {"type":"ready","model":"indic-parler-tts","sample_rate":22050}
CLIENT {"type":"synthesize","text":"നമസ്കാരം","lang":"ml-IN","style":"default","request_id":"abc123"}
SERVER {"type":"audio","request_id":"abc123","audio_b64":"...","latency_ms":2400}

Built for production,
open by design.

01 🔒
Full Data Sovereignty
Text never leaves your infrastructure. Deploy on-premise or in your own cloud account. No third-party API calls, no data sharing, no API costs.
02 🧠
ai4bharat Model
Powered by the indic-parler-tts model — fine-tuned on Indian language corpora, supporting 44+ pre-built voices and emotion control.
03 ☁️
Scale to Zero
Designed for Google Cloud Run with session affinity, CPU boost, and a batch API that absorbs cold starts. Pay $0 when idle. Scale to 250 connections across 5 instances.
04 🛡️
API Key Auth
Timing-safe shared-secret authentication on every endpoint. The /health endpoint is always exempt for Cloud Run probes. Backwards-compatible when key is unset.
05 🔌
Voice Gateway Integration
Plug-and-play with the VEXYL AI Voice Gateway. Replace or supplement cloud TTS providers (ElevenLabs, OpenAI, Deepgram) with zero code changes using the drop-in client library.
06 🐳
Docker-First
The ~6.0 GB image bakes the model at build time — no large download on every cold start. One-command Cloud Build deployment. ffmpeg included for diverse audio formats.

22 Indian languages,
one model.

From Hindi to Malayalam, Telugu to Sanskrit — a single model handles the full breadth of India's linguistic landscape.

हिं Hindi hi-IN
മല Malayalam ml-IN
தமி Tamil ta-IN
తెలు Telugu te-IN
ಕನ್ Kannada kn-IN
বাং Bengali bn-IN
ગુ Gujarati gu-IN
मरा Marathi mr-IN
ਪੰਜ Punjabi pa-IN
ଓଡ଼ Odia or-IN
অস Assamese as-IN
اردو Urdu ur-IN
संस् Sanskrit sa-IN
नेपा Nepali ne-IN
बो Bodo brx-IN
डोग Dogri doi-IN
En English en-IN
कों Konkani kok-IN
मै Maithili mai-IN
মৈ Manipuri mni-IN
ᱥᱟ Santali sat-IN
سن Sindhi sd-IN

Three commands
to production.

Local setup, Docker, or Cloud Run — pick your environment. The deploy script handles APIs, Artifact Registry, Cloud Build, and deployment automatically.

☁️ Cloud Run
Serverless — Scale to Zero
deploy.sh
$ export GCP_PROJECT_ID=my-project
$ export HF_TOKEN=hf_xxxx
$ ./deploy.sh

→ Enabling GCP APIs...
→ Building via Cloud Build (~15 min)...
→ Deploying to asia-south1...

✓ Service URL: https://vexyl-tts-xxx.run.app
✓ WebSocket:   wss://vexyl-tts-xxx.run.app
💻 Local / Self-Hosted
On-Premise Setup
bash
# One-command setup
$ ./setup.sh
  Downloads model, sets up venv...

# Start the server
$ ./run.sh
✓ Listening on ws://127.0.0.1:8092

# Health check
$ curl http://localhost:8092/health
{"status":"ok","model":"indic-parler-tts"}
🐳 Docker
Containerised Deployment
docker
# Build (bakes model at build time)
$ docker build \
    --build-arg HF_TOKEN=$HF_TOKEN \
    -t vexyl-tts .

# Run with API key
$ docker run -p 8080:8080 \
    -e VEXYL_TTS_API_KEY=secret \
    vexyl-tts
⚙️ Node.js Client
Voice Gateway Integration
vexyl-stt-client.js
// .env
VEXYL_TTS_URL=wss://vexyl-tts-xxx.run.app
VEXYL_TTS_API_KEY=your-secret
TTS_PROVIDER=vexyl-tts

// Usage
const tts = new VexylTTS('ml-IN');
await tts.connect();
tts.synthesize('നമസ്കാരം', audioChunk => play(audioChunk));

Batch API

Submit audio files and poll for results. CORS-enabled, API key protected, file-format agnostic.

Batch Transcription — Submit → Poll → Result
# 1. Submit a job
$ curl -X POST https://vexyl-tts-xxx.run.app/batch/synthesize \
     -H "X-API-Key: your-secret" \
     -H "Content-Type: application/json" \
     -d '{"text":"नमस्ते दुनिया","lang":"hi-IN"}'

{"job_id":"batch_a1b2c3d4e5f6","status":"queued","language":"hi-IN","text_length":14}

# 2. Poll for completion
$ curl https://vexyl-tts-xxx.run.app/batch/status/batch_a1b2c3d4e5f6 \
     -H "X-API-Key: your-secret"

{"job_id":"batch_a1b2c3d4e5f6","status":"completed","audio_b64":"...","latency_ms":2400}

# Health check (no auth required)
$ curl https://vexyl-tts-xxx.run.app/health
{"status":"ok","active_connections":0,"batch_jobs_queued":0,"uptime_seconds":42.3}
Limit Value Notes
Max text length 5,000 characters HTTP 400 returned if exceeded
Max pending jobs 1,000 HTTP 429 when queue is full
Job result TTL 1 hour Cleaned up every 5 minutes
Voice Styles default · warm · formal Controls the speaker selection

~$0.0006 per request.
$0 when idle.

Cloud Run bills per-second. With --min-instances=0, you pay nothing when there's no traffic. The free tier covers most light usage entirely.

Usage Requests / Month Estimated Cost
Light — Testing / Dev ~100 ~$0.06
Medium — Internal Tool ~1,000 ~$0.60
Heavy — Production ~10,000 ~$6.00
Always-Warm (min-instances=1) Any ~$50–70 / month
GCP Free Tier First 180K vCPU-sec + 360K GiB-sec FREE

Start synthesizing
in three commands.

Apache 2.0 licensed. Self-host it, fork it, integrate it into your stack. Contributions welcome.

Apache 2.0 Python 3.10+ 22 Languages WebSocket + REST Docker Ready Cloud Run Ready