Open Source · Apache 2.0

VEXYL-STT
Indian Language
Speech‑to‑Text

Self-hosted transcription server for 14 Indian languages.
WebSocket streaming + Batch REST API. Zero API costs. Full data sovereignty.

600M Parameters
14 Languages
~150ms Inference Latency
$0 API Cost

Two modes,
one port.

A single container on port 8080 serves both real-time streaming via WebSocket and async batch transcription via REST — designed for every workflow.

WebSocket Streaming
Real-time transcription with energy-based VAD. Stream 16kHz 16-bit mono PCM audio and receive JSON transcripts the moment speech is detected — no external VAD dependency required.
Energy-based Voice Activity Detection
Sub-300ms end-to-end latency on CPU
Up to 50 concurrent WebSocket connections
Auto-flush on silence (0.6s threshold)
📦
Batch REST API
Submit an audio file and poll for results. Handles cloud cold starts gracefully — the job queues instantly, the model loads in the background. Clients never wait in place.
WAV, MP3, FLAC, OGG, M4A support
Up to 25 MB / 5-minute audio files
1,000 concurrent pending jobs
Auto-cleanup after 1-hour TTL
// WebSocket session lifecycle
SERVER {"type":"ready","model":"indic-conformer-600m-multilingual"}
CLIENT {"type":"start","lang":"ml-IN","session_id":"abc123"}
CLIENT [binary PCM: 16kHz · 16-bit · mono] // stream audio chunks
SERVER {"type":"final","text":"നമസ്കാരം","lang":"ml-IN","latency_ms":148}
CLIENT {"type":"stop"} // flush remaining audio
SERVER {"type":"stopped"}

Built for production,
open by design.

01 🔒
Full Data Sovereignty
Audio never leaves your infrastructure. Deploy on-premise or in your own cloud account. No third-party API calls, no data sharing, no per-minute billing.
02 🧠
ai4bharat Model
Powered by the indic-conformer-600m-multilingual model — 600M parameters fine-tuned on Indian language corpora, supporting CTC and RNNT decoding.
03 ☁️
Scale to Zero
Designed for Google Cloud Run with session affinity, CPU boost, and a batch API that absorbs cold starts. Pay $0 when idle. Scale to 250 connections across 5 instances.
04 🛡️
API Key Auth
Timing-safe shared-secret authentication on every endpoint. The /health endpoint is always exempt for Cloud Run probes. Backwards-compatible when key is unset.
05 🔌
Voice Gateway Integration
Plug-and-play with the VEXYL AI Voice Gateway. Replace or supplement cloud STT providers (Sarvam, Deepgram, Groq) with zero code changes using the drop-in client library.
06 🐳
Docker-First
The ~3.5 GB image bakes the model at build time — no 2.4 GB download on every cold start. One-command Cloud Build deployment. ffmpeg included for MP3/M4A support.

14 Indian languages,
one model.

From Hindi to Malayalam, Telugu to Sanskrit — a single 600M parameter model handles the full breadth of India's linguistic landscape.

हिं Hindi hi-IN
മല Malayalam ml-IN
தமி Tamil ta-IN
తెలు Telugu te-IN
ಕನ್ Kannada kn-IN
বাং Bengali bn-IN
ગુ Gujarati gu-IN
मरा Marathi mr-IN
ਪੰਜ Punjabi pa-IN
ଓଡ଼ Odia or-IN
অস Assamese as-IN
اردو Urdu ur-IN
संस् Sanskrit sa-IN
नेपा Nepali ne-IN

Three commands
to production.

Local setup, Docker, or Cloud Run — pick your environment. The deploy script handles APIs, Artifact Registry, Cloud Build, and deployment automatically.

☁️ Cloud Run
Serverless — Scale to Zero
deploy.sh
$ export GCP_PROJECT_ID=my-project
$ export HF_TOKEN=hf_xxxx
$ ./deploy.sh

→ Enabling GCP APIs...
→ Building via Cloud Build (~15 min)...
→ Deploying to asia-south1...

✓ Service URL: https://vexyl-stt-xxx.run.app
✓ WebSocket:   wss://vexyl-stt-xxx.run.app
💻 Local / Self-Hosted
On-Premise Setup
bash
# One-command setup
$ ./setup.sh
  Downloads model, sets up venv...

# Start the server
$ ./run.sh
✓ Listening on ws://127.0.0.1:8091

# Health check
$ curl http://localhost:8091/health
{"status":"ok","model":"indic-conformer..."}
🐳 Docker
Containerised Deployment
docker
# Build (bakes model at build time)
$ docker build \
    --build-arg HF_TOKEN=$HF_TOKEN \
    -t vexyl-stt .

# Run with API key
$ docker run -p 8080:8080 \
    -e VEXYL_STT_API_KEY=secret \
    vexyl-stt
⚙️ Node.js Client
Voice Gateway Integration
vexyl-stt-client.js
// .env
VEXYL_STT_URL=wss://vexyl-stt-xxx.run.app
VEXYL_STT_API_KEY=your-secret
STT_PROVIDER=vexyl-stt

// Usage
const stt = new VexylSTT('ml-IN');
stt.onTranscript = text => console.log(text);
await stt.connect();
stt.sendAudio(pcmBuffer);

Batch API

Submit audio files and poll for results. CORS-enabled, API key protected, file-format agnostic.

Batch Transcription — Submit → Poll → Result
# 1. Submit a job
$ curl -X POST https://vexyl-stt-xxx.run.app/batch/transcribe \
     -H "X-API-Key: your-secret" \
     -F "file=@recording.wav" \
     -F "language_code=hi-IN"

{"job_id":"batch_a1b2c3d4e5f6","status":"queued","language":"hi-IN","audio_duration":4.52}

# 2. Poll for completion
$ curl https://vexyl-stt-xxx.run.app/batch/status/batch_a1b2c3d4e5f6 \
     -H "X-API-Key: your-secret"

{"job_id":"batch_a1b2c3d4e5f6","status":"completed","transcript":"नमस्ते दुनिया","latency_ms":320}

# Health check (no auth required)
$ curl https://vexyl-stt-xxx.run.app/health
{"status":"ok","active_sessions":0,"batch_jobs_queued":0,"uptime_seconds":42.3}
Limit Value Notes
Max file size 25 MB HTTP 413 returned if exceeded
Max audio duration 5 minutes HTTP 400 with duration error
Max pending jobs 1,000 HTTP 429 when queue is full
Job result TTL 1 hour Cleaned up every 5 minutes
Supported formats WAV · MP3 · FLAC · OGG · M4A ffmpeg fallback for MP3/M4A

~$0.0006 per request.
$0 when idle.

Cloud Run bills per-second. With --min-instances=0, you pay nothing when there's no traffic. The free tier covers most light usage entirely.

Usage Requests / Month Estimated Cost
Light — Testing / Dev ~100 ~$0.06
Medium — Internal Tool ~1,000 ~$0.60
Heavy — Production ~10,000 ~$6.00
Always-Warm (min-instances=1) Any ~$50–70 / month
GCP Free Tier First 180K vCPU-sec + 360K GiB-sec FREE

Start transcribing
in three commands.

Apache 2.0 licensed. Self-host it, fork it, integrate it into your stack. Contributions welcome.

Apache 2.0 Python 3.10+ 14 Languages WebSocket + REST Docker Ready Cloud Run Ready