Open Source · Apache 2.0

VEXYL-STT
Indian Language
Speech‑to‑Text

Self-hosted transcription server for 14 Indian languages.
WebSocket streaming + Batch REST API. Zero API costs. Full data sovereignty.

⭐ Star on GitHub See How It Works Deploy to Cloud Run

600M Parameters

14 Languages

~150ms Inference Latency

$0 API Cost

Architecture

Two modes,
one port.

A single container on port 8080 serves both real-time streaming via WebSocket and async batch transcription via REST — designed for every workflow.

⚡

WebSocket Streaming

Real-time transcription with energy-based VAD. Stream 16kHz 16-bit mono PCM audio and receive JSON transcripts the moment speech is detected — no external VAD dependency required.

Energy-based Voice Activity Detection

Sub-300ms end-to-end latency on CPU

Up to 50 concurrent WebSocket connections

Auto-flush on silence (0.6s threshold)

📦

Batch REST API

Submit an audio file and poll for results. Handles cloud cold starts gracefully — the job queues instantly, the model loads in the background. Clients never wait in place.

WAV, MP3, FLAC, OGG, M4A support

Up to 25 MB / 5-minute audio files

1,000 concurrent pending jobs

Auto-cleanup after 1-hour TTL

// WebSocket session lifecycle

SERVER → {"type":"ready","model":"indic-conformer-600m-multilingual"}

CLIENT → {"type":"start","lang":"ml-IN","session_id":"abc123"}

CLIENT → [binary PCM: 16kHz · 16-bit · mono] // stream audio chunks

SERVER → {"type":"final","text":"നമസ്കാരം","lang":"ml-IN","latency_ms":148}

CLIENT → {"type":"stop"} // flush remaining audio

SERVER → {"type":"stopped"}

Why VEXYL-STT

Built for production,
open by design.

01 🔒

Full Data Sovereignty

Audio never leaves your infrastructure. Deploy on-premise or in your own cloud account. No third-party API calls, no data sharing, no per-minute billing.

02 🧠

ai4bharat Model

Powered by the indic-conformer-600m-multilingual model — 600M parameters fine-tuned on Indian language corpora, supporting CTC and RNNT decoding.

03 ☁️

Scale to Zero

Designed for Google Cloud Run with session affinity, CPU boost, and a batch API that absorbs cold starts. Pay $0 when idle. Scale to 250 connections across 5 instances.

04 🛡️

API Key Auth

Timing-safe shared-secret authentication on every endpoint. The /health endpoint is always exempt for Cloud Run probes. Backwards-compatible when key is unset.

05 🔌

Voice Gateway Integration

Plug-and-play with the VEXYL AI Voice Gateway. Replace or supplement cloud STT providers (Sarvam, Deepgram, Groq) with zero code changes using the drop-in client library.

06 🐳

Docker-First

The ~3.5 GB image bakes the model at build time — no 2.4 GB download on every cold start. One-command Cloud Build deployment. ffmpeg included for MP3/M4A support.

Coverage

14 Indian languages,
one model.

From Hindi to Malayalam, Telugu to Sanskrit — a single 600M parameter model handles the full breadth of India's linguistic landscape.

हिं Hindi hi-IN

മല Malayalam ml-IN

தமி Tamil ta-IN

తెలు Telugu te-IN

ಕನ್ Kannada kn-IN

বাং Bengali bn-IN

ગુ Gujarati gu-IN

मरा Marathi mr-IN

ਪੰਜ Punjabi pa-IN

ଓଡ଼ Odia or-IN

অস Assamese as-IN

اردو Urdu ur-IN

संस् Sanskrit sa-IN

नेपा Nepali ne-IN

Deployment

Three commands
to production.

Local setup, Docker, or Cloud Run — pick your environment. The deploy script handles APIs, Artifact Registry, Cloud Build, and deployment automatically.

Serverless — Scale to Zero

deploy.sh

$ export GCP_PROJECT_ID=my-project
$ export HF_TOKEN=hf_xxxx
$ ./deploy.sh

→ Enabling GCP APIs...
→ Building via Cloud Build (~15 min)...
→ Deploying to asia-south1...

✓ Service URL: https://vexyl-stt-xxx.run.app
✓ WebSocket:   wss://vexyl-stt-xxx.run.app

💻 Local / Self-Hosted

On-Premise Setup

bash

# One-command setup
$ ./setup.sh
  Downloads model, sets up venv...

# Start the server
$ ./run.sh
✓ Listening on ws://127.0.0.1:8091

# Health check
$ curl http://localhost:8091/health
{"status":"ok","model":"indic-conformer..."}

🐳 Docker

Containerised Deployment

docker

# Build (bakes model at build time)
$ docker build \
    --build-arg HF_TOKEN=$HF_TOKEN \
    -t vexyl-stt .

# Run with API key
$ docker run -p 8080:8080 \
    -e VEXYL_STT_API_KEY=secret \
    vexyl-stt

⚙️ Node.js Client

Voice Gateway Integration

vexyl-stt-client.js

// .env
VEXYL_STT_URL=wss://vexyl-stt-xxx.run.app
VEXYL_STT_API_KEY=your-secret
STT_PROVIDER=vexyl-stt

// Usage
const stt = new VexylSTT('ml-IN');
stt.onTranscript = text => console.log(text);
await stt.connect();
stt.sendAudio(pcmBuffer);

API Reference

Batch API

Submit audio files and poll for results. CORS-enabled, API key protected, file-format agnostic.

Batch Transcription — Submit → Poll → Result

# 1. Submit a job
$ curl -X POST https://vexyl-stt-xxx.run.app/batch/transcribe \
     -H "X-API-Key: your-secret" \
     -F "file=@recording.wav" \
     -F "language_code=hi-IN"

{"job_id":"batch_a1b2c3d4e5f6","status":"queued","language":"hi-IN","audio_duration":4.52}

# 2. Poll for completion
$ curl https://vexyl-stt-xxx.run.app/batch/status/batch_a1b2c3d4e5f6 \
     -H "X-API-Key: your-secret"

{"job_id":"batch_a1b2c3d4e5f6","status":"completed","transcript":"नमस्ते दुनिया","latency_ms":320}

# Health check (no auth required)
$ curl https://vexyl-stt-xxx.run.app/health
{"status":"ok","active_sessions":0,"batch_jobs_queued":0,"uptime_seconds":42.3}

Limit	Value	Notes
Max file size	25 MB	HTTP 413 returned if exceeded
Max audio duration	5 minutes	HTTP 400 with duration error
Max pending jobs	1,000	HTTP 429 when queue is full
Job result TTL	1 hour	Cleaned up every 5 minutes
Supported formats	WAV · MP3 · FLAC · OGG · M4A	ffmpeg fallback for MP3/M4A

Cost

~$0.0006 per request.
$0 when idle.

Cloud Run bills per-second. With --min-instances=0, you pay nothing when there's no traffic. The free tier covers most light usage entirely.

Usage	Requests / Month	Estimated Cost
Light — Testing / Dev	~100	~$0.06
Medium — Internal Tool	~1,000	~$0.60
Heavy — Production	~10,000	~$6.00
Always-Warm (min-instances=1)	Any	~$50–70 / month
GCP Free Tier	First 180K vCPU-sec + 360K GiB-sec	FREE

Open Source

Start transcribing
in three commands.

Apache 2.0 licensed. Self-host it, fork it, integrate it into your stack. Contributions welcome.

View on GitHub VEXYL Voice Gateway →

Apache 2.0 Python 3.10+ 14 Languages WebSocket + REST Docker Ready Cloud Run Ready

VEXYL-STT Indian Language Speech‑to‑Text

Two modes,one port.

Built for production,open by design.

14 Indian languages,one model.

Three commandsto production.