Self-hosted transcription server for 14 Indian languages.
WebSocket streaming + Batch REST API. Zero API costs. Full data sovereignty.
A single container on port 8080 serves both real-time streaming via WebSocket and async batch transcription via REST — designed for every workflow.
indic-conformer-600m-multilingual
model — 600M parameters fine-tuned on Indian language corpora, supporting CTC and RNNT decoding.
/health
endpoint is always exempt for Cloud Run probes. Backwards-compatible when key is unset.From Hindi to Malayalam, Telugu to Sanskrit — a single 600M parameter model handles the full breadth of India's linguistic landscape.
Local setup, Docker, or Cloud Run — pick your environment. The deploy script handles APIs, Artifact Registry, Cloud Build, and deployment automatically.
$ export GCP_PROJECT_ID=my-project $ export HF_TOKEN=hf_xxxx $ ./deploy.sh → Enabling GCP APIs... → Building via Cloud Build (~15 min)... → Deploying to asia-south1... ✓ Service URL: https://vexyl-stt-xxx.run.app ✓ WebSocket: wss://vexyl-stt-xxx.run.app
# One-command setup $ ./setup.sh Downloads model, sets up venv... # Start the server $ ./run.sh ✓ Listening on ws://127.0.0.1:8091 # Health check $ curl http://localhost:8091/health {"status":"ok","model":"indic-conformer..."}
# Build (bakes model at build time) $ docker build \ --build-arg HF_TOKEN=$HF_TOKEN \ -t vexyl-stt . # Run with API key $ docker run -p 8080:8080 \ -e VEXYL_STT_API_KEY=secret \ vexyl-stt
// .env VEXYL_STT_URL=wss://vexyl-stt-xxx.run.app VEXYL_STT_API_KEY=your-secret STT_PROVIDER=vexyl-stt // Usage const stt = new VexylSTT('ml-IN'); stt.onTranscript = text => console.log(text); await stt.connect(); stt.sendAudio(pcmBuffer);
Submit audio files and poll for results. CORS-enabled, API key protected, file-format agnostic.
# 1. Submit a job $ curl -X POST https://vexyl-stt-xxx.run.app/batch/transcribe \ -H "X-API-Key: your-secret" \ -F "file=@recording.wav" \ -F "language_code=hi-IN" {"job_id":"batch_a1b2c3d4e5f6","status":"queued","language":"hi-IN","audio_duration":4.52} # 2. Poll for completion $ curl https://vexyl-stt-xxx.run.app/batch/status/batch_a1b2c3d4e5f6 \ -H "X-API-Key: your-secret" {"job_id":"batch_a1b2c3d4e5f6","status":"completed","transcript":"नमस्ते दुनिया","latency_ms":320} # Health check (no auth required) $ curl https://vexyl-stt-xxx.run.app/health {"status":"ok","active_sessions":0,"batch_jobs_queued":0,"uptime_seconds":42.3}
| Limit | Value | Notes |
|---|---|---|
| Max file size | 25 MB | HTTP 413 returned if exceeded |
| Max audio duration | 5 minutes | HTTP 400 with duration error |
| Max pending jobs | 1,000 | HTTP 429 when queue is full |
| Job result TTL | 1 hour | Cleaned up every 5 minutes |
| Supported formats | WAV · MP3 · FLAC · OGG · M4A | ffmpeg fallback for MP3/M4A |
Cloud Run bills per-second. With --min-instances=0, you pay
nothing when there's no traffic. The free tier covers most light usage entirely.
| Usage | Requests / Month | Estimated Cost |
|---|---|---|
| Light — Testing / Dev | ~100 | ~$0.06 |
| Medium — Internal Tool | ~1,000 | ~$0.60 |
| Heavy — Production | ~10,000 | ~$6.00 |
| Always-Warm (min-instances=1) | Any | ~$50–70 / month |
| GCP Free Tier | First 180K vCPU-sec + 360K GiB-sec | FREE |
Apache 2.0 licensed. Self-host it, fork it, integrate it into your stack. Contributions welcome.