Speech-to-Text Providers

VEXYL supports a wide range of STT providers to ensure the best accuracy and latency for your specific language and use case.

Provider Comparison

Provider	Mode	Latency	Best For
Sarvam	Streaming	300-800ms	Indian Languages
Deepgram	Streaming	300-500ms	English, Speed
Groq	Batch	1-3s	Accuracy (Whisper)
OpenAI	Batch	2-5s	General Purpose

Configuration

To configure a provider, add the corresponding API key to your environment variables.

Sarvam (Recommended for India)

SARVAM_API_KEY=your_key
STT_PROVIDER=sarvam

Deepgram (Recommended for Speed)

DEEPGRAM_API_KEY=your_key
STT_PROVIDER=deepgram
DEEPGRAM_STT_MODEL=nova-2

Groq (Recommended for Accuracy)

GROQ_API_KEY=your_key
STT_PROVIDER=groq
GROQ_MODEL=whisper-large-v3-turbo

Auto-Selection

Set STT_PROVIDER=auto to let VEXYL automatically choose the best provider based on the session language (Sarvam for Indian languages, Groq for others).