Voice AI, Benchmarks

TTS Latency Benchmark 2025: We Tested 7 Text-to-Speech APIs for Voice AI

vexyl.ai

December 10, 2025

Building a real-time voice AI agent? Latency is everything.

Users expect instant responses. A 2-second delay feels like an eternity. Yet most TTS providers market “low latency” without revealing real-world performance from different regions.

We ran comprehensive benchmarks on 7 major TTS providers from both India and US servers to find out which ones actually deliver.

Spoiler: The winner depends entirely on where your server is located.

Quick Results

From US/North America Servers:

Rank	Provider	Avg Latency
1	Deepgram	436ms
2	ElevenLabs	1,041ms
3	OpenAI TTS	2,049ms

From India/Asia Servers:

Rank	Provider	Avg Latency
1	Azure Neural	363ms
2	Sarvam AI	952ms
3	ElevenLabs	1,295ms

Bottom line: Use Deepgram in the US, Azure in Asia. Simple as that.

Why We Ran This TTS Latency Benchmark

We’re building Vexyl AI Voice Gateway – an open-source voice assistant platform for Asterisk PBX. Our users deploy servers globally, and we kept hearing:

“Why is my TTS so slow?”

The answer was always: wrong provider for your region.

Most TTS benchmarks are run from US data centers. That’s useless if you’re deploying in India, Europe, or Southeast Asia.

So we tested from both locations to give you actionable data.

Test Methodology

Test Servers

India Server (Delhi):

Intel Xeon E-2356G @ 3.20GHz
4GB RAM, Debian 13
500 Mbps symmetric, 0.74ms idle latency
ISP: Airtel

US Server (Canada – OVH):

Production VPS
Low-latency network connection

Test Parameters

Iterations: 3 per text length
Text lengths: Short (32 chars), Medium (123 chars)
Metric: Time from API call to complete audio received
Cache: Disabled (measuring raw API performance)

Providers Tested

Deepgram Aura
ElevenLabs (eleven_multilingual_v2)
Azure Neural TTS
OpenAI TTS (tts-1)
Sarvam AI (bulbul:v2)
Google Cloud TTS (Chirp3-HD)
Gemini TTS

Full Results: India vs US

Side-by-Side Comparison

Provider	India Server	US Server	Difference
Deepgram	1,830ms	436ms	-76%
Azure (Central India)	363ms	1,876ms	+417%
ElevenLabs	1,295ms	1,041ms	-20%
Sarvam AI	952ms	2,292ms	+141%
OpenAI TTS	2,129ms	2,049ms	-4%

What This Tells Us

Deepgram has US-only servers – 280ms network RTT from India makes it unusable for Asian deployments
Azure regional endpoints matter – Central India region delivers 363ms from India but 1,876ms from US
ElevenLabs has decent global coverage – Consistent ~1 second latency from both regions
Sarvam AI is India-optimized – Great from India (952ms), terrible from US (2,292ms)
OpenAI TTS is slow everywhere – ~2 seconds regardless of location

Detailed Results: India Server

Network RTT to TTS API Endpoints

Before looking at TTS latency, let’s see the raw network round-trip times:

Provider	API Endpoint	Network RTT
ElevenLabs	api.elevenlabs.io	14.9ms
Google Cloud	texttospeech.googleapis.com	15.6ms
Sarvam AI	api.sarvam.ai	26.1ms
Azure	centralindia.tts.speech.microsoft.com	28.3ms
Deepgram	api.deepgram.com	280.4ms

Deepgram’s 280ms RTT from India explains everything. Each API call requires:

280ms to reach server
Processing time
280ms to return audio

That’s 560ms minimum before any processing even happens.

TTS Latency Results (India)

Provider	Short Text	Medium Text	Average
Azure Neural	326ms	399ms	363ms
Sarvam AI	774ms	1,130ms	952ms
ElevenLabs	1,025ms	1,564ms	1,295ms
Deepgram	1,665ms	1,994ms	1,830ms
OpenAI TTS	1,859ms	2,399ms	2,129ms

Winner from India: Azure Neural TTS (363ms)

Detailed Results: US Server

TTS Latency Results (US/Canada)

Provider	Short Text	Medium Text	Average
Deepgram	397ms	474ms	436ms
ElevenLabs	796ms	1,286ms	1,041ms
Azure (Central India)	1,712ms	2,040ms	1,876ms
OpenAI TTS	1,963ms	2,134ms	2,049ms
Sarvam AI	1,867ms	2,717ms	2,292ms

Winner from US: Deepgram (436ms)

Provider Deep Dives

Deepgram Aura

Best for: US/North America deployments

Pros:

Fastest from US (436ms)
Cost-effective ($0.015/1K chars)
Good voice quality
Simple REST API

Cons:

US-only servers (280ms RTT from Asia)
Not suitable for global deployments without edge routing

Verdict: Excellent choice if your servers are in North America. Avoid for Asia/India.

Azure Neural TTS

Best for: Regional deployments (use local Azure region)

Pros:

Fastest from Asia when using regional endpoints (363ms)
Multiple regions available (East US, Central India, etc.)
Enterprise-grade reliability
SSML support for fine control

Cons:

Must provision correct region
Slightly more complex setup
Regional keys don’t work cross-region

Verdict: Best option for Asia deployments. Use the Azure region closest to your server.

ElevenLabs

Best for: Voice quality priority, global deployments

Pros:

Best voice quality (subjectively)
Decent global latency (~1 second)
WebSocket streaming available
Voice cloning capabilities

Cons:

Higher latency than Deepgram/Azure
More expensive
Rate limits on concurrent requests

Verdict: Choose when voice quality matters more than speed. Good middle-ground for global deployments.

Sarvam AI

Best for: Indian language support

Pros:

Native Indian language support (Hindi, Tamil, Telugu, etc.)
India-based servers (low latency from India)
Good for regional Indian deployments

Cons:

Slow from outside India
Limited to Indian languages
Smaller voice selection

Verdict: Best choice for Indian language voice AI. Not suitable for global English deployments.

OpenAI TTS

Best for: Simplicity (if already using OpenAI)

Pros:

Simple API (same as other OpenAI services)
Decent voice quality

Cons:

Consistently slow (~2 seconds)
No regional optimization
Not suitable for real-time voice AI

Verdict: Only use if you’re already locked into OpenAI ecosystem and latency isn’t critical.

Google Cloud TTS (Chirp3-HD)

Best for: Batch processing, quality priority

Results from our full benchmark:

Short: 799ms
Medium: 1,375ms
Long: 3,438ms

Verdict: Chirp3-HD is too slow for real-time. Use Neural2 voices instead for voice agents.

Gemini TTS

Best for: Experimental use only

Results:

Short: 3,301ms
Medium: 5,243ms
Long: 11,512ms

Verdict: Not production-ready. Avoid for voice applications.

Latency Thresholds for Voice AI

How do users perceive different latencies?

Latency	User Perception	Suitability
<200ms	Imperceptible	Excellent
200-400ms	Natural conversation	Good
400-600ms	Slightly noticeable	Acceptable
600-800ms	Feels slow	Use with caching
>800ms	Frustrating	Not recommended

Based on our tests:

Deepgram (US) and Azure (regional) are the only providers consistently under 500ms
Everything else requires TTS caching for acceptable user experience

Recommendations by Use Case

Real-Time Voice Agents (No Caching)

US/North America:

Deepgram (436ms)
ElevenLabs (1,041ms) – acceptable with streaming

India/Asia:

Azure Neural – Central India region (363ms)
Sarvam AI (952ms) – for Indian languages

Europe:

Azure Neural – West Europe region
ElevenLabs (global CDN)

Voice Agents with TTS Caching

If you implement response caching, provider choice matters less:

Cache hit: 3-5ms
Cache miss: 1-3 seconds

For survey bots or IVR systems with repetitive responses, 90%+ cache hit rates are achievable.

Batch Processing / Offline

When latency doesn’t matter:

ElevenLabs – best quality
Google Chirp3-HD – high quality
Any provider based on cost

How to Reduce TTS Latency

1. Choose the Right Provider for Your Region

This is the biggest factor. Wrong region = 500ms+ penalty.

2. Implement TTS Caching

Cache common responses. A cache hit is 3-5ms vs 1-2 seconds for API calls.

// Example: Simple TTS cache key
const cacheKey = crypto
  .createHash('md5')
  .update(`${text}-${voice}-${language}`)
  .digest('hex');

3. Use Streaming Where Available

ElevenLabs and Deepgram support WebSocket streaming. Time-to-first-byte is faster than waiting for complete audio.

4. Optimize Text Length

Shorter text = faster TTS. Break long responses into chunks and stream them.

5. Pre-warm Common Phrases

For IVR systems, pre-generate audio for greetings and common responses during off-peak hours.

Our Testing Tools

We’ve open-sourced our benchmark scripts:

Standalone TTS Benchmark:

# Test from any server with Node.js
node standalone-tts-benchmark.js

Tests Deepgram, ElevenLabs, Azure, Sarvam, and OpenAI with hardcoded iterations.

Full Benchmark Suite:

# Includes network tests, speedtest, traceroute
./run-tts-benchmark.sh 5 en-IN

Available in our GitHub repository.

Conclusion

There’s no single “best” TTS provider. The right choice depends on:

Server location – Use Deepgram in US, Azure in Asia
Language requirements – Sarvam for Indian languages
Quality vs speed tradeoff – ElevenLabs for quality, Deepgram/Azure for speed
Budget – Deepgram is most cost-effective

For our Vexyl AI Voice Gateway, we recommend:

Default: Azure Neural with regional endpoint matching your server
US deployments: Deepgram
Indian languages: Sarvam AI
Premium quality: ElevenLabs with caching

About Vexyl AI

Vexyl AI builds open-source voice AI infrastructure. Our Voice Gateway connects any LLM to phone systems via Asterisk, with support for multiple STT and TTS providers.

Features:

Multi-provider STT/TTS support
Real-time voice activity detection
TTS caching for low latency
Human-in-the-loop call transfer
WebSocket and AudioSocket protocols

Get Started with Vexyl AI Voice Gateway

FAQ

Which TTS provider has the lowest latency?

It depends on location. Deepgram is fastest from US (436ms), Azure Neural is fastest from Asia (363ms) when using regional endpoints.

Why is Deepgram slow from India?

Deepgram only has US servers. The 280ms network round-trip time from India adds 560ms+ to every request.

Is ElevenLabs good for voice AI?

ElevenLabs has the best voice quality but ~1 second latency. It’s acceptable for voice AI with streaming, but not the fastest option.

What’s a good TTS latency for voice agents?

Under 500ms is ideal. 500-800ms is acceptable with caching. Over 1 second feels noticeably slow to users.

Should I use Google TTS Chirp3-HD?

No, Chirp3-HD is too slow (3+ seconds for long text). Use Google Neural2 voices instead for real-time applications.

Last updated: December 2025

Benchmark data collected using Vexyl AI Voice Gateway testing tools. All tests performed on production servers with real API calls.

Quick Results

From US/North America Servers:

From India/Asia Servers:

Why We Ran This TTS Latency Benchmark

Test Methodology

Test Servers

Test Parameters

Providers Tested

Full Results: India vs US

Side-by-Side Comparison

What This Tells Us

Detailed Results: India Server

Network RTT to TTS API Endpoints

TTS Latency Results (India)

Detailed Results: US Server

TTS Latency Results (US/Canada)

Provider Deep Dives

Deepgram Aura

Azure Neural TTS

ElevenLabs

Sarvam AI

OpenAI TTS

Google Cloud TTS (Chirp3-HD)

Gemini TTS

Latency Thresholds for Voice AI

Recommendations by Use Case

Real-Time Voice Agents (No Caching)

Voice Agents with TTS Caching

Batch Processing / Offline

How to Reduce TTS Latency

1. Choose the Right Provider for Your Region

2. Implement TTS Caching

3. Use Streaming Where Available

4. Optimize Text Length

5. Pre-warm Common Phrases

Our Testing Tools

Conclusion

About Vexyl AI

FAQ

Which TTS provider has the lowest latency?

Why is Deepgram slow from India?

Is ElevenLabs good for voice AI?

What’s a good TTS latency for voice agents?

Should I use Google TTS Chirp3-HD?

Leave a Reply Cancel reply