Asterisk, Voice AI

VEXYL Voice Gateway AI Asterisk/FreePBX Integration – Self-Hosted

vexyl.ai

January 18, 2026

If you’re running an Asterisk or FreePBX phone system, you’ve probably wondered how to add AI voice capabilities without tearing everything down and starting fresh. Cloud platforms like Vapi and Retell AI charge ₹4,500-₹12,000 per month for just 30,000 minutes. That’s insane when you already have working infrastructure.

Enter VEXYL AI Voice Gateway—a self-hosted solution that bridges your existing Asterisk PBX to modern AI services. Think of it as middleware that adds speech recognition, natural language processing, and text-to-speech to your phone system without replacing anything. And it runs entirely on your servers via Docker.

I’ll show you exactly how to deploy VEXYL in about 5 minutes. We’ll cover Docker installation, Asterisk integration, and real-world use cases. If you’re tired of per-minute pricing or need Indian language support, this guide is for you.

What is VEXYL AI Voice Gateway?

VEXYL is self-hosted middleware that sits between your Asterisk/FreePBX system and AI providers. It handles the complete voice AI pipeline—speech recognition (STT), language processing (LLM), and text-to-speech (TTS)—whilst your PBX continues managing call routing, transfers, and everything else you’ve already configured.

The key difference from cloud platforms? You control the infrastructure. Audio never leaves your network. You pay AI providers directly at wholesale rates. And there’s no per-minute pricing eating into your budget every month.

Why Self-Hosting Matters

Most voice AI platforms force you through their cloud infrastructure. That’s fine for simple use cases, but it creates problems:

Data sovereignty issues: Healthcare and government sectors can’t route calls through third-party servers
Unpredictable costs: Per-minute charges scale exponentially with usage
Vendor lock-in: Migrating between platforms means rebuilding everything
Limited language support: Most platforms barely support Indian regional languages

VEXYL solves these by running entirely on-premise. You bring your own API keys, choose your providers, and maintain complete control.

How Does It Compare to Cloud Platforms?

Let’s talk numbers. I’ve deployed both cloud platforms and self-hosted solutions, and the cost difference is staggering.

Platform Type	Setup Cost	Monthly (30,000 min)	Annual Total
Cloud Platforms (Vapi, Retell, Bland AI)	₹0	₹72,000	₹8,64,000
Building In-House	₹1,50,000-₹2,50,000	₹15,000	₹3,30,000
VEXYL Gateway	₹0-₹2,00,00 (one-time licence)	₹10,000 (direct API costs)	₹1,00,000

The maths becomes even more compelling when you factor in TTS caching. For survey bots or IVR systems with repetitive content, VEXYL’s cache achieves 90%+ hit rates. That ₹10,000 monthly AI cost drops to just ₹7,500.

In my experience deploying healthcare appointment reminders, we process 1,000+ monthly calls in Malayalam at roughly ₹5,000 per month. The same volume on cloud platforms would cost ₹18,000-₹24,000. That’s a 76-85% reduction.

Installation: Docker Deployment in 5 Minutes

Right, let’s get VEXYL running. You’ll need Docker installed (if you don’t have it, grab it from Docker’s official site). The entire setup takes about 5 minutes from start to finish.

Step 1: Pull the Docker Image

docker pull vexyl/vexyl-voice-gateway

docker pull vexyl/vexyl-voice-gateway

This downloads the pre-built VEXYL container from Docker Hub. The image includes all dependencies—audio processing libraries, Redis session management, and the core gateway service.

Step 2: Create Configuration Directory

mkdir -p ~/vexyl/config
cd ~/vexyl

mkdir -p ~/vexyl/config
cd ~/vexyl

We’ll store configuration files here. Keeping them outside the container means you can update VEXYL without losing your settings.

Step 3: Configure Environment Variables

Create a file called config/.env with your API credentials:

# Basic Setup
HTTP_PORT=8080
AUDIOSOCKET_PORT=8080

# AI Provider Keys
SARVAM_API_KEY=your-sarvam-key-here
GROQ_API_KEY=your-groq-key-here
OPENAI_API_KEY=your-openai-key-here

# Speech-to-Text Provider
STT_PROVIDER=groq

# Text-to-Speech Provider  
TTS_PROVIDER=sarvam
TTS_CACHE_ENABLED=true

# LLM Configuration
LLM_PROVIDER=sarvam

# Optional: Enable Barge-in
ENABLE_BARGE_IN=true
BARGE_IN_THRESHOLD=500

# Basic Setup
HTTP_PORT=8080
AUDIOSOCKET_PORT=8080

# AI Provider Keys
SARVAM_API_KEY=your-sarvam-key-here
GROQ_API_KEY=your-groq-key-here
OPENAI_API_KEY=your-openai-key-here

# Speech-to-Text Provider
STT_PROVIDER=groq

# Text-to-Speech Provider  
TTS_PROVIDER=sarvam
TTS_CACHE_ENABLED=true

# LLM Configuration
LLM_PROVIDER=sarvam

# Optional: Enable Barge-in
ENABLE_BARGE_IN=true
BARGE_IN_THRESHOLD=500

You’ll need API keys from the providers you want to use. Sarvam AI is excellent for Indian languages. Groq offers fast speech recognition. OpenAI provides GPT models if you need them.

Step 4: Run VEXYL Container

docker run -d \
  --name vexyl-gateway \
  -p 8080:8080 \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/cache:/app/cache \
  --restart unless-stopped \
  vexyl/vexyl-voice-gateway

docker run -d \
  --name vexyl-gateway \
  -p 8080:8080 \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/cache:/app/cache \
  --restart unless-stopped \
  vexyl/vexyl-voice-gateway

This starts VEXYL in detached mode. The --restart unless-stopped flag ensures it survives server reboots. Port 8080 is where Asterisk will connect via AudioSocket protocol.

Step 5: Verify Installation

curl http://localhost:8080/health

curl http://localhost:8080/health

You should see a JSON response confirming the gateway is running:

{
  "status": "healthy",
  "uptime": 123,
  "version": "1.0.0"
}

{
  "status": "healthy",
  "uptime": 123,
  "version": "1.0.0"
}

That’s it. VEXYL is now running and ready to handle calls.

Connecting to Asterisk via AudioSocket

Now we need to tell Asterisk to route calls through VEXYL. This uses the AudioSocket protocol—a lesser-known Asterisk feature that streams raw audio to external applications.

Edit Asterisk Dialplan

Open /etc/asterisk/extensions.conf and add this context:

[ai-assistant]
exten => 1000,1,Answer()
exten => 1000,n,Set(SESSION_UUID=${UNIQUEID})
exten => 1000,n,AudioSocket(${SESSION_UUID},127.0.0.1:8080)
exten => 1000,n,Hangup()

[ai-assistant]
exten => 1000,1,Answer()
exten => 1000,n,Set(SESSION_UUID=${UNIQUEID})
exten => 1000,n,AudioSocket(${SESSION_UUID},127.0.0.1:8080)
exten => 1000,n,Hangup()

This creates extension 1000 that answers calls and immediately streams audio to VEXYL on localhost:8080. The SESSION_UUID variable helps track individual calls.

Reload Asterisk Configuration

asterisk -rx "dialplan reload"

asterisk -rx "dialplan reload"

Now dial extension 1000 from any phone connected to your Asterisk system. The AI should answer and respond to your voice. If you’ve configured Sarvam AI with Malayalam, try speaking in Malayalam—it’ll understand and respond naturally.

Supporting 10+ Indian Languages

Here’s where VEXYL truly shines compared to international platforms. Most voice AI services barely support Hindi, let alone regional languages. VEXYL integrates with Sarvam AI, which specialises in Indian languages with native-level fluency.

Supported languages include Malayalam, Hindi, Tamil, Telugu, Kannada, Bengali, Marathi, Gujarati, Odia, and Punjabi. This isn’t basic support—these are production-quality models that understand regional accents, colloquialisms, and natural speech patterns.

Configuring for Malayalam

If you’re serving customers in Kerala, here’s how to optimise for Malayalam:

# In your .env file
STT_PROVIDER=sarvam
TTS_PROVIDER=sarvam
LLM_PROVIDER=flowise

# In your .env file
STT_PROVIDER=sarvam
TTS_PROVIDER=sarvam
LLM_PROVIDER=flowise

Then in your Asterisk dialplan, pass the language code:

exten => 1001,1,Answer()
exten => 1001,n,Set(SESSION_UUID=${UNIQUEID})
exten => 1001,n,Set(CURL_RESULT=${CURL(http://127.0.0.1:8080/session/${SESSION_UUID}/metadata,language_code=ml-IN)})
exten => 1001,n,AudioSocket(${SESSION_UUID},127.0.0.1:8080)
exten => 1001,n,Hangup()

exten => 1001,1,Answer()
exten => 1001,n,Set(SESSION_UUID=${UNIQUEID})
exten => 1001,n,Set(CURL_RESULT=${CURL(http://127.0.0.1:8080/session/${SESSION_UUID}/metadata,language_code=ml-IN)})
exten => 1001,n,AudioSocket(${SESSION_UUID},127.0.0.1:8080)
exten => 1001,n,Hangup()

This tells VEXYL to use Malayalam for both speech recognition and synthesis. The quality is remarkable—we’ve deployed this in healthcare settings where elderly patients needed appointment reminders in their native language. 95% satisfaction rates speak for themselves.

Integration with Flowise for No-Code AI Workflows

One of VEXYL’s best features is native Flowise integration. If you’re not familiar, Flowise is a visual workflow builder for LLM applications. It lets you design conversation flows, add knowledge bases, and connect to databases—all without writing code.

Why This Matters

Traditional voice AI platforms lock you into their conversation design tools. With Flowise, your business analysts can build and modify AI conversations independently. Want to add product recommendations? Pull inventory from your database? Search through documentation? Just connect the blocks visually in Flowise.

Quick Flowise Setup

First, run Flowise using Docker:

docker run -d \
  --name flowise \
  -p 3001:3001 \
  -v $(pwd)/flowise:/root/.flowise \
  flowiseai/flowise

docker run -d \
  --name flowise \
  -p 3001:3001 \
  -v $(pwd)/flowise:/root/.flowise \
  flowiseai/flowise

Open http://localhost:3001, create your conversation flow, and grab the Flow ID from the URL. Then configure VEXYL:

# In your .env file
LLM_PROVIDER=flowise
FLOWISE_API_URL=http://localhost:3001
FLOWISE_FLOW_ID=your-flow-id-here

# In your .env file
LLM_PROVIDER=flowise
FLOWISE_API_URL=http://localhost:3001
FLOWISE_FLOW_ID=your-flow-id-here

Now when calls come through VEXYL, they’ll use your Flowise workflow for conversation logic. This means you can implement RAG (retrieval-augmented generation) with your knowledge base, integrate with external APIs, or whatever complex flow you’ve designed—all working over the phone.

Adding n8n for Workflow Automation

If Flowise handles conversation design, n8n handles what happens after the call. It’s a workflow automation platform that can create CRM tickets, send emails, update databases, trigger notifications—basically any business process.

Example Use Case

Imagine a customer support line where:

Customer calls and speaks to VEXYL AI
AI collects issue details using Flowise workflow
Call ends and conversation data flows to n8n
n8n creates a Zendesk ticket, updates Salesforce, sends Slack notification to support team, and schedules a follow-up call if needed

All without writing custom integration code.

Setting Up n8n

docker run -d \
  --name n8n \
  -p 5678:5678 \
  -v $(pwd)/n8n:/home/node/.n8n \
  n8nio/n8n

docker run -d \
  --name n8n \
  -p 5678:5678 \
  -v $(pwd)/n8n:/home/node/.n8n \
  n8nio/n8n

Create a webhook workflow in n8n (available at http://localhost:5678), then configure VEXYL to use it:

LLM_PROVIDER=n8n
N8N_WEBHOOK_URL=http://localhost:5678/webhook/your-webhook-id

LLM_PROVIDER=n8n
N8N_WEBHOOK_URL=http://localhost:5678/webhook/your-webhook-id

Now VEXYL sends conversation transcripts to n8n, which can execute any workflow you’ve designed. The beauty is that this works alongside Flowise—use Flowise for real-time conversation, n8n for post-call automation.

WebRTC Support for Browser-Based Calling

Whilst AudioSocket handles traditional phone systems beautifully, VEXYL also supports WebRTC for browser-based applications. This lets you add “Talk to AI” buttons on websites or voice features in mobile apps.

Enabling WebSocket Server

Add these to your .env file:

WEBSOCKET_AUDIO_ENABLED=true
WEBSOCKET_AUDIO_PORT=8082
WEBSOCKET_AUDIO_ALLOWED_ORIGINS=https://yourwebsite.com

WEBSOCKET_AUDIO_ENABLED=true
WEBSOCKET_AUDIO_PORT=8082
WEBSOCKET_AUDIO_ALLOWED_ORIGINS=https://yourwebsite.com

Using the JavaScript SDK

Install the VEXYL SDK from NPM:

npm install @vexyl.ai/aivg-sdk

npm install @vexyl.ai/aivg-sdk

Then integrate it into your web application:

import AIVoiceGateway from '@vexyl.ai/aivg-sdk';

const voice = new AIVoiceGateway({
    serverUrl: 'ws://localhost:8082',
    language: 'en-IN',
    onTranscript: (text, { isFinal }) => {
        console.log('User said:', text);
    },
    onResponse: (text) => {
        console.log('AI said:', text);
    }
});

await voice.connect();
await voice.startListening();

import AIVoiceGateway from '@vexyl.ai/aivg-sdk';

const voice = new AIVoiceGateway({
    serverUrl: 'ws://localhost:8082',
    language: 'en-IN',
    onTranscript: (text, { isFinal }) => {
        console.log('User said:', text);
    },
    onResponse: (text) => {
        console.log('AI said:', text);
    }
});

await voice.connect();
await voice.startListening();

This gives you the same voice AI capabilities in web browsers that you have on phone calls. Same conversation flows, same Flowise integration, same n8n workflows—just accessible through WebRTC instead of traditional telephony.

Cost Optimisation with TTS Caching

Here’s a feature that dramatically reduces costs for specific use cases: TTS caching. VEXYL caches generated speech on disc, so repeated phrases don’t hit the TTS API repeatedly.

When This Saves Massive Money

Survey bots: Same questions every call → 95%+ cache hit rate
IVR menus: Static options → 100% cache hit rate
Appointment reminders: Template messages → 80%+ cache hit rate
FAQ bots: Repeated answers → 70%+ cache hit rate

The performance improvement is staggering. First call generates speech in about 800ms. Cached calls? 2-10ms. That’s a 98-99% reduction in response time.

Enabling TTS Cache

TTS_CACHE_ENABLED=true
TTS_CACHE_DIR=/app/cache/tts
TTS_CACHE_MAX_SIZE_MB=5000
TTS_CACHE_MAX_AGE_DAYS=90

TTS_CACHE_ENABLED=true
TTS_CACHE_DIR=/app/cache/tts
TTS_CACHE_MAX_SIZE_MB=5000
TTS_CACHE_MAX_AGE_DAYS=90

For a survey bot making 1,000 calls daily with 10 questions each, you’d normally generate 10,000 TTS responses. With caching, you generate 10 unique responses once, then serve 9,990 from cache. That’s ₹3,000/day reduced to ₹150/day in TTS costs.

Real-World Use Cases

Let me share some deployments I’ve worked on to show you what’s possible with VEXYL.

Healthcare Appointment Reminders

A Kerala hospital needed automated appointment reminders in Malayalam. They were manually calling 1,000+ patients monthly—expensive and inconsistent.

We deployed VEXYL with Sarvam AI for Malayalam, Flowise for conversation logic, and n8n to pull appointment data from their HMS. The bot calls patients 24 hours before appointments, confirms attendance, and provides directions if needed.

Results: 95% patient satisfaction, 30% reduction in no-shows, ₹15,000 monthly cost vs ₹45,000 on cloud platforms. The hospital maintains complete HIPAA compliance since audio never leaves their network.

Customer Service Automation

An e-commerce company wanted 24/7 support without hiring night shift agents. VEXYL handles common queries (order status, returns, product info) and escalates complex issues to humans.

The escalation logic sits in Flowise. When the AI can’t help, it sets shouldEscalate: true and transfers to an agent queue. The agent receives full conversation context, so customers don’t repeat themselves.

This achieved 60% call deflection whilst maintaining customer satisfaction. That’s 18,000 of 30,000 monthly calls handled by AI, saving ₹2,70,000 annually in staffing costs.

Lead Qualification

A B2B software company receives 500+ inbound leads monthly. Their sales team wasted hours qualifying prospects who weren’t decision-makers or didn’t have budget.

VEXYL now handles initial qualification. The Flowise workflow asks about budget, timeline, decision-maker status, and company size. Hot leads (score >7/10) transfer immediately to sales. Warm leads get scheduled callbacks. Cold leads enter an email nurture sequence via n8n.

Sales team now focuses only on qualified prospects. Conversion rate improved from 12% to 28% because reps spend time on serious buyers.

Troubleshooting Common Issues

You’ll inevitably hit some issues during deployment. Here are the most common problems and solutions.

No Audio Response from AI

First, check Asterisk is actually connecting to VEXYL:

asterisk -rx "core show channels"

asterisk -rx "core show channels"

You should see your call listed with AudioSocket connection. If not, verify the dialplan is correct and port 8080 is accessible.

Next, check VEXYL logs:

docker logs -f vexyl-gateway

docker logs -f vexyl-gateway

Look for errors related to API keys, provider connectivity, or audio processing. Most issues stem from invalid credentials or network problems reaching AI providers.

High Latency (Slow Responses)

If the AI takes 5+ seconds to respond, try these optimisations:

Enable TTS caching for repetitive content
Switch to faster STT providers (Deepgram for streaming, Groq for batch)
Use lightweight LLM providers like Litebot instead of GPT-4
Enable barge-in so users can interrupt instead of waiting

The bottleneck is usually LLM response time. We’ve seen 2-3 second improvements just by switching from GPT-4 to Gemini Flash for simple use cases.

Indian Language Recognition Issues

If Sarvam AI isn’t recognising Malayalam or Hindi accurately, verify the language code in your dialplan:

Set(CURL_RESULT=${CURL(http://127.0.0.1:8080/session/${SESSION_UUID}/metadata,language_code=ml-IN)})

Set(CURL_RESULT=${CURL(http://127.0.0.1:8080/session/${SESSION_UUID}/metadata,language_code=ml-IN)})

Supported codes: ml-IN (Malayalam), hi-IN (Hindi), ta-IN (Tamil), te-IN (Telugu), kn-IN (Kannada), etc.

Also ensure you’re using Sarvam for both STT and TTS—mixing providers can create inconsistent language handling.

Production Deployment Checklist

Before going live with VEXYL in production, tick these boxes:

Use Redis for sessions: In-memory storage won’t survive restarts
Enable TTS caching: Massive performance and cost benefits
Set up monitoring: Health checks, log aggregation, alerts
Configure call transfer: Human escalation for edge cases
Set Docker resource limits: Prevent runaway memory usage
Enable HTTPS/reverse proxy: Secure WebRTC connections
Back up TTS cache directory: Rebuilding takes time
Configure IP whitelist: Restrict access to trusted sources

For production environments, I recommend running VEXYL behind nginx as a reverse proxy. This handles SSL termination, load balancing if you’re running multiple instances, and provides better security controls.

Getting Your API Keys

You’ll need credentials from the AI providers you want to use. Here’s where to get them:

Sarvam AI: https://www.sarvam.ai/ (Indian languages STT/TTS)
Groq: https://groq.com/ (Fast speech recognition)
OpenAI: https://platform.openai.com/ (GPT models)
Deepgram: https://deepgram.com/ (Real-time STT)
ElevenLabs: https://elevenlabs.io/ (Premium TTS)

Most offer free tier or credits for testing. Sarvam is particularly generous with Indian language access. Groq provides fast transcription at competitive rates. You can mix and match—use Sarvam for Indian languages, Deepgram for English, whatever optimises your specific use case.

What is VEXYL AI Voice Gateway?

VEXYL is a self-hosted middleware platform that connects Asterisk/FreePBX phone systems to modern AI services (speech recognition, language models, text-to-speech). It runs on your own servers via Docker, giving you complete control over call data and eliminating per-minute cloud charges.

How much does VEXYL cost compared to cloud platforms?

Cloud platforms like Vapi charge ₹72,000+ monthly for 30,000 minutes. VEXYL requires a one-time licence (₹50,000-₹2,00,000) plus direct AI provider costs (~₹30,000/month). With TTS caching, monthly costs drop to ₹7,500-₹15,000. That’s 87-95% savings compared to cloud solutions.

Does VEXYL support Indian regional languages?

Yes, VEXYL integrates with Sarvam AI to support 10+ Indian languages including Malayalam, Hindi, Tamil, Telugu, Kannada, Bengali, Marathi, Gujarati, Odia, and Punjabi. The quality is production-grade with natural speech recognition and synthesis.

How difficult is VEXYL to install?

Installation takes about 5 minutes with Docker. You pull the container image, configure environment variables with your API keys, run the container, and connect it to your Asterisk PBX via AudioSocket protocol. No complex compilation or dependency management required.

Can I integrate VEXYL with my existing workflows?

Absolutely. VEXYL integrates natively with Flowise (visual LLM workflows), n8n (automation platform), and supports custom webhooks. You can connect to databases, CRMs, knowledge bases, or any external API your business needs.

Final Thoughts: Why Self-Hosting Wins

After deploying both cloud and self-hosted voice AI solutions, I’m convinced self-hosting is the right approach for most enterprises. The economics favour it overwhelmingly once you pass 20,000 minutes monthly. Data sovereignty matters increasingly as regulations tighten. And vendor lock-in is a real risk when your entire phone system depends on a single cloud provider.

VEXYL gives you the best of both worlds. You get modern AI capabilities without replacing infrastructure. You maintain complete control whilst accessing 17+ provider options. And you pay predictable licensing costs instead of variable per-minute charges that scale unpredictably.

If you’re running Asterisk or FreePBX, give VEXYL a try. The Docker deployment takes 5 minutes. You can test with free tier API credits from providers. And if it works for your use case, you’ll save lakhs of rupees annually compared to cloud alternatives.

Next Steps

Visit vexyl.ai for complete documentation
Pull the Docker image: hub.docker.com/r/vexyl/vexyl-voice-gateway
Try the WebRTC SDK: npmjs.com/package/@vexyl.ai/aivg-sdk
Join community discussions on Asterisk and FreePBX forums

Questions about deployment? Drop a comment below and I’ll help you get started.

Get Started with VEXYL

What is VEXYL AI Voice Gateway?

Why Self-Hosting Matters

How Does It Compare to Cloud Platforms?

Installation: Docker Deployment in 5 Minutes

Step 1: Pull the Docker Image

Step 2: Create Configuration Directory

Step 3: Configure Environment Variables

Step 4: Run VEXYL Container

Step 5: Verify Installation

Connecting to Asterisk via AudioSocket

Edit Asterisk Dialplan

Reload Asterisk Configuration

Supporting 10+ Indian Languages

Configuring for Malayalam

Integration with Flowise for No-Code AI Workflows

Why This Matters

Quick Flowise Setup

Adding n8n for Workflow Automation

Example Use Case

Setting Up n8n

WebRTC Support for Browser-Based Calling

Enabling WebSocket Server

Using the JavaScript SDK

Cost Optimisation with TTS Caching

When This Saves Massive Money

Enabling TTS Cache

Real-World Use Cases

Healthcare Appointment Reminders

Customer Service Automation

Lead Qualification

Troubleshooting Common Issues

No Audio Response from AI

High Latency (Slow Responses)

Indian Language Recognition Issues

Production Deployment Checklist

Getting Your API Keys

What is VEXYL AI Voice Gateway?

How much does VEXYL cost compared to cloud platforms?

Does VEXYL support Indian regional languages?

How difficult is VEXYL to install?

Can I integrate VEXYL with my existing workflows?

Final Thoughts: Why Self-Hosting Wins

Next Steps

Leave a Reply Cancel reply