Select the Model and voice

The Models & Voice section controls how your voice agent listens, thinks, and speaks during conversations.
First choose the Pipeline Mode:
- STT + LLM + TTS pipeline gives more control by configuring speech-to-text, AI brain, and text-to-speech separately.
- Realtime model uses a single unified setup for faster configuration.
TTS (Text-to-Speech) decides how your agent sounds — select the voice model and choose a voice that matches your brand and audience.
LLM (Large Language Model) is the brain of the agent — it controls response quality, speed, and how well instructions are followed. Choose based on speed, cost, and complexity.
STT (Speech-to-Text) converts user voice into text — select a high-accuracy model and the correct language for better understanding.
Enable Noise Cancellation to remove background noise and improve call clarity, especially for noisy environments.
Properly configuring pipeline mode, models, voice, and noise settings ensures your voice agent sounds natural, intelligent, and professional.

Updated on May 21, 2026