- The Models & Voice section controls how your voice agent listens, thinks, and speaks during conversations.
- First choose the Pipeline Mode:
- STT + LLM + TTS pipeline gives more control by configuring speech-to-text, AI brain, and text-to-speech separately.
- Realtime model uses a single unified setup for faster configuration.
- TTS (Text-to-Speech) decides how your agent sounds — select the voice model and choose a voice that matches your brand and audience.
- LLM (Large Language Model) is the brain of the agent — it controls response quality, speed, and how well instructions are followed. Choose based on speed, cost, and complexity.
- STT (Speech-to-Text) converts user voice into text — select a high-accuracy model and the correct language for better understanding.
- Enable Noise Cancellation to remove background noise and improve call clarity, especially for noisy environments.
- Properly configuring pipeline mode, models, voice, and noise settings ensures your voice agent sounds natural, intelligent, and professional.