The core loop
Key components
Workflows (Agents) The conversation logic. A workflow is a graph of nodes (conversation steps) connected by edges (conditional transitions). You define what the agent says, when it moves on, and what data it collects. Runs Every execution of a workflow creates a run. The run record holds the transcript, recording, extracted data, and cost information. Telephony The phone infrastructure. Dograh connects to your telephony provider (Twilio, Vonage, etc.) to place and receive calls. The audio streams between the caller and Dograh in real time. Transcriber (STT) Converts the caller’s speech to text in real time. Dograh sends the audio stream to your configured speech-to-text provider and uses the transcript to drive both the LLM and the final run record. LLM Provider Processes the transcript and the active node’s prompt to generate the agent’s next response. It also evaluates edge conditions to decide when to move the conversation forward. Voice Synthesizer (TTS) Converts the LLM’s text response to audio and streams it back to the caller. The choice of TTS provider and voice is configurable per agent.How it fits together
When you trigger a call:- Dograh instructs your telephony provider to dial the number
- When the caller answers, a real-time audio pipeline opens
- The caller’s speech is transcribed by the STT provider
- The transcript is sent to the LLM with the active node’s prompt and conversation history
- The LLM responds — the response is synthesized to audio by the TTS provider and streamed to the caller
- When an edge condition is met, Dograh transitions to the next node
- When an end node is reached, the call ends
- Post-call: context is extracted, webhooks fire, the run record is saved