Why use custom recordings?
- Reduced TTS cost — Pre-recorded audio is played directly, so you are not charged for TTS synthesis on those segments.
- Emotional variance — Real recordings carry natural intonation and emotion that TTS cannot fully replicate.
- Lower latency — Playing a pre-recorded clip is faster than synthesizing text at runtime.
Prerequisites
- A TTS provider that supports voice cloning (e.g., Cartesia, ElevenLabs, or Deepgram).
- An API key for your chosen TTS provider, configured in Voice settings.
Step 1: Clone your voice
Clone your voice with your TTS provider so that dynamically generated speech sounds similar to your recordings. For example, with Cartesia:- Go to Cartesia and navigate to Instant Clone.
- Record a short audio clip (up to 10 seconds) of your voice.
- Give the clone a name and select your language.
- Copy the Voice ID — you will need it in the next step.
You can use any TTS provider that supports voice cloning. The steps will vary by provider, but the key output is always a Voice ID tied to your cloned voice.
Step 2: Configure the cloned voice in Dograh
- Go to your agent’s Model Configuration in the Dograh dashboard.
- Under voice settings, select Add Voice ID manually.
- Paste the Voice ID from your cloned voice.
- Make sure the provider matches where you cloned your voice (e.g., Cartesia).
- Enter the provider’s API key if you haven’t already.
- Save the configuration.
Step 3: Upload recordings
Navigate to your agent in the workflow builder and open the Recordings panel. You can either upload pre-recorded audio files or record directly in the browser. For each recording:- Click Record (or upload a file).
- Speak the exact phrase you want the agent to use.
- Give the recording a descriptive name (e.g.,
greeting,invitation,venue). - Verify the transcription is correct — edit it if needed.
- Click Upload.
Step 4: Build the workflow
Open your agent’s workflow and write the conversation flow in natural language. To insert a recording, type@ in the prompt editor — this will show a list of all available recordings scoped to your current Voice ID.
For any user question that falls outside your recordings, the agent automatically generates a dynamic response using the LLM, which is then synthesized using your cloned voice via TTS.
Tips for best results
- Record in a quiet environment to improve audio quality and consistency with the cloned voice.
- Use pro cloning services (when available) and provide more sample audio for a higher-quality voice clone.
- Keep recordings concise — short, focused clips work best for specific conversation moments.
- Review call recordings after testing to identify where the transition between pre-recorded and dynamic audio can be improved.