Skip to main content
Custom recordings allow you to build hybrid voice agents that use your own pre-recorded audio for key parts of the conversation, while falling back to LLM-generated speech (via a cloned voice) for dynamic responses. This gives you the best of both worlds — the emotional depth of real human speech and the flexibility of AI-generated dialogue.

Why use custom recordings?

  • Reduced TTS cost — Pre-recorded audio is played directly, so you are not charged for TTS synthesis on those segments.
  • Emotional variance — Real recordings carry natural intonation and emotion that TTS cannot fully replicate.
  • Lower latency — Playing a pre-recorded clip is faster than synthesizing text at runtime.

Prerequisites

  • A TTS provider that supports voice cloning (e.g., Cartesia, ElevenLabs, or Deepgram).
  • An API key for your chosen TTS provider, configured in Voice settings.

Step 1: Clone your voice

Clone your voice with your TTS provider so that dynamically generated speech sounds similar to your recordings. For example, with Cartesia:
  1. Go to Cartesia and navigate to Instant Clone.
  2. Record a short audio clip (up to 10 seconds) of your voice.
  3. Give the clone a name and select your language.
  4. Copy the Voice ID — you will need it in the next step.
You can use any TTS provider that supports voice cloning. The steps will vary by provider, but the key output is always a Voice ID tied to your cloned voice.

Step 2: Configure the cloned voice in Dograh

  1. Go to your agent’s Model Configuration in the Dograh dashboard.
  2. Under voice settings, select Add Voice ID manually.
  3. Paste the Voice ID from your cloned voice.
  4. Make sure the provider matches where you cloned your voice (e.g., Cartesia).
  5. Enter the provider’s API key if you haven’t already.
  6. Save the configuration.

Step 3: Upload recordings

Navigate to the Recordings page in the Dograh dashboard. Recordings are shared across all agents in your organization. You can either upload pre-recorded audio files or record directly in the browser. For each recording:
  1. Click Upload Recording.
  2. Choose an audio file or click Record to record in the browser.
  3. Verify the transcription is correct — edit it if needed.
  4. Click Upload.
You can rename a recording’s ID at any time by clicking the edit icon next to it in the recordings list.

Step 4: Build the workflow

Open your agent’s workflow and write the conversation flow in natural language. To insert a recording, type @ in the prompt editor — this will show a list of all available recordings in your organization. For any user question that falls outside your recordings, the agent automatically generates a dynamic response using the LLM, which is then synthesized using your cloned voice via TTS.

Tips for best results

  • Record in a quiet environment to improve audio quality and consistency with the cloned voice.
  • Use pro cloning services (when available) and provide more sample audio for a higher-quality voice clone.
  • Keep recordings concise — short, focused clips work best for specific conversation moments.
  • Review call recordings after testing to identify where the transition between pre-recorded and dynamic audio can be improved.