---
title: Audio Inputs
description: Send audio files to speech-capable models through the Agnic AI Gateway
---

# Audio Inputs

Send audio files to compatible models for transcription, analysis, and processing. Agnic supports common audio formats with automatic routing to audio-capable models.

## Overview

Audio requests are available via the `/v1/chat/completions` API with the `input_audio` content type. Audio files must be base64-encoded and include the format specification.

<Callout type="warning">
  Audio files must be **base64-encoded** - direct URLs are not supported for audio content.
</Callout>

---

## Sending Audio Files

### Python

```python
from openai import OpenAI
import base64

def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')

client = OpenAI(
    api_key="agnic_tok_YOUR_TOKEN",
    base_url="https://api.agnic.ai/v1"
)

# Read and encode the audio file
audio_path = "path/to/your/audio.wav"
base64_audio = encode_audio(audio_path)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please transcribe this audio file."
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": base64_audio,
                        "format": "wav"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)
```

### JavaScript

```javascript
import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'agnic_tok_YOUR_TOKEN',
  baseURL: 'https://api.agnic.ai/v1'
});

// Read and encode the audio file
const audioPath = 'path/to/your/audio.wav';
const audioBuffer = fs.readFileSync(audioPath);
const base64Audio = audioBuffer.toString('base64');

const response = await client.chat.completions.create({
  model: 'google/gemini-2.0-flash',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'Please transcribe this audio file.'
        },
        {
          type: 'input_audio',
          input_audio: {
            data: base64Audio,
            format: 'wav'
          }
        }
      ]
    }
  ]
});

console.log(response.choices[0].message.content);
```

### cURL

```bash
# First, encode your audio file
AUDIO_BASE64=$(base64 -i audio.wav)

curl https://api.agnic.ai/v1/chat/completions \
  -H "Authorization: Bearer agnic_tok_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Transcribe this audio"},
          {
            "type": "input_audio",
            "input_audio": {
              "data": "'$AUDIO_BASE64'",
              "format": "wav"
            }
          }
        ]
      }
    ]
  }'
```

---

## Supported Audio Formats

| Format | Extension | Notes |
|--------|-----------|-------|
| WAV | `.wav` | Uncompressed, best quality |
| MP3 | `.mp3` | Compressed, widely supported |
| FLAC | `.flac` | Lossless compression |
| M4A | `.m4a` | AAC encoded |
| OGG | `.ogg` | Vorbis encoded |
| AAC | `.aac` | Advanced Audio Coding |

<Callout type="info">
  Check your model's documentation for specific format support. Not all models support all formats.
</Callout>

---

## Compatible Models

Models with audio processing capabilities:

| Provider | Models | Capabilities |
|----------|--------|--------------|
| Google | Gemini 2.0 Flash, Gemini 1.5 Pro | Transcription, analysis |
| OpenAI | GPT-4o Audio | Transcription, understanding |

---

## Use Cases

- **Transcription** - Convert speech to text
- **Audio analysis** - Understand audio content
- **Meeting summaries** - Summarize recorded meetings
- **Voice commands** - Process spoken instructions
- **Podcast analysis** - Extract insights from podcasts

---

## Best Practices

1. **Use appropriate formats** - WAV for quality, MP3 for size
2. **Keep files reasonable** - Compress long audio files
3. **Clear audio** - Better quality audio = better results
4. **Specify the task** - Tell the model what you want (transcribe, summarize, etc.)

---

## Audio Quality Tips

| Factor | Recommendation |
|--------|----------------|
| Sample rate | 16kHz minimum, 44.1kHz ideal |
| Bit depth | 16-bit or higher |
| Channels | Mono is often sufficient |
| Noise | Minimize background noise |

---

## Troubleshooting

**Audio not processing?**
- Verify the model supports audio input
- Check that the format is supported
- Ensure the file isn't corrupted

**Poor transcription?**
- Improve audio quality
- Reduce background noise
- Try a different audio format

<Cards>
  <Card title="Video Inputs" href="/docs/ai-gateway/multimodal/video" />
  <Card title="Image Inputs" href="/docs/ai-gateway/multimodal/images" />
</Cards>
