AI Gateway/Multimodal
Audio Inputs
Send audio files to speech-capable models through the Agnic AI Gateway
Audio Inputs
Send audio files to compatible models for transcription, analysis, and processing. Agnic supports common audio formats with automatic routing to audio-capable models.
Overview
Audio requests are available via the /v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification.
Audio files must be base64-encoded - direct URLs are not supported for audio content.
Sending Audio Files
Python
from openai import OpenAI
import base64
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode('utf-8')
client = OpenAI(
api_key="agnic_tok_YOUR_TOKEN",
base_url="https://api.agnic.ai/v1"
)
# Read and encode the audio file
audio_path = "path/to/your/audio.wav"
base64_audio = encode_audio(audio_path)
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please transcribe this audio file."
},
{
"type": "input_audio",
"input_audio": {
"data": base64_audio,
"format": "wav"
}
}
]
}
]
)
print(response.choices[0].message.content)JavaScript
import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI({
apiKey: 'agnic_tok_YOUR_TOKEN',
baseURL: 'https://api.agnic.ai/v1'
});
// Read and encode the audio file
const audioPath = 'path/to/your/audio.wav';
const audioBuffer = fs.readFileSync(audioPath);
const base64Audio = audioBuffer.toString('base64');
const response = await client.chat.completions.create({
model: 'google/gemini-2.0-flash',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'Please transcribe this audio file.'
},
{
type: 'input_audio',
input_audio: {
data: base64Audio,
format: 'wav'
}
}
]
}
]
});
console.log(response.choices[0].message.content);cURL
# First, encode your audio file
AUDIO_BASE64=$(base64 -i audio.wav)
curl https://api.agnic.ai/v1/chat/completions \
-H "Authorization: Bearer agnic_tok_YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.0-flash",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe this audio"},
{
"type": "input_audio",
"input_audio": {
"data": "'$AUDIO_BASE64'",
"format": "wav"
}
}
]
}
]
}'Supported Audio Formats
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Uncompressed, best quality |
| MP3 | .mp3 | Compressed, widely supported |
| FLAC | .flac | Lossless compression |
| M4A | .m4a | AAC encoded |
| OGG | .ogg | Vorbis encoded |
| AAC | .aac | Advanced Audio Coding |
Check your model's documentation for specific format support. Not all models support all formats.
Compatible Models
Models with audio processing capabilities:
| Provider | Models | Capabilities |
|---|---|---|
| Gemini 2.0 Flash, Gemini 1.5 Pro | Transcription, analysis | |
| OpenAI | GPT-4o Audio | Transcription, understanding |
Use Cases
- Transcription - Convert speech to text
- Audio analysis - Understand audio content
- Meeting summaries - Summarize recorded meetings
- Voice commands - Process spoken instructions
- Podcast analysis - Extract insights from podcasts
Best Practices
- Use appropriate formats - WAV for quality, MP3 for size
- Keep files reasonable - Compress long audio files
- Clear audio - Better quality audio = better results
- Specify the task - Tell the model what you want (transcribe, summarize, etc.)
Audio Quality Tips
| Factor | Recommendation |
|---|---|
| Sample rate | 16kHz minimum, 44.1kHz ideal |
| Bit depth | 16-bit or higher |
| Channels | Mono is often sufficient |
| Noise | Minimize background noise |
Troubleshooting
Audio not processing?
- Verify the model supports audio input
- Check that the format is supported
- Ensure the file isn't corrupted
Poor transcription?
- Improve audio quality
- Reduce background noise
- Try a different audio format