Audio Inputs
Send audio files to speech-capable models through the AgnicPay AI Gateway
Audio Inputs
Send audio files to compatible models for transcription, analysis, and processing. AgnicPay supports common audio formats with automatic routing to audio-capable models.
Overview
Audio requests are available via the /v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification.
Audio files must be base64-encoded - direct URLs are not supported for audio content.
Sending Audio Files
Python
JavaScript
cURL
Supported Audio Formats
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Uncompressed, best quality |
| MP3 | .mp3 | Compressed, widely supported |
| FLAC | .flac | Lossless compression |
| M4A | .m4a | AAC encoded |
| OGG | .ogg | Vorbis encoded |
| AAC | .aac | Advanced Audio Coding |
Check your model's documentation for specific format support. Not all models support all formats.
Compatible Models
Models with audio processing capabilities:
| Provider | Models | Capabilities |
|---|---|---|
| Gemini 2.0 Flash, Gemini 1.5 Pro | Transcription, analysis | |
| OpenAI | GPT-4o Audio | Transcription, understanding |
Use Cases
- Transcription - Convert speech to text
- Audio analysis - Understand audio content
- Meeting summaries - Summarize recorded meetings
- Voice commands - Process spoken instructions
- Podcast analysis - Extract insights from podcasts
Best Practices
- Use appropriate formats - WAV for quality, MP3 for size
- Keep files reasonable - Compress long audio files
- Clear audio - Better quality audio = better results
- Specify the task - Tell the model what you want (transcribe, summarize, etc.)
Audio Quality Tips
| Factor | Recommendation |
|---|---|
| Sample rate | 16kHz minimum, 44.1kHz ideal |
| Bit depth | 16-bit or higher |
| Channels | Mono is often sufficient |
| Noise | Minimize background noise |
Troubleshooting
Audio not processing?
- Verify the model supports audio input
- Check that the format is supported
- Ensure the file isn't corrupted
Poor transcription?
- Improve audio quality
- Reduce background noise
- Try a different audio format