AgnicPay

Audio Inputs

Send audio files to speech-capable models through the AgnicPay AI Gateway

Audio Inputs

Send audio files to compatible models for transcription, analysis, and processing. AgnicPay supports common audio formats with automatic routing to audio-capable models.

Overview

Audio requests are available via the /v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification.

Audio files must be base64-encoded - direct URLs are not supported for audio content.


Sending Audio Files

Python

from openai import OpenAI
import base64
 
def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')
 
client = OpenAI(
    api_key="agnic_tok_YOUR_TOKEN",
    base_url="https://api.agnic.ai/v1"
)
 
# Read and encode the audio file
audio_path = "path/to/your/audio.wav"
base64_audio = encode_audio(audio_path)
 
response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please transcribe this audio file."
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": base64_audio,
                        "format": "wav"
                    }
                }
            ]
        }
    ]
)
 
print(response.choices[0].message.content)

JavaScript

import OpenAI from 'openai';
import fs from 'fs';
 
const client = new OpenAI({
  apiKey: 'agnic_tok_YOUR_TOKEN',
  baseURL: 'https://api.agnic.ai/v1'
});
 
// Read and encode the audio file
const audioPath = 'path/to/your/audio.wav';
const audioBuffer = fs.readFileSync(audioPath);
const base64Audio = audioBuffer.toString('base64');
 
const response = await client.chat.completions.create({
  model: 'google/gemini-2.0-flash',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'Please transcribe this audio file.'
        },
        {
          type: 'input_audio',
          input_audio: {
            data: base64Audio,
            format: 'wav'
          }
        }
      ]
    }
  ]
});
 
console.log(response.choices[0].message.content);

cURL

# First, encode your audio file
AUDIO_BASE64=$(base64 -i audio.wav)
 
curl https://api.agnic.ai/v1/chat/completions \
  -H "Authorization: Bearer agnic_tok_YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Transcribe this audio"},
          {
            "type": "input_audio",
            "input_audio": {
              "data": "'$AUDIO_BASE64'",
              "format": "wav"
            }
          }
        ]
      }
    ]
  }'

Supported Audio Formats

FormatExtensionNotes
WAV.wavUncompressed, best quality
MP3.mp3Compressed, widely supported
FLAC.flacLossless compression
M4A.m4aAAC encoded
OGG.oggVorbis encoded
AAC.aacAdvanced Audio Coding

Check your model's documentation for specific format support. Not all models support all formats.


Compatible Models

Models with audio processing capabilities:

ProviderModelsCapabilities
GoogleGemini 2.0 Flash, Gemini 1.5 ProTranscription, analysis
OpenAIGPT-4o AudioTranscription, understanding

Use Cases

  • Transcription - Convert speech to text
  • Audio analysis - Understand audio content
  • Meeting summaries - Summarize recorded meetings
  • Voice commands - Process spoken instructions
  • Podcast analysis - Extract insights from podcasts

Best Practices

  1. Use appropriate formats - WAV for quality, MP3 for size
  2. Keep files reasonable - Compress long audio files
  3. Clear audio - Better quality audio = better results
  4. Specify the task - Tell the model what you want (transcribe, summarize, etc.)

Audio Quality Tips

FactorRecommendation
Sample rate16kHz minimum, 44.1kHz ideal
Bit depth16-bit or higher
ChannelsMono is often sufficient
NoiseMinimize background noise

Troubleshooting

Audio not processing?

  • Verify the model supports audio input
  • Check that the format is supported
  • Ensure the file isn't corrupted

Poor transcription?

  • Improve audio quality
  • Reduce background noise
  • Try a different audio format

On this page