audio-sniffer
| (require (file "../audio-sniffer.rkt")) | package: base |
This module provides functionality to detect audio file formats based on file contents (signature sniffing) and, optionally, file extensions.
The sniffer prefers binary inspection over extensions and only falls back to extensions when detection is inconclusive.
1 Overview
The detection strategy is as follows:
Read a prefix of the file (default 4096 bytes)
Match known binary signatures ("magic numbers")
Apply format-specific heuristics (e.g. MP3 frame sync, AAC ADTS)
For ISO-BMFF (MP4/M4A), scan both head and tail for codec markers
If still unknown, optionally fall back to file extension
The result is always a symbol describing the detected format or a status.
2 Formats
Known audio formats:
'(mp3 flac ogg vorbis opus wav aiff mp4 aac alac encrypted-audio ac3 ape wavpack wma matroska)
Status values:
'(unknown file-not-found file-not-readable not-a-file)
3 API
procedure
file : path-string?
Returns one of:
A format symbol such as 'mp3, 'flac, etc.
A status symbol such as 'file-not-found
This function does not use the file extension.
procedure
file : path-string?
This is typically the preferred entry point in user-facing code.
procedure
file : path-string?
procedure
(audio-format-matches? file formats) → boolean?
file : path-string? formats : (listof symbol?)
Detection uses audio-sniff-format/extension.
4 Architecture
The sniffer is structured as a layered pipeline:
I/O layer – reads byte ranges from the file (head and tail)
Signature layer – matches fixed binary identifiers
Heuristic layer – validates formats without fixed headers
Container layer – inspects structured containers (MP4, Ogg)
Fallback layer – maps file extensions to formats
Detection proceeds from cheap and deterministic checks to more expensive or heuristic ones.
MP4/M4A detection is handled separately because codec identifiers may appear outside the initial header. For this reason both the beginning and the end of the file are scanned.
The sniffer is deliberately stateless; each call operates only on the given file and does not cache results.
5 Detection Details
Binary signatures are used where possible:
FLAC: fLaC
Ogg: OggS + subtype detection (Opus/Vorbis/FLAC)
WAV: RIFF/WAVE
AIFF: FORM/AIFF or AIFC
ASF/WMA: GUID header
Matroska: EBML header
AC3: 0x0B77 sync word
APE: MAC
WavPack: wvpk
Heuristics are applied for:
MP3 (ID3 header or frame sync validation)
AAC (ADTS sync pattern)
MP4/M4A detection:
Detect ISO-BMFF via ftyp
Scan for codec markers: mp4a, alac, enca
Perform additional scanning near the end of the file
6 Why not use FFmpeg?
The primary reason for implementing a custom sniffer is performance.
Format detection in this module is intentionally lightweight: it reads only small portions of the file and applies simple, deterministic checks. In most cases, detection completes after inspecting just a few kilobytes.
Using a library such as FFmpeg would significantly increase the cost of this operation:
Startup overhead – initialization of codec infrastructure
I/O overhead – more data is typically read than necessary
Processing overhead – partial parsing of streams or containers