FFmpeg Decoder Definitions
| (require ffmpeg-definitions) |
This module provides the direct FFmpeg-backed decoder layer used by the audio pipeline. It is deliberately small and stateful. A caller creates one decoder instance, opens one file on it, queries the selected audio stream, repeatedly asks for the next PCM block, and closes the instance again.
The module does not expose FFmpeg metadata. It only exposes the information needed for playback: stream count, sample rate, channel count, duration, bitrate, decoded PCM data, and sample positions. The output format is fixed: interleaved signed 32-bit PCM, four bytes per sample, using FFmpeg’s AV_SAMPLE_FMT_S32 sample format.
The FFmpeg libraries are loaded when the module is required. The module checks that the runtime FFmpeg major versions are in the supported range configured by the implementation. This binding targets the FFmpeg library major versions used by FFmpeg 6, 7, and 8: libavutil 58 to 60, libavcodec 60 to 62, libavformat 60 to 62, and libswresample 4 to 6. Unsupported runtime versions fail early, before a decoder instance is used.
On Windows, the private library loader may download the bundled sound-library set into Racket’s add-on directory before the FFI libraries are opened. On Unix-like systems, the FFmpeg libraries are expected to be installed by the operating system or platform package manager and to be reachable by Racket’s FFI library search path.
1 Layering
This module is the low-level Racket FFI layer. It is normally wrapped by "ffmpeg-ffi.rkt" and then by "ffmpeg-decoder.rkt". The first wrapper adapts this module to the command protocol used by the audio decoder frontend. The second wrapper exposes the callback-oriented decoder interface used by the rest of the playback pipeline.
The distinction matters for buffer lifetime. At this level, fmpg-buffer returns the current buffer owned by the decoder instance. The adapter in "ffmpeg-ffi.rkt" copies that buffer before passing it to "ffmpeg-decoder.rkt". Code that uses this module directly must copy the buffer itself when the bytes must survive the next decoder operation.
2 Implementation strategy
This module talks directly to the FFmpeg shared libraries through Racket’s FFI. There is no C shim that hides FFmpeg’s structs or normalizes their layout. The price of that choice is that the Racket side must know enough of the relevant C struct layouts to read the fields used by the decoder. The benefit is that the binding remains a Racket module with direct access to the platform FFmpeg libraries.
2.1 Versioned C struct layouts
The module defines only partial FFmpeg structs. A partial definition includes the fields that are actually read by this decoder and enough preceding fields to compute their offsets. Fields that are not needed are represented only by their C type, or by a repetition count such as (6 _int). Tail fields after the last required member are not described.
The helper module "private/cstruct-helper.rkt" provides make-offsets and def-cstruct. The make-offsets form computes offsets for a sequence of C field types, while def-cstruct expands to a define-cstruct form whose public fields are placed at those explicit offsets. This keeps the actual accessors small while still accounting for skipped fields in the C layout.
The right layout is selected when the module is required, after the runtime FFmpeg major versions have been read from the libraries. For the supported range, AVCodecParameters uses one layout for libavcodec major version 60 and another for major versions 61 and 62. Likewise, AVFrame uses one layout for libavutil major version 58 and another for major versions 59 and 60. The other partial structs used by this module are defined with a single layout across the supported versions.
This is why the version check is performed before normal decoder use. The accessors are correct only for the FFmpeg major-version ranges for which the partial layouts were written. If a future FFmpeg major release changes a layout before one of the fields used here, the version range should be extended only after the affected partial definitions have been checked.
2.2 Sequential failure handling
Most FFmpeg calls report ordinary failure through C-style return values or null pointers. The implementation treats those results as normal control flow, not as exceptional Racket failures. The let/assert form is used for this pattern. It behaves like a sequential binding form: each binding can be checked immediately, and a failed check returns the specified failure value for the whole form.
That style is used for setup paths such as opening a file, selecting stream information, allocating the codec context, and initializing the resampler. It keeps the success path linear while still giving each FFmpeg return value or pointer a local check. Predicates such as a-!nullptr?, a-nullptr?, a-true?, and a->=? express the usual FFmpeg checks directly next to the binding that produced the value.
For loops where decoding must stop immediately from a nested position, the module uses define/return from define-return. This gives functions such as fmpg-decode-next! and the internal resampler drain routine an explicit early-return continuation without using exceptions for normal FFmpeg outcomes. The two helpers are implementation dependencies; they are not re-exported by this module.
3 Decoder instances
A decoder instance is an opaque value returned by fmpg-init. Its structure type and predicate are not exported. Pass the value back to the functions in this module and do not inspect it directly. The contracts below therefore use any/c for the instance argument. Operationally, that argument must be a value returned by fmpg-init.
The instance owns native FFmpeg resources: a format context, a codec context, an audio frame, a resampler, and the Racket byte string used for the current PCM block. Finalizers are installed as a last line of defence, but callers should still call fmpg-close! explicitly when playback stops or when the file is no longer needed. Explicit close keeps the lifetime of native resources predictable.
Creating the instance does not open a file. Use fmpg-open-file! before querying stream information or decoding audio.
The function returns 1 on success and 0 on failure. On failure, partially initialized native state is closed again.
An instance can only have one file open. Close it with fmpg-close! before opening another file on the same instance. A non-string, non-path filename is treated as an open failure and returns 0.
procedure
(fmpg-close! instance) → void?
instance : any/c
4 Audio stream information
The decoder selects one audio stream for playback using FFmpeg’s best-stream selection. The stream count reports how many audio streams were found in the container, but decoding is performed only for the selected stream.
The term sample in this module means a sample frame: one time step in the audio stream, across all channels. For stereo 32-bit output, one sample frame therefore occupies (* 2 4) bytes in the returned PCM buffer.
procedure
(fmpg-audio-stream-count instance) → exact-nonnegative-integer?
instance : any/c
procedure
(fmpg-audio-sample-rate instance) → exact-nonnegative-integer?
instance : any/c
procedure
(fmpg-audio-channels instance) → exact-nonnegative-integer?
instance : any/c
procedure
(fmpg-audio-bits-per-sample instance) → exact-positive-integer?
instance : any/c
procedure
→ exact-positive-integer? instance : any/c
procedure
(fmpg-duration-ms instance) → exact-integer?
instance : any/c
procedure
(fmpg-duration-samples instance) → exact-integer?
instance : any/c
procedure
(fmpg-file-bitrate instance) → exact-integer?
instance : any/c
5 Decoding
Decoding is block oriented. Each call to fmpg-decode-next! clears the previous PCM block and attempts to produce the next decoded block for the selected audio stream. When the call returns 1, the block can be read with fmpg-buffer and described with the buffer query functions.
The function does not distinguish end of stream from a decode failure. The intended playback loop treats 0 as no further PCM block available for this decoder instance.
Internally, decoding receives all currently available frames, reads packets for the selected audio stream, sends those packets to the codec, converts decoded frames through libswresample, and drains the resampler at end of stream. Non-selected packets are skipped.
procedure
instance : any/c target-pos-ms : exact-nonnegative-integer?
Seeking uses FFmpeg’s backward seek flag. After the seek, decoded audio before the requested target sample is discarded so the next buffer starts at, or as close as FFmpeg can provide to, the requested position.
6 Decoded buffers
The PCM buffer belongs to the decoder instance. It is replaced by the next call to fmpg-decode-next!, fmpg-seek-ms!, or fmpg-close!. Treat the returned byte string as read-only. Copy it if it must outlive the next decoder operation or if another component may mutate it.
procedure
(fmpg-buffer instance) → (or/c bytes? #f)
instance : any/c
The byte string contains interleaved signed 32-bit samples. Its logical frame count is available as the difference between fmpg-buffer-end-sample and fmpg-buffer-start-sample. Its byte size is also available through fmpg-buffer-size.
procedure
(fmpg-buffer-size instance) → exact-nonnegative-integer?
instance : any/c
procedure
→ exact-nonnegative-integer? instance : any/c
procedure
(fmpg-buffer-end-sample instance) → exact-nonnegative-integer?
instance : any/c
procedure
(fmpg-sample-position instance) → exact-nonnegative-integer?
instance : any/c
7 FFmpeg version information
procedure
(ffmpeg-version lib) →
(list/c exact-nonnegative-integer? exact-nonnegative-integer? exact-nonnegative-integer?)
lib :
(or/c 'avutil 'avcodec 'avformat 'swr 'swresample)
The function raises an exception for an unknown library symbol.
8 Use through the decoder frontend
The direct API above is normally wrapped by "ffmpeg-ffi.rkt" and by "ffmpeg-decoder.rkt". The frontend function ffmpeg-open returns a handle or #f when the file does not exist. Its stream-info callback receives a mutable hash with at least these playback keys:
(list 'sample-rate 'channels 'bits-per-sample 'bytes-per-sample 'total-samples 'duration)
The audio callback receives the same hash extended for the current buffer with these keys:
(list 'sample 'current-time)
The hash is followed by a copied byte string and its valid byte count. The copy is made by "ffmpeg-ffi.rkt", not by the low-level buffer function itself.
The frontend’s seek function accepts a percentage of the stream and translates that percentage to a sample position. The adapter then translates the sample position to milliseconds and calls fmpg-seek-ms!. This is why the low-level module exposes millisecond seeking while the frontend exposes percentage seeking.
9 Example
The following example opens a file, decodes all PCM blocks, and reports their byte ranges and sample ranges. A real playback loop would pass each buffer to the audio output layer before requesting the next block.
(define dec (fmpg-init)) (when (and dec (= (fmpg-open-file! dec "track.ogg") 1)) (printf "~a Hz, ~a channels, ~a ms\n" (fmpg-audio-sample-rate dec) (fmpg-audio-channels dec) (fmpg-duration-ms dec)) (let loop () (when (= (fmpg-decode-next! dec) 1) (define pcm (fmpg-buffer dec)) (define size (fmpg-buffer-size dec)) (define start (fmpg-buffer-start-sample dec)) (define end (fmpg-buffer-end-sample dec)) (printf "decoded ~a bytes, samples [~a, ~a)\n" size start end) (loop))) (fmpg-close! dec))
A simple seek flow looks the same after the seek succeeds. The following code moves to 30 seconds and then requests the next decoded buffer.
(when (= (fmpg-seek-ms! dec 30000) 1) (when (= (fmpg-decode-next! dec) 1) (define pcm (fmpg-buffer dec)) (define start (fmpg-buffer-start-sample dec)) (printf "first buffer after seek starts at sample ~a\n" start)))