APNG Video over APRS with Compression

I built and evaluated multiple compression algorithms to transmit animated PNG (APNG) videos over a very-low bandwidth radio link (APRS/AX.25 via Direwolf on a Raspberry Pi), then reconstruct and decompress it server-side. I explored both lossless and lossy methods (LZW, DEFLATE/LZ77 + Huffman, JPEG2000-style wavelets, and SPIHT), and primarly focused on implementing SPIHT-based wavelet compression for color video frames.

Python Raspberry Pi Digital Signal Processing

Overview

This was a final project for EE 123, and an end-to-end DSP challenge: Send a video throught a channel that is so bandwidth-limited that we only get about 60 APRS packets total (~10KB of payload) and still reconstruct a video with the same dimensions on the receiver (minimum PSNR or 25dB).

We used animated PNG (APNG) as the video container (instead of GIF) because APNG supports full-color frames without the 256-color palette limitation.

I explored main compression algorithms, from classic lossless compressors like LZW and DEFLATE, JPEG 2000, and other machine learning approaches. The main compression algorithm that I ended up on was SPIHT, a wavelet-based embedded codec similar in spirit to JPEG 2000.

End-to-End System

  1. Sender (Raspberry Pi):
    • Load the APNG into frames.
    • Compress the frames into a compact bytestream (our best results used SPIHT).
    • Packetize the bytestream into APRS/AX.25 packets.
    • Transmit packets using the radio interface (TNC behavior via Direwolf).
  2. Receiver (Server-side reconstruction):
    • Collect packets and reassemble the payload.
    • Read metadata (dimensions, frame count, etc.) and reconstruct the APNG frames.
    • Decompress the bytestream into pixel data.
    • Output a video with the exact same frame dimensions as the original (quality may be lower).

Note: the server was set up to pull our decoding code from GitHub. One header packet included our repository name, and the server cloned it and ran a specific reconstruction entrypoint to decode our submission.

Challenges

In normal video systems, we might have megabits per second. Here we had roughly ~10KB of payload to deliver the entire video. This makes it neccesary to decide what to preserve: spatial detail, motion smoothness, color, etc. We also need to spend bits efficiently. The headers, metadata, and redundancy all matter. Ideally the algorithm can also degrade gracefully so that if we can only afford part of the bitstrea, the receiver should still produce something recognizable.

This is one reason I favored embedded codecs like SPIHT: they naturally produce a progressive bitstream where earlier bits give coarse quality and later bits refine it.

Compression Approaches

Here are the main families of approaches I tried:

1) LZW (dictionary compression)

LZW (Lempel–Ziv–Welch) is a lossless compressor that builds a dictionary of repeated sequences. It’s great when the data has recurring patterns.

How it works:

It is a bit limited here because raw video frames (especially full-color) often look “random” to a dictionary coder unless we first transform/predict the data. Without a transform (like wavelets/DCT) or a predictor (frame differencing / motion compensation), LZW often won’t shrink video enough to meet the ~10KB constraint.

2) LZ77 + Huffman = DEFLATE (what ZIP/PNG use)

DEFLATE combines:

It is also a lossless algorithm and is very good for strong general-purpose compression. PNG uses DEFLATE internally, so it was a natural target when working with PNG/APNG.

How it works:

While the lossless nature was great, we had extreme bandwidth limits, where we needed some lossy compression to preserve perceptual quality. DEFALTE can’t choose to discard less important details, it must preserve everything. However, when DEFLATE does achieve high compression, PSNR can be effectively infinite because the reconstruction is identical—lossless.

3) JPEG (DCT-based)

JPEG transforms image blocks with the DCT and then quantizes coefficients (lossy). It’s extremely effective for natural images.

The key idea:

This fits the idea of a video under a byte limit. It spends bits on what matters. However, block artifacts and inefficiency at very low bitrates, especially for sharp edges and text-like content made it struggle for this project.

4) JPEG 2000 (wavelet-based)

JPEG 2000 replaces block-DCT with a wavelet transform. Instead of block artifacts, wavelets give multi-resolution structure.

Core concept:

JPEG 2000 is known for good quality at low bitrates and supports progressive transmission.

5) SPIHT (Set Partitioning in Hierarchical Trees)

SPIHT is a wavelet-based codec that exploits two facts:

  1. After a wavelet transform, most coefficients are small (a few carry most energy).
  2. Coefficients are correlated across scales (a large coefficient at coarse scale often implies large coefficients in related positions at finer scales).

SPIHT organizes coefficients into spatial orientation trees and encodes significance progressively.

SPIHT outputs an embedded bitstream:

How SPIHT works

After wavelet decomposition, SPIHT iterates over bit-planes from most significant to least:

This sort then refine loop is why the bitstream is progressive. Early passes locate the big coefficients while later passes refine them.

RGB handling

A practical way to handle color is:

Our implementation treated color channels separately for simplicity. This can work, but it may not exploit cross-channel correlation as well as YCbCr.

6) Neural approach (D-NERV)

Neural representation methods learn a compact model of a video such that the network weights become the “compressed” form. This could be strong because it can exploit temporal redundancy naturally. However, it was hard in this project because we have to submit not only the data but model parameters. It also didn’t fit well for our timeline and constraints.

Transmission + Reconstruction

Other than compression, we also need:

Implementation notes