by Friedrich M. Tschirpke
A project exploring multiple optimizations for performing PNG encoding efficiently on a GPU.
- working PNG encoder for the GPU implemented in CUDA
- CPU version of the same algorithms as a reference of what the GPU version is trying to achieve
- simple benchmarking scripts, which allow runtime comparison between the GPU and the CPU version, and also the libpng reference implementation.
For details about the specific PNG variant that we implement, see the config.h file or the caveats below.
- May not work for very large images.
- This implementation supports multiple PNG IDAT chunks, but was not tested with images large enough to require them (>2GB).
- This implementation assumes that (uncompressed) image data fits into RAM as well as GPU RAM.
- Limited image support
- This prototype requires the image to be of type RGBA with 8-bit channels.
- Limited compression rate
- This prototype uses fixed Huffman encoding, which often does not provide optimal compression ratios.
- The optimizations applied in the GPU version sometimes disregarded compression rate in favor of compression speed.
- Not feature complete
- This prototype does not implement interlacing.
- This prototype does not support decoding.
We only cover the most interesting make targets here, a full list can be found at the top of the Makefile.
Before building, ensure that the variables at the top of the Makefile are set correctly, especially CUDA_HOME
should be the correct path to your CUDA installation.
makeormake buildbuild all the four executables (two example main scripts and two benchmarking scripts)- the two example main scripts show an example of how the code is used
- the two benchmarking scripts allow benchmarking of all implemented variations
make run_cpu- builds and runs the example main script for the CPU version- reads in the PNG-file provided via command line
- re-encodes the file using the libpng reference implementation and hexdumps the result to standard output
- re-encodes the file using our CPU implementation and hexdumps the result to standard output
make run_gpu- builds and runs the example main script for the GPU version- reads in the PNG-file provided via command line
- re-encodes the file using our GPU implementation and hexdumps the result to standard output
make bench_all- builds and runs all benchmarksmake bench_cpu_all- builds and runs all the CPU-related benchmarksmake bench_gpu_all- builds and runs all the GPU-related benchmarks
We perform simple, system-clock-based benchmarking of all the functions we implement for
- the filtering preprocessing step of the PNG encoding
- the zlib compression step of the PNG encoding
- the end-to-end PNG encoding (filtering + zlib compression)
Additionally, we benchmark the end-to-end encoding of libpng as a
referencefor our implementations.
For the CPU version, we only implemented one function for each. The CPU-related benchmarks are intended to be a reference for the GPU version.
For the GPU version however, we implemented multiple functions for both filtering and compression, which have different optimizations applied. The GPU-related benchmarks allow for comparison between the runtime of these functions and thereby the effectiveness of the applied optimizations.
The file benchmark_results.txt contains the results when executing the command make bench_all on a personal machine
equipped with:
- CPU: Intel(R) Core(TM) i5-10600KF CPU
- GPU: MSI(R) GeForce RTX(TM) 4060 Ti VENTUS 2X BLACK 16G OC
- 32GB RAM