Releases: szcompressor/cuSZp
V3.0.0
This patch release consists of the major updates from cuSZp2 (SC'24 paper) to cuSZp3 (SC'25 paper).
cuSZp V3.x has following designs:
- Support for high dimensionality data (1D, 2D and 3D) with dimension-aware delta encoding.
- Each dimension has three algorithms: fixed, plain, and outlier.
- fixed: Only fixed-length encoding, suitable for unsmooth scientific data or machine learning weights/tokens.
- plain: Delta-encoding + fixed-length encoding; similar with previous plain design (but with high dimension support).
- outlier: Delta-encoding + fixed-length encoding + outlier preservation: similar with previous outlier (with dimension support).
- Versatility support (including memory-efficient compression and selective decompression) will be included in other branches later.
- Still pure-GPU design with single kernel function.
cuSZp V3.x has following features:
- Ultra-fast end-to-end throughput (even higher than cuSZp V2.x).
- High compression ratio in various data patterns (always higher compression ratios than cuSZp V2.x).
- F64 and F32 data types are supported.
- Executable binary, C/C++ API, and Python API are supported.
V2.0.1
This patch release includes the implementation of the SC'24 compression tutorial. Specifically:
- Updates cuSZp.h and cuSZp.cpp to increase compatibility.
- In float32 data type and plain mode compression kernel, update a partial re-execution design (alleviating register usages).
V2.0.0
This patch release consists of the major updates from cuSZp1 (SC'23 paper) to cuSZp2 (SC'24 paper).
cuSZp V2.x has following designs:
- One kernel function for compression/decompression.
- Outlier- and plain-fixed-length encoding mode.
- Using optimized memory access patterns in compression and decompression.
- Using latency control in global synchronization.
cuSZp V2.x has following features:
- Ultra-fast end-to-end throughput (2x~3x compared with cuSZp V1.x).
- High compression ratio in various data patterns.
- F64 and F32 data types are supported.
- Executable binary, C/C++ API, and Python API are supported.
V1.1.0
This patch release moves padding into cuSZp kernel functions along with various kernel updates. Users can directly use the following APIs to perform compression and decompression on device pointers. In other words, extra cudaMalloc() and cudaMemcpy() for padding are no longer required, making cuSZp easier to deploy in inline compression tasks.
SZp_compress_deviceptr_f32();SZp_decompress_deviceptr_f32();
This release can be seen and evaluated as the final implementation of cuSZp V1.x (i.e. [SC'23] paper).
V1.0.0
This release includes the design that is mentioned in [SC'23] paper.
cuSZp V1.x has following designs:
- One kernel function for compression/decompression.
- Using fixed-length encoding as the core compression algorithm.
- Global synchronization is performed via a serial chain scan.
cuSZp V1.x has following features:
- Fast end-to-end throughput.
- High compression ratio for sparse and non-smooth datasets.
- F64 and F32 data types supported.