Validated Results

Benchmark Results

Platform: Jetson AGX Orin 64GB | JetPack 6.2.2 | CUDA 12.6 | MAXN mode | All results roundtrip verified. Canonical source: hammerio.dev/benchmark.

8,537 MB/s
GPU decompress (in-memory peak)
4,258 MB/s
GPU decompress (10GB roundtrip)
4.3x
In-memory decompress vs CPU zstd-1
5.8x
Roundtrip decompress vs CPU zstd-1

In-Memory Performance (Raw Throughput)

Compression and decompression of payloads held entirely in unified memory — no disk I/O in the critical path.

Method Processor Compress Decompress Integrity
nvCOMP LZ4 GPU 705 MB/s 8,537 MB/s PASS
nvCOMP Snappy GPU 1,615 MB/s 5,756 MB/s PASS
zstd-1 CPU 1,747 MB/s 2,001 MB/s PASS

Roundtrip Results (10 GB with Disk I/O)

End-to-end measurements including disk read, compress, write, read, decompress. Represents operational throughput for field-node recovery and audit-trail access.

Method Processor Compress Decompress Ratio Integrity
nvCOMP LZ4 GPU 517 MB/s 4,258 MB/s 1.98x PASS
zstd-1 CPU 1,094 MB/s 733 MB/s 2.00x PASS
zstd-3 CPU 1,014 MB/s 741 MB/s 2.00x PASS

Real-World Compression

Actual project backup compressing mixed source files (Python, HTML, config, JSON). Validates ratios on real data vs. synthetic payloads.

Engine Source Ratio Integrity
zstd Project backup (Python, HTML, config) 5.09x PASS

Analysis

GPU Decompression Advantage

GPU nvCOMP LZ4 decompresses at 8,537 MB/s in-memory — 4.3x faster than CPU zstd-1 (2,001 MB/s). At 10 GB roundtrip with disk I/O, the advantage grows to 5.8x.

GPU Crossover Point

Below ~10 MB, kernel launch overhead dominates and CPU compression is faster. HammerIO's smart routing sends small payloads to CPU automatically.

Real-World Ratios

Synthetic LZ4 shows 1.98x on uniform test data. Real project source (Python, HTML, configs) compresses at 5.09x with zstd. ML datasets and log files typically land at 1.8x–2.5x.

Data Integrity

All engines pass SHA-256 roundtrip verification. Zero data loss across LZ4, Snappy, and zstd levels 1–9.

Canonical benchmark data published at hammerio.dev/benchmark. Synthetic benchmarks use uniform test payloads; real-world benchmark on actual project source files. All throughput includes memory transfer overhead. Integrity verified via SHA-256 roundtrip comparison.