Install Guide
ModelSafe requires Python 3.10+ and runs on Linux (x86_64 and aarch64). For GPU-accelerated compression, HammerIO and CUDA 12.x are required.
# Clone the repository
git clone https://github.com/ResilientMindAI/ModelSafe.git
cd ModelSafe
# Install dependencies
pip install -e .
# Verify installation
modelsafe --version
For Jetson AGX Orin, ensure JetPack 6.x is installed and nvCOMP libraries are available via HammerIO.
CLI Commands
ModelSafe provides a straightforward CLI for all checkpoint operations.
# Store a model checkpoint
modelsafe store --model ./checkpoints/llama-7b.bin
# List stored checkpoints
modelsafe list
# Verify integrity of a stored checkpoint
modelsafe verify --id ckpt_abc123
# Restore a checkpoint
modelsafe restore --id ckpt_abc123 --output ./restored/
# Show vault status
modelsafe status
# Export manifest
modelsafe manifest --id ckpt_abc123 --format json
Vault Configuration
The vault is the local directory where compressed checkpoints are stored. Configure it in modelsafe.yaml:
vault:
path: /data/modelsafe/vault
max_size: 500GB
compression:
gpu_threshold: 500MB # Files above this use nvCOMP GPU LZ4
cpu_algorithm: zstd # Files below threshold use CPU zstd
gpu_algorithm: lz4 # GPU compression via HammerIO nvCOMP
integrity:
algorithm: sha256
verify_on_store: true
verify_on_restore: true
Manifest Format
Every stored checkpoint creates a manifest entry in JSON format:
{
"id": "ckpt_abc123",
"model_name": "llama-7b",
"original_size": 1073741824,
"compressed_size": 917504819,
"compression_ratio": 1.17,
"hash_original": "sha256:a1b2c3d4...",
"hash_compressed": "sha256:e5f6g7h8...",
"compression_method": "nvcomp_lz4_gpu",
"stored_at": "2026-04-07T12:00:00Z",
"vault_path": "/data/modelsafe/vault/ckpt_abc123.msvault",
"match": true
}
Integrity Verification
ModelSafe uses SHA-256 hashing at every stage of the checkpoint lifecycle:
- Before compression: Original file is hashed and recorded in the manifest
- After compression: Compressed file is hashed for storage verification
- On restore: Decompressed file hash is compared against the original hash
- Match requirement:
match: trueis required for a successful restore - On failure: Alert is raised and restore is aborted — no partial or corrupt checkpoints
# Manually verify a checkpoint
modelsafe verify --id ckpt_abc123
# Output:
# Checkpoint: ckpt_abc123
# Original SHA-256: a1b2c3d4...
# Restored SHA-256: a1b2c3d4...
# Match: True
# Status: PASS
HammerIO Integration
ModelSafe uses HammerIO as its compression backend for GPU-accelerated operations. HammerIO wraps NVIDIA nvCOMP to provide high-throughput LZ4 compression on CUDA-capable hardware.
- Files larger than 500 MB are automatically routed to GPU LZ4 via HammerIO
- Files smaller than 500 MB use CPU zstd for efficiency
- HammerIO handles memory management, chunking, and GPU kernel scheduling
- Peak throughput: 391 MB/s decompression on Jetson AGX Orin
AriaOS Audit Integration
When running within the CommandRoomAI platform, ModelSafe integrates with AriaOS governance. Every store and restore operation is logged to the AriaOS audit trail.
- Store operations log: model name, original hash, compressed hash, vault path, timestamp
- Restore operations log: checkpoint ID, restored hash, match status, restore time, timestamp
- Failed integrity checks are flagged as security events in AriaOS
- All audit logs are immutable and can be exported for compliance reporting
# Enable AriaOS audit logging
modelsafe config set ariaos.audit_enabled true
modelsafe config set ariaos.endpoint http://localhost:9090/audit