ModelSafe Docs

Everything you need to install, configure, and operate ModelSafe.

Install Guide

ModelSafe requires Python 3.10+ and runs on Linux (x86_64 and aarch64). For GPU-accelerated compression, HammerIO and CUDA 12.x are required.

# Clone the repository
git clone https://github.com/ResilientMindAI/ModelSafe.git
cd ModelSafe

# Install dependencies
pip install -e .

# Verify installation
modelsafe --version

For Jetson AGX Orin, ensure JetPack 6.x is installed and nvCOMP libraries are available via HammerIO.

CLI Commands

ModelSafe provides a straightforward CLI for all checkpoint operations.

# Store a model checkpoint
modelsafe store --model ./checkpoints/llama-7b.bin

# List stored checkpoints
modelsafe list

# Verify integrity of a stored checkpoint
modelsafe verify --id ckpt_abc123

# Restore a checkpoint
modelsafe restore --id ckpt_abc123 --output ./restored/

# Show vault status
modelsafe status

# Export manifest
modelsafe manifest --id ckpt_abc123 --format json

Vault Configuration

The vault is the local directory where compressed checkpoints are stored. Configure it in modelsafe.yaml:

vault:
  path: /data/modelsafe/vault
  max_size: 500GB
  compression:
    gpu_threshold: 500MB    # Files above this use nvCOMP GPU LZ4
    cpu_algorithm: zstd     # Files below threshold use CPU zstd
    gpu_algorithm: lz4      # GPU compression via HammerIO nvCOMP
  integrity:
    algorithm: sha256
    verify_on_store: true
    verify_on_restore: true

Manifest Format

Every stored checkpoint creates a manifest entry in JSON format:

{
  "id": "ckpt_abc123",
  "model_name": "llama-7b",
  "original_size": 1073741824,
  "compressed_size": 917504819,
  "compression_ratio": 1.17,
  "hash_original": "sha256:a1b2c3d4...",
  "hash_compressed": "sha256:e5f6g7h8...",
  "compression_method": "nvcomp_lz4_gpu",
  "stored_at": "2026-04-07T12:00:00Z",
  "vault_path": "/data/modelsafe/vault/ckpt_abc123.msvault",
  "match": true
}

Integrity Verification

ModelSafe uses SHA-256 hashing at every stage of the checkpoint lifecycle:

  • Before compression: Original file is hashed and recorded in the manifest
  • After compression: Compressed file is hashed for storage verification
  • On restore: Decompressed file hash is compared against the original hash
  • Match requirement: match: true is required for a successful restore
  • On failure: Alert is raised and restore is aborted — no partial or corrupt checkpoints
# Manually verify a checkpoint
modelsafe verify --id ckpt_abc123

# Output:
# Checkpoint: ckpt_abc123
# Original SHA-256:   a1b2c3d4...
# Restored SHA-256:   a1b2c3d4...
# Match: True
# Status: PASS

HammerIO Integration

ModelSafe uses HammerIO as its compression backend for GPU-accelerated operations. HammerIO wraps NVIDIA nvCOMP to provide high-throughput LZ4 compression on CUDA-capable hardware.

  • Files larger than 500 MB are automatically routed to GPU LZ4 via HammerIO
  • Files smaller than 500 MB use CPU zstd for efficiency
  • HammerIO handles memory management, chunking, and GPU kernel scheduling
  • Peak throughput: 391 MB/s decompression on Jetson AGX Orin

AriaOS Audit Integration

When running within the CommandRoomAI platform, ModelSafe integrates with AriaOS governance. Every store and restore operation is logged to the AriaOS audit trail.

  • Store operations log: model name, original hash, compressed hash, vault path, timestamp
  • Restore operations log: checkpoint ID, restored hash, match status, restore time, timestamp
  • Failed integrity checks are flagged as security events in AriaOS
  • All audit logs are immutable and can be exported for compliance reporting
# Enable AriaOS audit logging
modelsafe config set ariaos.audit_enabled true
modelsafe config set ariaos.endpoint http://localhost:9090/audit