Benchmarks
Real performance data from the April 2026 AriaOS: Forge training run on NVIDIA Jetson AGX Orin (64 GB).
Training Run Summary
End-to-end pipeline execution details.
623
Training Samples
~10h
Full Pipeline Duration
3B
Fine-tuned Model Parameters
LoRA
Training Method
A/B Model Comparison
Head-to-head evaluation of the fine-tuned model against the base model.
| Model | Quality | Tok/s | TTFT | Result |
|---|---|---|---|---|
ariaos-forge:latest |
80/100 | 19.7 | 4,467 ms | Winner |
qwen2.5-coder:7b |
60/100 | 10.3 | 29,853 ms | — |
Quality via PraetorianMind Model A/B Compare — heuristic evaluation. Speed reflects 3B vs 7B parameter difference.
Inference Speed
Token generation performance on Jetson AGX Orin.
| Metric | ariaos-forge:latest | qwen2.5-coder:7b |
|---|---|---|
| Tokens per second | 19.7 tok/s | 10.3 tok/s |
| Time to first token | 4,467 ms | 29,853 ms |
| Speedup (TTFT) | 6.7x faster time to first token | |
| Speedup (throughput) | 1.9x faster token generation | |
Memory Usage
Resource consumption during training and inference.
| Phase | GPU Memory | System RAM | Notes |
|---|---|---|---|
| LoRA Training | ~28 GB | ~16 GB | fp16 with gradient checkpointing |
| Inference (3B) | ~6 GB | ~4 GB | Ollama runtime |
| Inference (7B base) | ~14 GB | ~8 GB | Ollama runtime |
Hardware Configuration
Platform
NVIDIA Jetson AGX Orin
GPU Memory
64 GB Unified
CUDA Cores
2048
Network
Air-gapped (zero cloud)