Training Run Summary

End-to-end pipeline execution details.

623
Training Samples
~10h
Full Pipeline Duration
3B
Fine-tuned Model Parameters
LoRA
Training Method

A/B Model Comparison

Head-to-head evaluation of the fine-tuned model against the base model.

Model Quality Tok/s TTFT Result
ariaos-forge:latest 80/100 19.7 4,467 ms Winner
qwen2.5-coder:7b 60/100 10.3 29,853 ms

Quality via PraetorianMind Model A/B Compare — heuristic evaluation. Speed reflects 3B vs 7B parameter difference.

Inference Speed

Token generation performance on Jetson AGX Orin.

Metric ariaos-forge:latest qwen2.5-coder:7b
Tokens per second 19.7 tok/s 10.3 tok/s
Time to first token 4,467 ms 29,853 ms
Speedup (TTFT) 6.7x faster time to first token
Speedup (throughput) 1.9x faster token generation

Memory Usage

Resource consumption during training and inference.

Phase GPU Memory System RAM Notes
LoRA Training ~28 GB ~16 GB fp16 with gradient checkpointing
Inference (3B) ~6 GB ~4 GB Ollama runtime
Inference (7B base) ~14 GB ~8 GB Ollama runtime

Hardware Configuration

Platform
NVIDIA Jetson AGX Orin
GPU Memory
64 GB Unified
CUDA Cores
2048
Network
Air-gapped (zero cloud)