PraetorianMind Benchmarks

Date	Model	Avg Tok/s	TTFT
2026-04-06	`ariaos-forge:latest`	21.5	1192ms
2026-03-18	`aya:8b`	21	2071ms
2026-03-18	`deepseek-r1:1.5b`	41	1389ms
2026-03-18	`phi3.5:3.8b`	39.6	112ms
2026-03-18	`llama3.1:8b`	6.7	17649ms

ariaos-forge:latest Winner vs qwen2.5-coder:7b

Head-to-head inference comparison on NVIDIA Jetson AGX Orin 64GB. The A/B Compare module evaluates response quality, latency, and token throughput side by side.

How A/B Compare Works

Send the same prompt to two models simultaneously
Measure tok/s, TTFT, and total generation time for each
Compare response quality with structured evaluation criteria
Track win rates over time with persistent history
Export results to CSV for further analysis

All benchmarks are run locally on PraetorianMind's Inference Bench module. The hardware platform is a NVIDIA Jetson AGX Orin 64GB running JetPack with CUDA acceleration. Each benchmark run captures:

Average Tokens per Second (tok/s) — sustained generation throughput
Time to First Token (TTFT) — latency from prompt submission to first token
CUDA Memory Bandwidth — GPU memory utilization during inference
Efficiency Score — composite metric of throughput vs. model size

Results are stored in PraetorianMind's local database and can be exported as CSV. No data leaves the device.

Inference Benchmark History

Model A/B Comparison Results

How A/B Compare Works

Benchmark Methodology