Benchmarks — ARIA Universal Protocol

Methodology

Reproducible by design

Every parameter is fixed and documented. Raw JSON results live in the repo and the report is generated from those files — no manual editing.

Tokens / run

256

Runs / model

5 (median)

Threads

8 (Zen 4 symmetric)

Total runs

170+ (Zen 4) · 270+ (Zen 5)

Inference runtime

bitnet.cpp · Clang 20.1.8

Vector path

AVX-512 VNNI + VBMI

Energy figures are estimated as CPU_time × TDP / active_threads, not measured with a hardware power meter. Treat them as order-of-magnitude estimates only. For ≥7B models, Zen 5 results use a cold-burst protocol to avoid laptop thermal throttling.

Results

Throughput · Zen 4 × Zen 5

All values in tokens per second, median of 5 runs.

Model	Params	Type	Zen 4 (t/s)	Zen 5 (t/s)	Δ
BitNet b1.58 Large	0.7B	Post-quantized	118.25	—	—
Falcon-E 1B Instruct	1.0B	Native 1-bit	80.19	103.59	+29%
Falcon3 1B 1.58bit	1.0B	Post-quantized	56.31	78.16	+39%
BitNet b1.58 2B-4T	2.4B	Native 1-bit	37.76	51.82	+37%
Falcon-E 3B Instruct	3.0B	Native 1-bit	49.80	65.19	+31%
Falcon3 3B 1.58bit	3.0B	Post-quantized	33.21	46.77	+41%
Falcon3 7B 1.58bit	7.0B	Post-quantized	19.89	28.45	+43%
Falcon3 10B 1.58bit	10.0B	Post-quantized	15.12	19.39	+28%

Average cross-generation improvement: +35% (range +28% to +43%).

Key finding

Native 1-bit beats post-quantized

Direct comparison at matching parameter counts. Models trained natively in ternary weights (Falcon-E) outperform post-quantized equivalents (Falcon3 1.58bit) consistently.

Size	Native (Falcon-E)	Post-quant (Falcon3)	Advantage
1B (Zen 4)	80.19 t/s	56.31 t/s	+42%
1B (Zen 5)	103.59 t/s	78.16 t/s	+33%
3B (Zen 4)	49.80 t/s	33.21 t/s	+50%
3B (Zen 5)	65.19 t/s	46.77 t/s	+39%

Why: native ternary kernels replace multiply-accumulate with simple add/subtract operations. The advantage grows with model size as memory bandwidth becomes the dominant bottleneck.

Test platform

Hardware

	Zen 4	Zen 5
CPU	AMD Ryzen 9 7845HX	AMD Ryzen AI 9 HX 370
Cores / threads	12C / 24T	12C / 24T (4P + 8E)
Architecture	Zen 4, single CCD	Zen 5, big.LITTLE
AVX-512	VNNI + VBMI (double-pumped 256-bit)	Native 512-bit
RAM	64 GB DDR5	64 GB DDR5
OS	Windows 11	Windows 11

Zen 5 system: ASUS ProArt P16 (laptop). Models ≥7B use cold-burst protocol to avoid thermal throttling.

Honest reporting

Limitations

1. Energy figures are estimated from CPU time × TDP, not direct power-meter measurement.
2. Single hardware per generation. Variance across SKUs, BIOS versions, and memory configurations is not characterized.
3. Decode throughput only — prompt processing (prefill) is not reported here.
4. Inference speed only — this report does not measure model output quality.
5. Thread affinity left to the OS scheduler on Zen 5 big.LITTLE; explicit pinning may shift results.

Full methodology and raw JSON results: benchmarks/README.md ↗ · benchmark report v2 ↗