v0.5.5 · Zen 4 + Zen 5

Benchmarks

9 models. 170+ runs. Native 1-bit beats post-quantized by 42–50%.

Cross-generation throughput on AMD Ryzen 9 7845HX (Zen 4) and Ryzen AI 9 HX 370 (Zen 5). Median of 5 runs, 256 tokens, 8 threads, cold-burst protocol for ≥7B models.

Methodology

Reproducible by design

Every parameter is fixed and documented. Raw JSON results live in the repo and the report is generated from those files — no manual editing.

Tokens / run
256
Runs / model
5 (median)
Threads
8 (Zen 4 symmetric)
Total runs
170+ (Zen 4) · 270+ (Zen 5)
Inference runtime
bitnet.cpp · Clang 20.1.8
Vector path
AVX-512 VNNI + VBMI
Energy figures are estimated as CPU_time × TDP / active_threads, not measured with a hardware power meter. Treat them as order-of-magnitude estimates only. For ≥7B models, Zen 5 results use a cold-burst protocol to avoid laptop thermal throttling.
Results

Throughput · Zen 4 × Zen 5

All values in tokens per second, median of 5 runs.

Model Params Type Zen 4 (t/s) Zen 5 (t/s) Δ
BitNet b1.58 Large 0.7B Post-quantized 118.25
Falcon-E 1B Instruct 1.0B Native 1-bit 80.19 103.59 +29%
Falcon3 1B 1.58bit 1.0B Post-quantized 56.31 78.16 +39%
BitNet b1.58 2B-4T 2.4B Native 1-bit 37.76 51.82 +37%
Falcon-E 3B Instruct 3.0B Native 1-bit 49.80 65.19 +31%
Falcon3 3B 1.58bit 3.0B Post-quantized 33.21 46.77 +41%
Falcon3 7B 1.58bit 7.0B Post-quantized 19.89 28.45 +43%
Falcon3 10B 1.58bit 10.0B Post-quantized 15.12 19.39 +28%

Average cross-generation improvement: +35% (range +28% to +43%).

Key finding

Native 1-bit beats post-quantized

Direct comparison at matching parameter counts. Models trained natively in ternary weights (Falcon-E) outperform post-quantized equivalents (Falcon3 1.58bit) consistently.

Size Native (Falcon-E) Post-quant (Falcon3) Advantage
1B (Zen 4) 80.19 t/s 56.31 t/s +42%
1B (Zen 5) 103.59 t/s 78.16 t/s +33%
3B (Zen 4) 49.80 t/s 33.21 t/s +50%
3B (Zen 5) 65.19 t/s 46.77 t/s +39%

Why: native ternary kernels replace multiply-accumulate with simple add/subtract operations. The advantage grows with model size as memory bandwidth becomes the dominant bottleneck.

Test platform

Hardware

Zen 4 Zen 5
CPU AMD Ryzen 9 7845HX AMD Ryzen AI 9 HX 370
Cores / threads 12C / 24T 12C / 24T (4P + 8E)
Architecture Zen 4, single CCD Zen 5, big.LITTLE
AVX-512 VNNI + VBMI (double-pumped 256-bit) Native 512-bit
RAM 64 GB DDR5 64 GB DDR5
OS Windows 11 Windows 11

Zen 5 system: ASUS ProArt P16 (laptop). Models ≥7B use cold-burst protocol to avoid thermal throttling.

Honest reporting

Limitations

Full methodology and raw JSON results: benchmarks/README.md ↗ · benchmark report v2 ↗