9 models. 170+ runs. Native 1-bit beats post-quantized by 42–50%.
Cross-generation throughput on AMD Ryzen 9 7845HX (Zen 4) and Ryzen AI 9 HX 370 (Zen 5). Median of 5 runs, 256 tokens, 8 threads, cold-burst protocol for ≥7B models.
Every parameter is fixed and documented. Raw JSON results live in the repo and the report is generated from those files — no manual editing.
All values in tokens per second, median of 5 runs.
| Model | Params | Type | Zen 4 (t/s) | Zen 5 (t/s) | Δ |
|---|---|---|---|---|---|
| BitNet b1.58 Large | 0.7B | Post-quantized | 118.25 | — | — |
| Falcon-E 1B Instruct | 1.0B | Native 1-bit | 80.19 | 103.59 | +29% |
| Falcon3 1B 1.58bit | 1.0B | Post-quantized | 56.31 | 78.16 | +39% |
| BitNet b1.58 2B-4T | 2.4B | Native 1-bit | 37.76 | 51.82 | +37% |
| Falcon-E 3B Instruct | 3.0B | Native 1-bit | 49.80 | 65.19 | +31% |
| Falcon3 3B 1.58bit | 3.0B | Post-quantized | 33.21 | 46.77 | +41% |
| Falcon3 7B 1.58bit | 7.0B | Post-quantized | 19.89 | 28.45 | +43% |
| Falcon3 10B 1.58bit | 10.0B | Post-quantized | 15.12 | 19.39 | +28% |
Average cross-generation improvement: +35% (range +28% to +43%).
Direct comparison at matching parameter counts. Models trained natively in ternary weights (Falcon-E) outperform post-quantized equivalents (Falcon3 1.58bit) consistently.
| Size | Native (Falcon-E) | Post-quant (Falcon3) | Advantage |
|---|---|---|---|
| 1B (Zen 4) | 80.19 t/s | 56.31 t/s | +42% |
| 1B (Zen 5) | 103.59 t/s | 78.16 t/s | +33% |
| 3B (Zen 4) | 49.80 t/s | 33.21 t/s | +50% |
| 3B (Zen 5) | 65.19 t/s | 46.77 t/s | +39% |
Why: native ternary kernels replace multiply-accumulate with simple add/subtract operations. The advantage grows with model size as memory bandwidth becomes the dominant bottleneck.
| Zen 4 | Zen 5 | |
|---|---|---|
| CPU | AMD Ryzen 9 7845HX | AMD Ryzen AI 9 HX 370 |
| Cores / threads | 12C / 24T | 12C / 24T (4P + 8E) |
| Architecture | Zen 4, single CCD | Zen 5, big.LITTLE |
| AVX-512 | VNNI + VBMI (double-pumped 256-bit) | Native 512-bit |
| RAM | 64 GB DDR5 | 64 GB DDR5 |
| OS | Windows 11 | Windows 11 |
Zen 5 system: ASUS ProArt P16 (laptop). Models ≥7B use cold-burst protocol to avoid thermal throttling.
Full methodology and raw JSON results: benchmarks/README.md ↗ · benchmark report v2 ↗