First in line. XDNA2 ships broadly on Ryzen AI 300/400 laptops and has the highest TOPS-per-dollar among consumer NPUs.
Runs on every CPU. Scaffolding for NPU acceleration.
ARIA targets the silicon already in your machine. Inference runs on the CPU today; NPU detection lands in v0.9.0 so the network can prepare for accelerated inference in v1.0.
Native 1-bit kernels target AVX-512 on x86_64 and dotprod on ARM64. The bitnet.cpp backend ships with optimised paths for both.
| Family | Examples | Vector path | Notes |
|---|---|---|---|
| AMD Zen 5 | Ryzen 9000 series, Ryzen AI 300/400 | AVX-512 (native 512-bit) | Reference platform — full-width vector units, best 1-bit throughput per core. |
| AMD Zen 4 | Ryzen 7000, Threadripper 7000 | AVX-512 (double-pumped 256-bit) | VNNI + VBMI extensions present; throughput slightly lower than Zen 5 for the same model. |
| Intel Core (Tiger Lake+) | 11th gen and later mobile, Xeon Scalable | AVX-512 (native, where present) | AVX-512 was disabled on Alder/Raptor Lake desktop SKUs; modern Core Ultra and Xeon expose it again. |
| Apple Silicon | M1 / M2 / M3 / M4 | ARM Neon + dotprod | Competitive day-1 throughput via the bitnet.cpp ARM kernels. |
| Snapdragon X (ARM64) | X Elite, X Plus laptops | ARM Neon + dotprod | Same ARM64 path as Apple Silicon; NPU acceleration follows in a v1.x release. |
| Generic x86_64 / ARM64 | Any 64-bit CPU | Scalar + SSE/Neon fallback | Slower but functional. Catalog still loads, throughput depends on cache and memory subsystem. |
v0.9.0 detects the presence of an NPU and broadcasts the snapshot in the v2 peer hello — but inference still runs on the CPU. The detector is silent, sub-second, and best-effort.
First in line. XDNA2 ships broadly on Ryzen AI 300/400 laptops and has the highest TOPS-per-dollar among consumer NPUs.
Lunar Lake (Core Ultra 200V) and Arrow Lake-H share the same intel_vpu driver. OpenVINO is the most mature NPU plugin for v1.0.
Snapdragon X Elite / Plus. Detection alone unlocks routing decisions today; inference acceleration follows once Snapdragon laptop adoption justifies the integration cost.
Core ML is the only sanctioned ANE entry point. Day-1 throughput on Apple Silicon is competitive even without the ANE thanks to ARM dotprod kernels.
The desktop client picks a profile automatically based on detected RAM. You can override it manually.
| Profile | Minimum RAM | Tier coverage | Use case |
|---|---|---|---|
| Minimal | 4 GB | Smallest 1-bit models only | Any x86_64 or ARM64 CPU. Single-tier efficiency, sub-1B parameters. |
| Efficient (default) | 8 GB | Efficiency tier (full) | Mainstream laptops. Runs the full BitNet + Falcon-E line up to 3B comfortably. |
| Balanced | 16 GB | Efficiency + Quality | Adds Apache 2.0 quality models. Good for daily multilingual / multimodal use. |
| Full | 32 GB+ | All three tiers | Workstations. Loads 7B–10B and the specialist trio with headroom for KV-cache. |
$ aria hardware info
ARIA Hardware Profile (v0.9.0)
============================================================
CPU
------------------------------------------------------------
Model : AMD Ryzen AI 9 HX 370 w/ Radeon 890M
Vendor : AMD
Cores : 12
Threads : 24
AVX-512 : yes
Memory
------------------------------------------------------------
Total RAM : 64.0 GB
NPU
------------------------------------------------------------
Vendor : AMD
Performance : 50.0 TOPS (estimated)
Available stubs:
- amd_xdna2 AMD ~50 TOPS
Note: NPU inference is scaffolding only in v0.9.0; real acceleration in v1.0.
Full reference: docs/NPU_SUPPORT.md ↗