v0.9.0 hardware

Hardware compatibility

Runs on every CPU. Scaffolding for NPU acceleration.

ARIA targets the silicon already in your machine. Inference runs on the CPU today; NPU detection lands in v0.9.0 so the network can prepare for accelerated inference in v1.0.

CPU support

Inference runs on the CPU you already have

Native 1-bit kernels target AVX-512 on x86_64 and dotprod on ARM64. The bitnet.cpp backend ships with optimised paths for both.

Family Examples Vector path Notes
AMD Zen 5 Ryzen 9000 series, Ryzen AI 300/400 AVX-512 (native 512-bit) Reference platform — full-width vector units, best 1-bit throughput per core.
AMD Zen 4 Ryzen 7000, Threadripper 7000 AVX-512 (double-pumped 256-bit) VNNI + VBMI extensions present; throughput slightly lower than Zen 5 for the same model.
Intel Core (Tiger Lake+) 11th gen and later mobile, Xeon Scalable AVX-512 (native, where present) AVX-512 was disabled on Alder/Raptor Lake desktop SKUs; modern Core Ultra and Xeon expose it again.
Apple Silicon M1 / M2 / M3 / M4 ARM Neon + dotprod Competitive day-1 throughput via the bitnet.cpp ARM kernels.
Snapdragon X (ARM64) X Elite, X Plus laptops ARM Neon + dotprod Same ARM64 path as Apple Silicon; NPU acceleration follows in a v1.x release.
Generic x86_64 / ARM64 Any 64-bit CPU Scalar + SSE/Neon fallback Slower but functional. Catalog still loads, throughput depends on cache and memory subsystem.
About AVX-512: Zen 5 and Intel Tiger Lake+ implement native 512-bit AVX-512 datapaths. Zen 4 implements AVX-512 via double-pumped 256-bit datapaths — functionally identical but lower throughput per cycle. Both expose VNNI and VBMI, which the bitnet.cpp kernels rely on.
NPU roadmap

NPU detection in v0.9.0, acceleration in v1.0

v0.9.0 detects the presence of an NPU and broadcasts the snapshot in the v2 peer hello — but inference still runs on the CPU. The detector is silent, sub-second, and best-effort.

AMD XDNA2 v1.0 target
amd_xdna2 — ~50 TOPS
Detected via amdxdna.sys (Win) / amdxdna module (Linux) Acceleration: OpenVINO path (v1.0)

First in line. XDNA2 ships broadly on Ryzen AI 300/400 laptops and has the highest TOPS-per-dollar among consumer NPUs.

Intel NPU (Series 2+) v1.0 target
intel_npu — Lunar Lake ~48 TOPS
Detected via intel_vpu driver / OpenVINO probe Acceleration: OpenVINO (v1.0)

Lunar Lake (Core Ultra 200V) and Arrow Lake-H share the same intel_vpu driver. OpenVINO is the most mature NPU plugin for v1.0.

Qualcomm Hexagon post-v1.0
qualcomm_hexagon — ~45 TOPS
Detected via qnn-net-run on PATH Acceleration: QNN SDK (post-v1.0)

Snapdragon X Elite / Plus. Detection alone unlocks routing decisions today; inference acceleration follows once Snapdragon laptop adoption justifies the integration cost.

Apple ANE post-v1.0
apple_ane — M4 ~38 TOPS
Detected via FEAT_DotProd sysctl Acceleration: Core ML (post-v1.0)

Core ML is the only sanctioned ANE entry point. Day-1 throughput on Apple Silicon is competitive even without the ANE thanks to ARM dotprod kernels.

NPU inference is scaffolding only in v0.9.0. Calling load_model on an NPU stub raises NotImplementedError pointing at the v1.0 milestone — by design. Real acceleration arrives in v1.0 (Intel NPU and AMD XDNA2 first, via OpenVINO).
Minimum requirements

Four hardware profiles

The desktop client picks a profile automatically based on detected RAM. You can override it manually.

Profile Minimum RAM Tier coverage Use case
Minimal 4 GB Smallest 1-bit models only Any x86_64 or ARM64 CPU. Single-tier efficiency, sub-1B parameters.
Efficient (default) 8 GB Efficiency tier (full) Mainstream laptops. Runs the full BitNet + Falcon-E line up to 3B comfortably.
Balanced 16 GB Efficiency + Quality Adds Apache 2.0 quality models. Good for daily multilingual / multimodal use.
Full 32 GB+ All three tiers Workstations. Loads 7B–10B and the specialist trio with headroom for KV-cache.
Verify detection

Check what ARIA sees on your machine

$ aria hardware info

ARIA Hardware Profile (v0.9.0)
============================================================

CPU
------------------------------------------------------------
  Model       : AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Vendor      : AMD
  Cores       : 12
  Threads     : 24
  AVX-512     : yes

Memory
------------------------------------------------------------
  Total RAM   : 64.0 GB

NPU
------------------------------------------------------------
  Vendor      : AMD
  Performance : 50.0 TOPS (estimated)

  Available stubs:
    - amd_xdna2       AMD        ~50 TOPS

Note: NPU inference is scaffolding only in v0.9.0; real acceleration in v1.0.

Full reference: docs/NPU_SUPPORT.md ↗