Hardware compatibility — ARIA Universal Protocol

CPU support

Inference runs on the CPU you already have

Native 1-bit kernels target AVX-512 on x86_64 and dotprod on ARM64. The bitnet.cpp backend ships with optimised paths for both.

Family	Examples	Vector path	Notes
AMD Zen 5	Ryzen 9000 series, Ryzen AI 300/400	AVX-512 (native 512-bit)	Reference platform — full-width vector units, best 1-bit throughput per core.
AMD Zen 4	Ryzen 7000, Threadripper 7000	AVX-512 (double-pumped 256-bit)	VNNI + VBMI extensions present; throughput slightly lower than Zen 5 for the same model.
Intel Core (Tiger Lake+)	11th gen and later mobile, Xeon Scalable	AVX-512 (native, where present)	AVX-512 was disabled on Alder/Raptor Lake desktop SKUs; modern Core Ultra and Xeon expose it again.
Apple Silicon	M1 / M2 / M3 / M4	ARM Neon + dotprod	Competitive day-1 throughput via the bitnet.cpp ARM kernels.
Snapdragon X (ARM64)	X Elite, X Plus laptops	ARM Neon + dotprod	Same ARM64 path as Apple Silicon; NPU acceleration follows in a v1.x release.
Generic x86_64 / ARM64	Any 64-bit CPU	Scalar + SSE/Neon fallback	Slower but functional. Catalog still loads, throughput depends on cache and memory subsystem.

About AVX-512: Zen 5 and Intel Tiger Lake+ implement native 512-bit AVX-512 datapaths. Zen 4 implements AVX-512 via double-pumped 256-bit datapaths — functionally identical but lower throughput per cycle. Both expose VNNI and VBMI, which the bitnet.cpp kernels rely on.

NPU roadmap

NPU detection in v0.9.0, acceleration in v1.0

v0.9.0 detects the presence of an NPU and broadcasts the snapshot in the v2 peer hello — but inference still runs on the CPU. The detector is silent, sub-second, and best-effort.

AMD XDNA2 v1.0 target

amd_xdna2 — ~50 TOPS

Detected via amdxdna.sys (Win) / amdxdna module (Linux) Acceleration: OpenVINO path (v1.0)

First in line. XDNA2 ships broadly on Ryzen AI 300/400 laptops and has the highest TOPS-per-dollar among consumer NPUs.

Intel NPU (Series 2+) v1.0 target

intel_npu — Lunar Lake ~48 TOPS

Detected via intel_vpu driver / OpenVINO probe Acceleration: OpenVINO (v1.0)

Lunar Lake (Core Ultra 200V) and Arrow Lake-H share the same intel_vpu driver. OpenVINO is the most mature NPU plugin for v1.0.

Qualcomm Hexagon post-v1.0

qualcomm_hexagon — ~45 TOPS

Detected via qnn-net-run on PATH Acceleration: QNN SDK (post-v1.0)

Snapdragon X Elite / Plus. Detection alone unlocks routing decisions today; inference acceleration follows once Snapdragon laptop adoption justifies the integration cost.

Apple ANE post-v1.0

apple_ane — M4 ~38 TOPS

Detected via FEAT_DotProd sysctl Acceleration: Core ML (post-v1.0)

Core ML is the only sanctioned ANE entry point. Day-1 throughput on Apple Silicon is competitive even without the ANE thanks to ARM dotprod kernels.

NPU inference is scaffolding only in v0.9.0. Calling load_model on an NPU stub raises NotImplementedError pointing at the v1.0 milestone — by design. Real acceleration arrives in v1.0 (Intel NPU and AMD XDNA2 first, via OpenVINO).

Minimum requirements

Four hardware profiles

The desktop client picks a profile automatically based on detected RAM. You can override it manually.

Profile	Minimum RAM	Tier coverage	Use case
Minimal	4 GB	Smallest 1-bit models only	Any x86_64 or ARM64 CPU. Single-tier efficiency, sub-1B parameters.
Efficient (default)	8 GB	Efficiency tier (full)	Mainstream laptops. Runs the full BitNet + Falcon-E line up to 3B comfortably.
Balanced	16 GB	Efficiency + Quality	Adds Apache 2.0 quality models. Good for daily multilingual / multimodal use.
Full	32 GB+	All three tiers	Workstations. Loads 7B–10B and the specialist trio with headroom for KV-cache.

Verify detection

Check what ARIA sees on your machine

$ aria hardware info

ARIA Hardware Profile (v0.9.0)
============================================================

CPU
------------------------------------------------------------
  Model       : AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  Vendor      : AMD
  Cores       : 12
  Threads     : 24
  AVX-512     : yes

Memory
------------------------------------------------------------
  Total RAM   : 64.0 GB

NPU
------------------------------------------------------------
  Vendor      : AMD
  Performance : 50.0 TOPS (estimated)

  Available stubs:
    - amd_xdna2       AMD        ~50 TOPS

Note: NPU inference is scaffolding only in v0.9.0; real acceleration in v1.0.

Full reference: docs/NPU_SUPPORT.md ↗