v0.9.0 — ARIA Universal

ARIA Universal Protocol

One protocol. Every CPU. Every NPU. Every model.

A peer-to-peer distributed inference network that routes any prompt to the right model across a three-tier catalog — efficiency, quality, specialist — on the hardware you already own.

The catalog

Three tiers, one router

ARIA routes every query to the model that fits the task — without forcing a choice between speed, quality, and capability.

Efficiency

Native 1-bit speed

Native 1-bit. 30–118 tokens/sec on consumer CPU. Signature efficiency.

  • BitNet b1.58 Large — 0.7B
  • Falcon-E 3B Instruct — 3B
  • Falcon3 10B 1.58bit — 10B
Quality

Apache-grade output

Apache 2.0 quality models. Multilingual, multimodal, instruction-grade.

  • Gemma 4 E4B Instruct — 4.5B
  • Qwen 3.5 4B Instruct — 4B
  • Phi-4 Mini Instruct — 3.8B
Specialist

Best in domain

Code, math, vision specialists. Best-in-class per domain.

  • Qwen2.5 Coder 3B — code
  • DeepSeek R1 Distill 7B — reasoning
  • MiniCPM-V 2.6 — vision

See the full catalog →

Why ARIA

Built for hardware you already own

Open by default. Local-first. No GPU required.

CPU-first

No GPU required. Inference runs on the silicon already in your laptop or workstation.

Privacy

Inference runs on your hardware. Prompts and outputs never leave the local node by default.

Open

MIT license, fully auditable codebase. Every model in the catalog passes a strict permissive license gate.

Peer-to-peer

Distributed network with no central server. Pipeline parallelism shards models across willing peers.

Benchmarks

Real numbers on real silicon

Measured on AMD Ryzen 9 7845HX (Zen 4), 8 threads, 256 tokens, median of 5 runs.

118.25 BitNet b1.58 Large 0.7B · tokens/sec
37.76 BitNet b1.58 2B-4T 2.4B · tokens/sec
19.89 Falcon3 7B 1.58bit 7B · tokens/sec
15.12 Falcon3 10B 1.58bit 10B · tokens/sec

See the full benchmark report →

Runs on every CPU. Scaffolding for NPU acceleration.

x86_64 (AMD Zen 4/5, Intel Core), ARM64 (Apple Silicon, Snapdragon X). NPU detection in v0.9.0 for AMD XDNA2, Intel NPU, Qualcomm Hexagon and Apple ANE — full acceleration arrives in v1.0.

See compatibility →
Community

Get involved

ARIA is open, MIT-licensed, and shaped by its contributors. The contribution score recognises useful work — uptime, served inferences, validated models — without any monetary mechanism.

Build with us

Open issues, suggest models for the catalog, or contribute backends. The full development history, roadmap, and protocol spec live on GitHub.

GitHub repository →

Discuss

Questions, design proposals, and benchmark reports go in GitHub Discussions. Bug reports and protocol changes go in Issues.

Open discussions →

Contribution scores are an on-network reputation signal — non-monetary by design. Peers use it to pick reliable nodes; nothing else.