v0.5.5

Decentralized AI Inference
for Everyone

A peer-to-peer protocol for efficient, ethical, and decentralized AI inference. Run 1-bit quantized models on any CPU with 99.6% energy savings.

9 Models Validated
196 Tests Passing
MIT License
+35% Zen 5 vs Zen 4

Why ARIA?

AI inference without expensive hardware, excessive energy, or centralized control.

CPU-Efficient

118 t/s

1-bit ternary weights (-1, 0, +1) replace expensive multiplications with simple additions. Runs on any consumer CPU — no GPU required.

Energy-Conscious

15 mJ/token

99.6% energy reduction compared to cloud APIs. A single node uses ~241 kWh/year vs 25,550 kWh for cloud solutions.

Truly Decentralized

P2P

WebSocket-based peer-to-peer networking with pipeline parallelism. No central server, no single point of failure. Your data stays yours.

Extensible Ecosystem

8+ orgs

At least 8 independent organizations produce 1-bit models. No single vendor dependency. Falcon-Edge outperforms Microsoft BitNet (53.17% vs 51.54%).

5/5 Cognitive Memory

5 types

Full coverage of all five human memory types — including prospective memory for deferred intentions. The system remembers what to bring up next time, not just what happened before. Grounded in Einstein & McDaniel’s Multiprocess Framework from cognitive science.

Real-World Benchmarks

9 models, 3 vendors, 270+ test runs — cross-generation validated on Zen 4 (7845HX) and Zen 5 (HX 370), AVX-512 VNNI+VBMI, reproducible results.

Model Params Type Zen 4 (t/s) Zen 5 (t/s) Delta
BitNet-b1.58-large 0.7B Post-quantized 118.25
Falcon-E-1B-Instruct 1.0B Native 1-bit 80.19 103.59 +29%
Falcon3-1B-Instruct 1.0B Post-quantized 56.31 78.16 +39%
BitNet-b1.58-2B-4T 2.4B Native 1-bit 37.76 51.82 +37%
Falcon-E-3B-Instruct 3.0B Native 1-bit 49.80 65.19 +31%
Falcon3-3B-Instruct 3.0B Post-quantized 33.21 46.77 +41%
Falcon3-7B-Instruct 7.0B Post-quantized 19.89 28.45 +43%
Llama3-8B-1.58 8.0B Post-quantized 16.97
Falcon3-10B-Instruct 10.0B Post-quantized 15.12 19.39 +28%

Key finding: Models natively trained in 1-bit (Falcon-E) outperform post-training quantized models by +42% at 1B and +50% at 3B. This validates native ternary training over post-hoc quantization. Zen 5 native 512-bit AVX-512 delivers +28% to +43% improvement over Zen 4 double-pump implementation across all tested models.

Zen 5 results measured on AMD Ryzen AI 9 HX 370 (ASUS ProArt P16, native 512-bit AVX-512). Models ≥7B use cold-burst protocol for laptop thermal management. See benchmark report for full methodology.

3-Year Total Cost of Ownership

Solution Hardware Running Costs Total vs ARIA
Cloud APIs (frontier) $0 $164,250 $164,250 2,161x
Llama API $0 $32,850 $32,850 432x
RTX 4090 (local) $2,000 $6,533 $8,533 112x
ARIA Protocol $0 $76 $76 1x

Assumptions: 10M tokens/day, existing CPU hardware, electricity at $0.25/kWh.

🧬
Native > Post-Quantized

Falcon-E (native 1-bit) outperforms Falcon3 (post-quantized) by +42% at 1B and +50% at 3B.

🧵
Thread Scaling

Zen 4 (symmetric): all models peak at 6–8 threads. Zen 5 (big.LITTLE): 1B peaks at 6 threads, 2.4B at 8, 7B at 20. Model-size-aware tuning required on heterogeneous architectures.

💻
CCD & big.LITTLE

Zen 4: smaller models benefit from single-CCD pinning. Zen 5: 24-thread penalty is only -2% for 7B (vs -38% on Zen 4 dual-CCD).

🚀
10B on CPU

Falcon3-10B at 15 tok/s demonstrates viable interactive inference on consumer hardware.

🧠
Consensus Inference

Multiple 7B models with orchestrated debate reach 92.85% accuracy (Nature 2025, SLM-MATRIX).

💾
Extended Context

KV-Cache NVMe paging targets 500K+ tokens on 8GB RAM via sparse attention + 2-bit quantization.

🌐
1-Bit Ecosystem

8+ independent organizations. Falcon-Edge outperforms Microsoft BitNet: 53.17% vs 51.54% avg benchmark.

🧠
Prospective Memory

One of the first open-source AI protocols with dedicated prospective memory — time-based, semantic, and condition-based triggers for autonomous intention management.

Architecture

A 3-layer distributed system designed for resilience and efficiency.

Layer 3: Service
OpenAI-Compatible API Web Dashboard CLI Interface Desktop App (Tauri 2.0 + Electron)
Layer 2: Consensus
Provenance Ledger Proof of Useful Work Proof of Sobriety Consent Contracts
Layer 1: Compute
P2P Network (→ Kademlia DHT in v0.6.0) 1-bit Inference Engine Model Sharding Consent-based Routing

Security & Trust

Five independent defense layers protect every inference. No single point of failure.

5

Privacy & Consent

Consent contracts · Local-first inference · Data minimization

Implemented
4

Reputation Security

Contribution scoring · Reputation penalties · Quality thresholds · Temporal decay

Designed
3

Consensus Security

Proof of Useful Work · Proof of Sobriety · Provenance Ledger

Implemented
2

Protocol Security

Message authentication · Replay protection · Anti-downgrade

Implemented
1

Transport Security

TLS 1.3 · Certificate validation · Perfect forward secrecy

Implemented
Provenance Ledger — Every inference recorded immutably
🛡️

Anti-Sybil

Proof of Useful Work requires real computation. Reputation requirements create contribution cost. Rate limiting caps fake node creation.

Anti-Fraud

Output hashes + timing analysis detect falsified results. Energy claims cross-referenced with hardware TDP profiles.

🔒

Privacy-First

Inference runs locally. Only cryptographic hashes transit the network. Consent contracts enforce resource limits.

📋

Full Audit Trail

Every inference recorded on provenance ledger: timestamp, I/O hashes, nodes, energy consumed. Fully auditable.

"Nodes do not trust each other — they verify."

Desktop Application

A beautiful, native desktop experience for ARIA Protocol.

Dashboard

Real-time node monitoring and network stats with live updates.

Model Manager

Download and manage BitNet models directly from HuggingFace.

AI Chat

Local AI chat interface with typewriter effects and streaming.

Energy Dashboard

Track energy savings, CO2 avoided, and unlock achievements.

Settings

12 languages, consent controls, and system preferences.

Cross-platform: Windows, macOS, Linux
Lightweight: Tauri 2.0 (~15 MB) + Electron fallback (~150 MB)
12 Languages: EN, FR, ES, DE, PT, IT, JA, KO, ZH, RU, AR, HI
Premium design: Dark mode with glassmorphism effects
One-click setup: Perfect for non-developers

Coming in v0.6.0+: Infinite Context Mode, Conversation Memory Manager, Consensus Inference Panel, Knowledge Network Browser

Get Started in 3 Commands

Terminal
# Install ARIA Protocol
$ pip install aria-protocol

# Start a node
$ aria node start --port 8765 --model aria-2b-1bit

# Start the API server
$ aria api start --port 3000

Roadmap v3.0

9 Versions · From testnet to production

v0.1–v0.5.2 Genesis → Desktop
v0.6.0 Testnet Alpha
v0.7.0 Smart Layer
v0.7.5 R&D + Docs
v0.8.0 Extended Context
v0.9.0 ARIA-LM
v1.0.0 Production
v1.1.0+ Beyond
54 Tasks
44 New
9 Versions
7 Corrections
View Full Roadmap →

Join the Decentralized AI Movement

ARIA is open-source, MIT licensed, and ready for contributors.