v0.5.5
A peer-to-peer protocol for efficient, ethical, and decentralized AI inference. Run 1-bit quantized models on any CPU with 99.6% energy savings.
AI inference without expensive hardware, excessive energy, or centralized control.
1-bit ternary weights (-1, 0, +1) replace expensive multiplications with simple additions. Runs on any consumer CPU — no GPU required.
99.6% energy reduction compared to cloud APIs. A single node uses ~241 kWh/year vs 25,550 kWh for cloud solutions.
WebSocket-based peer-to-peer networking with pipeline parallelism. No central server, no single point of failure. Your data stays yours.
At least 8 independent organizations produce 1-bit models. No single vendor dependency. Falcon-Edge outperforms Microsoft BitNet (53.17% vs 51.54%).
Full coverage of all five human memory types — including prospective memory for deferred intentions. The system remembers what to bring up next time, not just what happened before. Grounded in Einstein & McDaniel’s Multiprocess Framework from cognitive science.
9 models, 3 vendors, 170 test runs — AMD Ryzen 9 7845HX, 8 threads, AVX-512 VNNI+VBMI, reproducible results.
| Model | Params | Source | Type | tok/s | Energy* |
|---|---|---|---|---|---|
| BitNet-b1.58-large | 0.7B | Microsoft | Post-quantized | 118.25 | ~15 mJ/tok |
| Falcon-E-1B-Instruct | 1.0B | TII | Native 1-bit | 80.19 | ~23 mJ/tok |
| Falcon3-1B-Instruct | 1.0B | TII | Post-quantized | 56.31 | ~33 mJ/tok |
| BitNet-b1.58-2B-4T | 2.4B | Microsoft | Native 1-bit | 37.76 | ~49 mJ/tok |
| Falcon-E-3B-Instruct | 3.0B | TII | Native 1-bit | 49.80 | ~37 mJ/tok |
| Falcon3-3B-Instruct | 3.0B | TII | Post-quantized | 33.21 | ~55 mJ/tok |
| Falcon3-7B-Instruct | 7.0B | TII | Post-quantized | 19.89 | ~92 mJ/tok |
| Llama3-8B-1.58 | 8.0B | Microsoft | Post-quantized | 16.97 | ~108 mJ/tok |
| Falcon3-10B-Instruct | 10.0B | TII | Post-quantized | 15.12 | ~121 mJ/tok |
Key finding: Models natively trained in 1-bit (Falcon-E) outperform post-training quantized models by +42% at 1B and +50% at 3B. This validates native ternary training over post-hoc quantization.
*Energy is estimated via CPU-time × TDP/threads. See benchmark report for full methodology.
| Solution | Hardware | Running Costs | Total | vs ARIA |
|---|---|---|---|---|
| Cloud APIs (frontier) | $0 | $164,250 | $164,250 | 2,161x |
| Llama API | $0 | $32,850 | $32,850 | 432x |
| RTX 4090 (local) | $2,000 | $6,533 | $8,533 | 112x |
| ARIA Protocol | $0 | $76 | $76 | 1x |
Assumptions: 10M tokens/day, existing CPU hardware, electricity at $0.25/kWh.
Falcon-E (native 1-bit) outperforms Falcon3 (post-quantized) by +42% at 1B and +50% at 3B.
All 9 models peak at 6–8 threads. 1-bit inference is memory-bound, not compute-bound.
Smaller models benefit from single-CCD pinning; 7B+ models show minimal CCD sensitivity.
Falcon3-10B at 15 tok/s demonstrates viable interactive inference on consumer hardware.
Multiple 7B models with orchestrated debate reach 92.85% accuracy (Nature 2025, SLM-MATRIX).
KV-Cache NVMe paging targets 500K+ tokens on 8GB RAM via sparse attention + 2-bit quantization.
8+ independent organizations. Falcon-Edge outperforms Microsoft BitNet: 53.17% vs 51.54% avg benchmark.
One of the first open-source AI protocols with dedicated prospective memory — time-based, semantic, and condition-based triggers for autonomous intention management.
A 3-layer distributed system designed for resilience and efficiency.
Five independent defense layers protect every inference. No single point of failure.
Consent contracts · Local-first inference · Data minimization
Contribution scoring · Reputation penalties · Quality thresholds · Temporal decay
Proof of Useful Work · Proof of Sobriety · Provenance Ledger
Message authentication · Replay protection · Anti-downgrade
TLS 1.3 · Certificate validation · Perfect forward secrecy
Proof of Useful Work requires real computation. Reputation requirements create contribution cost. Rate limiting caps fake node creation.
Output hashes + timing analysis detect falsified results. Energy claims cross-referenced with hardware TDP profiles.
Inference runs locally. Only cryptographic hashes transit the network. Consent contracts enforce resource limits.
Every inference recorded on provenance ledger: timestamp, I/O hashes, nodes, energy consumed. Fully auditable.
"Nodes do not trust each other — they verify."
A beautiful, native desktop experience for ARIA Protocol.
Real-time node monitoring and network stats with live updates.
Download and manage BitNet models directly from HuggingFace.
Local AI chat interface with typewriter effects and streaming.
Track energy savings, CO2 avoided, and unlock achievements.
12 languages, consent controls, and system preferences.
Coming in v0.6.0+: Infinite Context Mode, Conversation Memory Manager, Consensus Inference Panel, Knowledge Network Browser
# Install ARIA Protocol
$ pip install aria-protocol
# Start a node
$ aria node start --port 8765 --model aria-2b-1bit
# Start the API server
$ aria api start --port 3000
9 Versions · From testnet to production
ARIA is open-source, MIT licensed, and ready for contributors.