v0.9.0 complete — v0.9.5 next

ARIA Protocol — Roadmap v4.0

Universal Protocol. Three tiers of inference — Efficiency, Quality, Specialist — on every CPU and NPU.

Twelve milestones, eighty-six identified tasks. From the v0.1 whitepaper to the v0.9.0 Universal Protocol pivot, with v0.9.5 polish and a v1.0 production network on the horizon.

12 Versions
86 Tasks
v0.9.0 Current
Timeline

Version history & next milestones

From genesis to ARIA Universal Protocol — every shipped version, every planned step.

v0.1 → v0.5.2 Genesis → Desktop Complete

Whitepaper, P2P, CLI, API, Dashboard, BitNet, Benchmarks, Desktop App

  • v0.1.0 Genesis — Whitepaper published, reference implementation
  • v0.2.0 Full Stack — P2P WebSocket + TLS, CLI, API, Dashboard, BitNet engine
  • v0.2.5 Hardening — Threat model, protocol spec, TLS support
  • v0.3.0 Benchmarks — Performance validation (up to 120 t/s on 0.7B, multi-architecture)
  • v0.4.0 Native BitNet — Python ctypes bindings to bitnet.cpp
  • v0.5.0 Desktop App — Tauri 2.0 + Electron, 12 languages, system tray
  • v0.5.1 Build Fix — Desktop build corrections
  • v0.5.2 Subprocess Backend — llama-cli bridge, multi-backend inference, CI/CD cross-platform builds
v0.5.5 Housekeeping & Foundations Complete

CI/CD fixes, cleanup, test stabilization, Falcon3 prep, Zen 5 benchmarks (+35% vs Zen 4), documentation — 7 tasks

View 7 tasks
#TaskTypeDetail
1Fix CI/CD workflows across all platformsInfraResolve build failures on Windows, macOS, Linux. Cross-platform test matrix.
2Code cleanup and lintingCodeRemove dead code, fix lint warnings, standardize imports.
3Test stabilization (196 tests passing)CodeFix flaky tests, ensure deterministic results, improve coverage.
4Prepare Falcon3 1.58-bit integrationR&DVerify GGUF availability, test bitnet.cpp compatibility, plan integration.
5Documentation update (post-brainstorming Feb 2026)DocUpdate README, roadmap, GitHub Pages with research-validated content.
6GitHub Release v0.5.5InfraTag release, update version badges, sync pyproject.toml.
7Zen 5 cross-generation benchmarksR&D270+ runs on AMD Ryzen AI 9 HX 370 (Zen 5). +28-43% vs Zen 4. big.LITTLE thread scaling analysis. Thermal throttling diagnostic.
v0.6.0 Testnet Alpha Complete

Kademlia DHT, NAT traversal, desktop bridge, Falcon3/Edge models, 50+ community nodes — 10 tasks

View 10 tasks
#TaskTypeDetail
7Kademlia DHT peer discoveryCodeReplace bootstrap servers with decentralized peer discovery via Kademlia DHT.
8NAT traversal (STUN/TURN)CodeEnable nodes behind routers to participate. STUN for discovery, TURN for relay.
9Decentralized bootstrap mechanismInfraNo central server dependency. Seed nodes + DHT for full decentralization.
10Desktop ↔ Python backend bridge (live inference)CodeReal-time bidirectional communication between desktop app and Python backend. Live inference metrics.
11Real-time P2P network visualizationWebVisual node map in desktop app showing connected peers, latency, throughput.
12Integrate Falcon3 1.58-bit models (1B, 3B, 7B, 10B)CodeAdd TII instruction-tuned models. GGUF format, bitnet.cpp compatible.
13Integrate Falcon-Edge (1B, 3B) + benchmark vs Microsoft BitNetCodeNatively-trained 1-bit models from TII. Benchmark: 53.17% vs 51.54% avg.
14Update bitnet.cpp to January 2026 parallel kernelsCodeIntegrate ELUT parallel kernels for +1.15–2.1x additional CPU speedup.
15Settings persistence + consent managementCodeSave user preferences, consent settings, and node configuration across sessions.
16Public testnet with 50+ community nodes targetComLaunch campaign: Show HN, r/LocalLLaMA. Target 50+ active nodes.
Validation: 50-node simulation — 100% shard discovery, 82.2% routing completeness, 0 errors.
v0.7.0 Smart Layer Complete

Reputation system, Smart Router, MoE System Routing, Frontier API bridge, Thinking Pipeline, contribution snapshot — 531 tests

View 16 tasks
#TaskTypeDetail
17Consensus Inference: confidence-based routing (SLM-MUX style)CodeRoute queries based on model confidence scores. Avoids groupthink via orchestrated protocols.
18Consensus Inference: cross-verification protocolCodeMultiple models verify each other’s outputs. Majority voting + synthesis.
19Consensus Inference: user-facing protocol selectionCodeSpeed/Quality/Verify mode selection in UI. Energy budget controls.
20Consensus Inference: provenance loggingCodeRecord which nodes, which protocol, energy used. Full audit trail on ledger.
21Conversation Memory: 3-tier architecture (Hot/Warm/Cold)CodeHot = current session, Warm = recent context, Cold = long-term storage.
22Conversation Memory: Fact Store with heuristic + LLM extractionCodeExtract and store facts from conversations. Heuristic + LLM-based extraction pipeline.
23Conversation Memory: semantic recall via embeddingsCodeRetrieve relevant memories using semantic similarity. Local embedding model.
24Node reputation system (reliability scoring)CodeScore nodes on uptime, quality, energy efficiency. Published on ledger.
25Anti-Sybil protections (IP + hardware fingerprint)CodePrevent fake nodes. IP diversity + hardware attestation.
26SAPO integration prototype (reasoning rollout sharing)R&DDecentralized experience sharing for reasoning improvement. Based on Gensyn research.
27Prospective Memory: Intention node schema in Profile GraphCodeNew Intention node type (14th) in Kuzu graph with trigger_type, trigger_condition, priority, status lifecycle, and trigger_embedding fields. Relations: HAS_INTENTION, RELATES_TO, DEPENDS_ON.
28Prospective Memory: dual-pathway trigger systemCodeStrategic monitoring (time-based polling, ~3ms) + spontaneous retrieval (semantic embedding match, ~2ms). Adaptive monitoring intensity per Dynamic Multiprocess Framework. Compound triggers (AND/OR/SEQUENCE).
29Prospective Memory: intention extraction pipelineCodeHeuristic pattern matching (~1ms sync) + LLM extraction via Qwen2.5-1.5B (1-3s async). Token-efficient context injection with ACT-R inspired activation scoring. 300-500 token budget.
30Prospective Memory: Intentions Panel in desktop appCodeUI for managing active intentions: list view with trigger type icons, on/off toggle, manual creation, fire history, “3 active reminders” indicator. Integrated in Memory Manager.
31MoE System Routing: local multi-model dispatchCodeBitNet 0.7B (118 t/s) classifies tasks, dispatches to optimal model: Falcon-E-3B (coding), Falcon3-7B (complex), Falcon3-10B (analysis). Auto-switch via InferenceServer (~1.5s).
32Thinking Pipeline: Chain-of-Thought multi-passCodeTHINK → RETRIEVE → GENERATE. Structured reasoning plan before generation. Targeted RAG search based on plan. Two inference passes. Toggleable per request.
v0.7.5 R&D + Documentation Complete

Whitepaper v2, PT-BitNet R&D, KV-Cache proof of concept, benchmarks — 6 tasks

View 6 tasks
#TaskTypeDetail
33Whitepaper v2DocComplete WP v2: Consensus Inference, KV-Cache, ARIA-LM, reputation system, competitor analysis.
34PT-BitNet R&D: ternarize Qwen3-14BR&DPost-training ternarization of Qwen3-14B (28 GB → ~3 GB). Quality benchmarks.
35KV-Cache NVMe: proof of concept (save/restore between sessions)R&DPrototype SSD-based KV-cache persistence. Save and restore context across sessions.
36Benchmark Falcon-Edge vs Microsoft BitNet (publication content)R&DRigorous comparison. Publishable results for community credibility.
37Reproduce SLM-MATRIX results on ARIA infrastructureR&DValidate multi-agent debate accuracy claims on ARIA’s consensus system.
38Update threat model for distributed featuresDocAddress new attack vectors from DHT, consensus, memory, and reputation systems.
v0.7.6 Integration Sprint Complete

SmartRouter desktop integration, Auto model selection, thinking toggle, source indicator, frontier settings, reputation card — 8 features, 7 fixes

View 15 items
#TaskTypeDetail
A1SmartRouter integration in api.pyCodeAll /v1/chat/completions routed through SmartRouter (MoE classification → model selection → inference → confidence → reputation).
A2Auto (Smart Router) mode in model selectorCodeNew default: automatic model selection via MoE routing.
A3Thinking toggleCodeBrain icon in chat input, per-conversation toggle, thinking_enabled passed through IPC.
A4Source indicator badgeCodeBadge under assistant messages: Local/Frontier, model, confidence %, energy, cost.
A5Frontier API key settings panelCodeFrontierSettings.tsx: 5 providers, encrypted KeyStore, frontier escalation toggle.
A6Reputation dashboard cardCodeReputationCard.tsx: SVG gauge, inference stats, local/frontier split bar, 10s refresh.
A7KV-cache slot persistenceCode--slots + --slot-save-path in llama-server startup.
A8New API endpointsCodeGET/POST/DELETE /v1/settings/api-keys, GET /v1/node/reputation, GET /v1/router/stats.
F1Chat template fixFixAdded --chat-template chatml to llama-server (GGUF lacks embedded templates).
F2Stale process cleanupFixPort cleanup before each llama-server startup prevents orphan processes.
F3Model display fixFixBadge shows SmartRouter-selected model instead of request model.
F4Message passthrough fixFixOriginal chat messages passed to llama-server instead of flattened prompt.
F5Manual model bypassFixUser-selected model bypasses MoE routing.
F6Routing matrix fixFixchat/simple changed from base model to Falcon-E-1B-Instruct.
F7TypeScript build errorsFixUnused React imports, type comparison in useBackend.ts.
v0.7.7 Wire Everything Complete

ThinkingPipeline wired end-to-end, “Thinking...” animation, expandable phase details, frontier retry button, routing fix — 3 commits, 588 tests

View 8 items
#TaskTypeDetail
1Wire ThinkingPipeline into SmartRouterCodeSmartRouter.route() gains thinking_enabled param. When enabled + complexity ≥ medium, runs THINK → RETRIEVE → GENERATE pipeline.
2ThinkingPipeline integration testsTest8 new tests covering all paths: disabled, enabled, simple bypass, phases in response, force_frontier.
3“Thinking...” animationCodeTypingIndicator shows spinning brain + animated text when thinking is active instead of bouncing dots.
4Expandable thinking phasesCodeClickable badge under messages shows per-phase detail: phase name, latency (ms), tokens used.
5“Retry with frontier API” buttonCodeAppears on local messages with confidence < 60% when frontier keys are configured. Sends same prompt with force_frontier=true.
6force_frontier IPC plumbingCodeFull pipeline: preload.js → main.js → API body → SmartRouter.route(force_frontier=True).
7Fix query classificationFixClassifier now uses user message only (not system prompt), preventing length inflation from “simple” to “medium”.
8Fix fallback model resolutionFixFallback chain uses default_model_id instead of “auto” which couldn’t resolve to any GGUF path.
v0.8.0 Agentic Foundation Complete

Tool Framework, ARIA Code, MCP Client, Local RAG, session management — 8 tasks

View 8 tasks
#TaskTypeDetail
39Tool FrameworkCodeCentral tool registry with manifest system (JSON schema), execution pipeline, permission model (ask/allow/deny), streaming results. Foundation for all agentic features.
40ARIA Code — local sandboxCodeCoding assistant using Tool Framework. BashTool in isolated sandbox. FileRead/Write/Edit tools. Automatic error iteration loop.
41MCP ClientCodeMCP protocol client. Stdio transport for local servers. Tool bridging: MCP server tools registered in ARIA Tool Framework. Compatible with all existing MCP servers.
42Local RAG Engine (LanceDB)CodeLocal vector store for document embeddings. Chunk docs, embed locally, semantic search top-k. Registered as RAGSearchTool. All data in ~/.aria/knowledge/.
43Dynamic Web Fetch + Auto-IndexCodeOn-demand documentation fetching. Registered as WebFetchTool. Smart TTL cache. Offline-first with web fallback. Respects robots.txt.
44Session management + compactionCodeLong conversation persistence with auto-compaction for context window management. Session export/import. History indexing.
45Desktop: Code mode + MCP UICodeNew “Code” mode in ModeSwitch. Terminal output panel. MCP server management in Settings. Tool permission prompts.
46Agentic documentationDocdocs/tool-framework.md, docs/mcp-integration.md, docs/code-mode.md. Update threat-model.md with tool execution attack surfaces.
v0.8.1 Integration Sprint Complete

Desktop wired end-to-end: 14 API endpoints, 12 IPC handlers, 3 React hooks. Zero mocks — 1117 tests

View 3 tasks
#TaskTypeDetail
1API endpoints (CodeAgent, MCP, Sessions)Code14 REST endpoints in aria/api.py: /v1/code/, /v1/mcp/, /v1/sessions/*
2Electron IPC handlersCode12 IPC handlers in main.js + preload.js exposure
3React hooks + wiringCodeuseCodeAgent, useMCP, useSessions hooks. CodeView and MCPSettings wired to real backend. Zero setTimeout mocks.
v0.8.5 Collective Intelligence Complete

Consensus Inference (local + P2P), hooks, skills, vision, doc loader — 6 sprints (CC-22 to CC-27), +171 tests, 1288 total

View 6 tasks
#TaskTypeDetail
47Consensus Inference ProtocolCode3-phase pipeline: brainstorm (N parallel agents), debate (1–3 rounds, convergence detection), synthesis. Local + P2P distributed. 55 tests.
48Consensus UI + Desktop wiringCodeConsensusView with 3-section layout (config, live debate, result). ConsensusSettings: agent count, rounds, mode, energy budget. 5 IPC handlers.
49Hooks system (pre/post tool)CodeHookManager with 3 built-in hooks: ConsentEnforcement, AuditLog, RateLimit. Priority system (SYSTEM → AUDIT). 37 tests.
50Skills discovery + loadingCodeSKILL.md/ARIA.md auto-discover, SkillsRegistry, trigger-based context injection into CodeAgent. 30 tests.
51Vision pipeline (OCR)CodeVisionTool: OCR via pytesseract, graceful fallback, multiple output formats. Image paste/drop in Chat. 29 tests.
52Document Loader UICodeDocLoader: drag-and-drop import, SearchPreview with debounced semantic search, RAG toggle, cache indicators. 20 tests.
v0.9.0 ARIA Universal Complete

Universal Protocol pivot: dual-backend (bitnet.cpp + llama.cpp), 3-tier model catalog (Efficiency/Quality/Specialist), SmartRouter v2, P2P v2, NPU scaffolding, desktop 3-tier UI, docs and website refresh — 14 tasks, 7 sprints, +12,211 lines, 170 new tests

View 14 tasks
#TaskTypeDetail
53InferenceBackend abstractionCodeABC interface for pluggable backends. Frozen dataclasses: ModelHandle, GenerationParams, BackendHealth. Shared _LlamaServerProcess helper.
54Mainline llama.cpp backendCodeLlamacppBackend on port 8082 alongside BitnetBackend on 8081. Chat template per model (gemma, qwen, chatml, llama3). Install scripts for Windows and Linux.
55Unified model catalogCode16 models across 3 tiers. License gate (Apache 2.0/MIT/TII only). EXCLUDED_MODELS with 9 entries. Llama3-8B-1.58 removed (Meta license).
56Model manager refactorCodeCatalog-driven downloads. Per-tier CLI: aria model list --tier, aria model info. Backward-compat with legacy SUPPORTED_MODELS.
57SmartRouter v2Code3-layer classification: regex pre-filter (<1ms), embedding similarity (~15ms), heuristic fallback. 7 query categories. Tier-aware routing with fallback chains.
58Query classifierCodeQueryCategory enum (CHAT_SIMPLE, CHAT_COMPLEX, CODE, MATH_REASONING, VISION, MULTILINGUAL, AUDIO). 105 training examples.
59P2P Protocol v2CodeHELLO v2 with tiers_enabled, available_models, backend_versions, hardware_profile. v1 backwards compat for Tier 1. Peer matching with exact-model priority.
60Hardware detection scaffoldingCodeHardwareProfile: CPU, RAM, AVX-512, NPU presence (AMD XDNA, Intel, Qualcomm, Apple). NPU stubs raise NotImplementedError with v1.0 deferral. aria hardware info CLI.
61Desktop: 3-tier ModelSelectorCode3 horizontal tabs (Efficiency/Quality/Specialist). Catalog cards with download/select. Disabled tiers show “Enable in Settings”.
62Desktop: TierBadge componentCodeBlue/emerald/purple badges with Lucide icons. Used in ChatInput, ModelCard, NodeStatus, Settings.
63Desktop: NodeProfile presets + HardwareSettingsCode5 presets (Minimal/Efficient/Balanced/Full/Specialist-only). Hardware panel with NPU banner. Profile persisted to ~/.aria/node_profile.json.
64i18n update (12 languages)Codetier.*, profile.*, hardware.*, npu.* keys added to en/fr/es/de/pt/it/ja/ko/zh/ru/ar/hi.
65Public documentation refreshDocREADME rewritten. 7 docs updated/created: architecture, getting-started, MODELS.md, NPU_SUPPORT.md, protocol-spec v2, MIGRATION, threat-model.
66Website refreshWebDark mode landing page, models.html (16 models in 3 tabs), hardware.html (CPU/NPU matrix), benchmarks.html (v0.5.5 + Zen 5 data).
v0.9.5 Universal Polish Next

Extended model catalog, MoE inference, browser agent, tier benchmarks, contribution scoring — 6 tasks

View 6 tasks
#TaskTypeDetail
67Extended catalog: Qwen 3.5 (0.8B/9B), Phi-4-mini, SmolVLM, Qwen2.5-Coder-7B, GLM-4-9B, OLMo 2 7BCodeComplete the Quality and Specialist tiers with P3/P4 models.
68Gemma 4 26B-A4B MoE supportCodeMoE inference path in llama.cpp backend. ~4B active params per forward pass from 25B total.
69Browser agent (ARIA Browse)CodeHeadless browser driven by Tier 2/3 models via Tool Framework. Web search, data extraction.
70Tier-aware benchmarksR&DExtend v0.5.5 benchmark harness to all 3 tiers with reproducible methodology.
71Contribution score tier weightingCodeNodes earn score proportionate to tier demand served.
72Desktop chat tier polishCodeTier indicator in chat header, one-click tier override per message.
v1.0.0 Production Network Planned

Stable production network, reputation system, hardened infrastructure — 6 tasks

View 6 tasks
#TaskTypeDetail
61Distributed KV-Cache across P2P networkCodeShare KV-cache across nodes. Collaborative context extension.
62Context Relay: map-reduce for long documentsCodeSplit long documents across nodes. Map-reduce style processing.
63Smart Inference Routing (local vs P2P vs cloud fallback)CodeIntelligent routing: try local first, then P2P network, then cloud API fallback.
64Universal API Bridge (OpenAI-compatible from any app)CodeDrop-in replacement for OpenAI API. Works with any OpenAI-compatible application.
65Production security hardeningInfraP2P network audit. Bug bounty program. Third-party security review.
66Production launchComStable production network for public use. Launch campaign: HN, Reddit, Twitter, Product Hunt.
v1.1.0+ Beyond — Long-term Vision Vision

Mobile, mesh networking, automation, gamification, MoE+1-bit research

  • Mobile companion app — iOS/Android with on-device inference (1B–3B models)
  • ARIA Mesh Mode — Offline local inference via Bluetooth/WiFi Direct
  • Live Network Globe — 3D visualization of the global ARIA network
  • Automation Studio — No-code AI workflows and task automation
  • Team/Family sharing — LAN-based node sharing for groups
  • Gamification & achievements — Recognize engagement and contribution milestones
  • MoE + 1-bit research — 100B+ params, ~1 GB memory, frontier-class on laptops
At a glance

By the numbers

Sixteen versions tracked, eighty-six tasks identified, and one universal protocol shipped.

12
Planned Versions
86
Identified Tasks
v0.9.0
Current Version