From genesis to ARIA Universal Protocol — every shipped version, every planned step.
Whitepaper, P2P, CLI, API, Dashboard, BitNet, Benchmarks, Desktop App
- v0.1.0 Genesis — Whitepaper published, reference implementation
- v0.2.0 Full Stack — P2P WebSocket + TLS, CLI, API, Dashboard, BitNet engine
- v0.2.5 Hardening — Threat model, protocol spec, TLS support
- v0.3.0 Benchmarks — Performance validation (up to 120 t/s on 0.7B, multi-architecture)
- v0.4.0 Native BitNet — Python ctypes bindings to bitnet.cpp
- v0.5.0 Desktop App — Tauri 2.0 + Electron, 12 languages, system tray
- v0.5.1 Build Fix — Desktop build corrections
- v0.5.2 Subprocess Backend — llama-cli bridge, multi-backend inference, CI/CD cross-platform builds
CI/CD fixes, cleanup, test stabilization, Falcon3 prep, Zen 5 benchmarks (+35% vs Zen 4), documentation — 7 tasks
View 7 tasks
| # | Task | Type | Detail |
| 1 | Fix CI/CD workflows across all platforms | Infra | Resolve build failures on Windows, macOS, Linux. Cross-platform test matrix. |
| 2 | Code cleanup and linting | Code | Remove dead code, fix lint warnings, standardize imports. |
| 3 | Test stabilization (196 tests passing) | Code | Fix flaky tests, ensure deterministic results, improve coverage. |
| 4 | Prepare Falcon3 1.58-bit integration | R&D | Verify GGUF availability, test bitnet.cpp compatibility, plan integration. |
| 5 | Documentation update (post-brainstorming Feb 2026) | Doc | Update README, roadmap, GitHub Pages with research-validated content. |
| 6 | GitHub Release v0.5.5 | Infra | Tag release, update version badges, sync pyproject.toml. |
| 7 | Zen 5 cross-generation benchmarks | R&D | 270+ runs on AMD Ryzen AI 9 HX 370 (Zen 5). +28-43% vs Zen 4. big.LITTLE thread scaling analysis. Thermal throttling diagnostic. |
Kademlia DHT, NAT traversal, desktop bridge, Falcon3/Edge models, 50+ community nodes — 10 tasks
View 10 tasks
| # | Task | Type | Detail |
| 7 | Kademlia DHT peer discovery | Code | Replace bootstrap servers with decentralized peer discovery via Kademlia DHT. |
| 8 | NAT traversal (STUN/TURN) | Code | Enable nodes behind routers to participate. STUN for discovery, TURN for relay. |
| 9 | Decentralized bootstrap mechanism | Infra | No central server dependency. Seed nodes + DHT for full decentralization. |
| 10 | Desktop ↔ Python backend bridge (live inference) | Code | Real-time bidirectional communication between desktop app and Python backend. Live inference metrics. |
| 11 | Real-time P2P network visualization | Web | Visual node map in desktop app showing connected peers, latency, throughput. |
| 12 | Integrate Falcon3 1.58-bit models (1B, 3B, 7B, 10B) | Code | Add TII instruction-tuned models. GGUF format, bitnet.cpp compatible. |
| 13 | Integrate Falcon-Edge (1B, 3B) + benchmark vs Microsoft BitNet | Code | Natively-trained 1-bit models from TII. Benchmark: 53.17% vs 51.54% avg. |
| 14 | Update bitnet.cpp to January 2026 parallel kernels | Code | Integrate ELUT parallel kernels for +1.15–2.1x additional CPU speedup. |
| 15 | Settings persistence + consent management | Code | Save user preferences, consent settings, and node configuration across sessions. |
| 16 | Public testnet with 50+ community nodes target | Com | Launch campaign: Show HN, r/LocalLLaMA. Target 50+ active nodes. |
Validation: 50-node simulation — 100% shard discovery, 82.2% routing completeness, 0 errors.
Reputation system, Smart Router, MoE System Routing, Frontier API bridge, Thinking Pipeline, contribution snapshot — 531 tests
View 16 tasks
| # | Task | Type | Detail |
| 17 | Consensus Inference: confidence-based routing (SLM-MUX style) | Code | Route queries based on model confidence scores. Avoids groupthink via orchestrated protocols. |
| 18 | Consensus Inference: cross-verification protocol | Code | Multiple models verify each other’s outputs. Majority voting + synthesis. |
| 19 | Consensus Inference: user-facing protocol selection | Code | Speed/Quality/Verify mode selection in UI. Energy budget controls. |
| 20 | Consensus Inference: provenance logging | Code | Record which nodes, which protocol, energy used. Full audit trail on ledger. |
| 21 | Conversation Memory: 3-tier architecture (Hot/Warm/Cold) | Code | Hot = current session, Warm = recent context, Cold = long-term storage. |
| 22 | Conversation Memory: Fact Store with heuristic + LLM extraction | Code | Extract and store facts from conversations. Heuristic + LLM-based extraction pipeline. |
| 23 | Conversation Memory: semantic recall via embeddings | Code | Retrieve relevant memories using semantic similarity. Local embedding model. |
| 24 | Node reputation system (reliability scoring) | Code | Score nodes on uptime, quality, energy efficiency. Published on ledger. |
| 25 | Anti-Sybil protections (IP + hardware fingerprint) | Code | Prevent fake nodes. IP diversity + hardware attestation. |
| 26 | SAPO integration prototype (reasoning rollout sharing) | R&D | Decentralized experience sharing for reasoning improvement. Based on Gensyn research. |
| 27 | Prospective Memory: Intention node schema in Profile Graph | Code | New Intention node type (14th) in Kuzu graph with trigger_type, trigger_condition, priority, status lifecycle, and trigger_embedding fields. Relations: HAS_INTENTION, RELATES_TO, DEPENDS_ON. |
| 28 | Prospective Memory: dual-pathway trigger system | Code | Strategic monitoring (time-based polling, ~3ms) + spontaneous retrieval (semantic embedding match, ~2ms). Adaptive monitoring intensity per Dynamic Multiprocess Framework. Compound triggers (AND/OR/SEQUENCE). |
| 29 | Prospective Memory: intention extraction pipeline | Code | Heuristic pattern matching (~1ms sync) + LLM extraction via Qwen2.5-1.5B (1-3s async). Token-efficient context injection with ACT-R inspired activation scoring. 300-500 token budget. |
| 30 | Prospective Memory: Intentions Panel in desktop app | Code | UI for managing active intentions: list view with trigger type icons, on/off toggle, manual creation, fire history, “3 active reminders” indicator. Integrated in Memory Manager. |
| 31 | MoE System Routing: local multi-model dispatch | Code | BitNet 0.7B (118 t/s) classifies tasks, dispatches to optimal model: Falcon-E-3B (coding), Falcon3-7B (complex), Falcon3-10B (analysis). Auto-switch via InferenceServer (~1.5s). |
| 32 | Thinking Pipeline: Chain-of-Thought multi-pass | Code | THINK → RETRIEVE → GENERATE. Structured reasoning plan before generation. Targeted RAG search based on plan. Two inference passes. Toggleable per request. |
Whitepaper v2, PT-BitNet R&D, KV-Cache proof of concept, benchmarks — 6 tasks
View 6 tasks
| # | Task | Type | Detail |
| 33 | Whitepaper v2 | Doc | Complete WP v2: Consensus Inference, KV-Cache, ARIA-LM, reputation system, competitor analysis. |
| 34 | PT-BitNet R&D: ternarize Qwen3-14B | R&D | Post-training ternarization of Qwen3-14B (28 GB → ~3 GB). Quality benchmarks. |
| 35 | KV-Cache NVMe: proof of concept (save/restore between sessions) | R&D | Prototype SSD-based KV-cache persistence. Save and restore context across sessions. |
| 36 | Benchmark Falcon-Edge vs Microsoft BitNet (publication content) | R&D | Rigorous comparison. Publishable results for community credibility. |
| 37 | Reproduce SLM-MATRIX results on ARIA infrastructure | R&D | Validate multi-agent debate accuracy claims on ARIA’s consensus system. |
| 38 | Update threat model for distributed features | Doc | Address new attack vectors from DHT, consensus, memory, and reputation systems. |
SmartRouter desktop integration, Auto model selection, thinking toggle, source indicator, frontier settings, reputation card — 8 features, 7 fixes
View 15 items
| # | Task | Type | Detail |
| A1 | SmartRouter integration in api.py | Code | All /v1/chat/completions routed through SmartRouter (MoE classification → model selection → inference → confidence → reputation). |
| A2 | Auto (Smart Router) mode in model selector | Code | New default: automatic model selection via MoE routing. |
| A3 | Thinking toggle | Code | Brain icon in chat input, per-conversation toggle, thinking_enabled passed through IPC. |
| A4 | Source indicator badge | Code | Badge under assistant messages: Local/Frontier, model, confidence %, energy, cost. |
| A5 | Frontier API key settings panel | Code | FrontierSettings.tsx: 5 providers, encrypted KeyStore, frontier escalation toggle. |
| A6 | Reputation dashboard card | Code | ReputationCard.tsx: SVG gauge, inference stats, local/frontier split bar, 10s refresh. |
| A7 | KV-cache slot persistence | Code | --slots + --slot-save-path in llama-server startup. |
| A8 | New API endpoints | Code | GET/POST/DELETE /v1/settings/api-keys, GET /v1/node/reputation, GET /v1/router/stats. |
| F1 | Chat template fix | Fix | Added --chat-template chatml to llama-server (GGUF lacks embedded templates). |
| F2 | Stale process cleanup | Fix | Port cleanup before each llama-server startup prevents orphan processes. |
| F3 | Model display fix | Fix | Badge shows SmartRouter-selected model instead of request model. |
| F4 | Message passthrough fix | Fix | Original chat messages passed to llama-server instead of flattened prompt. |
| F5 | Manual model bypass | Fix | User-selected model bypasses MoE routing. |
| F6 | Routing matrix fix | Fix | chat/simple changed from base model to Falcon-E-1B-Instruct. |
| F7 | TypeScript build errors | Fix | Unused React imports, type comparison in useBackend.ts. |
ThinkingPipeline wired end-to-end, “Thinking...” animation, expandable phase details, frontier retry button, routing fix — 3 commits, 588 tests
View 8 items
| # | Task | Type | Detail |
| 1 | Wire ThinkingPipeline into SmartRouter | Code | SmartRouter.route() gains thinking_enabled param. When enabled + complexity ≥ medium, runs THINK → RETRIEVE → GENERATE pipeline. |
| 2 | ThinkingPipeline integration tests | Test | 8 new tests covering all paths: disabled, enabled, simple bypass, phases in response, force_frontier. |
| 3 | “Thinking...” animation | Code | TypingIndicator shows spinning brain + animated text when thinking is active instead of bouncing dots. |
| 4 | Expandable thinking phases | Code | Clickable badge under messages shows per-phase detail: phase name, latency (ms), tokens used. |
| 5 | “Retry with frontier API” button | Code | Appears on local messages with confidence < 60% when frontier keys are configured. Sends same prompt with force_frontier=true. |
| 6 | force_frontier IPC plumbing | Code | Full pipeline: preload.js → main.js → API body → SmartRouter.route(force_frontier=True). |
| 7 | Fix query classification | Fix | Classifier now uses user message only (not system prompt), preventing length inflation from “simple” to “medium”. |
| 8 | Fix fallback model resolution | Fix | Fallback chain uses default_model_id instead of “auto” which couldn’t resolve to any GGUF path. |
Tool Framework, ARIA Code, MCP Client, Local RAG, session management — 8 tasks
View 8 tasks
| # | Task | Type | Detail |
| 39 | Tool Framework | Code | Central tool registry with manifest system (JSON schema), execution pipeline, permission model (ask/allow/deny), streaming results. Foundation for all agentic features. |
| 40 | ARIA Code — local sandbox | Code | Coding assistant using Tool Framework. BashTool in isolated sandbox. FileRead/Write/Edit tools. Automatic error iteration loop. |
| 41 | MCP Client | Code | MCP protocol client. Stdio transport for local servers. Tool bridging: MCP server tools registered in ARIA Tool Framework. Compatible with all existing MCP servers. |
| 42 | Local RAG Engine (LanceDB) | Code | Local vector store for document embeddings. Chunk docs, embed locally, semantic search top-k. Registered as RAGSearchTool. All data in ~/.aria/knowledge/. |
| 43 | Dynamic Web Fetch + Auto-Index | Code | On-demand documentation fetching. Registered as WebFetchTool. Smart TTL cache. Offline-first with web fallback. Respects robots.txt. |
| 44 | Session management + compaction | Code | Long conversation persistence with auto-compaction for context window management. Session export/import. History indexing. |
| 45 | Desktop: Code mode + MCP UI | Code | New “Code” mode in ModeSwitch. Terminal output panel. MCP server management in Settings. Tool permission prompts. |
| 46 | Agentic documentation | Doc | docs/tool-framework.md, docs/mcp-integration.md, docs/code-mode.md. Update threat-model.md with tool execution attack surfaces. |
Desktop wired end-to-end: 14 API endpoints, 12 IPC handlers, 3 React hooks. Zero mocks — 1117 tests
View 3 tasks
| # | Task | Type | Detail |
| 1 | API endpoints (CodeAgent, MCP, Sessions) | Code | 14 REST endpoints in aria/api.py: /v1/code/, /v1/mcp/, /v1/sessions/* |
| 2 | Electron IPC handlers | Code | 12 IPC handlers in main.js + preload.js exposure |
| 3 | React hooks + wiring | Code | useCodeAgent, useMCP, useSessions hooks. CodeView and MCPSettings wired to real backend. Zero setTimeout mocks. |
Consensus Inference (local + P2P), hooks, skills, vision, doc loader — 6 sprints (CC-22 to CC-27), +171 tests, 1288 total
View 6 tasks
| # | Task | Type | Detail |
| 47 | Consensus Inference Protocol | Code | 3-phase pipeline: brainstorm (N parallel agents), debate (1–3 rounds, convergence detection), synthesis. Local + P2P distributed. 55 tests. |
| 48 | Consensus UI + Desktop wiring | Code | ConsensusView with 3-section layout (config, live debate, result). ConsensusSettings: agent count, rounds, mode, energy budget. 5 IPC handlers. |
| 49 | Hooks system (pre/post tool) | Code | HookManager with 3 built-in hooks: ConsentEnforcement, AuditLog, RateLimit. Priority system (SYSTEM → AUDIT). 37 tests. |
| 50 | Skills discovery + loading | Code | SKILL.md/ARIA.md auto-discover, SkillsRegistry, trigger-based context injection into CodeAgent. 30 tests. |
| 51 | Vision pipeline (OCR) | Code | VisionTool: OCR via pytesseract, graceful fallback, multiple output formats. Image paste/drop in Chat. 29 tests. |
| 52 | Document Loader UI | Code | DocLoader: drag-and-drop import, SearchPreview with debounced semantic search, RAG toggle, cache indicators. 20 tests. |
Universal Protocol pivot: dual-backend (bitnet.cpp + llama.cpp), 3-tier model catalog (Efficiency/Quality/Specialist), SmartRouter v2, P2P v2, NPU scaffolding, desktop 3-tier UI, docs and website refresh — 14 tasks, 7 sprints, +12,211 lines, 170 new tests
View 14 tasks
| # | Task | Type | Detail |
| 53 | InferenceBackend abstraction | Code | ABC interface for pluggable backends. Frozen dataclasses: ModelHandle, GenerationParams, BackendHealth. Shared _LlamaServerProcess helper. |
| 54 | Mainline llama.cpp backend | Code | LlamacppBackend on port 8082 alongside BitnetBackend on 8081. Chat template per model (gemma, qwen, chatml, llama3). Install scripts for Windows and Linux. |
| 55 | Unified model catalog | Code | 16 models across 3 tiers. License gate (Apache 2.0/MIT/TII only). EXCLUDED_MODELS with 9 entries. Llama3-8B-1.58 removed (Meta license). |
| 56 | Model manager refactor | Code | Catalog-driven downloads. Per-tier CLI: aria model list --tier, aria model info. Backward-compat with legacy SUPPORTED_MODELS. |
| 57 | SmartRouter v2 | Code | 3-layer classification: regex pre-filter (<1ms), embedding similarity (~15ms), heuristic fallback. 7 query categories. Tier-aware routing with fallback chains. |
| 58 | Query classifier | Code | QueryCategory enum (CHAT_SIMPLE, CHAT_COMPLEX, CODE, MATH_REASONING, VISION, MULTILINGUAL, AUDIO). 105 training examples. |
| 59 | P2P Protocol v2 | Code | HELLO v2 with tiers_enabled, available_models, backend_versions, hardware_profile. v1 backwards compat for Tier 1. Peer matching with exact-model priority. |
| 60 | Hardware detection scaffolding | Code | HardwareProfile: CPU, RAM, AVX-512, NPU presence (AMD XDNA, Intel, Qualcomm, Apple). NPU stubs raise NotImplementedError with v1.0 deferral. aria hardware info CLI. |
| 61 | Desktop: 3-tier ModelSelector | Code | 3 horizontal tabs (Efficiency/Quality/Specialist). Catalog cards with download/select. Disabled tiers show “Enable in Settings”. |
| 62 | Desktop: TierBadge component | Code | Blue/emerald/purple badges with Lucide icons. Used in ChatInput, ModelCard, NodeStatus, Settings. |
| 63 | Desktop: NodeProfile presets + HardwareSettings | Code | 5 presets (Minimal/Efficient/Balanced/Full/Specialist-only). Hardware panel with NPU banner. Profile persisted to ~/.aria/node_profile.json. |
| 64 | i18n update (12 languages) | Code | tier.*, profile.*, hardware.*, npu.* keys added to en/fr/es/de/pt/it/ja/ko/zh/ru/ar/hi. |
| 65 | Public documentation refresh | Doc | README rewritten. 7 docs updated/created: architecture, getting-started, MODELS.md, NPU_SUPPORT.md, protocol-spec v2, MIGRATION, threat-model. |
| 66 | Website refresh | Web | Dark mode landing page, models.html (16 models in 3 tabs), hardware.html (CPU/NPU matrix), benchmarks.html (v0.5.5 + Zen 5 data). |
Extended model catalog, MoE inference, browser agent, tier benchmarks, contribution scoring — 6 tasks
View 6 tasks
| # | Task | Type | Detail |
| 67 | Extended catalog: Qwen 3.5 (0.8B/9B), Phi-4-mini, SmolVLM, Qwen2.5-Coder-7B, GLM-4-9B, OLMo 2 7B | Code | Complete the Quality and Specialist tiers with P3/P4 models. |
| 68 | Gemma 4 26B-A4B MoE support | Code | MoE inference path in llama.cpp backend. ~4B active params per forward pass from 25B total. |
| 69 | Browser agent (ARIA Browse) | Code | Headless browser driven by Tier 2/3 models via Tool Framework. Web search, data extraction. |
| 70 | Tier-aware benchmarks | R&D | Extend v0.5.5 benchmark harness to all 3 tiers with reproducible methodology. |
| 71 | Contribution score tier weighting | Code | Nodes earn score proportionate to tier demand served. |
| 72 | Desktop chat tier polish | Code | Tier indicator in chat header, one-click tier override per message. |
Stable production network, reputation system, hardened infrastructure — 6 tasks
View 6 tasks
| # | Task | Type | Detail |
| 61 | Distributed KV-Cache across P2P network | Code | Share KV-cache across nodes. Collaborative context extension. |
| 62 | Context Relay: map-reduce for long documents | Code | Split long documents across nodes. Map-reduce style processing. |
| 63 | Smart Inference Routing (local vs P2P vs cloud fallback) | Code | Intelligent routing: try local first, then P2P network, then cloud API fallback. |
| 64 | Universal API Bridge (OpenAI-compatible from any app) | Code | Drop-in replacement for OpenAI API. Works with any OpenAI-compatible application. |
| 65 | Production security hardening | Infra | P2P network audit. Bug bounty program. Third-party security review. |
| 66 | Production launch | Com | Stable production network for public use. Launch campaign: HN, Reddit, Twitter, Product Hunt. |
Mobile, mesh networking, automation, gamification, MoE+1-bit research
- Mobile companion app — iOS/Android with on-device inference (1B–3B models)
- ARIA Mesh Mode — Offline local inference via Bluetooth/WiFi Direct
- Live Network Globe — 3D visualization of the global ARIA network
- Automation Studio — No-code AI workflows and task automation
- Team/Family sharing — LAN-based node sharing for groups
- Gamification & achievements — Recognize engagement and contribution milestones
- MoE + 1-bit research — 100B+ params, ~1 GB memory, frontier-class on laptops