Pre-release · v0.1 closed alpha

Your AI, on your hardware.

Selfweave is a privacy-first personal AI. Local inference, a private memory of your work, and — when you want it — distributed compute: trusted LAN devices today, a cooperative WAN you help build by contributing. Weave yourself into the network.

No telemetry Encrypted at rest Source-available
Three pillars

Personal AI, built for sovereignty.

Selfweave separates what must stay private from what can be shared. You decide where the line sits — and how much of yourself you weave into the network.

01

On-device personal AI

Local LLM inference via llama.cpp. Private indexing of your code, documents, and photos. A personal adapter trained from your patterns, never leaving the device.

02

Cooperative compute

Run larger models by pooling VRAM across devices. Selfweave's roadmap extends this from LAN peers you choose to a cooperative pool of strangers — governed by attestation tiering, Sybil defense, and contribution-based incentives. The trust-and-economics layer is the differentiator, not the splitting itself.

03

Federated intelligence

An optional shared behavioural layer, trained via SPARTA + DiLoCo. Your gradients contribute to a collective intelligence without ever exposing your raw data.

Architecture

A hard boundary between you and the network.

Selfweave is built in four layers. Personal data lives above the encryption boundary and never crosses it in the clear. Distributed operations happen below it — with cryptographic guarantees, not promises.

  • Event-sourced state — every change is reversible, auditable, portable.
  • AES-256-GCM at rest, derived from your OS-managed key material.
  • Backend subprocesses sandboxed with Job Objects + AppContainer on Windows.
  • Cryptographic erasure — destroy the key, destroy the data.
L4
Your data
corpus, indices, LoRA adapter
L3
Local runtime
llama.cpp, Tauri, Python sidecar
— encryption boundary —
L2
Network primitives
pairing, split inference, relay
L1
Federation
SPARTA gossip, DiLoCo rounds
Behavior Intelligence Layer

A shared brain that never sees you.

The BIL is a small LoRA adapter — about 8–50 MB — that encodes aggregate human behaviour patterns: when people focus, how communication tone shifts across contexts, what people typically need at different times of day. It sits between the frozen base model and your private personal adapter. You get the benefit of the collective without the collective ever seeing your data.

The technique is borrowed, not invented. OpenFedLLM showed federated LoRA exchanging 0.06% of parameters per round can outperform a frontier model in a domain (Ye et al., 2024). Selfweave uses 0.1% via SPARTA sparse aggregation, with DiLoCo-style outer-loop synchronisation from DeepMind's distributed-training work and noise-budgeting from (ε, δ)-differential privacy (Dwork & Roth, 2014). Status: roadmap — cryptographic primitives ship in v0.2; federation activates when the peer-defense stack lands in v0.3.

  • SPARTA sparsity. Roughly 4,200 noised floats (~17 KB) leave your device per round. The full adapter is non-reconstructible from any subset of fragments.
  • (ε, δ)-differential privacy. ε = 1.0, δ = 1/(2n). The same standard as Apple iOS analytics and academic federated-healthcare research.
  • Defense in depth. Gradient clipping bounds any single example's influence; Gaussian noise applied on-device before transmission; a moments accountant tracks cumulative privacy budget; central DP added at the coordinator.
  • Open parameters. ε, δ, σ, clipping norm and aggregation code are all source-available. Independent researchers can verify the math — not take our word for it.
  • What an attacker can't do with the shared weights: determine whether you participated, recover any training example, identify any personal detail, or attribute behaviour to an individual.
Top
Personal adapter
private · ~8–50 MB · never shared
Mid
Behavior Intelligence Layer
federated · ~8–50 MB · DP-noised · SPARTA
Base
Foundation model
frozen · 2–4 GB INT4 · identical for everyone
— stacked at inference time —
W + (α/r)·Bshared·Ashared + (α/r)·Bpersonal·Apersonal
Parity

From one device to a network. Honest about today, ambitious about tomorrow.

Selfweave's capacity story made concrete: per-user experience as the network grows, and what one peer's contribution covers. Privacy and cost are Selfweave properties at every scale.

Per-user experience

Config Model Decode tok/s Context Cost / Mtok Privacy Phase
1-node RTX 3080 Ti Llama 3.1 8B Q4 89.7 128k $0 Local shipped
1-node + speculative 8B Q4 + 1B draft ~160–225 (proj.) 128k $0 Local v0.1.x
3-node LAN 14B Q4 split ~12 (proj.) 32k $0 LAN-trusted v0.2.5
10-node cooperative pool 70B Q4 sharded ~10–30 (proj.) 128k $0 + contribute Attestation-tiered v0.3+
100-node mature pool DeepSeek V3 (671B / 37B-MoE) similar per-user 128k+ $0 + contribute Attestation-tiered v0.3+ mature
1k-node mature pool 405B+ dense usable, large aggregate 128k+ $0 + contribute Attestation-tiered v0.3+ mature
ChatGPT (GPT-5) proprietary 135.8 128k $1.25 / $10.00 Provider-side now
Claude Sonnet 4.6 proprietary 45.3 1M $3.75 / $15.00 Provider-side now
DeepSeek V3 (hosted) 671B / 37B-MoE ~34 (provider median) 128k $0.40 / $0.89 Provider-side now

Agentic-loop performance — stacked, not chosen between

Agentic loops compound WAN RTT across both LLM steps and tool I/O. Naïve split inference is poorly suited; the five layers below stack rather than substitute. Numbers are per-user tok/s on a 70B-class model over a Petals-class swarm (tech-spec §4.8).

Stack level Technique Effective per-user tok/s Status
0 — baseline Naïve split inference ~5
1 — affinity KV cache session affinity ~8–10 design (RT-008)
2 — token spec Token-level speculative decoding ~15–20 wire-up shipped
3 — step spec Step-level lookahead reasoning ~30–50 design
4 — batching Cross-loop batching at swarm peers same per-user, 5–10× per peer v0.2.5
5 — overlap Tool-I/O ⇄ next-step prefill overlap hides tool latency design

Network capacity at scale — by parameter tier and phase

Capacity grows along three axes: peer count, model hidden-state size (smaller is better — egress is the bottleneck per tech-spec §4.6, not compute), and the RT-009 bandwidth-efficiency lever stack (INT4 activation quantisation, adaptive compression at slow links, cross-loop batching at swarm peers, tier stratification). Cells show users servable @ 5% concurrency with the §4.7 30–60% jitter discount baked in — a contributing peer still covers more user-pipeline work than its own household consumes. The 405B+ dense row is honest about being structurally bandwidth-heavy: the same swarm serves an order of magnitude more users on MoE-class models. Numbers are projections, not measurements.

Model class (hidden state) Phase 3 launch
v0.3 · ~100 peers · no levers
Phase 3 + RT-009 initial
v0.3.x · ~1k peers · ~5× stack
Phase 4 mature, full RT-009
v0.4+ · ~10k peers · ~10× stack
8B (8 KB) ~120–240 ~6k–12k ~120k–240k
14B (10 KB) ~95–190 ~4.7k–9.5k ~95k–190k
70B (16 KB, baseline) ~60–120 ~3k–6k ~60k–120k
DeepSeek V3 (14 KB, 671B / 37B-MoE) ~70–140 ~3.5k–7k ~70k–140k
405B+ dense (32 KB) ~30–60 ~1.5k–3k ~30k–60k

Sources: 1-node decode from a community llama.cpp benchmark on RTX 3080 Ti running Llama 3.1 8B Q4_K_M (localscore.ai). Speculative-decoding 1.8–2.5× from inference-speedups design §3.1 (projected). Multi-node figures derived from _config/tech-spec.md §4.6–4.8: per-peer egress ceiling, swarm capacity model, agentic-loop workaround stack. Effective tok/s discounts theoretical peak by 30–60% per the §4.7 jitter / churn / tail-latency caveat — Petals in production runs at ~6 tok/s/user, so this discount is empirical, not pessimistic. Phase-3 lever multipliers (~5× initial, ~10× full stack) reflect cross-loop batching at swarm peers (§4.8 layer 4: same per-user latency, ~5–10× per-peer throughput) compounded with INT4 activation quantisation (~2× egress per token-pass) and AdaTopK adaptive compression at slow links — all tracked under RT-009; benchmarks pending, ranges are conservative midpoints. Cloud numbers from artificialanalysis.ai and vendor pricing pages, May 2026. Projections labelled (proj.) are design targets, not measurements.

Headline feature

A coding assistant that knows your project — not your future.

Selfweave exposes an OpenAI-compatible API and a local editor proxy with project-scoped RAG. Your codebase is indexed on-device; completions, fill-in-the-middle, and chat all run against the same private context.

No cloud round-trip. No prompts mined for training. No telemetry of what you type.

  • VS Code
  • Continue
  • Cline
  • Zed
  • Neovim
  • JetBrains
Features

Quietly capable. Explicitly honest.

Selfweave is in active development. Shipped features work today on Windows; roadmap items are being built in the open.

Coding assistance Shipped

Completions, fill-in-the-middle, embeddings. Project-scoped RAG with a debounced file watcher. OpenAI-compatible, editor-agnostic.

Research mode Shipped

Query your own document library with citation-grounded responses. Widened search params, structured five-element prompt.

Semantic search Shipped

Dual FAISS indices — CLIP for photos, MiniLM for text. Search your life in natural language, offline.

Anti-sycophancy Shipped

Honesty is measurable. Lexical overlap, agreement markers, and a benchmark harness gate every release. No yes-men.

Backend sandboxing Shipped

Three profiles — Strict, Basic, Permissive. Windows Job Objects + AppContainer + restricted tokens. Consent modal before elevation.

Function calling Shipped

Tag-based tool use with a registry + executor. Extend the assistant with local tools — no remote webhook required.

LAN split inference Roadmap

Pool VRAM across household devices via QR-paired peers. Pipeline-parallel split over libp2p Noise transport — the same primitive that extends to the cooperative pool when the trust layer ships.

Cooperative compute Roadmap

Contribute idle GPU to the network, earn priority routing. Contribution-based incentive, no tokens, no speculation.

Federated BIL Roadmap

Shared behavioural intelligence trained across users via SPARTA + DiLoCo. You get the benefit of the collective without sharing the data.

Privacy, by architecture

What stays on your device stays on your device.

These are not promises — they are properties of the system.

  • Zero telemetry.No analytics SDK, no crash beacons, no anonymised usage pings. If it's not in the event log, it doesn't exist.
  • No cloud model calls.Every inference runs on your hardware. No OpenAI, Anthropic, or Google in the path. LAN peer pairing is opt-in and on the roadmap.
  • Right-to-erasure by design.Every corpus encrypts with a key you can destroy, turning the event log into ciphertext garbage. The "Delete all my data" UX ships before first public release (GDPR Art. 17).
  • No microphone, no camera.Selfweave rejects ambient sensors by policy. Your screen and your files are enough context; your bedroom isn't.
  • Source-available core.The library behind Selfweave is inspectable. Audit the code paths, verify the sandbox, run your own build.
  • Post-quantum ready.At-rest encryption is already PQ-safe. Asymmetric primitives will migrate to hybrid ML-KEM / ML-DSA as the network layer activates.
Research foundations

Built on published work, not made up.

Selfweave is an integration of peer-reviewed research and proven open-source projects, not a stack of hopes. Each capability below traces back to published work — verifiable, attributable, replicable.

Federated LoRA

OpenFedLLM (Ye et al., 2024) — federated LoRA exchanging 0.06% of parameters per round outperformed GPT-4 in financial-domain benchmarks. Selfweave uses 0.1% via SPARTA sparse aggregation.

Distributed training

DiLoCo (DeepMind) and INTELLECT-1 / INTELLECT-2 (Prime Intellect, arXiv 2505.07291, 2025) — 32B-parameter distributed RL across continents on consumer-class infrastructure. Validates Selfweave's post-v1.0 pre-training direction.

Differential privacy

(ε, δ)-DP (Dwork & Roth, 2014). ε = 1.0 matches Apple iOS analytics and academic federated-healthcare research — strong enough to be a meaningful guarantee, loose enough for the BIL to actually learn.

Speculative decoding

EAGLE-3 (Li et al., arXiv 2503.01840, 2025) and Speculative Streaming (Bhendawade et al., arXiv 2402.11131, 2024). Selfweave wires llama.cpp's --model-draft mechanism for lossless decode speedup.

Step-level lookahead

Lookahead Reasoning (ICLR 2026) — orthogonal to token-level speculation; both stack. Aimed at agentic loops where reasoning steps, not tokens, are the speculation unit.

Split inference at scale

Petals (Borzunov et al., arXiv 2312.08361, 2023) and Parallax (Gradient Network, arXiv 2509.26182, 2025). Pipeline-parallel inference over commodity links — Selfweave inherits the pattern, adds attestation tiering and Sybil defense.

Behavioural ontology

HBCP / BCIO (Mac Aonghusa & Michie, 2020) — the Human Behaviour-Change Project's ontology. Selfweave's 30-50-concept behavioural map mirrors its role: a unified pipeline-friendly representation of diverse natural-language behaviours.

Behaviour intelligence

Artificial Behaviour Intelligence (Jo et al., arXiv 2505.03315, 2025) — formalises behaviour understanding in cultural and situational context, with calibrated uncertainty. Direct inspiration for the BIL's context modelling.

Sycophancy gating

ELEPHANT benchmark (Cheng et al., arXiv 2505.13995, 2025) — measures social sycophancy in LLM responses. Selfweave runs a derivative harness before every adapter release and base-model upgrade.

Pricing

The local AI is free. The convenience is optional.

Selfweave's core features run forever on your own hardware at no cost. Paid tiers cover managed conveniences we can't run on your laptop.

Free
$0 / forever

The complete local AI. Your hardware, your rules.

  • Coding assistance + editor proxy
  • Local RAG + research mode
  • LAN pairing across your devices (Phase 2)
  • Bring-your-own web search key (Phase 2)
  • Contribute compute, earn priority (Phase 3)
Start free
Plus
$8 / month

End-to-end sync + managed conveniences. Privacy intact.

  • Everything in Free
  • 5 GB E2E-encrypted sync (corpus, LoRA, personas)
  • 500 managed web searches / mo
  • Curated prompt & persona library
  • Priority support
Join waitlist
Pro
$12 / month

Everything in Plus, at professional scale.

  • Everything in Plus
  • 50 GB sync + 2,000 searches / mo
  • WAN compute priority — no contribution required (Phase 3)
  • Persona Studio + experimental features
  • Early access to new protocols
Join waitlist
Honest roadmap

Built in the open, one phase at a time.

Selfweave is a long-term solo project. Progress is steady, not flashy. Here's what's real and what's next.

Phase 1 · v0.1

Shipping now
  • Local LLM inference via llama.cpp
  • Coding assistance + editor proxy
  • Project-scoped RAG
  • Photo + document indexing
  • Research mode with citations
  • Anti-sycophancy benchmark

Phase 2 · v0.2

In progress
  • Backend sandboxing (Windows shipped)
  • Backend binary hash-pinning (shipped)
  • CUDA-compatible Strict sandbox (shipped)
  • LAN pairing with QR codes
  • TLS certificate pinning
  • Cross-device E2E sync (Plus)

Phase 3 · v0.3+

On the horizon
  • Cooperative compute pool (attestation + Sybil defense)
  • WAN distributed inference
  • Federated BIL via SPARTA + DiLoCo
  • Peer-defense stack (Byzantine, integrity audit)
  • Post-quantum handshakes
  • Personal-LoRA pipeline over your corpus
Early access

Own your intelligence.

Selfweave is in closed pre-release. Join the waitlist for a build, a changelog entry per week, and a direct line to the developer. Then, when you're ready, weave yourself into the network.