Building AI that thinks deeply and acts responsibly.
Technical Report: On-Device Agile Reasoning via WebGPU-Accelerated Hebbian Fast Weights
N.O.V.A. (Native Object Vector Architecture) is a browser-resident, fast-weight dialogue stack executing entirely on the WebGPU compute substrate. Unlike static Transformer models, N.O.V.A. performs token-level Hebbian adaptation in real-time, utilizing a symplectic manifold for context retention and "Origami-style" symbolic scoping for structural logic. This architecture eliminates the need for server-side inference, offering a fully transparent, inspection-ready reference implementation for client-side agile reasoning. This report details the system's mathematical formulation, memory dynamics, and deployment methodology.
N.O.V.A. is engineered to provide a reproducible, modifiable reference for edge-based learning, moving beyond the paradigm of "black-box" turnkey assistants.
Device.js: A low-level wrapper around the WebGPU Adapter, Device, and Queue.NovaTensor: Manages raw GPU buffers with specialized initialization kernels (Xavier, Residual, Unit-Spinor). It implements safe memory lifecycles (read/dispose), ensuring all mathematical operations remain strictly in-browser with zero server dependency.
The core reasoning unit iterates per token, forming a "Thinking Step" loop:
-
Projection: Input embeddings pass through RMSNorm and project into rotational components
$(W_r, W_v, W_g)$ . - Symplectic Flow: Updates a complex-valued manifold using energy-preserving rotations, preventing gradient decay.
-
Gated FFN: A Swish-gated Feed-Forward Network (
$W_{up1}, W_{up2}, W_{down}$ ) processes the manifold state. -
Active Inference: The system predicts its own next state via
$W_{predict}$ and adjusts the manifold based on prediction error.
- Hebbian Consolidation: Fast weights capture transient token-to-token associations dynamically, governed by variable decay and learning rates.
- HormoneSystem: Tiles affective scalar values across the bias vector, allowing for "emotional" modulation of generation probability.
- OrigamiMemory: A symbolic stack that pushes/pops hidden states upon detecting structural delimiters (e.g.,
{,}), enabling robust handling of nested logic in code generation.
- Entropic Compression: A byte-level BPE compressor learns a compact ~16k vocabulary "gene" table from the corpus, optimizing for high-density concepts.
- Sampling Strategy: Inference is stabilized via
src/main.jsusing n-gram anchoring (bigram/trigram), repetition penalties, anchor bonuses, and mode-specific tags (Mode: chat/task/code).
Let the tokenizer map a text stream into tokens
The fast-weight cell computes the rotational (
State retention is governed by a complex rotation
The manifold output is processed via a Swish-gated block:
$$
u_t = \mathrm{swish}(W_{u1} m_t) \odot (W_{u2} m_t),\quad c_t = W_d u_t
$$
$$
h_t = \mathrm{RMSNorm}(m_t + c_t)
$$
The system performs Active Inference, adjusting the manifold towards predicted embeddings
The memory manifold and projections are updated in real-time during the forward pass:
$$
M \leftarrow \rho M + \eta, (h_t \otimes e_{t+1})
$$
Where
Final logits are computed by projecting back to the vocabulary space, modulated by n-gram priors and bias vectors: $$ \ell_t = E^\top h_t + b_{\text{ngram}} + b_{\text{bias}} $$
- Hardware: WebGPU-capable GPU (Integrated or Discrete).
- Software: Recent Chromium-based browser (Chrome/Edge).
- Serve Static Files:
No backend logic is required. Serve the root directory via any HTTP server.
# Node.js npx http-server . # Python python -m http.server 8000
- Initialize Runtime:
Navigate to
http://localhost:8000and click "Pulse" to initialize the compute shaders. - Training & Inference:
- Auto-Load: If
nova.config.jsenablessnapshot.autoLoad, the system restoresmodel.snapshot. - In-Browser Training: Otherwise,
scripts/train_browser.jsexecutes token-by-token Hebbian updates ondata/training_data.txt.
- Auto-Load: If
- Configuration:
Adjust hyperparameters (dModel, layers, learningRate) in
nova.config.js.
- Experimental Nature: N.O.V.A. is a research prototype focusing on architectural novelty (Fast Weights/WebGPU) rather than scale.
- Data Sensitivity: The model is susceptible to dataset bias and assumes ASCII-formatted chat data.
- Persistence: Model state is transient unless explicitly exported via Snapshots.
- Safety: Minimal guardrails are implemented. Not suitable for safety-critical applications without further review.
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). Copyright © 2026 Protoethik Co., Ltd. Use, modification, and networked deployment must comply with AGPL-3.0, including source disclosure requirements.
If you use this work in your research, please cite:
Wu, Y. (2026). Context Is Geometry (1.0). Zenodo. https://doi.org/10.5281/zenodo.18366793Direct link: https://doi.org/10.5281/zenodo.18366793