Skip to content

Repurposing old tiny LLMs into reasoning machines using a latent space feedback loop.

License

Notifications You must be signed in to change notification settings

iblameandrew/Recursive-Folding-Machine

Repository files navigation

Project Logo

Recursive Folding Machine (RFM): Latent Reasoning Without RLHF

A latent thinking architecture designed to reuse the "frozen" knowledge of established models while training a real-time Latent Space Controller.

Rather than relying on human preference data (RLHF), the RFM trains its policy engine through the process of reasoning itself, using a Policy Curriculum to learn how to sample and evolve its own latent space for deep thinking.

🚀 Core Architecture: Latent Thinking

The RFM manages a persistent Reasoning State (Z) that acts as a an internal canvas.

  • Weight Reuse: Reuses the dense world-model contained in frozen weights.
  • Latent Control: A learnable Thinking Adapter and Objective Router manage how thoughts evolve in the latent space.
  • Policy Curriculum: The model is trained on-the-fly to choose cognitive strategies (Explore, Converge, etc.) that lead to coherent, deep answers.
  • Zero RLHF: All "intelligence" is derived from signal-processing feedback and signal-scoring of its own self-generated reasoning trajectories.

🧬 The "Thinking Cap" Mechanism

  1. Reality Anchoring: Input text embeddings remain constant during recursion, anchoring the model to the prompt.
  2. Latent Evolution (Z): The Reasoning State (Z) is initialized from valid token embeddings and evolved via a recursive Thinking Adapter.
  3. Sampling Strategy: The Objective Router decides the cognitive "goal" for each rollout, sampling the latent space based on:
    • Explore ($Se$): Seeking novel interpretations.
    • Converge ($Si$): Stabilizing around a likely conclusion.
    • Diversify ($Ti$): Expanding the internal reasoning landscape.
    • Smooth ($Ni$): Optimizing for semantic flow and narrative consistency.

Key Features

  • Sequence-Level GRPO: A delayed-gratification training loop where $K$ full drafts (rollouts) are evaluated as complete logical units.
  • Forced Reasoning (Thinking Caps): EOS-masking ensures the model can't quit until it has achieved a minimum "Depth of Thought."
  • Complexity Rewards: Explicitly incentivizes vocabulary richness and narrative elaboration over cliches.
  • Hybrid Precision Stability: Critical controller parameters (Router, Adapter) are maintained in FP32 to ensure the reasoning engine remains numerically stable during deep recursion.

🛠️ Usage

1. Training the Thinking Controller

Initialize the recursive wrapper and perform latent reasoning:

from rfm import RecursiveModel, inference

# Initialize with a frozen backbone
rfm = RecursiveModel("meta-llama/Llama-3.2-1B-Instruct")

# Perform real-time latent reasoning
# max_len=100 allows for paragraph-level depth
inference(rfm, episodes=100, max_len=100, K=4, min_think=30)

📦 Components

  • RecursiveModel: The wrapper managing the frozen weights, the Thinking Adapter, and the Z state.
  • ObjectiveRouter: The conductor trained on the policy curriculum to sample the latent reasoning space.
  • score_trajectories: A sequence-level engine that rewards vocabulary richesse, halting logic, and cognitive coherence.
  • inference: The real-time reasoning engine implementing rollout-based latent optimization.

About

Repurposing old tiny LLMs into reasoning machines using a latent space feedback loop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages