A latent thinking architecture designed to reuse the "frozen" knowledge of established models while training a real-time Latent Space Controller.
Rather than relying on human preference data (RLHF), the RFM trains its policy engine through the process of reasoning itself, using a Policy Curriculum to learn how to sample and evolve its own latent space for deep thinking.
The RFM manages a persistent Reasoning State (Z) that acts as a an internal canvas.
- Weight Reuse: Reuses the dense world-model contained in frozen weights.
- Latent Control: A learnable Thinking Adapter and Objective Router manage how thoughts evolve in the latent space.
- Policy Curriculum: The model is trained on-the-fly to choose cognitive strategies (Explore, Converge, etc.) that lead to coherent, deep answers.
- Zero RLHF: All "intelligence" is derived from signal-processing feedback and signal-scoring of its own self-generated reasoning trajectories.
- Reality Anchoring: Input text embeddings remain constant during recursion, anchoring the model to the prompt.
- Latent Evolution (Z): The Reasoning State (Z) is initialized from valid token embeddings and evolved via a recursive Thinking Adapter.
-
Sampling Strategy: The Objective Router decides the cognitive "goal" for each rollout, sampling the latent space based on:
-
Explore (
$Se$ ): Seeking novel interpretations. -
Converge (
$Si$ ): Stabilizing around a likely conclusion. -
Diversify (
$Ti$ ): Expanding the internal reasoning landscape. -
Smooth (
$Ni$ ): Optimizing for semantic flow and narrative consistency.
-
Explore (
-
Sequence-Level GRPO: A delayed-gratification training loop where
$K$ full drafts (rollouts) are evaluated as complete logical units. - Forced Reasoning (Thinking Caps): EOS-masking ensures the model can't quit until it has achieved a minimum "Depth of Thought."
- Complexity Rewards: Explicitly incentivizes vocabulary richness and narrative elaboration over cliches.
- Hybrid Precision Stability: Critical controller parameters (Router, Adapter) are maintained in FP32 to ensure the reasoning engine remains numerically stable during deep recursion.
Initialize the recursive wrapper and perform latent reasoning:
from rfm import RecursiveModel, inference
# Initialize with a frozen backbone
rfm = RecursiveModel("meta-llama/Llama-3.2-1B-Instruct")
# Perform real-time latent reasoning
# max_len=100 allows for paragraph-level depth
inference(rfm, episodes=100, max_len=100, K=4, min_think=30)RecursiveModel: The wrapper managing the frozen weights, the Thinking Adapter, and the Z state.ObjectiveRouter: The conductor trained on the policy curriculum to sample the latent reasoning space.score_trajectories: A sequence-level engine that rewards vocabulary richesse, halting logic, and cognitive coherence.inference: The real-time reasoning engine implementing rollout-based latent optimization.
