stxnext/deep-next

Milestones

Improvements package based on open-source release.
- Robustness
No due date
0% complete0 open 0 closed
Scaling Test-Time Compute – Budget-Aware, Cost-Conscious Reasoning
DeepNext’s reasoning power can scale with more computation, but that comes at a $ cost. This milestone focuses on building a budget-aware execution model where the system can: - Adapt its “thinking time” based on predefined resource limits - Extend reasoning loops when budget allows - Stop or summarize early when hitting cost or time ceilings Achieving this will require research and architectural adjustments, particularly around: - Loop control mechanisms within agent workflows - Cost estimation or tracking at runtime - Budget configuration APIs or settings The goal is to make DeepNext intelligently budget-aware, giving users control over cost vs. capability trade-offs without needing to manually intervene.
No due date
•0/2 issues closed
0% complete2 open 0 closed
Designing Custom Evaluation Framework
To improve DeepNext, we first need a reliable way to evaluate it. Existing benchmarks like swe-bench fall short—they're often biased, too artificial, or don’t reflect real-world, human-in-the-loop workflows. This milestone focuses on researching and designing a custom evaluation framework tailored to how DeepNext actually works. That may include: - Defining what “good” looks like for our multi-agent, human-in-the-loop system - Creating or curating a custom dataset that better reflects real-world developer workflows - Developing repeatable checks and regression tests to measure progress over time The goal is to move beyond generic benchmarks and establish practical, meaningful evaluation criteria that truly reflect DeepNext’s capabilities and value.
No due date
•0/1 issues closed
0% complete1 open 0 closed
Beyond Python – Enabling Multi-Language Development Support
Today, DeepNext’s tooling, prompts, and functions are Python-specific, which limits its potential to solve problems in other programming languages—even though modern LLMs are capable of working with many. This milestone focuses on removing language-specific assumptions and generalizing agent capabilities to support any programming language, including but not limited to JavaScript, Go, Rust, Java, and more. The goal is to design language-agnostic tooling, prompts, and interfaces that allow DeepNext to adapt to any tech stack, making it a true multi-language coding partner for a wide range of teams and projects.
No due date
•0/2 issues closed
0% complete2 open 0 closed
Observability – Supporting Multiple Inspection Tool
Today, DeepNext relies on LangSmith for inspecting and debugging LLM interactions, but our goal is to remove this dependency and give users the freedom to choose their preferred observability tools. This milestone focuses on making observability pluggable and tool-agnostic, allowing seamless integration with open standards like OpenTelemetry, as well as other custom or third-party inspection tools. Whether users prefer LangSmith, OpenTelemetry, or their own monitoring stack, DeepNext should provide consistent, extensible observability hooks without forcing any particular vendor or ecosystem.
No due date
•1/1 issues closed
100% complete0 open 1 closed
LLM Model Agnosticity
DeepNext should never be tied to a single LLM provider or API. This milestone focuses on making our agent pipeline model-agnostic, enabling support for multiple LLM backends like LiteLLM, Pydantic AI, or LangChain-compatible providers. The goal is to build a plug-and-play architecture where adding or switching models requires minimal effort—freeing users to choose what works best for their needs, budget, or deployment context.
No due date
•1/2 issues closed
50% complete1 open 1 closed
Enabling Developer-like Interaction (UI 2.0)
Our goal is to make interacting with DeepNext feel as natural as working with a fellow developer. This milestone focuses on building the core collaboration experience, where DeepNext doesn't just generate code, but works with you through: - Human-in-the-loop interaction for planning actions - Regular code reviews that feel like peer feedback This is a critical step toward making DeepNext a true teammate, not just a tool.
No due date
•0/3 issues closed
0% complete3 open 0 closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milestones

Improvements package based on open-source release.

Scaling Test-Time Compute – Budget-Aware, Cost-Conscious Reasoning

Designing Custom Evaluation Framework

Beyond Python – Enabling Multi-Language Development Support

Observability – Supporting Multiple Inspection Tool

LLM Model Agnosticity

Enabling Developer-like Interaction (UI 2.0)

Milestones

List view

Improvements package based on open-source release.

Scaling Test-Time Compute – Budget-Aware, Cost-Conscious Reasoning

Designing Custom Evaluation Framework

Beyond Python – Enabling Multi-Language Development Support

Observability – Supporting Multiple Inspection Tool

LLM Model Agnosticity

Enabling Developer-like Interaction (UI 2.0)