List view
- Robustness
No due dateDeepNext’s reasoning power can scale with more computation, but that comes at a $ cost. This milestone focuses on building a budget-aware execution model where the system can: - Adapt its “thinking time” based on predefined resource limits - Extend reasoning loops when budget allows - Stop or summarize early when hitting cost or time ceilings Achieving this will require research and architectural adjustments, particularly around: - Loop control mechanisms within agent workflows - Cost estimation or tracking at runtime - Budget configuration APIs or settings The goal is to make DeepNext intelligently budget-aware, giving users control over cost vs. capability trade-offs without needing to manually intervene.
No due date•0/2 issues closedTo improve DeepNext, we first need a reliable way to evaluate it. Existing benchmarks like swe-bench fall short—they're often biased, too artificial, or don’t reflect real-world, human-in-the-loop workflows. This milestone focuses on researching and designing a custom evaluation framework tailored to how DeepNext actually works. That may include: - Defining what “good” looks like for our multi-agent, human-in-the-loop system - Creating or curating a custom dataset that better reflects real-world developer workflows - Developing repeatable checks and regression tests to measure progress over time The goal is to move beyond generic benchmarks and establish practical, meaningful evaluation criteria that truly reflect DeepNext’s capabilities and value.
No due date•0/1 issues closedToday, DeepNext’s tooling, prompts, and functions are Python-specific, which limits its potential to solve problems in other programming languages—even though modern LLMs are capable of working with many. This milestone focuses on removing language-specific assumptions and generalizing agent capabilities to support any programming language, including but not limited to JavaScript, Go, Rust, Java, and more. The goal is to design language-agnostic tooling, prompts, and interfaces that allow DeepNext to adapt to any tech stack, making it a true multi-language coding partner for a wide range of teams and projects.
No due date•0/2 issues closedToday, DeepNext relies on LangSmith for inspecting and debugging LLM interactions, but our goal is to remove this dependency and give users the freedom to choose their preferred observability tools. This milestone focuses on making observability pluggable and tool-agnostic, allowing seamless integration with open standards like OpenTelemetry, as well as other custom or third-party inspection tools. Whether users prefer LangSmith, OpenTelemetry, or their own monitoring stack, DeepNext should provide consistent, extensible observability hooks without forcing any particular vendor or ecosystem.
No due date•1/1 issues closedDeepNext should never be tied to a single LLM provider or API. This milestone focuses on making our agent pipeline model-agnostic, enabling support for multiple LLM backends like LiteLLM, Pydantic AI, or LangChain-compatible providers. The goal is to build a plug-and-play architecture where adding or switching models requires minimal effort—freeing users to choose what works best for their needs, budget, or deployment context.
No due date•1/2 issues closedOur goal is to make interacting with DeepNext feel as natural as working with a fellow developer. This milestone focuses on building the core collaboration experience, where DeepNext doesn't just generate code, but works with you through: - Human-in-the-loop interaction for planning actions - Regular code reviews that feel like peer feedback This is a critical step toward making DeepNext a true teammate, not just a tool.
No due date•0/3 issues closed