Skip to content

grandsmile/UniCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

UniCode: A Framework for Generating High-Quality Competitive Coding Problems

UniCode Dataset

UniCode is a novel framework that addresses the limitations of static, human-authored problem sets by automatically generating high-quality algorithmic problems and robust, contamination-resistant test cases. Inspired by biological evolution, our framework creates diverse and challenging programming problems through systematic generation strategies.


๐Ÿš€ Quick Start

To evaluate models with our benchmark, use the following command:

python run_benchmark.py \
    --models gpt-4o gpt-4.1 gpt-4.1-mini o3-mini gpt-4.5-preview \
    --max-workers 4
  • --models: List of models to evaluate
  • --max-workers: Number of parallel workers for evaluation

To generate new problems and test cases:

# Generate new problems
python gen_new_questions.py

# Generate test cases for new problems
python generate_test_cases_by_brute.py
python generate_test_cases_by_opt.py
python filter.py

๐Ÿ† Leaderboard

Leaderboard


๐Ÿง  Problem Generation Strategies

UniCode employs three biologically inspired strategies to create novel algorithmic challenges: (1) Single-Problem Extension, (2) Same-Type Fusion, and (3) Cross-Type Fusion.

Problem Generation Strategies


๐Ÿงช Test Case Generation

Our stress-driven pipeline ensures high-quality test suites without requiring ground-truth solutions:

Input Generation

  • Random Generation: Broad sampling from valid input space
  • Adversarial Generation: Targets boundary conditions and worst-case scenarios
  • LLM-based Synthesis: Creates small-scale challenging inputs

Output Validation Pipeline

  1. Brute-Force Validation: Establishes trusted outputs for small-scale inputs
  2. Solver Filtration: Filters optimized solutions using stress tests
  3. Consensus Validation: Uses majority voting for large-scale inputs
  4. LLM Adjudication: Resolves conflicts with powerful LLMs

Test Case Generation Pipeline

About

A Framework for Generating High-Quality Competitive Coding Problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages