Remove <done> from vLLM stop tokens #954

dzorlu · 2026-01-25T08:28:54Z

Summary

Remove <done> from vLLM stop tokens to fix premature generation termination
Qwen3 thinking models output <done> inside <think> sections when describing their plan
This caused vLLM to stop generation before the model could actually call tools

Problem

<think>
...I'll inform the user and include the <done>   ← vLLM STOPS HERE
</think>

Model was planning to say <done>, not actually signaling completion.

Solution

Only stop on </tool_call> (for tool execution)
Let model naturally finish with <|im_end|> (EOS token)
env.step() already detects <done> in the full response to mark episode completion

Files Changed

skyrl-train/tasks/openenv-fleet-grpo-qwen3-8b.yaml
skyrl-train/tasks/openenv-fleet-grpo.yaml

🤖 Generated with Claude Code

* feat: Add OpenEnv Fleet training CI with SkyPilot - Add SkyPilot task YAML for GRPO training on neoclouds (Lambda, RunPod, Vast) - Add GitHub Actions workflow for PR-triggered training runs - Update .gitignore for SkyPilot venv The workflow: 1. Triggers on PRs to main (paths: skyrl-train/integrations/openenv/**) 2. Configures neocloud credentials (Lambda, RunPod, Vast) 3. Launches training job via `sky jobs launch` 4. Posts job info as PR comment Required secrets: - LAMBDA_API_KEY - RUNPOD_API_KEY - VAST_API_KEY - FLEET_API_KEY - WANDB_API_KEY_TOOL_USE - WANDB_API_KEY_COMPUTER_USE 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Only trigger workflow on skyrl-train/tasks/** changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Add permissions for PR comments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Remove auto-trigger, manual dispatch only Training is expensive - only trigger manually via Actions tab. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Simplify WandB key + add Slack notifications - Use single WANDB_API_KEY instead of per-modality keys - Add Slack notification to #fleet-training channel 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Slack channel to #fleet-training-runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Use modality-specific WandB keys - WANDB_API_KEY_TOOL_USE for tool_use modality - WANDB_API_KEY_COMPUTER_USE for computer_use modality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Proper WandB key selection + separate Slack success/failure notifications - Use job-level env vars for bash runtime selection - Add validation that WandB key is set - Separate Slack notifications for success vs failure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: Add WandB dashboard link to Slack notification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: Sync launch - wait for job completion before Slack notification - Removed --async flag, workflow now waits for training to complete - Slack notification shows completion status (success/failure) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Add Fleet task integration for SkyRL training This PR adds a new integration for training on Fleet-hosted environments: - `integrations/fleet/env.py`: FleetTaskEnv that wraps OpenEnv's FleetTaskEnv and adapts it to SkyRL's BaseTextEnv interface - `integrations/fleet/prepare_dataset.py`: Converts Fleet task JSON files to SkyRL parquet dataset format - `integrations/fleet/entrypoints/main_fleet.py`: Training entrypoint that registers the fleet_task environment - Updated `tasks/openenv-fleet-grpo.yaml` to use Fleet integration The integration supports: - Loading tasks from Fleet API or JSON files - MCP tool execution via OpenEnv's FleetTaskEnv - Verifier-based rewards on episode completion - Filtering by modality (tool_use/computer_use) and env_key Usage: ```bash sky jobs launch tasks/openenv-fleet-grpo.yaml \ --env FLEET_API_KEY=sk_... \ --env WANDB_API_KEY=wandb_... \ --env MODALITY=tool_use ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Simplify FleetTaskEnv and add unit tests - Simplified parse_tool_call: removed complex regex patterns, now only supports tag-based formats (<tool_call>, <function_call>) - Removed unused _verified attribute - Extracted _run_async helper for cleaner async handling - Removed tools_cache (not needed after init) Added comprehensive unit tests: - load_tasks_from_json: array/object formats, caching, errors - parse_tool_call: various formats, edge cases - FleetTaskEnv.__init__: validation, config priority - FleetTaskEnv.init: environment creation, tools info - FleetTaskEnv.step: tool calls, done signal, max turns - FleetTaskEnv.close: cleanup, error handling - FleetTaskEnv.get_metrics/aggregate_metrics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update GHA workflow for Fleet task integration - Replace env_name with env_key input for Fleet environment filtering - Add max_tasks input for limiting tasks during testing - Update Slack notifications to say "Fleet Task" instead of "OpenEnv" - Update WandB project link to fleet-task-grpo - Build launch command conditionally (only pass ENV_KEY/MAX_TASKS if set) - Add max_tasks to job summary 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix SkyPilot task for OpenEnv namespace package - Clone OpenEnv repo for namespace package access (envs/ has no __init__.py) - Add OpenEnv/src to PYTHONPATH so 'from envs.fleet_env import ...' works - Remove redundant --with openenv since we use PYTHONPATH - Use fleet-integration branch (update to main after PR merge) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use main branch for SkyPilot workdir The workflow should use main branch - merge PR first before running. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix ruff linting errors - Remove unused variable 'loop' in _run_async() - Remove unused variable 'call_args' in test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix linting: remove unused imports, apply black formatting - Remove unused imports (tempfile, AsyncMock) - Apply black formatting to all fleet integration files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

1. Move inline Python script to separate file (export_tasks.py) - Heredoc in YAML caused parsing errors 2. Fix Job ID extraction - Use portable grep patterns instead of Perl-style \K - Try multiple patterns for different SkyPilot output formats 3. Clarify Slack notifications - "Job Launched" instead of "Training Completed" - This workflow only launches the job, doesn't wait for completion - Actual training status comes from WandB or sky jobs logs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Fix YAML parsing error and improve Slack notifications 1. Move inline Python script to separate file (export_tasks.py) - Heredoc in YAML caused parsing errors 2. Fix Job ID extraction - Use portable grep patterns instead of Perl-style \K - Try multiple patterns for different SkyPilot output formats 3. Clarify Slack notifications - "Job Launched" instead of "Training Completed" - This workflow only launches the job, doesn't wait for completion - Actual training status comes from WandB or sky jobs logs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix: Move API keys from secrets to envs section SkyPilot secrets require --secret flag, but GHA passes via --env. Move WANDB_API_KEY and FLEET_API_KEY to envs section with empty defaults that get overridden by --env flags. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Add disk_size: 40 (RunPod max is 40GB, default was 50GB causing failures) - Remove aws (not configured in GHA workflow) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

The managed jobs controller has its own default disk_size=50, independent of our task YAML. Configure it via ~/.sky/config.yaml to use disk_size=30 (RunPod limit is 40GB). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Changes: - Split workflow into separate steps: Submit → Slack Submitted → Stream Logs → Slack Done - Stream training logs directly to GHA using `sky jobs logs --follow` - Add job status tracking and report final status in Slack - Switch from A100:4 to H100:2 (easier to find availability) - Remove memory requirement (H100 has sufficient memory) - Add PR_NUMBER env var to training job 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

The Configure SkyPilot step was setting disk_size for the controller but this might be causing issues. Let SkyPilot auto-select controller resources from available clouds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Changes: - Use `sky launch` instead of `sky jobs launch` for direct GPU provisioning - No more CPU controller - GPUs are provisioned directly - Use `sky logs --follow` to stream training logs - Auto-terminate cluster after training completes - Add cleanup step on failure This removes the managed jobs controller abstraction and directly provisions H100:2 GPU instances on Lambda/RunPod/Vast. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

H100:2 is more readily available on Lambda/RunPod/Vast. Removed memory requirement (H100 has sufficient memory). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Remove OpenEnv dependency, use Fleet SDK directly Changes: - Rewrite env.py to use Fleet SDK directly (fleet.load_tasks, task.make, task.verify) - Remove all OpenEnv clone/install steps from task YAML - Update __init__.py with lazy import to handle missing dependencies gracefully - Comprehensive tests for error cases (file not found, task not found, API errors, etc.) This fixes the setup failure where OpenEnv (private repo) couldn't be cloned. Error cases now covered: - Missing tasks file - Invalid JSON format - Empty tasks array - Task not found in file - Task not found in Fleet API - task.make() failure - Tool execution errors - Verifier errors - Close errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Move Fleet tests to tests/cpu/ for CI pickup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix black formatting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Mock fleet and skyrl_gym in tests to allow running without deps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add noqa for E402 (module imports after sys.modules mock) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix BaseTextEnvStepOutput mock to return actual objects The MagicMock was causing assertions to fail because result.done etc were returning MagicMock objects instead of actual values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Skip Fleet tests if dependencies unavailable instead of mocking sys.modules The previous approach of mocking sys.modules at module level polluted global state and broke other tests (skyrl_gym.envs became MagicMock). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix test isolation by using importlib instead of sys.modules mocking The previous approach of mocking sys.modules polluted global state and broke other tests (generator tests). This change uses importlib.util.find_spec to check for dependencies and pytest.mark.skipif to skip tests when dependencies aren't available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use OpenEnv's FleetTaskEnv instead of Fleet SDK directly This commit rewrites the Fleet integration to use OpenEnv's FleetTaskEnv as the abstraction layer instead of calling Fleet SDK directly. Changes: - integrations/fleet/env.py: Rewrite to use envs.fleet_env.FleetTaskEnv - tasks/openenv-fleet-grpo.yaml: Install openenv package - tests/cpu/test_fleet_env.py: Update mocks for OpenEnv - CLAUDE.md: Add instructions about integration patterns 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

This fixes the error: Key 'fleet_task' is not in struct full_key: environment.skyrl_gym.fleet_task Changes: - Add fleet_task section to skyrl_gym_config/default.yaml with: - tasks_file: Path to exported Fleet tasks JSON - api_key: Fleet API key (defaults to FLEET_API_KEY env var) - ttl_seconds: TTL for Fleet environment instances (default: 600) - Add tests for config validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Change H100:2 to H100:1 due to capacity constraints on cloud providers. The config parameters using $TOTAL_GPUS will automatically adjust to 1. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Add 100 booking tasks sample (fleet_booking_sample.json) - Update YAML to use committed sample instead of Fleet API export - Simplifies initial testing before full dataset Environment: booking Tasks: 100 Modality: tool_use 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Change `from envs.fleet_env import FleetTaskEnv` to `from openenv import FleetTaskEnv` - the correct import path when openenv is installed via pip. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Change `fleet-ai` to `thefleet` in WandB URLs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

FleetTaskEnv is not in PyPI release - install from git branch. - Change import to `from envs.fleet_env import FleetTaskEnv` - Install from git+https://github.com/fleet-ai/OpenEnv.git@deniz/fleet_client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Change terminology from "Cluster" to "Run" in user-facing messages - Update GPU count from H100:2 to H100:1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

The --with "openenv" was installing from PyPI, overriding the git branch install. Now uses git URL directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Comprehensive guide covering: - How to trigger training via GitHub Actions - Required secrets and configuration - Training hyperparameters - Monitoring and troubleshooting - Architecture overview 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Add Fleet task integration to skyrl-agent Implements multi-turn agentic training on Fleet-hosted environments: - Add FleetTask class implementing BaseTask interface - Add skyrl_fleet.yaml config for training - Add run_fleet.sh launch script - Add sample dataset (100 booking tasks) - Add unit tests for Fleet task - Move docs from skyrl-train to skyrl-agent The Fleet task: - Creates Fleet environments via FleetTaskEnv (from OpenEnv) - Provides task prompts as agent instructions - Exposes MCP tools for agent interaction - Evaluates results using task verifiers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Format fleet_task.py with black 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: Remove Fleet integration from skyrl-train Fleet integration has been moved to skyrl-agent. Deleting old files to avoid confusion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Fix GitHub Actions workflow path for Fleet training - Create SkyPilot YAML in skyrl-agent/tasks/fleet-task-training.yaml - Update workflow to reference new YAML path instead of deleted skyrl-train path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update docs to reference correct SkyPilot YAML path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Change Slack channel to #fleet-training-runs-test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix YAML syntax error and add validation tests - Move inline Python code to separate script (scripts/prepare_fleet_dataset.py) - Add TestYAMLValidation class to catch YAML syntax errors in CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Format with black 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Restore skyrl-train/integrations/fleet/ from before PR #21 - Restore skyrl-train/tasks/openenv-fleet-grpo.yaml - Remove skyrl-agent fleet files (tasks/fleet, tests, examples) - Move data and docs back to skyrl-train - Update workflow to use skyrl-train path This reverts the approach from PR #21 which moved Fleet integration to skyrl-agent. The skyrl-train approach using BaseTextEnv is simpler and better suited for Fleet's MCP tool interface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Add S3 checkpoint upload to prevent disk exhaustion - Add S3CheckpointUploader module for async checkpoint upload - Wrap trainer to upload checkpoints to S3 after each save - Delete local checkpoints after successful upload to save disk - Increase disk_size from 40GB to 100GB as fallback - Add AWS credentials support in workflow and SkyPilot YAML GitHub Secrets required: - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY S3 bucket: skyrl-checkpoints (configurable via S3_CHECKPOINT_BUCKET) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Include model name in S3 checkpoint path * refactor: Use Path library for model name extraction --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local>

* docs: Simplify fleet-training.md * fix: Make checkpoint cleanup synchronous, always clean old checkpoints before save - Upload to S3 is now synchronous (blocking) to ensure disk space freed - Always clean up old checkpoints BEFORE saving new one (keeps only 1 local) - Works with or without AWS credentials (local cleanup still happens) - Simplified entrypoint - always wraps trainer for checkpoint management * fix: Revert to async upload, key fix is cleanup BEFORE save The actual issue was: 1. AWS credentials not set → no S3 upload happening 2. Old checkpoints not cleaned up BEFORE saving → disk full Fix: - Keep async upload (non-blocking, better performance) - Clean up old checkpoints BEFORE saving new one (key fix) - Keep 2 local checkpoints for safety margin (~10GB) - S3 upload deletes local after successful upload * fix: Increase disk to 200GB and limit Ray object store memory The first checkpoint save was failing because the disk was already 95% full from Ray temp files, vLLM cache, and model/optimizer states before any checkpoint was saved. Changes: - Increase disk_size from 100GB to 200GB - Limit Ray object store memory to 10GB (prevents unbounded growth) - Add --object-store-memory flag to ray start command 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: Add task_file input parameter to workflow Allows selecting which SkyPilot YAML file to use, enabling users to test different configurations by pushing custom YAML files. Default: skyrl-train/tasks/openenv-fleet-grpo.yaml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: Change task_file to task_name input parameter Use just the task name (e.g., 'openenv-fleet-grpo') instead of full path. Path is constructed as: skyrl-train/tasks/{task_name}.yaml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Fix black formatting in s3_checkpoints.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Update fleet-training.md with task_name param and Slack channel - Add task_name parameter mention in Quick Start - Change Slack channel to #fleet-training-runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Add --with boto3 to uv run --isolated (required for S3 uploads) - Change Slack channel from #fleet-training-runs-test to #fleet-training-runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Use S3 datasets instead of sample data Downloads real task datasets from S3: - tool_use: s3://fleet-internal-datasets/v0.1/openenv/all_tool_use.json (3,603 tasks) - computer_use: s3://fleet-internal-datasets/v0.1/openenv/all_computer_use.json (1,278 tasks) Changes: - Add S3_DATASET_BUCKET env var - Download dataset from S3 based on MODALITY - Add validation for required env vars (FLEET_API_KEY, AWS credentials, MODALITY) - Make AWS credentials required in workflow - Update docs to reflect AWS is now required 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Remove troubleshooting and creating PRs sections 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Add id to "Training Run Started" step to capture message ts - Use thread_ts in Completed/Failed messages to reply in thread 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* fix: Handle Fleet env reset failures gracefully Instead of crashing on reset timeout, log error and continue: - Add logging to fleet env.py - On reset failure: log error, mark _init_failed=True - In step(): if init failed, return done=True with reward=0 This allows training to continue even if some environments fail to reset. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: Clean up verbose import blocks * fix: Move logger after imports to fix ruff E402 --------- Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…re (#33) The model can still generate actions even if env reset failed. Errors are handled in the tool execution try/except blocks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

SkyPilot will try H100 first on all clouds (Lambda, RunPod, Vast), then fall back to B200 if no H100 capacity available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: Add per-environment metrics breakdown in WandB - Add data_source field to Fleet dataset records (using env_key) - Add calculate_per_source_reward_metrics() for training metrics - Update postprocess_generator_output to compute per-env metrics - Track data_source through GeneratedOutputGroup in async trainer This enables WandB to show performance broken down by Fleet environment (github, booking, reddit, etc.) for both training and eval metrics. Metrics format: - reward/{env_key}/avg_score (training) - reward/{env_key}/pass_at_N (training) - eval/{env_key}/avg_score (evaluation) - eval/{env_key}/pass_at_N (evaluation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: Add pre-commit formatting rule to CLAUDE.md Always run pre-commit before creating PRs to ensure code is properly formatted. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: Fix black formatting in trainer_utils.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Add primeintellect to SkyPilot task config for H100 and B200 - Configure Prime Intellect credentials in GitHub workflow - Add PRIME_INTELLECT_API_KEY to required secrets in docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* feat: stratified eval split with held-out test environments - Stratified split by environment (each env maintains train/eval ratio) - Hash-based deterministic assignment (same task always goes to same split) - Minimum 10 eval samples per env (otherwise all go to train) - Held-out test envs: outlook (tool_use), instacart (computer_use) - Document split strategy in fleet-training.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: add tests for prepare_dataset stratified split - 23 tests covering hash_to_split, _task_to_record, load_tasks_from_json - Integration tests for held-out envs, stratified split, modality/env filters - Test deterministic split reproducibility across runs - Add per-environment breakdown table to summary output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: simplify to train/eval splits, fix WandB Slack links - Remove test split - held-out envs (outlook, instacart) now go to eval - Update Slack messages to link to specific WandB run (not just project) - Update docs and tests to reflect train/eval only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: simplify WandB run name to fleet_{modality}_{random} - Remove PR number and env_key from run name - Use random 8-char hex suffix for uniqueness - Fallback to random if RUN_ID not set (for local runs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

WandB search URLs don't work reliably. Instead: - Show the run name as text (e.g., fleet_tool_use_a3f2b1) - Link to project dashboard (user can search by name) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Change eval_ratio from 10% to 2% (default in prepare_dataset.py) - Lower MIN_EVAL_SAMPLES from 10 to 5 (threshold for creating eval split) - Add eval_n_samples_per_prompt=3 to YAML (vs 4 for train) - Update tests and documentation This reduces eval time from ~50 min to ~5 min per epoch: - Before: 366 samples × 4 trajectories = 1464 total - After: ~88 samples × 3 trajectories = ~264 total 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

- Change cleanup step condition from `failure()` to `failure() || cancelled()` - Add Slack notification for cancelled runs Previously, cancelling via GitHub UI left GPU pods running. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Mac.localdomain> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Add new task config `openenv-fleet-grpo-qwen3-8b.yaml`: - Model: Qwen/Qwen3-8B (8B params, instruct-tuned) - GPUs: B200:2 (preferred) or H100:4 (fallback) Original config unchanged (Qwen2.5-1.5B-Instruct, H100:1). Usage: sky launch tasks/openenv-fleet-grpo-qwen3-8b.yaml ... 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Replace hardcoded "H100:1" with dynamic task_name in Slack notifications and workflow summary. Each task config defines its own GPU requirements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

4 → 8 prompts per batch (32 trajectories with n_samples=4) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

The model was not using tools - instead giving suggestions/instructions. Root cause: No system message was being sent, and only tool names (not full schemas) were added to the user prompt. Fix: - Add a system prompt that clearly explains the model is an agent that must execute tools - Include full tool definitions as JSON (not just names) - Separate system and user messages properly - Use concise prompt style similar to theseus orchestrator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Upload eval trajectories to S3 for persistence Eval results (JSONL files with full multi-turn traces) were only saved locally and lost when the cluster terminated. Changes: - Add upload_eval_results_to_s3() function to s3_checkpoints.py - Call S3 upload after local dump in evaluate.py (both regular and step_wise) - Use separate bucket: S3_TRAJECTORY_BUCKET (default: skyrl-trajectories) - Add S3_TRAJECTORY_BUCKET env var to SkyPilot task YAMLs S3 path: s3://skyrl-trajectories/evals/{run_name}/global_step_{N}/ The JSONL files contain full multi-turn traces with model reasoning, tool calls, and tool results (all decoded from response_ids). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * learnings * Fix premature <done> by improving system prompt Model was saying <done> after thinking without calling tools. Updated prompt to explicitly require tool calls before <done>. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add max_input_length for multi-turn context growth - MAX_INPUT_LENGTH=24000 (fits in Qwen3-8B's 32K context) - MAX_GENERATE_LENGTH=4096 - Prevents context truncation during long tool-use conversations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Make training fully on-policy Set policy_mini_batch_size = train_batch_size for single optimizer step per batch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add thinking tokens documentation to learnings Explains when/why thinking gets stripped and how to preserve it for on-policy training. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Qwen3 thinking models may output <done> inside <think> sections when describing their plan, causing premature generation termination. Now vLLM only stops on </tool_call>. The env.step() already detects <done> in the full response to mark episode completion. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel · 2026-01-25T08:28:59Z

Someone is attempting to deploy a commit to the Tyler's projects Team on Vercel.

A member of the Team first needs to authorize it.

gemini-code-assist

Code Review

This pull request introduces a major new feature: integration with Fleet-hosted environments for reinforcement learning training. It adds a comprehensive set of new modules, including a custom environment, dataset preparation scripts, S3 checkpointing, and extensive documentation and tests. The change to remove <done> from vLLM stop tokens, as mentioned in the title, is a small but important part of this larger integration. My review focuses on the new Fleet integration code. I've identified a critical issue with asynchronous code handling that could lead to runtime errors, as well as some high-severity issues related to external dependencies and error handling. Additionally, there are opportunities to improve maintainability by reducing code duplication and to fix inaccuracies in documentation. Overall, this is a substantial and valuable contribution that will be ready for merging after addressing the feedback.

gemini-code-assist · 2026-01-25T08:31:34Z

skyrl-train/integrations/fleet/env.py

+    def step(self, action: str) -> BaseTextEnvStepOutput:
+        """
+        Execute one step in the Fleet environment.
+
+        Parses the action for tool calls, executes via OpenEnv's FleetTaskEnv,
+        and returns observation. Reward is computed by the verifier on completion.
+        """
+        self.turns += 1
+        self.chat_history.append({"role": "assistant", "content": action})
+
+        max_turns_reached = self.turns >= self.max_turns
+
+        # Check if agent signals completion
+        agent_done = "<done>" in action.lower() or "[done]" in action.lower()
+
+        # Parse tool call from LLM response
+        tool_call = parse_tool_call(action)
+
+        tool_result = None
+        error = None
+        reward = 0.0
+
+        # Execute tool call if present via OpenEnv
+        if tool_call and self.openenv_task_env:
+            # Build action dict for OpenEnv
+            openenv_action = {
+                "tool": tool_call["name"],
+                "params": tool_call.get("arguments", {}),
+                "done": agent_done,
+            }
+
+            try:
+                # Use async step method
+                obs, reward, done, info = asyncio.get_event_loop().run_until_complete(
+                    self.openenv_task_env.step_async(openenv_action)
+                )
+                tool_result = obs.get("observation")
+                if "tool_error" in info:
+                    error = info["tool_error"]
+            except Exception as e:
+                error = str(e)
+        elif agent_done and self.openenv_task_env:
+            # Agent signaled done without tool call
+            openenv_action = {"done": True}
+            try:
+                obs, reward, done, info = asyncio.get_event_loop().run_until_complete(
+                    self.openenv_task_env.step_async(openenv_action)
+                )
+            except Exception as e:
+                error = str(e)
+
+        # Check if episode is done
+        episode_done = agent_done or max_turns_reached
+
+        # Build observation message
+        if max_turns_reached:
+            return BaseTextEnvStepOutput(
+                observations=[],
+                reward=reward,
+                done=True,
+                metadata={"done_reason": "max_turns", "task_key": self.task_key},
+            )
+
+        # Build response observation
+        if error:
+            obs_content = f"Error: {error}"
+        elif tool_result:
+            if isinstance(tool_result, dict):
+                obs_content = f"Tool result:\n{json.dumps(tool_result, indent=2)}"
+            else:
+                obs_content = f"Tool result:\n{tool_result}"
+        elif agent_done:
+            obs_content = "Task marked as complete."
+        elif not tool_call:
+            obs_content = 'No tool call found. Use <tool_call>{"name": "...", "arguments": {...}}</tool_call> format.'
+        else:
+            obs_content = "Action executed."
+
+        new_obs = {"role": "user", "content": obs_content}
+        self.chat_history.append(new_obs)
+
+        metadata = {
+            "task_key": self.task_key,
+            "turn": self.turns,
+            "tool_call": tool_call,
+            "tool_result": tool_result,
+            "error": error,
+            "done_reason": "agent_done" if agent_done else None,
+        }
+
+        return BaseTextEnvStepOutput(
+            observations=[new_obs],
+            reward=reward,
+            done=episode_done,
+            metadata=metadata,
+        )


The step method is synchronous but calls asyncio.get_event_loop().run_until_complete() to execute an async function. Since the training loop is initiated with asyncio.run(), an event loop is already running. This will cause a RuntimeError: This event loop is already running.

To fix this, the step method should be defined as async def and use await to call self.openenv_task_env.step_async(). This change might require updating the BaseTextEnv class, but it's the correct approach for handling asynchronous operations within an async application. A similar issue exists on lines 268-270.

async def step(self, action: str) -> BaseTextEnvStepOutput: """ Execute one step in the Fleet environment. Parses the action for tool calls, executes via OpenEnv's FleetTaskEnv, and returns observation. Reward is computed by the verifier on completion. """ self.turns += 1 self.chat_history.append({"role": "assistant", "content": action}) max_turns_reached = self.turns >= self.max_turns # Check if agent signals completion agent_done = "<done>" in action.lower() or "[done]" in action.lower() # Parse tool call from LLM response tool_call = parse_tool_call(action) tool_result = None error = None reward = 0.0 # Execute tool call if present via OpenEnv if tool_call and self.openenv_task_env: # Build action dict for OpenEnv openenv_action = { "tool": tool_call["name"], "params": tool_call.get("arguments", {}), "done": agent_done, } try: # Use async step method obs, reward, done, info = await self.openenv_task_env.step_async(openenv_action) tool_result = obs.get("observation") if "tool_error" in info: error = info["tool_error"] except Exception as e: error = str(e) elif agent_done and self.openenv_task_env: # Agent signaled done without tool call openenv_action = {"done": True} try: obs, reward, done, info = await self.openenv_task_env.step_async(openenv_action) except Exception as e: error = str(e) # Check if episode is done episode_done = agent_done or max_turns_reached # Build observation message if max_turns_reached: return BaseTextEnvStepOutput( observations=[], reward=reward, done=True, metadata={"done_reason": "max_turns", "task_key": self.task_key}, ) # Build response observation if error: obs_content = f"Error: {error}" elif tool_result: if isinstance(tool_result, dict): obs_content = f"Tool result:\n{json.dumps(tool_result, indent=2)}" else: obs_content = f"Tool result:\n{tool_result}" elif agent_done: obs_content = "Task marked as complete." elif not tool_call: obs_content = 'No tool call found. Use <tool_call>{"name": "...", "arguments": {...}}</tool_call> format.' else: obs_content = "Action executed." new_obs = {"role": "user", "content": obs_content} self.chat_history.append(new_obs) metadata = { "task_key": self.task_key, "turn": self.turns, "tool_call": tool_call, "tool_result": tool_result, "error": error, "done_reason": "agent_done" if agent_done else None, } return BaseTextEnvStepOutput( observations=[new_obs], reward=reward, done=episode_done, metadata=metadata, )

gemini-code-assist · 2026-01-25T08:31:34Z

skyrl-train/integrations/fleet/env.py

+        except Exception as e:
+            logger.error(f"Failed to reset Fleet environment for task {self.task_key}: {e}")
+            self._init_failed = True
+            obs = {}


The init method catches any exception during self.openenv_task_env.reset(), logs an error, and then continues. This can mask critical environment setup failures, leading to silent errors or unpredictable behavior later in the training process. The test test_init_reset_fails correctly expects a RuntimeError in this scenario. It would be better to re-raise the exception to make the failure explicit and prevent the system from continuing in a bad state.

Suggested change

except Exception as e:

logger.error(f"Failed to reset Fleet environment for task {self.task_key}: {e}")

self._init_failed = True

obs = {}

except Exception as e:

logger.error(f"Failed to reset Fleet environment for task {self.task_key}: {e}")

raise RuntimeError(f"Failed to reset Fleet environment for task {self.task_key}: {e}") from e

gemini-code-assist · 2026-01-25T08:31:34Z

skyrl-train/tasks/openenv-fleet-grpo-qwen3-8b.yaml

+  uv pip install wandb boto3 awscli
+
+  # Install OpenEnv for Fleet environment access (from branch with FleetTaskEnv)
+  uv pip install "git+https://github.com/fleet-ai/OpenEnv.git@deniz/fleet_client" fleet-python


The setup script installs OpenEnv from a personal git branch (deniz/fleet_client). This introduces a dependency on a branch that may not be stable or could be deleted, making the build process fragile and difficult to reproduce. It is highly recommended to use a released version from PyPI or a stable branch from the main repository. This same issue is present on line 148.

uv pip install openenv-client fleet-python

gemini-code-assist · 2026-01-25T08:31:34Z

CLAUDE.md

+   - Fleet SDK repo: `/Users/deniz/repos/fleet-sdk`
+   - OpenEnv repo: `/Users/deniz/repos/OpenEnv`


The repository paths for Fleet SDK and OpenEnv are hardcoded to a local user directory. For this documentation to be useful to all contributors, these should be replaced with the public repository URLs.

Suggested change

- Fleet SDK repo: `/Users/deniz/repos/fleet-sdk`

- OpenEnv repo: `/Users/deniz/repos/OpenEnv`

- Fleet SDK repo: https://github.com/fleet-ai/fleet-sdk

- OpenEnv repo: https://github.com/fleet-ai/OpenEnv

gemini-code-assist · 2026-01-25T08:31:34Z

skyrl-train/integrations/fleet/__init__.py

+# Fleet Task Environment Integration for SkyRL
+#
+# This module provides a SkyRL-compatible environment wrapper for Fleet-hosted tasks.
+# It uses the Fleet SDK directly (no OpenEnv dependency).


The comment here states that the module uses the Fleet SDK directly with no OpenEnv dependency. This is incorrect, as the implementation in integrations/fleet/env.py uses OpenEnvFleetTaskEnv. The comment should be updated to accurately reflect the dependency on OpenEnv.

Suggested change

# It uses the Fleet SDK directly (no OpenEnv dependency).

# It uses OpenEnv's FleetTaskEnv as an abstraction layer.

gemini-code-assist · 2026-01-25T08:31:35Z

skyrl-train/skyrl_train/evaluate.py

+            # Upload to S3 if credentials are available
+            try:
+                from integrations.fleet.s3_checkpoints import upload_eval_results_to_s3
+
+                run_name = getattr(cfg.trainer, "run_name", None)
+                if run_name:
+                    upload_eval_results_to_s3(
+                        local_dir=str(data_save_dir),
+                        run_name=run_name,
+                        global_step=global_step,
+                        delete_local=False,  # Keep local copy
+                    )
+            except ImportError:
+                pass  # S3 upload not available
+            except Exception as e:
+                logger.warning(f"Failed to upload eval results to S3: {e}")


The logic for uploading evaluation results to S3 is duplicated in both the evaluate and evaluate_step_wise functions (lines 254-269). To improve maintainability and avoid redundancy, this logic should be refactored into a separate helper function.

dzorlu and others added 30 commits January 20, 2026 18:35

Fix: use --cluster instead of --cluster-name

a8a51d0

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix WandB team name in Slack notifications (#16)

fa1787a

Change `fleet-ai` to `thefleet` in WandB URLs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

docs: Update fleet-training.md with task breakdown and S3 info (#27)

68972a8

Co-authored-by: Deniz <deniz@Denizs-MacBook-Pro.local>

dzorlu and others added 16 commits January 22, 2026 22:51

gemini-code-assist bot reviewed Jan 25, 2026

View reviewed changes

dzorlu closed this Jan 25, 2026

dzorlu deleted the fix/remove-done-stop-token branch January 25, 2026 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove <done> from vLLM stop tokens #954

Remove <done> from vLLM stop tokens #954

Uh oh!

dzorlu commented Jan 25, 2026

Uh oh!

vercel bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

gemini-code-assist bot Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- Fleet SDK repo: `/Users/deniz/repos/fleet-sdk`
		- OpenEnv repo: `/Users/deniz/repos/OpenEnv`

	# It uses the Fleet SDK directly (no OpenEnv dependency).
	# It uses OpenEnv's FleetTaskEnv as an abstraction layer.

Remove <done> from vLLM stop tokens #954

Remove <done> from vLLM stop tokens #954

Uh oh!

Conversation

dzorlu commented Jan 25, 2026

Summary

Problem

Solution

Files Changed

Uh oh!

vercel bot commented Jan 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant