Skip to content

Conversation

@tyler-griggs
Copy link
Member

Summary

  • Redirects vLLM and worker infrastructure logs to a log file (/tmp/skyrl-logs/{run_name}/infra.log)
  • Keeps training progress (config, steps, metrics, rewards) on stdout for visibility
  • Adds SKYRL_LOG_DIR and SKYRL_LOG_LEVEL environment variables

How it works

  • vLLM engines and workers call redirect_actor_output_to_file() in their __init__
  • Training entrypoint does NOT redirect, so training progress reaches stdout
  • SKYRL_LOG_LEVEL=DEBUG shows all logs on stdout for debugging

Test plan

  • Run bash examples/gsm8k/run_gsm8k.sh and verify training progress on stdout
  • Verify vLLM logs go to /tmp/skyrl-logs/gsm8k_test/infra.log
  • Test SKYRL_LOG_LEVEL=DEBUG shows all logs

Known limitations

  • Ray (raylet) system logs still appear on stdout (minimal, not the noisy vLLM output)

🤖 Generated with Claude Code

- Add SKYRL_LOG_DIR and SKYRL_LOG_LEVEL environment variables
- Create ray_logging.py with redirect_actor_output_to_file() helper
- Redirect vLLM engine and worker output to log file
- Keep training progress (from skyrl_entrypoint) on stdout
- Add documentation in docs/LOGGING_IMPROVEMENTS.md

Infrastructure logs (vLLM model loading, KV cache, etc.) now go to
/tmp/skyrl-logs/{run_name}/infra.log while training progress remains
on stdout for visibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
skyrl-docs Ready Ready Preview, Comment Jan 25, 2026 8:09am

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to separate infrastructure logs from training progress logs, redirecting the former to a file, which improves log clarity. However, the implementation of the log file path construction in initialize_ray is vulnerable to path traversal. An attacker can control the log file location via the run_name configuration, potentially leading to arbitrary file creation or append in sensitive system directories, and given that log content can be influenced by model outputs, this poses a significant security risk, including potential remote code execution. Additionally, the log redirection does not respect the SKYRL_LOG_LEVEL=DEBUG setting, and the log file setup in initialize_ray could be more robust by being conditional on the log level to prevent empty log files and misleading messages in debug mode.

Comment on lines +663 to +665
log_dir = Path(SKYRL_LOG_DIR) / cfg.trainer.run_name
log_dir.mkdir(parents=True, exist_ok=True)
log_file = str(log_dir / "infra.log")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The initialize_ray function constructs the log_dir and log_file paths using SKYRL_LOG_DIR and cfg.trainer.run_name without proper validation. This creates a path traversal vulnerability where a malicious run_name (e.g., ../../etc/cron.daily) could lead to arbitrary file creation in sensitive system directories. Since log content can be influenced by model outputs, this poses a significant risk, including potential arbitrary code execution if the file is placed in an executable location (like cron.daily). To remediate this, ensure the constructed path is resolved and verified to be within the intended base directory. Additionally, the current implementation sets up log file redirection regardless of SKYRL_LOG_LEVEL. When SKYRL_LOG_LEVEL is DEBUG, an empty log file is still created and a potentially misleading message is printed. The log file setup should be conditional on SKYRL_LOG_LEVEL not being DEBUG to avoid this.

Comment on lines +12 to +27
def redirect_actor_output_to_file():
"""
Redirect stdout and stderr to log file to prevent Ray from forwarding to driver.
Call this at the very start of any Ray actor/remote function where you want
to suppress output from appearing on the driver's stdout. The output will
instead be written to the log file specified by SKYRL_LOG_FILE.
Note: Do NOT call this in skyrl_entrypoint() - training progress should
go to stdout.
"""
log_file = os.getenv("SKYRL_LOG_FILE")
if log_file:
log_fd = open(log_file, "a", buffering=1) # noqa: SIM115
os.dup2(log_fd.fileno(), sys.stdout.fileno())
os.dup2(log_fd.fileno(), sys.stderr.fileno())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation unconditionally redirects stdout/stderr if SKYRL_LOG_FILE is set. However, the documentation and PR description state that setting SKYRL_LOG_LEVEL=DEBUG should show all logs on stdout. To align with this behavior, you should avoid redirection when the log level is 'DEBUG'.

def redirect_actor_output_to_file():
    """
    Redirect stdout and stderr to log file to prevent Ray from forwarding to driver.

    Call this at the very start of any Ray actor/remote function where you want
    to suppress output from appearing on the driver's stdout. The output will
    instead be written to the log file specified by SKYRL_LOG_FILE.

    Note: Do NOT call this in skyrl_entrypoint() - training progress should
    go to stdout.
    """
    from skyrl_train.env_vars import SKYRL_LOG_LEVEL

    if SKYRL_LOG_LEVEL == "DEBUG":
        return

    log_file = os.getenv("SKYRL_LOG_FILE")
    if log_file:
        log_fd = open(log_file, "a", buffering=1)  # noqa: SIM115
        os.dup2(log_fd.fileno(), sys.stdout.fileno())
        os.dup2(log_fd.fileno(), sys.stderr.fileno())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants