Skip to content

Conversation

@kouroshHakha
Copy link
Contributor

Summary: Integrates the HTTP-based inference layer with training code behind a private feature flag _SKYRL_USE_HTTP_INFERENCE. When enabled, uses RemoteInferenceClient + ServerGroup + InferenceRouter instead of the legacy Ray actor-based inference. Both code paths remain fully functional; the flag allows gradual rollout and validation.

Key Changes:

  1. Feature Flag (env_vars.py):

    • _SKYRL_USE_HTTP_INFERENCE env var (default: 0 = legacy path)
  2. New Config Options (ppo_base_config.yaml):

    • generator.external_proxy_url - External data plane URL (optional)
    • generator.external_server_urls - External control plane URLs (optional)
    • generator.router_port - Port for managed router (default: 8080)
  3. Config Validation (utils.py):

    • Colocated + external endpoints → Error (must use driver-managed servers)
    • Non-colocated routing logic for various external/internal combinations
  4. Updated get_inference_client() (main_base.py):

    • When flag enabled: Build VLLMServerGroup + InferenceRouter + RemoteInferenceClient
    • When flag disabled: Use legacy InferenceEngineClient (existing behavior)
    • Proper teardown of server group and router
  5. Weight Sync Integration (worker.py, broadcast_strategy.py, transfer_strategy.py, cuda_ipc_strategy.py):

    • worker.py fetches inference_world_size from client.get_world_size() for HTTP path
    • create_init_info() accepts optional inference_world_size parameter
    • HTTP path never assumes parallelism from config - always queries servers
  6. API Compatibility (remote_inference_client.py):

    • Renamed init_weight_transferinit_weight_update_communicator
    • Renamed update_weightsupdate_named_weights
    • Added tags parameter to sleep()/wake_up() for colocation

Files Changed:

File Change
env_vars.py Add _SKYRL_USE_HTTP_INFERENCE feature flag
ppo_base_config.yaml Add external_proxy_url, external_server_urls, router_port
utils.py Add _validate_http_inference_cfg() with routing logic
main_base.py Update get_inference_client() to use HTTP path when flag enabled
remote_inference_client.py Rename methods for API compatibility
worker.py Fetch inference_world_size from client for HTTP path
broadcast_strategy.py Accept inference_world_size parameter, validate for HTTP path
transfer_strategy.py Update base class signature
cuda_ipc_strategy.py Accept (and ignore) inference_world_size parameter

Testing:

# Legacy path (default)
pytest tests/

# New HTTP path
_SKYRL_USE_HTTP_INFERENCE=1 pytest tests/

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Phase 2b-1: Refactors inference client creation into an overridable hook.

Changes:
- Add get_inference_client() -> InferenceEngineInterface hook in BasePPOExp
- Update _setup_trainer() to use the new hook
- Refactor DAPOExp to override get_inference_client() instead of duplicating _setup_trainer()
- Update EvalOnlyEntrypoint.run() to use the hook
- Update TerminalBenchGenerateExp._setup_generator() to use the hook
- Move strategy validation for FlashRL to main() for early failure
- Fix bug: add missing tokenizer arg in DAPOExp remote engines path

This refactor eliminates code duplication and prepares for future
RemoteInferenceClient integration (Phase 2b-2).
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant