[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N #931

kouroshHakha · 2026-01-24T02:39:06Z

Summary: Integrates the HTTP-based inference layer with training code behind a private feature flag _SKYRL_USE_HTTP_INFERENCE. When enabled, uses RemoteInferenceClient + ServerGroup + InferenceRouter instead of the legacy Ray actor-based inference. Both code paths remain fully functional; the flag allows gradual rollout and validation.

Key Changes:

Feature Flag (env_vars.py):
- _SKYRL_USE_HTTP_INFERENCE env var (default: 0 = legacy path)
New Config Options (ppo_base_config.yaml):
- generator.external_proxy_url - External data plane URL (optional)
- generator.external_server_urls - External control plane URLs (optional)
- generator.router_port - Port for managed router (default: 8080)
Config Validation (utils.py):
- Colocated + external endpoints → Error (must use driver-managed servers)
- Non-colocated routing logic for various external/internal combinations
Updated get_inference_client() (main_base.py):
- When flag enabled: Build VLLMServerGroup + InferenceRouter + RemoteInferenceClient
- When flag disabled: Use legacy InferenceEngineClient (existing behavior)
- Proper teardown of server group and router
Weight Sync Integration (worker.py, broadcast_strategy.py, transfer_strategy.py, cuda_ipc_strategy.py):
- worker.py fetches inference_world_size from client.get_world_size() for HTTP path
- create_init_info() accepts optional inference_world_size parameter
- HTTP path never assumes parallelism from config - always queries servers
API Compatibility (remote_inference_client.py):
- Renamed init_weight_transfer → init_weight_update_communicator
- Renamed update_weights → update_named_weights
- Added tags parameter to sleep()/wake_up() for colocation

Files Changed:

File	Change
`env_vars.py`	Add `_SKYRL_USE_HTTP_INFERENCE` feature flag
`ppo_base_config.yaml`	Add `external_proxy_url`, `external_server_urls`, `router_port`
`utils.py`	Add `_validate_http_inference_cfg()` with routing logic
`main_base.py`	Update `get_inference_client()` to use HTTP path when flag enabled
`remote_inference_client.py`	Rename methods for API compatibility
`worker.py`	Fetch `inference_world_size` from client for HTTP path
`broadcast_strategy.py`	Accept `inference_world_size` parameter, validate for HTTP path
`transfer_strategy.py`	Update base class signature
`cuda_ipc_strategy.py`	Accept (and ignore) `inference_world_size` parameter

Testing:

# Legacy path (default)
pytest tests/

# New HTTP path
_SKYRL_USE_HTTP_INFERENCE=1 pytest tests/

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Phase 2b-1: Refactors inference client creation into an overridable hook. Changes: - Add get_inference_client() -> InferenceEngineInterface hook in BasePPOExp - Update _setup_trainer() to use the new hook - Refactor DAPOExp to override get_inference_client() instead of duplicating _setup_trainer() - Update EvalOnlyEntrypoint.run() to use the hook - Update TerminalBenchGenerateExp._setup_generator() to use the hook - Move strategy validation for FlashRL to main() for early failure - Fix bug: add missing tokenizer arg in DAPOExp remote engines path This refactor eliminates code duplication and prepares for future RemoteInferenceClient integration (Phase 2b-2).

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…nce-3

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 25 commits January 18, 2026 21:23

v0

40e538b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

common

a52b0dc

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

vllm_server_actor

6d68e2f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

pool

d0d2990

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d20b4bd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

group

07f3d9f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1a48e61

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e290f4b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

tests

509538f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

555082b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

afcc8de

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058cb95

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

7c8fc0b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

68dc4ed

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

dce17d2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini fback

22c12ad

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

05bfc92

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

eca0e3d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

bdd1d8a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

9bf4173

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058d358

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge origin/kh/inference-2 (PR 904) - Add inference_servers module

da8106b

Merge remote-tracking branch 'origin/kh/inference-2b2' into kh/infere…

46c11d8

…nce-3

wip

5e93b49

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N #931

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N #931

Uh oh!

kouroshHakha commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N #931

Are you sure you want to change the base?

[skyrl-train][inference] HTTP Inference Integration (Feature-Flagged) 4/N #931

Uh oh!

Conversation

kouroshHakha commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant