Skip to content

Conversation

@dongwang218
Copy link

@dongwang218 dongwang218 commented Mar 28, 2025

What does this PR do? Please describe:
Allow fairseq2 to run inside ray cluster.

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

  • [N ] Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • [ Y] Did you read the contributor guideline?
  • [ Y] Did you make sure that your PR does only one thing instead of bundling different changes together?
  • [ Y] Did you make sure to update the documentation with your changes? (if necessary)
  • [ Y] Did you write any new necessary tests?
  • [ Y] Did you verify new and existing tests pass locally with your changes?
  • [ N] Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@dongwang218 dongwang218 requested a review from cbalioglu as a code owner March 28, 2025 02:22
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 28, 2025
@dongwang218 dongwang218 requested a review from uralik March 28, 2025 02:23
@artemru
Copy link
Contributor

artemru commented Mar 28, 2025

nice contribution !
can you add a simple integration test and some doc so everyone could start using this integration with Ray ?

setup.py Outdated
# listed as optional in tiktoken's pyproject.toml
# (https://github.com/openai/tiktoken/blob/main/pyproject.toml#L9)
"blobfile~=3.0.0",
"ray~=2.40",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be optional. In fact, instead of making it optional, we can just do a quick import check in the cluster implementation, and raise an error advising the use to install Ray themselves. For majority of use cases, people won't be using Ray. See this as an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made ray as extra. also make sure the test and existing code works without install ray.

def set_torch_distributed_variables(self) -> None:
env = self._env

rank_str = env.get("RANK")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can (ideally should) use the get_rank, get_local_rank, get_world_size, and get_local_world_size utility functions defined here: https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/utils/env.py#L85

They also raise an appropriate exception that causes the process to terminate gracefully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changed to use utility functions.

@cbalioglu cbalioglu force-pushed the dong/launch_via_ray_main branch from 23b04af to e20c5cf Compare April 16, 2025 15:36
@artemru
Copy link
Contributor

artemru commented Dec 26, 2025

I would be interested in reanimating this work!

@dongwang218
Copy link
Author

I would be interested in reanimating this work!

sure, will merge master and push the latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants