-
Notifications
You must be signed in to change notification settings - Fork 133
Add RayClusterHandler to enable fairseq2 run on Ray #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
nice contribution ! |
setup.py
Outdated
| # listed as optional in tiktoken's pyproject.toml | ||
| # (https://github.com/openai/tiktoken/blob/main/pyproject.toml#L9) | ||
| "blobfile~=3.0.0", | ||
| "ray~=2.40", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be optional. In fact, instead of making it optional, we can just do a quick import check in the cluster implementation, and raise an error advising the use to install Ray themselves. For majority of use cases, people won't be using Ray. See this as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made ray as extra. also make sure the test and existing code works without install ray.
src/fairseq2/cluster.py
Outdated
| def set_torch_distributed_variables(self) -> None: | ||
| env = self._env | ||
|
|
||
| rank_str = env.get("RANK") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can (ideally should) use the get_rank, get_local_rank, get_world_size, and get_local_world_size utility functions defined here: https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/utils/env.py#L85
They also raise an appropriate exception that causes the process to terminate gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, changed to use utility functions.
23b04af to
e20c5cf
Compare
|
I would be interested in reanimating this work! |
sure, will merge master and push the latest. |
What does this PR do? Please describe:
Allow fairseq2 to run inside ray cluster.
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: