Building an On-Policy RL Algorithm to play Yahtzee
pip install -r requirements.txt
https://docs.wandb.ai/quickstart/
To train a model, run:
python trainer.py
To run the interactive Yahtzee app, use:
python app.py
python trainer.py \
--batch_size 8192 \
--num_steps 10000 \
--policy_loss_coefficient 100.0 \
--value_loss_coefficient 0.01 \
--entropy_loss_coefficient 1.0 \
--use_learned_value
Here are example runs using the optimal configuration:
- Run 1 - Example training run with optimal parameters
- Run 2 - Alternative training run with optimal parameters
All experiments can be compared on the Weights & Biases project page.
- Clean up the State class to group features into dicts
- Implement UI for calculation mode
- Model saving / Loading
- Model Store via hugging face
- Experiment Management / Comparison Improvements
- Reinforcement Learning for Yahtzee - Explores using Deep Q-Learning and Policy Gradient methods to train an AI agent to play Yahtzee
- Optimal Play in Yahtzee - Mathematical analysis of optimal Yahtzee strategies and expected values
- Yahtzee Q-Learning Implementation - Example implementation of Q-learning applied to Yahtzee