XGBoost-Ray provides a drop-in replacement for XGBoost's train function. To pass data, instead of using xgb.DMatrix you will have to use xgboost_ray.RayDMatrix.
Distributed training parameters are configured with a xgboost_ray.RayParams object. For instance, you can set the num_actors property to specify how many distributed actors you would like to use.
XGBoost-Ray also features a scikit-learn API fully mirroring pure XGBoost scikit-learn API, providing a completely drop-in replacement. The following estimators are available:
The RayDMatrix lazy loads data and stores it sharded in the Ray object store. The Ray XGBoost actors then access these shards to run their training on.
XGBoost-Ray integrates with Ray Tune to provide distributed hyperparameter tuning for your distributed XGBoost models. You can run multiple XGBoost-Ray training runs in parallel, each with a different hyperparameter configuration, and each training run parallelized by itself. All you have to do is move your training code to a function, and pass the function to tune.run. Internally, train will detect if tune is being used and will automatically report results to tune.
XGBoost-Ray leverages the stateful Ray actor model to enable fault tolerant training. There are currently two modes implemented.