[R] Use same RNG algorithm as in other interfaces #10029
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ref #9810
This PR removes the custom RNG option for the R package that uses R's own RNG in place of XGBoost's default RNG.
In order to still mimic the old behavior, it will draw a random seed through R's RNG system when the
seed
parameter is not manually specified, but otherwise will take the random seed (whether manually passed or automatically drawn) with the default random engine.With this change, running the same call to
xgb.train
with the same data and parameters should produce the same results across interfaces.By the way, I see that XGBoost is using a Mersenne-Twister generator. This family of RNG algorithms has been found to not hold all the statistical properties that one expects from an RNG algorithm, and many large projects (e.g. NumPy, DuckDB, Highway, among many others) have consequently moved towards newer RNGs like Xoshiro family and PCG family.
It might not matter much given the limited amount of random numbers that are drawn in XGBoost, but could be a potential improvement to look after.