Fix merge conflicts [skip ci] #3892

@dantegd

This should resolve the confusion caused in the issue rapidsai/raft#228. Tagging @dantegd for review. Authors: - Thejaswi. N. S (https://github.com/teju85) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3875

Use floating rounding to make UMAP optimization deterministic. This is a breaking change as the batch size parameter is removed. * Add procedure for rounding the gradient updates. * Add buffer for gradient updates. * Add an internal parameter `deterministic`, which should be set to `true` when `random_state` is set. The test file is removed due to #3849 . Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3848

This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Dillon Cullinan (https://github.com/dillon-cullinan) URL: #3883

I made the mistake and got a segmentation fault. A value error might be nicer. Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #3881

…istances from self-loops (#3824) Closes #3801 Closes #3802 Corresponding RAFT PR: rapidsai/raft#217 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3824

This PR rewrites the mean squared error objective. Mean squared error is much easier when factored mathematically into a slightly different form. This should bring regression performance in line with classification. I've also removed the MAE objective as its not correct. This can be seen from the fact that leaf predictions with MAE use the mean, where the correct minimiser is the median. Also see sklearns implementation, where streaming median calculations are required: https://github.com/scikit-learn/scikit-learn/blob/de1262c35e2aa4ee062d050281ee576ce9e35c94/sklearn/tree/_criterion.pyx#L976. Implementing this correctly for GPU would be very challenging. Performance before: ![rf_regression_perf](https://user-images.githubusercontent.com/7307640/117608125-8c884280-b1b1-11eb-8cb4-e92f39dad0f3.png) After: ![rf_regression_perf_fix](https://user-images.githubusercontent.com/7307640/117608145-94e07d80-b1b1-11eb-939f-b96cafbd3e35.png) Script: ```python from cuml import RandomForestRegressor as cuRF from sklearn.ensemble import RandomForestRegressor as sklRF from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns import time matplotlib.use("Agg") sns.set() X, y = make_regression(n_samples=100000, random_state=0) X = X.astype(np.float32) y = y.astype(np.float32) rs = np.random.RandomState(92) df = pd.DataFrame(columns=["algorithm", "Time(s)", "MSE"]) d = 10 n_repeats = 5 bootstrap = False max_samples = 1.0 max_features = 0.5 n_estimators = 10 n_bins = min(X.shape[0], 128) for _ in range(n_repeats): clf = sklRF( n_estimators=n_estimators, max_depth=d, random_state=rs, max_features=max_features, bootstrap=bootstrap, max_samples=max_samples if max_samples < 1.0 else None, ) start = time.perf_counter() clf.fit(X, y) skl_time = time.perf_counter() - start pred = clf.predict(X) cu_clf = cuRF( n_estimators=n_estimators, max_depth=d, random_state=rs.randint(0, 1 << 32), n_bins=n_bins, max_features=max_features, bootstrap=bootstrap, max_samples=max_samples, use_experimental_backend=True, ) start = time.perf_counter() cu_clf.fit(X, y) cu_time = time.perf_counter() - start cu_pred = cu_clf.predict(X, predict_model="CPU") df = df.append( { "algorithm": "cuml", "Time(s)": cu_time, "MSE": mean_squared_error(y, cu_pred), }, ignore_index=True, ) df = df.append( { "algorithm": "sklearn", "Time(s)": skl_time, "MSE": mean_squared_error(y, pred), }, ignore_index=True, ) print(df) fig, ax = plt.subplots(1, 2) sns.barplot(data=df, x="algorithm", y="Time(s)", ax=ax[0]) sns.barplot(data=df, x="algorithm", y="MSE", ax=ax[1]) plt.savefig("rf_regression_perf_fix.png") ``` Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Thejaswi. N. S (https://github.com/teju85) - John Zedlewski (https://github.com/JohnZed) URL: #3845

See #3820 Authors: - Micka (https://github.com/lowener) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3831

See #3820 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #3830

…8-merge-21.06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix merge conflicts [skip ci] #3892

Fix merge conflicts [skip ci] #3892

Commits on May 19, 2021

Commits on May 20, 2021

Commits on May 21, 2021

Commits on May 22, 2021

Commits on May 24, 2021