[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

vishalmehta1991 · 2019-05-27T15:17:04Z

Fixing DT classification number of feature limitation. major fix for RF classifier against branch-0.7
Adding Regression Decision Tree and Random Forest
Adding Entropy for classification MSE/MAE for regression
Base classes for RF and DT, derived classes for regression and classifications
Various Bug fixes

- Added rfRegressor class to random forest, along with relevant stateless API functions. Implementation currently commented out as DecisionTreeRegressor is not yet implemented. - Updated RF_metrics struct to include regression metrics too. Metrics supported are mean absolute error, mean squared error and median absolute error as per SKL-rf.

- Added a base dt class, similat to what we had in rf. - Added a DecisionTreeRegressor class. API only, no implementation for now.

-Further templated TreeNode to work for both regression and classification. -Moved all predict methods to the base dt class.

…ion and metric info

- Further templated TemporaryMemory (for regression labels) + other minor fixes. - Moved split_branch to base dt class. - Placeholder methods for DecisionTreeRegressor.

- RFRegressor flat API signature fixes. - Minor tempmem fixes. - Temp addition to rf_test for regression testing. Very simply test. - Debugging ongoing.

- DecisionTreeRegressor: supports MSE (mean squared error) or MAE (mean absolute error) as split criterion. - DecisionTreeClassifier: default and only option is GINI.

… for classification

- Removed useless memory allocation from memory.cuh - Added temporary testing folder in randomforest w/ testing script for both regression and classification. This directory will be removed prior any merge req.

- Preprocess quantiles in batches of batch_cols if there isn't enough device memory to process all ncols at once. - The number of columns per batch is dynamically determined based on the available device memory.

- Updated all our cudaMemcpy calls to updateDevice, updateHost, or updateAsync as appropriate. - Copied recent iota -> permute change from rfRegressor to rfClassifier fit too.

- Also minor update to minmax prim code. - minmax prim use in dt commented out.

- Calling find_best_fruit_all is unnecessary when a node will be considered a leaf due to depth (or max leaves) constraints. - Added stream to cub call in memory.cuh

- Do not update # bins for GLOBAL_QUANTILE when # rows per node < # bins.

GPUtester · 2019-05-27T15:17:06Z

Can one of the admins verify this patch?

cjnolet · 2019-05-27T15:43:31Z

add to whitelist

- Added batching support to minmaxKernel to enable datasets with a large number of features. - Added an extra testcase where ncols wouldn't previously fit in the available shared memory.

…hta1991/cuml into fea-ext-randomforest_regression

…ndomforest_regression - Also run clang-format on files.

- Also more clang formatting changes that the previous commit missed.

- Examples run but accuracy is low. TODO Debug - Corrected doxygen comments in randomforest.cu

…ndomforest_regression - Updates to minmax prim.

…ode questions

…pmem optimization

…ndomforest_regression - Also, updated python wrapper w/ quantile_per_tree.

oyilmaz-nvidia

It's a huge PR and hard to go over every code but overall LGTM. Please check the two comments I left.

oyilmaz-nvidia · 2019-06-27T21:12:15Z

cpp/src/randomforest/randomforest.cu


-    //Cleanup
-    selected_rows.release(stream);
+    trees[i].fit(user_handle, input, n_cols, n_rows, labels,


I think decision tree building should be parallelized to get better speedup. Should be relatively straigforward using OpenMP with multiple streams. It can be done in a separate PR.

Agreed but as per the cuml developer guide. We want to keep the algorithms single threaded. Openmp can be compiler specific.

@vishalmehta1991, Some of us had a discussion about this on #694. When I get some time, I'm going to submit a PR so that we can be more clear in our language in the developer guide.

This is a case that, I believe, would warrant the use of threading since building trees in serial has such a massive impact on the performance of our application. This should be made possible by synchronizing the main stream before your for..loop, creating new streams in threads and synchronizing them while still in the for loop. I have also verified with the CUDA team that the cuda API is, in fact, thread safe so you should not have a problem creating and submitting cuda kernels while in separate threads.

So long as you synchronize the main stream before the for loop, you should not have any problems maintaining thread-safety.

Hi @cjnolet am completely aware about the CUDA api and the thread support for it. That not the concern. My concern is that the way openmp is treated across a board of compilers. To be compliant with all the compiler that do openmp; we will need to test them.

But again from a developer point; putting openmp is easy and simple. I would be up for it if Rapids developer guide allows me to.

Tagging @jirikraus & @teju85 for additional feedback / suggestions of ways we can design this to build trees in parallel.

@vishalmehta1991. I would like to learn more about the problems related to openmp compiler support. Would you mind sharing some references related to this problem?

I'm concerned that we're seeing signs the current implementation's performance is bounded by the number of available CPU cores as a result of the kernels being submitted in a single stream.

Seriously, what is the real issue here about OpenMP or any other multi-threading library? OpenMP was used in KNN and I don't recall any problem. It's obvious that we need to build the trees in parallel. What do you guys suggest if you don't want to use openmp or a multi-threading library? Only other option I can think of is to redesign and rewrite the whole algorithm from scratch to build the trees in parallel which will require a significant amount of work. Any other suggestions?

Personally I dont mind threads; its embarrassing parallel. If you want we can do openmp in a separate PR (this PR is too big ) and also open a PR to change the cuml developer guide; so that we adhere to cuml guidelines.

Let's merge this PR and try openmp in a separate PR. We can change the developer guideline if openmp PR gives us good results.

The wording that is being discussed for our changes to the developer guide:

With the exception of the cumlHandle, cuML algorithms should maintain thread-safety and are, in general, assumed to be single threaded. Exceptions are made for algorithms that can take advantage of multiple CUDA streams in order to oversubscribe or increase occupancy on a single GPU. In these cases, multiple threads should be used only to maintain concurrency of the underlying CUDA streams. Multiple threads should be used sparingly, be bounded, and should not perform CPU-intensive computations.

cpp/src/randomforest/randomforest.cu

oyilmaz-nvidia

LGTM.

teju85

Agree with @oyilmaz-nvidia . More perf-updates can be dealt with in subsequent PRs after thorough rounds of profiling, once @cjnolet provides the python script he's been using to benchmark RF code.

Code at the current state looks good for merging.

…ntropy, added missing rf quantile check

myrtw and others added 24 commits May 7, 2019 06:19

Added base dt class and DecisionTreeRegressor

af36570

- Added a base dt class, similat to what we had in rf. - Added a DecisionTreeRegressor class. API only, no implementation for now.

More class updates.

545582d

TreeNode updates & more dt class changes.

8c5dd25

-Further templated TreeNode to work for both regression and classification. -Moved all predict methods to the base dt class.

added regression kernels, modified naming convention to metric quest…

b9c5c6f

…ion and metric info

More decision tree changes

04da29a

- Further templated TemporaryMemory (for regression labels) + other minor fixes. - Moved split_branch to base dt class. - Placeholder methods for DecisionTreeRegressor.

added kernels for mean squared error

14fb0e0

added all regression code / kernels, now compiles, next step is testing

feaac6d

Code in flux. Regression related changes.

09eb479

- RFRegressor flat API signature fixes. - Minor tempmem fixes. - Temp addition to rf_test for regression testing. Very simply test. - Debugging ongoing.

fixed right mse, it needs to be computed in kernel

1b2b8d3

Added support for MSE or MAE split criterion.

a6457fc

- DecisionTreeRegressor: supports MSE (mean squared error) or MAE (mean absolute error) as split criterion. - DecisionTreeClassifier: default and only option is GINI.

Fixed split_criterion config in rf_test.

d8176b7

relocating functors, adding inline to device functors, adding entropy…

229bd03

… for classification

Removed useless mem alloc+added tmp testing script

f2a8336

- Removed useless memory allocation from memory.cuh - Added temporary testing folder in randomforest w/ testing script for both regression and classification. This directory will be removed prior any merge req.

added iota and permute on GPU using thrust and ml-prims

76d2e7e

Preprocess quantiles in batches.

9a5f32c

- Preprocess quantiles in batches of batch_cols if there isn't enough device memory to process all ncols at once. - The number of columns per batch is dynamically determined based on the available device memory.

merged new cuml dir structure

b6ec0fb

Swapped cudaMemcpy w/ updateDevice/Host/Async.

b472bbd

- Updated all our cudaMemcpy calls to updateDevice, updateHost, or updateAsync as appropriate. - Copied recent iota -> permute change from rfRegressor to rfClassifier fit too.

Made rowids and colids unsigned int.

d6027a9

- Also minor update to minmax prim code. - minmax prim use in dt commented out.

now using minmax primitive with column sampler

29ac9ee

deleted col_minmax kernel now using ml-prims

8c5ed32

adding missing stream in cub

280bce6

Reordered call to find_best_fruit_all function.

8648363

- Calling find_best_fruit_all is unnecessary when a node will be considered a leaf due to depth (or max leaves) constraints. - Added stream to cub call in memory.cuh

Fixed nbins bug for GLOBAL_QUANTILE.

d48e9c2

- Do not update # bins for GLOBAL_QUANTILE when # rows per node < # bins.

Changelog update.

21d30e9

dantegd requested review from oyilmaz-nvidia, cjnolet and teju85 May 27, 2019 22:04

myrtw and others added 9 commits June 14, 2019 03:22

Added support for wider datasets for minmax prim.

18ef5b2

- Added batching support to minmaxKernel to enable datasets with a large number of features. - Added an extra testcase where ncols wouldn't previously fit in the available shared memory.

loop around regressor kernels for large number of features

4ab84a2

Merge branch 'fea-ext-randomforest_regression' of github.com:vishalme…

83d4dfb

…hta1991/cuml into fea-ext-randomforest_regression

Minor kernel fix + helper function.

f79f8fa

Merge branch 'branch-0.8' of github.com:rapidsai/cuml into fea-ext-ra…

e677e55

…ndomforest_regression - Also run clang-format on files.

Python related updates to rf/dt, randomforest.pyx

8d326b3

- Also more clang formatting changes that the previous commit missed.

WIP python wrapper changes.

b588047

- Examples run but accuracy is low. TODO Debug - Corrected doxygen comments in randomforest.cu

Fixed Python wrapper

1181890

Merge branch 'branch-0.8' of github.com:rapidsai/cuml into fea-ext-ra…

5907085

…ndomforest_regression - Updates to minmax prim.

vishalmehta1991 changed the title ~~[Review] Random Forest & Decision Tree Regression~~ [Review] Random Forest & Decision Tree Regression + major updates to Classification Jun 19, 2019

vishalmehta1991 changed the base branch from branch-0.8 to branch-0.9 June 21, 2019 07:43

vishalmehta1991 and others added 4 commits June 24, 2019 11:30

adding host quatile data structure and removing host mem copies for n…

51017ac

…ode questions

quantile fix; once per RF; quantile per tree flag; default false; tem…

10473d3

…pmem optimization

critical fix: entropy functor cannot have log(0)

95eb5f1

Merge branch 'branch-0.9' of github.com:rapidsai/cuml into fea-ext-ra…

9ad7e0f

…ndomforest_regression - Also, updated python wrapper w/ quantile_per_tree.

oyilmaz-nvidia reviewed Jun 27, 2019

View reviewed changes

Python style fix.

db787ef

oyilmaz-nvidia approved these changes Jun 28, 2019

View reviewed changes

cjnolet approved these changes Jun 28, 2019

View reviewed changes

teju85 approved these changes Jun 28, 2019

View reviewed changes

vishalmehta1991 added 2 commits July 1, 2019 16:52

using proper split criterion in rf test, fixing max value range for e…

ffb80e2

…ntropy, added missing rf quantile check

changing ChangeLog entry to branch-0.9

c6c5a5e

dantegd approved these changes Jul 1, 2019

View reviewed changes

dantegd merged commit e9191d4 into rapidsai:branch-0.9 Jul 1, 2019

vishalmehta1991 deleted the fea-ext-randomforest_regression branch July 1, 2019 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

vishalmehta1991 commented May 27, 2019 •

edited

Loading

GPUtester commented May 27, 2019

cjnolet commented May 27, 2019

oyilmaz-nvidia left a comment

oyilmaz-nvidia Jun 27, 2019

vishalmehta1991 Jun 27, 2019

cjnolet Jun 27, 2019 •

edited

Loading

vishalmehta1991 Jun 27, 2019 •

edited

Loading

cjnolet Jun 27, 2019 •

edited

Loading

cjnolet Jun 27, 2019 •

edited

Loading

oyilmaz-nvidia Jun 28, 2019

vishalmehta1991 Jun 28, 2019

oyilmaz-nvidia Jun 28, 2019

cjnolet Jun 28, 2019 •

edited

Loading

oyilmaz-nvidia left a comment

teju85 left a comment

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

Conversation

vishalmehta1991 commented May 27, 2019 • edited Loading

GPUtester commented May 27, 2019

cjnolet commented May 27, 2019

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

oyilmaz-nvidia Jun 27, 2019

Choose a reason for hiding this comment

vishalmehta1991 Jun 27, 2019

Choose a reason for hiding this comment

cjnolet Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

vishalmehta1991 Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

cjnolet Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

cjnolet Jun 27, 2019 • edited Loading

Choose a reason for hiding this comment

oyilmaz-nvidia Jun 28, 2019

Choose a reason for hiding this comment

vishalmehta1991 Jun 28, 2019

Choose a reason for hiding this comment

oyilmaz-nvidia Jun 28, 2019

Choose a reason for hiding this comment

cjnolet Jun 28, 2019 • edited Loading

Choose a reason for hiding this comment

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

teju85 left a comment

Choose a reason for hiding this comment

vishalmehta1991 commented May 27, 2019 •

edited

Loading

cjnolet Jun 27, 2019 •

edited

Loading

vishalmehta1991 Jun 27, 2019 •

edited

Loading

cjnolet Jun 27, 2019 •

edited

Loading

cjnolet Jun 27, 2019 •

edited

Loading

cjnolet Jun 28, 2019 •

edited

Loading