Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

Merged
merged 58 commits into from
Jul 1, 2019
Merged

[Review] Random Forest & Decision Tree Regression + major updates to Classification #635

merged 58 commits into from
Jul 1, 2019

Conversation

vishalmehta1991
Copy link
Contributor

@vishalmehta1991 vishalmehta1991 commented May 27, 2019

  • Fixing DT classification number of feature limitation. major fix for RF classifier against branch-0.7
  • Adding Regression Decision Tree and Random Forest
  • Adding Entropy for classification MSE/MAE for regression
  • Base classes for RF and DT, derived classes for regression and classifications
  • Various Bug fixes

myrtw and others added 24 commits May 7, 2019 06:19
- Added rfRegressor class to random forest, along with relevant stateless API functions.
  Implementation currently commented out as DecisionTreeRegressor is not yet implemented.
- Updated RF_metrics struct to include regression metrics too. Metrics supported
  are mean absolute error, mean squared error and median absolute error as per SKL-rf.
- Added a base dt class, similat to what we had in rf.
- Added a DecisionTreeRegressor class. API only, no implementation for now.
-Further templated TreeNode to work for both regression and classification.
-Moved all predict methods to the base dt class.
- Further templated TemporaryMemory (for regression labels) + other minor fixes.
- Moved split_branch to base dt class.
- Placeholder methods for DecisionTreeRegressor.
- RFRegressor flat API signature fixes.
- Minor tempmem fixes.
- Temp addition to rf_test for regression testing. Very simply test.
- Debugging ongoing.
- DecisionTreeRegressor: supports MSE (mean squared error) or MAE
(mean absolute error) as split criterion.
- DecisionTreeClassifier: default and only option is GINI.
- Removed useless memory allocation from  memory.cuh
- Added temporary testing folder in randomforest w/ testing script
for both regression and classification. This directory will be removed prior any merge req.
- Preprocess quantiles in batches of batch_cols if there isn't
enough device memory to process all ncols at once.
- The number of columns per batch is dynamically determined based
on the available device memory.
- Updated all our cudaMemcpy calls to updateDevice, updateHost, or updateAsync as appropriate.
- Copied recent iota -> permute change from rfRegressor to rfClassifier fit too.
- Also minor update to minmax prim code.
- minmax prim use in dt commented out.
- Calling find_best_fruit_all is unnecessary when a node
will be considered a leaf due to depth (or max leaves) constraints.
- Added stream to cub call in memory.cuh
- Do not update # bins for GLOBAL_QUANTILE when # rows per node < # bins.
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@cjnolet
Copy link
Member

cjnolet commented May 27, 2019

add to whitelist

myrtw and others added 9 commits June 14, 2019 03:22
- Added batching support to minmaxKernel to enable datasets with a large
number of features.
- Added an extra testcase where ncols wouldn't previously fit
in the available shared memory.
…hta1991/cuml into fea-ext-randomforest_regression
…ndomforest_regression

- Also run clang-format on files.
- Also more clang formatting changes that the previous commit missed.
- Examples run but accuracy is low. TODO Debug
- Corrected doxygen comments in randomforest.cu
…ndomforest_regression

- Updates to minmax prim.
@vishalmehta1991 vishalmehta1991 changed the title [Review] Random Forest & Decision Tree Regression [Review] Random Forest & Decision Tree Regression + major updates to Classification Jun 19, 2019
@vishalmehta1991 vishalmehta1991 changed the base branch from branch-0.8 to branch-0.9 June 21, 2019 07:43
Copy link
Contributor

@oyilmaz-nvidia oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a huge PR and hard to go over every code but overall LGTM. Please check the two comments I left.


//Cleanup
selected_rows.release(stream);
trees[i].fit(user_handle, input, n_cols, n_rows, labels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think decision tree building should be parallelized to get better speedup. Should be relatively straigforward using OpenMP with multiple streams. It can be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but as per the cuml developer guide. We want to keep the algorithms single threaded. Openmp can be compiler specific.

Copy link
Member

@cjnolet cjnolet Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalmehta1991, Some of us had a discussion about this on #694. When I get some time, I'm going to submit a PR so that we can be more clear in our language in the developer guide.

This is a case that, I believe, would warrant the use of threading since building trees in serial has such a massive impact on the performance of our application. This should be made possible by synchronizing the main stream before your for..loop, creating new streams in threads and synchronizing them while still in the for loop. I have also verified with the CUDA team that the cuda API is, in fact, thread safe so you should not have a problem creating and submitting cuda kernels while in separate threads.

So long as you synchronize the main stream before the for loop, you should not have any problems maintaining thread-safety.

Copy link
Contributor Author

@vishalmehta1991 vishalmehta1991 Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cjnolet am completely aware about the CUDA api and the thread support for it. That not the concern. My concern is that the way openmp is treated across a board of compilers. To be compliant with all the compiler that do openmp; we will need to test them.

But again from a developer point; putting openmp is easy and simple. I would be up for it if Rapids developer guide allows me to.

Copy link
Member

@cjnolet cjnolet Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @jirikraus & @teju85 for additional feedback / suggestions of ways we can design this to build trees in parallel.

Copy link
Member

@cjnolet cjnolet Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishalmehta1991. I would like to learn more about the problems related to openmp compiler support. Would you mind sharing some references related to this problem?

I'm concerned that we're seeing signs the current implementation's performance is bounded by the number of available CPU cores as a result of the kernels being submitted in a single stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously, what is the real issue here about OpenMP or any other multi-threading library? OpenMP was used in KNN and I don't recall any problem. It's obvious that we need to build the trees in parallel. What do you guys suggest if you don't want to use openmp or a multi-threading library? Only other option I can think of is to redesign and rewrite the whole algorithm from scratch to build the trees in parallel which will require a significant amount of work. Any other suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I dont mind threads; its embarrassing parallel. If you want we can do openmp in a separate PR (this PR is too big ) and also open a PR to change the cuml developer guide; so that we adhere to cuml guidelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this PR and try openmp in a separate PR. We can change the developer guideline if openmp PR gives us good results.

Copy link
Member

@cjnolet cjnolet Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording that is being discussed for our changes to the developer guide:

With the exception of the cumlHandle, cuML algorithms should maintain thread-safety and are, in general, assumed to be single threaded. Exceptions are made for algorithms that can take advantage of multiple CUDA streams in order to oversubscribe or increase occupancy on a single GPU. In these cases, multiple threads should be used only to maintain concurrency of the underlying CUDA streams. Multiple threads should be used sparingly, be bounded, and should not perform CPU-intensive computations.

cpp/src/randomforest/randomforest.cu Show resolved Hide resolved
Copy link
Contributor

@oyilmaz-nvidia oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @oyilmaz-nvidia . More perf-updates can be dealt with in subsequent PRs after thorough rounds of profiling, once @cjnolet provides the python script he's been using to benchmark RF code.

Code at the current state looks good for merging.

@dantegd dantegd merged commit e9191d4 into rapidsai:branch-0.9 Jul 1, 2019
@vishalmehta1991 vishalmehta1991 deleted the fea-ext-randomforest_regression branch July 1, 2019 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CUDA / C++ CUDA issue New Algorithm For tracking new algorithms that will be added to our existing collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants