Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DFS to get a valid but smaller new node order #570

Merged
merged 5 commits into from
Jul 13, 2024
Merged

Conversation

tmct
Copy link
Contributor

@tmct tmct commented Jun 24, 2024

Please see #571 for details

@tmct
Copy link
Contributor Author

tmct commented Jun 24, 2024

Resolves #571

@hcho3
Copy link
Collaborator

hcho3 commented Jun 24, 2024

Hi, thanks for submitting a pull request. The fix looks good to me.

Can you add a unit test that uses a LightGBM tree with depth >32 ?

@tmct
Copy link
Contributor Author

tmct commented Jun 24, 2024

Good idea. I've synthesised a model that reproduces - will try to get it into a unit test.

'tree\nversion=v4\nnum_class=1\nnum_tree_per_iteration=1\nlabel_index=0\nmax_feature_idx=0\nobjective=regression\nfeature_names=this\nfeature_infos=[0:100]\ntree_sizes=1119\n\nTree=0\nnum_leaves=32\nnum_cat=0\nsplit_feature=0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\nsplit_gain=0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1\nthreshold=1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\ndecision_type=2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\nleft_child=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 -31\nright_child=-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -28 -29 -30 -32\nleaf_value=31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 0 1\nleaf_weight=1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\nleaf_count=1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\ninternal_value=0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\ninternal_weight=1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\ninternal_count=1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\nis_linear=0\nshrinkage=1\n\n\nend of trees\n\nfeature_importances:\nthis=31\n\npandas_categorical:null\n'

Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.66%. Comparing base (27b81f7) to head (8175573).
Report is 5 commits behind head on mainline.

Additional details and impacted files
@@             Coverage Diff              @@
##           mainline     #570      +/-   ##
============================================
+ Coverage     84.58%   84.66%   +0.08%     
============================================
  Files            75       75              
  Lines          6547     6549       +2     
  Branches        528      528              
============================================
+ Hits           5538     5545       +7     
+ Misses         1009     1004       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hcho3
Copy link
Collaborator

hcho3 commented Jun 25, 2024

@tmct Can you post the code that generated the model? It would be nice to not add the model file itself to the git repo.

@tmct
Copy link
Contributor Author

tmct commented Jun 25, 2024

The model is effectively handmade outside of LightGBM using the LightGBM model format, and I can't share the specific construction code for it, sorry. I think it might be very difficult to produce a small model with this depth property through the natural training dynamics of LightGBM.... You might be able to achieve it through clever use of features such as sample weights and custom objectives...

The model file above is a string less than 2kB - I tried to make it as small as possible but still failing the test.

@hcho3
Copy link
Collaborator

hcho3 commented Jun 25, 2024

I see. Let's include the model file in the repo then. If you need help setting up the test, ping me.

@tmct
Copy link
Contributor Author

tmct commented Jun 25, 2024

Thank you, will do.

@hcho3
Copy link
Collaborator

hcho3 commented Jul 10, 2024

@tmct Do you mind if I take over this pull request? I'd like to include this as part of the next release of Treelite.

@tmct
Copy link
Contributor Author

tmct commented Jul 11, 2024

@hcho3 I would be delighted if you could finish the tests for me, please. Sorry that I have not yet found the time.

@hcho3 hcho3 merged commit f1b910e into dmlc:mainline Jul 13, 2024
19 checks passed
@hcho3
Copy link
Collaborator

hcho3 commented Jul 13, 2024

Done!

@tmct
Copy link
Contributor Author

tmct commented Jul 13, 2024

Many thanks!

@tmct tmct deleted the patch-1 branch July 19, 2024 08:58
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jul 24, 2024
Treelite 4.3.0 contains the following improvements:

* Support XGBoost 2.1.0, including the UBJSON format (dmlc/treelite#572, dmlc/treelite#578)
* [GTIL] Allow inferencing with FP32 input + FP64 model (dmlc/treelite#574). Related: triton-inference-server/fil_backend#391
* Prevent integer overflow for deep LightGBM trees by using DFS order (dmlc/treelite#570).
* Support building with latest RapidJSON (dmlc/treelite#567)

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #5968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants