Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Treelite format needs to be updated to accommodate 32-bit and 64-bit floats #1084

Closed
hcho3 opened this issue Sep 10, 2019 · 7 comments
Closed
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@hcho3
Copy link
Contributor

hcho3 commented Sep 10, 2019

Describe the bug
#837 added Treelite as a submodule. Currently, Treelite uses single-precision floating-point (float) to store test thresholds and leaf outputs in decision trees. This is good for XGBoost models, but however, some other packages such as LightGBM use double-precision floating-point (double).

Steps/Code to reproduce bug
See dmlc/treelite#98, dmlc/treelite#95, where Treelite is not able to match prediction results for LightGBM models. Due to semantics of floating-point numbers, there is always truncation
whenever we convert between float and double.

As part of 1.0 release of Treelite (dmlc/treelite#111), I'd like to propose an update to the exchange format to support multiple data types. This way, the Treelite exchange format will represent LightGBM models faithfully. I'll post an update when a concrete proposal is ready.

@canonizer

@hcho3 hcho3 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Sep 10, 2019
@canonizer
Copy link
Contributor

What about also supporting float16 for models?

@hcho3
Copy link
Contributor Author

hcho3 commented Sep 10, 2019

@canonizer Good idea. I also have a client who wants to use integer types (floating-point math is slow on microcontrollers)

@hcho3
Copy link
Contributor Author

hcho3 commented Sep 10, 2019

@canonizer Can you point me to a reference implementation of float16? The standard C does not provide 16-bit floating-point type.

Update. this seems promising.

@hcho3
Copy link
Contributor Author

hcho3 commented Oct 8, 2019

@canonizer I'm preparing a pull request to Treelite to enable multiple types (float32, float64, float16, int) for thresholds. Stay tuned.

@hcho3
Copy link
Contributor Author

hcho3 commented Oct 29, 2019

@canonizer @teju85 dmlc/treelite#130 performs refactoring of Treelite codebase to support multiple data types (for now, int32, float32, float64). Finally, LightGBM and scikit-learn models can be represented exactly. Note: I'm breaking compatibility for the Protobuf spec (src/tree.proto).

@teju85
Copy link
Member

teju85 commented Oct 29, 2019

Thanks @hcho3 for pointing us to this. I have filed an issue in our repo to make sure this is not forgotten.

@hcho3
Copy link
Contributor Author

hcho3 commented Jun 19, 2020

Consolidating to #2160.

@hcho3 hcho3 closed this as completed Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants