Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix for serializer #273

Merged
merged 5 commits into from
May 11, 2021
Merged

Hotfix for serializer #273

merged 5 commits into from
May 11, 2021

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented May 6, 2021

Failing tests in rapidsai/cuml#3829 revealed a subtle bug in #270. It appears that the testing suite in Treelite was not sufficiently robust to let this bug through 😞

The bug causes multiple trees to deserialize from the same set of frames.

I added a few test cases that are affected by the bug. The current Treelite version will fail these tests.

Once this PR is merged, I will publish a patch release (1.2.1).

@@ -44,7 +44,7 @@ inline void TestRoundTrip(treelite::Model* model) {
std::unique_ptr<treelite::Model> received_model = treelite::Model::DeserializeFromFile(fp);
std::fclose(fp);

ASSERT_EQ(TreeliteToBytes(model), TreeliteToBytes(received_model.get()));
ASSERT_TRUE(TreeliteToBytes(model) == TreeliteToBytes(received_model.get()));
Copy link
Collaborator Author

@hcho3 hcho3 May 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT_EQ will dump all the raw bytes into a string, leading to OOM error for this test case. So use ASSERT_TRUE instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to this effect so that someone doesn't naively reintroduce this problem down the line?

Comment on lines +192 to +195
tree->SetLeafNode(3, frontend::Value::Create<LeafOutputType>(tree_id + 3));
tree->SetLeafNode(4, frontend::Value::Create<LeafOutputType>(tree_id + 1));
tree->SetLeafNode(5, frontend::Value::Create<LeafOutputType>(tree_id + 4));
tree->SetLeafNode(6, frontend::Value::Create<LeafOutputType>(tree_id + 2));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I crafted the test case so that all the trees will have different leaf outputs. The bug being addressed causes all trees to be identical.

@codecov-commenter
Copy link

codecov-commenter commented May 6, 2021

Codecov Report

Merging #273 (561cb0e) into mainline (fdf09b7) will increase coverage by 0.36%.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff               @@
##             mainline     #273      +/-   ##
==============================================
+ Coverage       84.31%   84.68%   +0.36%     
  Complexity         46       46              
==============================================
  Files              97       97              
  Lines            7442     7618     +176     
  Branches           50       50              
==============================================
+ Hits             6275     6451     +176     
  Misses           1142     1142              
  Partials           25       25              
Impacted Files Coverage Δ Complexity Δ
include/treelite/tree_impl.h 90.08% <100.00%> (+0.02%) 0.00 <0.00> (ø)
tests/cpp/test_serializer.cc 100.00% <100.00%> (ø) 0.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdf09b7...561cb0e. Read the comment docs.

@@ -925,6 +925,7 @@ ModelImpl<ThresholdType, LeafOutputType>::InitFromPyBuffer(

auto tree_hanlder = [&begin](Tree<ThresholdType, LeafOutputType>& tree) {
tree.InitFromPyBuffer(begin, begin + kNumFramePerTree);
begin += kNumFramePerTree;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're capturing begin by reference, won't this iterate begin forward as a side effect of executing the lambda? It's not immediately clear to me why that is the correct thing to do with this handler. Maybe add a comment or refactor such that the lambda returns the next iterator to be processed rather than directly changing begin?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're capturing begin by reference, won't this iterate begin forward as a side effect of executing the lambda?

Yes, this is intended behavior.

refactor such that the lambda returns the next iterator to be processed

That would not be feasible, since the tree handler for the file stream does not involve iterators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Can we get a comment just to make it clear to future developers that the side-effect is intentional?

@@ -44,7 +44,7 @@ inline void TestRoundTrip(treelite::Model* model) {
std::unique_ptr<treelite::Model> received_model = treelite::Model::DeserializeFromFile(fp);
std::fclose(fp);

ASSERT_EQ(TreeliteToBytes(model), TreeliteToBytes(received_model.get()));
ASSERT_TRUE(TreeliteToBytes(model) == TreeliteToBytes(received_model.get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to this effect so that someone doesn't naively reintroduce this problem down the line?

@hcho3 hcho3 merged commit 19fcbc7 into dmlc:mainline May 11, 2021
@hcho3 hcho3 deleted the hotfix_binary_serializer branch May 11, 2021 19:09
This was referenced May 12, 2021
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request May 14, 2021
Upgrade to Treelite 1.3.0 to take advantage of the following new features:

* Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes #3768
* Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273)
* [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274)

Make progress towards #3853

Depends on rapidsai/integration#270

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3855
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Upgrade to Treelite 1.3.0 to take advantage of the following new features:

* Faster model import for scikit-learn tree models (dmlc/treelite#264). Fixes rapidsai#3768
* Binary serializer to a file stream (dmlc/treelite#270, dmlc/treelite#273)
* [EXPERIMENTAL] Add GTIL, reference inference backend (dmlc/treelite#274)

Make progress towards rapidsai#3853

Depends on rapidsai/integration#270

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants