docs: add snippet for creating boosted tree model #1142

rey-esp · 2024-11-11T15:42:14Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

tswast · 2024-11-15T18:25:47Z

samples/snippets/classification_boosted_tree_model_test.py

+    import bigframes.ml.linear_model
+
+    # input_data is defined in an earlier step.
+    training_data = input_data[input_data["dataframe"] == "training"]


No action needed, but something to consider for future: it would be nice to update the prepare section above to work without referencing an index (e.g. when ordering mode = "partial").

We have a few options, but the easiest will be to start with a string column and add (True, "training") as the last in the list of cases.

Aside: we have an issue open (349926559) to allow selecting any column in the dataframe (such as functional_weight, which would be a natural choice in this example) even if its a different type, so long as a True (default) case is provided.

tswast · 2024-11-15T18:29:39Z

samples/snippets/classification_boosted_tree_model_test.py

+    # input_data is defined in an earlier step.
+    training_data = input_data[input_data["dataframe"] == "training"]
+    X = training_data.drop(columns=["income_bracket", "dataframe"])
+    y = training_data["income_bracket"]


Presumably you ran this code sample and it worked OK? I remember we had some bugs where y had to be a DataFrame not a Series in past, so just double-checking.

The code sample seems to run! Not sure if I did it right so here's the colab: https://colab.sandbox.google.com/drive/10jA6zSRiptXWrTkCcmyCT_sYBjLqGJx0?resourcekey=0-0TrIkmDzAJw_F6ONFikwaA#scrollTo=wU367u1SAj3Y

tswast · 2024-11-15T18:34:15Z

samples/snippets/classification_boosted_tree_model_test.py

+    census_model = bigframes.ml.linear_model.LogisticRegression(
+        # model_type="BOOSTED_TREE_CLASSIFIER",
+        # booster_type="gbtree",
+        max_iterations=50,
+    )


I don't think we should be doing LogisticRegression here. In the SQL we do use model_type='BOOSTED_TREE_CLASSIFIER', but in BigQuery DataFrames we normally use separate Python classes to represent the different model types.

A few ways to discover which class to use:

Search our code for BOOSTED_TREE_CLASSIFIER

Google search for boosted trees BigFrames

These should give you some strong hints as to which class to use instead.

Copying this from an internal comment I made for visibility:

Just like scikit-learn, it's one of the "ensemble" methods: https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble

Normally we try to use the scikit-learn class names too, but I think we may have added this class before GradientBoostingClassifier was in scikit-learn

snippet-bot · 2024-11-19T20:46:36Z

Here is the summary of changes.

You are about to add 1 region tag.

samples/snippets/classification_boosted_tree_model_test.py:42, tag bigquery_dataframes_bqml_boosted_tree_create

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

Refresh this comment

docs: create boosted tree model

b67616f

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Nov 11, 2024

merge main

1010bee

product-auto-label bot added size: s Pull request size is small. and removed size: m Pull request size is medium. labels Nov 11, 2024

rey-esp added 4 commits November 12, 2024 13:41

Merge branch 'main' into b338872698-bigframes-v1

b734076

Merge branch 'main' into b338872698-bigframes-v1

016f21b

Merge branch 'main' into b338872698-bigframes-v1

922002e

merge main

669bc74

tswast reviewed Nov 15, 2024

View reviewed changes

tswast requested changes Nov 15, 2024

View reviewed changes

tswast mentioned this pull request Nov 15, 2024

docs: add snippet for predicting classifications using a boosted tree model #1156

Merged

4 tasks

GarrettWu self-requested a review November 18, 2024 21:49

rey-esp added 2 commits November 19, 2024 17:12

Merge branch 'main' into b338872698-bigframes-v1

a9c2fdd

update model

b113a57

tswast approved these changes Nov 19, 2024

View reviewed changes

update test

a8ac72d

tswast marked this pull request as ready for review November 19, 2024 20:42

tswast requested review from a team as code owners November 19, 2024 20:42

tswast requested a review from m-strzelczyk November 19, 2024 20:42

blunderbuss-gcf bot assigned orrbradford Nov 19, 2024

rey-esp added the owlbot:run Add this label to trigger the Owlbot post processor. label Nov 19, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Nov 19, 2024

rey-esp merged commit a972668 into main Nov 19, 2024
23 checks passed

rey-esp deleted the b338872698-bigframes-v1 branch November 19, 2024 22:32

release-please bot mentioned this pull request Nov 19, 2024

chore(main): release 1.28.0 #1159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add snippet for creating boosted tree model #1142

docs: add snippet for creating boosted tree model #1142

rey-esp commented Nov 11, 2024

tswast Nov 15, 2024

tswast Nov 15, 2024

rey-esp Nov 15, 2024

tswast Nov 15, 2024

tswast Nov 18, 2024 •

edited

Loading

snippet-bot bot commented Nov 19, 2024

docs: add snippet for creating boosted tree model #1142

docs: add snippet for creating boosted tree model #1142

Conversation

rey-esp commented Nov 11, 2024

tswast Nov 15, 2024

Choose a reason for hiding this comment

tswast Nov 15, 2024

Choose a reason for hiding this comment

rey-esp Nov 15, 2024

Choose a reason for hiding this comment

tswast Nov 15, 2024

Choose a reason for hiding this comment

tswast Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

snippet-bot bot commented Nov 19, 2024

tswast Nov 18, 2024 •

edited

Loading