-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add snippet for creating boosted tree model #1142
Changes from all commits
b67616f
1010bee
b734076
016f21b
922002e
669bc74
a9c2fdd
b113a57
a8ac72d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ | |
|
||
|
||
def test_boosted_tree_model(random_model_id: str) -> None: | ||
# your_model_id = random_model_id | ||
your_model_id = random_model_id | ||
# [START bigquery_dataframes_bqml_boosted_tree_prepare] | ||
import bigframes.pandas as bpd | ||
|
||
|
@@ -39,4 +39,28 @@ def test_boosted_tree_model(random_model_id: str) -> None: | |
) | ||
del input_data["functional_weight"] | ||
# [END bigquery_dataframes_bqml_boosted_tree_prepare] | ||
# [START bigquery_dataframes_bqml_boosted_tree_create] | ||
from bigframes.ml import ensemble | ||
|
||
# input_data is defined in an earlier step. | ||
training_data = input_data[input_data["dataframe"] == "training"] | ||
X = training_data.drop(columns=["income_bracket", "dataframe"]) | ||
y = training_data["income_bracket"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably you ran this code sample and it worked OK? I remember we had some bugs where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code sample seems to run! Not sure if I did it right so here's the colab: https://colab.sandbox.google.com/drive/10jA6zSRiptXWrTkCcmyCT_sYBjLqGJx0?resourcekey=0-0TrIkmDzAJw_F6ONFikwaA#scrollTo=wU367u1SAj3Y |
||
|
||
# create and train the model | ||
census_model = ensemble.XGBClassifier( | ||
n_estimators=1, | ||
booster="gbtree", | ||
tree_method="hist", | ||
max_iterations=1, # For a more accurate model, try 50 iterations. | ||
subsample=0.85, | ||
) | ||
census_model.fit(X, y) | ||
|
||
census_model.to_gbq( | ||
your_model_id, # For example: "your-project.census.census_model" | ||
replace=True, | ||
) | ||
# [END bigquery_dataframes_bqml_boosted_tree_create] | ||
assert input_data is not None | ||
assert census_model is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action needed, but something to consider for future: it would be nice to update the
prepare
section above to work without referencing an index (e.g. when ordering mode = "partial").We have a few options, but the easiest will be to start with a string column and add
(True, "training")
as the last in the list of cases.Aside: we have an issue open (349926559) to allow selecting any column in the dataframe (such as
functional_weight
, which would be a natural choice in this example) even if its a different type, so long as aTrue
(default) case is provided.