Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add predict sample to samples/snippets/bqml_getting_started_test.py #388

Merged
merged 27 commits into from
Mar 8, 2024

Conversation

DevStephanie
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@DevStephanie DevStephanie requested review from a team as code owners February 22, 2024 19:04
Copy link

snippet-bot bot commented Feb 22, 2024

Here is the summary of changes.

You are about to add 2 region tags.
You are about to delete 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 22, 2024
@DevStephanie DevStephanie changed the title Bqml predict1 docs: add predict sample to samples/snippets/bqml_getting_started_test.py Feb 23, 2024
@product-auto-label product-auto-label bot added size: s Pull request size is small. samples Issues that are directly related to samples. and removed size: m Pull request size is medium. labels Feb 23, 2024
}
)
# Use Logistic Regression predict method to, find more information here in
# [BigFrames](/bigframes/latest/bigframes.ml.linear_model.LogisticRegression#bigframes_ml_linear_model_LogisticRegression_predict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it result in a clickable link leading to docs.google.com documentation? Asking because in the other place (line 157) we are using absolute https://... path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not. We should use absolute path here in comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, corrected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: If this has been corrected, your change hasn't been pushed to GitHub yet.


predictions = model.predict(features)

visitor_id = predictions.groupby(["country"])[["predicted_transactions"]].sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we call it visitor_id here, looks same as countries few lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another query that asks for predictions for visitors, so the query looks almost identical except for that one change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's call SUM(predicted_label) as total_predicted_purchases in SQL. https://cloud.google.com/bigquery/docs/create-machine-learning-model#run_the_mlpredict_query Let's use the same name as SQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will correct that.


operatingSystem = df["device"].struct.field("operatingSystem")
operatingSystem = operatingSystem.fillna("")
isMobile = df["device"].struct.field("isMobile")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use snake_case for this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will correct that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to change that variable name for the earlier part of the code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Feb 28, 2024
"os": operatingSystem,
"is_mobile": isMobile,
"os": operating_system,
"isMobile": is_mobile,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "is_mobile" string didn't need to be changed, but if you do you must change it everywhere.

@tswast tswast mentioned this pull request Mar 6, 2024
4 tasks
@@ -151,7 +143,7 @@ def test_bqml_getting_started(random_model_id):
# - log_loss — The loss function used in a logistic regression. This is the measure of how far the
# model's predictions are from the correct labels.

# - roc_auc — The area under the ROC curve. This is the probability that a classifier is more confident that
# - roc_auc — The area under the ROC curve. This is the probability that a classifier is morepy confident that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: morepy

Suggested change
# - roc_auc — The area under the ROC curve. This is the probability that a classifier is morepy confident that
# - roc_auc — The area under the ROC curve. This is the probability that a classifier is more confident that

"pageviews": pageviews,
}
)
# Use Logistic Regression predict method to, find more information here in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete sentence.

@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 6, 2024
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 7, 2024
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 7, 2024
@tswast tswast requested a review from shobsi March 7, 2024 16:48
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 7, 2024
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 7, 2024
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 8, 2024
@tswast tswast merged commit 6a3b0cc into main Mar 8, 2024
10 of 13 checks passed
@tswast tswast deleted the bqml_predict1 branch March 8, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants