Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update plot sample to 1000 rows #458

Merged
merged 7 commits into from
Mar 21, 2024
Merged

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Mar 18, 2024

In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.
@tswast tswast requested a review from chelsea-lin March 18, 2024 20:58
@tswast tswast requested review from a team as code owners March 18, 2024 20:58
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Mar 18, 2024
@@ -45,7 +45,7 @@ def generate(self) -> None:

def _compute_plot_data(self, data):
# TODO: Cache the sampling data in the PlotAccessor.
sampling_n = self.kwargs.pop("sampling_n", 100)
sampling_n = self.kwargs.pop("sampling_n", 1000)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 500 or something would be better? 640x480 was a very common resolution in the 1990s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated how sample size affects the shape of a dataset. My findings (see document: https://docs.google.com/document/d/1KaIF7zX-7seXsb-rohl56jYjhjFdLc-HlNOYfTM1Zfw/edit?tab=t.0) indicate that:

  • Samples of size 500 (sampling_n=500) broadly reflect the same shape as a sample of 1000.
  • A sample size of 1000 yields a closer approximation to the true underlying distribution.

Maybe we can proceed with a sample size of 1000? Visualize the denser data in the graph for a more informative representation?

@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 19, 2024
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 20, 2024
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Mar 20, 2024
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 20, 2024
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 20, 2024
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Mar 21, 2024
@gcf-merge-on-green gcf-merge-on-green bot merged commit 60d4a7b into main Mar 21, 2024
14 of 15 checks passed
@gcf-merge-on-green gcf-merge-on-green bot deleted the tswast-patch-1 branch March 21, 2024 23:02
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants