-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update plot sample to 1000 rows #458
Conversation
In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.
@@ -45,7 +45,7 @@ def generate(self) -> None: | |||
|
|||
def _compute_plot_data(self, data): | |||
# TODO: Cache the sampling data in the PlotAccessor. | |||
sampling_n = self.kwargs.pop("sampling_n", 100) | |||
sampling_n = self.kwargs.pop("sampling_n", 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 500 or something would be better? 640x480 was a very common resolution in the 1990s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated how sample size affects the shape of a dataset. My findings (see document: https://docs.google.com/document/d/1KaIF7zX-7seXsb-rohl56jYjhjFdLc-HlNOYfTM1Zfw/edit?tab=t.0) indicate that:
- Samples of size 500 (sampling_n=500) broadly reflect the same shape as a sample of 1000.
- A sample size of 1000 yields a closer approximation to the true underlying distribution.
Maybe we can proceed with a sample size of 1000? Visualize the denser data in the graph for a more informative representation?
Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot. |
In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕