-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common operations tutorial #395
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #395 +/- ##
=======================================
Coverage 95.53% 95.53%
=======================================
Files 25 25
Lines 1702 1702
=======================================
Hits 1626 1626
Misses 76 76 ☔ View full report in Codecov by Sentry. |
257efa0
to
05dd6f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
A handful of small nits to be consistent in styling (which I probably am inconsistent about myself).
One final nit is to edit the PR title to remove to remove the "WIP"
"In this notebook, we'll highlight a handful of common dataframe operations that can be performed within `TAPE`. \n", | ||
"\n", | ||
"> **_Note:_**\n", | ||
"TAPE extends the `Pandas`/`Dask` API, and so users familiar with those APIs can expect many operations to be near-identical when working with `TAPE`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use TAPE and TAPE
in the same line.
Do we want to always have it within ``?
"source": [ | ||
"### Filtering by Number of Observations\n", | ||
"\n", | ||
"Filters based on number of observations are more directly supported within the TAPE API. First, using a dedicated function to calculate the number of observations per lightcurve, `Ensemble.calc_nobs()`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another s/TAPE/TAPE
"### Using `Compute()` to view the data\n", | ||
"\n", | ||
"When an `EnsembleFrame` contents are small enough to fit into memory, you can use `compute()` to view the actual data.\n", | ||
"\n", | ||
"> **_Note:_**\n", | ||
"`Compute()` also involves actual computation of the in-memory data, working on any loading/filtering/analysis needed to produce the result, as such this can take a long time! " | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would prefer being consistent about keeping compute()
lowercase
"source": [ | ||
"### Applying Functions with Batch\n", | ||
"\n", | ||
"The `Ensemble` provides a powerful batching interface, `Ensemble.batch`, with in-built parallelization (provided the input data is in multiple partitions)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could probably be consistent in using Ensemble.batch()
vs Ensemble.batch
Personally I prefer the former
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Using `Persist()` to Save Computation Time\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In this cell we can be consistent about using persist()
instead of Persist()
and compute()
instead of Compute()
"### Sampling\n", | ||
"\n", | ||
"\n", | ||
"In addition to filtering by specific constraints, it's possible to select a subset of your data to work with. `Ensemble.sample` will randomly select a fraction of objects from the full object list. This will return a new\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to be consistent we should probably use Ensemble.sample()
Change Description
Partially addresses Break up the "Working with the Ensemble Tutorial" #392 by adding a dedicated notebook for showing a set of dataframe-like operations that we expect users will commonly use with TAPE workflows.
Solution Description
Code Quality
Project-Specific Pull Request Checklists
Bug Fix Checklist
New Feature Checklist
Documentation Change Checklist
Build/CI Change Checklist
Other Change Checklist