Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common operations tutorial #395

Merged
merged 8 commits into from
Mar 15, 2024
Merged

Common operations tutorial #395

merged 8 commits into from
Mar 15, 2024

Conversation

dougbrn
Copy link
Collaborator

@dougbrn dougbrn commented Mar 14, 2024

Change Description

Solution Description

Code Quality

  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

  • I have added a function that requires a sync_tables command, and have added the neccesary sync_tables call

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Build/CI Change Checklist

  • If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
  • If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

  • Any new or updated docstrings use the NumPy docstring format.
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover any changes
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

github-actions bot commented Mar 14, 2024

Before [db9490e] After [4a26242] Ratio Benchmark (Parameter)
45.3±0.5ms 45.6±0.4ms 1 benchmarks.time_batch
48.5±0.9ms 47.5±0.6ms 0.98 benchmarks.time_prune_sync_workflow

Click here to view all benchmarks.

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.53%. Comparing base (b1be5c9) to head (0ce7932).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #395   +/-   ##
=======================================
  Coverage   95.53%   95.53%           
=======================================
  Files          25       25           
  Lines        1702     1702           
=======================================
  Hits         1626     1626           
  Misses         76       76           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dougbrn dougbrn force-pushed the common_operations branch from 257efa0 to 05dd6f1 Compare March 14, 2024 21:44
@dougbrn dougbrn requested a review from wilsonbb March 14, 2024 22:15
@dougbrn dougbrn marked this pull request as ready for review March 14, 2024 22:15
@dougbrn dougbrn changed the title WIP: common operations tutorial Common operations tutorial Mar 14, 2024
Copy link
Collaborator

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

A handful of small nits to be consistent in styling (which I probably am inconsistent about myself).

One final nit is to edit the PR title to remove to remove the "WIP"

"In this notebook, we'll highlight a handful of common dataframe operations that can be performed within `TAPE`. \n",
"\n",
"> **_Note:_**\n",
"TAPE extends the `Pandas`/`Dask` API, and so users familiar with those APIs can expect many operations to be near-identical when working with `TAPE`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use TAPE and TAPE in the same line.

Do we want to always have it within ``?

"source": [
"### Filtering by Number of Observations\n",
"\n",
"Filters based on number of observations are more directly supported within the TAPE API. First, using a dedicated function to calculate the number of observations per lightcurve, `Ensemble.calc_nobs()`"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another s/TAPE/TAPE

Comment on lines 94 to 100
"### Using `Compute()` to view the data\n",
"\n",
"When an `EnsembleFrame` contents are small enough to fit into memory, you can use `compute()` to view the actual data.\n",
"\n",
"> **_Note:_**\n",
"`Compute()` also involves actual computation of the in-memory data, working on any loading/filtering/analysis needed to produce the result, as such this can take a long time! "
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would prefer being consistent about keeping compute() lowercase

"source": [
"### Applying Functions with Batch\n",
"\n",
"The `Ensemble` provides a powerful batching interface, `Ensemble.batch`, with in-built parallelization (provided the input data is in multiple partitions)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could probably be consistent in using Ensemble.batch() vs Ensemble.batch

Personally I prefer the former

"cell_type": "markdown",
"metadata": {},
"source": [
"### Using `Persist()` to Save Computation Time\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In this cell we can be consistent about using persist() instead of Persist() and compute() instead of Compute()

"### Sampling\n",
"\n",
"\n",
"In addition to filtering by specific constraints, it's possible to select a subset of your data to work with. `Ensemble.sample` will randomly select a fraction of objects from the full object list. This will return a new\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to be consistent we should probably use Ensemble.sample()

@dougbrn dougbrn merged commit 05400ae into main Mar 15, 2024
10 checks passed
@dougbrn dougbrn deleted the common_operations branch April 4, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants