Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make keying of examples explicit. #21777

Merged
merged 9 commits into from
Jun 10, 2022
Merged

Make keying of examples explicit. #21777

merged 9 commits into from
Jun 10, 2022

Conversation

robertwb
Copy link
Contributor

@robertwb robertwb commented Jun 9, 2022

This decouples the keying logic from the DoFn and helps with type inference.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

This decouples the keying logic from the DoFn and helps with type inference.

A MaybeKeyedModelLoader could be added to make this decision dynamically if desired.
@github-actions github-actions bot added the python label Jun 9, 2022
@robertwb
Copy link
Contributor Author

robertwb commented Jun 9, 2022

R: @yeandy

@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #21777 (31c7788) into master (ef7cd0c) will increase coverage by 0.00%.
The diff coverage is 93.54%.

❗ Current head 31c7788 differs from pull request most recent head 1dc8dcc. Consider uploading reports for the commit 1dc8dcc to get more accurate results

@@           Coverage Diff           @@
##           master   #21777   +/-   ##
=======================================
  Coverage   74.01%   74.02%           
=======================================
  Files         698      698           
  Lines       92224    92229    +5     
=======================================
+ Hits        68263    68270    +7     
+ Misses      22710    22708    -2     
  Partials     1251     1251           
Flag Coverage Δ
python 83.59% <93.54%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...thon/apache_beam/ml/inference/pytorch_inference.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/ml/inference/base.py 94.40% <100.00%> (+0.70%) ⬆️
.../python/apache_beam/testing/test_stream_service.py 88.09% <0.00%> (-4.77%) ⬇️
sdks/python/apache_beam/utils/interactive_utils.py 95.12% <0.00%> (-2.44%) ⬇️
.../python/apache_beam/transforms/periodicsequence.py 96.72% <0.00%> (-1.64%) ⬇️
...python/apache_beam/runners/worker/worker_status.py 78.26% <0.00%> (-1.45%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.06% <0.00%> (-1.33%) ⬇️
...hon/apache_beam/runners/direct/test_stream_impl.py 93.28% <0.00%> (-0.75%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.65%) ⬇️
sdks/go/pkg/beam/runners/dataflow/dataflow.go 58.42% <0.00%> (-0.24%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef7cd0c...1dc8dcc. Read the comment docs.

@robertwb
Copy link
Contributor Author

Added maybe keyed class and rebased atop ModelLoader merge. PTAL.

@robertwb
Copy link
Contributor Author

R: @ryanthompson591

@@ -119,7 +119,7 @@ def run_inference(
predictions = model(batched_tensors, **prediction_params)
return [PredictionResult(x, y) for x, y in zip(batch, predictions)]

def get_num_bytes(self, batch: List[torch.Tensor]) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the same Sequence change for sklearn_inference.py lines 78, 89, 97, 116, 121

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -93,6 +95,100 @@ def batch_elements_kwargs(self) -> Mapping[str, Any]:
return {}


class KeyedModelHandler(Generic[KeyT, ExampleT, PredictionT, ModelT],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's any way to make this more readable or simple. All these nested lists are making my eyes a little buggy.

Can we perhaps use constants here?
BASIC_MODEL_HANDLER = ModelHandler[Tuple[KeyT, ExampleT]
KEYED_PREDICTION = Tuple[KeyT, PredictionT]

Or are there any other ways to make some of this templating go away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with this a bit, don't see a way to really make things much simpler. It is possible delete the generic, but then it becomes harder to reason about the order of the nested arguments.

@robertwb robertwb merged commit b8e2e85 into apache:master Jun 10, 2022
AnandInguva added a commit to AnandInguva/beam that referenced this pull request Jun 13, 2022
tvalentyn pushed a commit that referenced this pull request Jun 13, 2022
* refactor code from api to base

* delete api.py

* modify imports

* Add todo to mypy github issue

* Refactor code to reflect changes of  #21777

* Refactor example with KeyedModelHandler

* remove explicit type hints from RunInference class

* Fixup : Lint

* remove TODO to github issue for mypy error

* Add mypy github issue as TODO
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
This decouples the keying logic from the DoFn and helps with type inference.

There is both a KeyedModelHandler that expects keys and a MaybeKeyedModelHandler that preserves the old behavior.
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
* refactor code from api to base

* delete api.py

* modify imports

* Add todo to mypy github issue

* Refactor code to reflect changes of  apache#21777

* Refactor example with KeyedModelHandler

* remove explicit type hints from RunInference class

* Fixup : Lint

* remove TODO to github issue for mypy error

* Add mypy github issue as TODO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants