Onboard text classification inputs to new object #1022

ethanwharris · 2021-12-03T01:07:32Z

What does this PR do?

Part of #964

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? [not needed for typos/docs]
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

tchaton

Looks good, some comments.

flash/text/classification/data.py

tchaton · 2021-12-03T09:05:08Z

flash/text/classification/data.py

-        element[DataKeys.TARGET] = targets
+    def _resolve_target(target_keys: Union[str, List[str]], element: Dict[str, Any]) -> Dict[str, Any]:
+        if not isinstance(target_keys, List):
+            element[DataKeys.TARGET] = element.pop(target_keys)


Would this work as expected ?

This is just a preprocessing step before our own target handling mechanism. So it doesn't matter what comes out here as we will later handle it properly with our classification utilities.

tchaton · 2021-12-03T09:06:33Z

flash/text/classification/data.py

+            self.load_target_metadata(targets)
+
+            # If we had binary multi-class targets then we also know the labels (column names)
+            if self.target_mode is TargetMode.MULTI_BINARY and isinstance(target_keys, List):


Why do you override the state there ? If the state is shared among train, val. This would override the train one.

The classification state is only ever shared from the train data (becuase labels are always inferred from the train data rather than val. or test). In this case, the labels are binary, so we assume that a one in a column corresponds to having that label. So the labels inferred by our classification utils are None but we replace them with the target keys from the dataset.

flash/text/classification/data.py

tchaton · 2021-12-03T09:10:51Z

flash/text/classification/model.py

+class TextClassificationBackboneState(ProcessState):
+    """The ``TextClassificationBackboneState`` records the ``backbone`` in use by the ``TextClassifier``."""
+
+    backbone: str


Do we want to expend the state to anything behind passed along ?

flash/text/seq2seq/core/data.py

codecov · 2021-12-03T13:53:29Z

Codecov Report

Merging #1022 (c0c8789) into master (5dd695f) will decrease coverage by 5.85%.
The diff coverage is 96.15%.

@@            Coverage Diff             @@
##           master    #1022      +/-   ##
==========================================
- Coverage   87.09%   81.24%   -5.86%     
==========================================
  Files         254      254              
  Lines       13778    13735      -43     
==========================================
- Hits        12000    11159     -841     
- Misses       1778     2576     +798

Flag	Coverage Δ
unittests	`81.24% <96.15%> (-5.86%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
flash/image/classification/data.py	`89.39% <ø> (-4.05%)`	⬇️
flash/text/classification/cli.py	`100.00% <ø> (ø)`
flash/text/seq2seq/core/data.py	`88.35% <75.00%> (-1.65%)`	⬇️
flash/core/data/io/input.py	`51.46% <100.00%> (ø)`
flash/core/data/splits.py	`97.14% <100.00%> (+0.47%)`	⬆️
flash/core/integrations/labelstudio/input.py	`88.57% <100.00%> (-0.32%)`	⬇️
flash/text/classification/data.py	`98.44% <100.00%> (+1.43%)`	⬆️
flash/text/classification/model.py	`93.44% <100.00%> (+1.60%)`	⬆️
flash/core/integrations/icevision/transforms.py	`12.50% <0.00%> (-77.78%)`	⬇️
...lash/image/embedding/vissl/transforms/multicrop.py	`27.27% <0.00%> (-69.70%)`	⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dd695f...c0c8789. Read the comment docs.

review-notebook-app · 2021-12-06T21:16:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ethanwharris added 2 commits December 3, 2021 00:46

Onboard text classification inputs to new object

e91fe2b

Fixes

1e2c29e

ethanwharris requested review from ananyahjha93, Borda, carmocca, justusschock, kaushikb11 and tchaton as code owners December 3, 2021 01:07

Update data.py

2b6928f

ethanwharris added Data Pipeline V2 Refactor (Functional) labels Dec 3, 2021

tchaton approved these changes Dec 3, 2021

View reviewed changes

ethanwharris added 3 commits December 3, 2021 12:04

Updates

4f0bc7c

Fixes

ea70ac5

Docs

313f012

ethanwharris requested a review from edenlightning as a code owner December 3, 2021 12:32

ethanwharris added 5 commits December 3, 2021 12:46

Fixes

965f19a

Fixes

5c4ac00

Fixes

1b83dd6

Fixes

05f06c9

Fix

221b52d

ethanwharris changed the title ~~[WIP] Onboard text classification inputs to new object~~ Onboard text classification inputs to new object Dec 3, 2021

ethanwharris and others added 6 commits December 3, 2021 14:00

Fixes

53f0fbc

Merge branch 'master' into feature/text_classification_inputs

5ffe829

Merge branch 'master' into feature/text_classification_inputs

a238858

Fixes

db98968

Fix notebook

c2cf61e

Update CHANGELOG.md

bd10545

ethanwharris added 2 commits December 6, 2021 22:13

Try fix

2f7d5e4

fix

c0c8789

ethanwharris merged commit 159cd98 into master Dec 6, 2021

ethanwharris deleted the feature/text_classification_inputs branch December 6, 2021 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onboard text classification inputs to new object #1022

Onboard text classification inputs to new object #1022

ethanwharris commented Dec 3, 2021 •

edited

Loading

tchaton left a comment

tchaton Dec 3, 2021

ethanwharris Dec 3, 2021

tchaton Dec 3, 2021

ethanwharris Dec 3, 2021

tchaton Dec 3, 2021

codecov bot commented Dec 3, 2021 •

edited

Loading

review-notebook-app bot commented Dec 6, 2021

Onboard text classification inputs to new object #1022

Onboard text classification inputs to new object #1022

Conversation

ethanwharris commented Dec 3, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

tchaton left a comment

Choose a reason for hiding this comment

tchaton Dec 3, 2021

Choose a reason for hiding this comment

ethanwharris Dec 3, 2021

Choose a reason for hiding this comment

tchaton Dec 3, 2021

Choose a reason for hiding this comment

ethanwharris Dec 3, 2021

Choose a reason for hiding this comment

tchaton Dec 3, 2021

Choose a reason for hiding this comment

codecov bot commented Dec 3, 2021 • edited Loading

Codecov Report

review-notebook-app bot commented Dec 6, 2021

ethanwharris commented Dec 3, 2021 •

edited

Loading

codecov bot commented Dec 3, 2021 •

edited

Loading