Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Onboard text classification inputs to new object #1022

Merged
merged 19 commits into from
Dec 6, 2021

Conversation

ethanwharris
Copy link
Collaborator

@ethanwharris ethanwharris commented Dec 3, 2021

What does this PR do?

Part of #964

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some comments.

flash/text/classification/data.py Outdated Show resolved Hide resolved
element[DataKeys.TARGET] = targets
def _resolve_target(target_keys: Union[str, List[str]], element: Dict[str, Any]) -> Dict[str, Any]:
if not isinstance(target_keys, List):
element[DataKeys.TARGET] = element.pop(target_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work as expected ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a preprocessing step before our own target handling mechanism. So it doesn't matter what comes out here as we will later handle it properly with our classification utilities.

self.load_target_metadata(targets)

# If we had binary multi-class targets then we also know the labels (column names)
if self.target_mode is TargetMode.MULTI_BINARY and isinstance(target_keys, List):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you override the state there ? If the state is shared among train, val. This would override the train one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification state is only ever shared from the train data (becuase labels are always inferred from the train data rather than val. or test). In this case, the labels are binary, so we assume that a one in a column corresponds to having that label. So the labels inferred by our classification utils are None but we replace them with the target keys from the dataset.

flash/text/classification/data.py Outdated Show resolved Hide resolved
class TextClassificationBackboneState(ProcessState):
"""The ``TextClassificationBackboneState`` records the ``backbone`` in use by the ``TextClassifier``."""

backbone: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expend the state to anything behind passed along ?

flash/text/seq2seq/core/data.py Show resolved Hide resolved
@ethanwharris ethanwharris changed the title [WIP] Onboard text classification inputs to new object Onboard text classification inputs to new object Dec 3, 2021
@codecov
Copy link

codecov bot commented Dec 3, 2021

Codecov Report

Merging #1022 (c0c8789) into master (5dd695f) will decrease coverage by 5.85%.
The diff coverage is 96.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1022      +/-   ##
==========================================
- Coverage   87.09%   81.24%   -5.86%     
==========================================
  Files         254      254              
  Lines       13778    13735      -43     
==========================================
- Hits        12000    11159     -841     
- Misses       1778     2576     +798     
Flag Coverage Δ
unittests 81.24% <96.15%> (-5.86%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/image/classification/data.py 89.39% <ø> (-4.05%) ⬇️
flash/text/classification/cli.py 100.00% <ø> (ø)
flash/text/seq2seq/core/data.py 88.35% <75.00%> (-1.65%) ⬇️
flash/core/data/io/input.py 51.46% <100.00%> (ø)
flash/core/data/splits.py 97.14% <100.00%> (+0.47%) ⬆️
flash/core/integrations/labelstudio/input.py 88.57% <100.00%> (-0.32%) ⬇️
flash/text/classification/data.py 98.44% <100.00%> (+1.43%) ⬆️
flash/text/classification/model.py 93.44% <100.00%> (+1.60%) ⬆️
flash/core/integrations/icevision/transforms.py 12.50% <0.00%> (-77.78%) ⬇️
...lash/image/embedding/vissl/transforms/multicrop.py 27.27% <0.00%> (-69.70%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dd695f...c0c8789. Read the comment docs.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ethanwharris ethanwharris merged commit 159cd98 into master Dec 6, 2021
@ethanwharris ethanwharris deleted the feature/text_classification_inputs branch December 6, 2021 23:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants