Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-label classification dataset and metric #1572

Merged
merged 11 commits into from
Apr 5, 2024

Conversation

laggui
Copy link
Member

@laggui laggui commented Apr 3, 2024

Pull Request Template

Checklist

  • Confirmed that run-checks all script has been executed.

Related Issues/PRs

Progress towards #1526.
Fine-tuning example is almost complete (locally).

Changes

Added ImageFolderDataset::new_multilabel_classification_with_items and HammingScore multi-label accuracy metric

  • Added Annotation::MultiLabel(Vec<usize>) for multi-label classification
  • Added AnnotationRaw enum to de/serialize different supported annotation types with bincode
  • Refactored ImageFolderDataset new methods to use with_items
  • Added HammingScore metric and MultiLabelClassificationOutput to handle multi-label outputs

Testing

New unit tests for dataset methods and hamming score metric.

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 92.15686% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 86.34%. Comparing base (0978c8a) to head (a63fa6e).
Report is 3 commits behind head on main.

Files Patch % Lines
crates/burn-train/src/metric/hamming.rs 87.87% 12 Missing ⚠️
crates/burn-train/src/learner/classification.rs 0.00% 7 Missing ⚠️
crates/burn-dataset/src/vision/image_folder.rs 99.32% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1572      +/-   ##
==========================================
- Coverage   86.53%   86.34%   -0.19%     
==========================================
  Files         684      687       +3     
  Lines       78248    78685     +437     
==========================================
+ Hits        67713    67943     +230     
- Misses      10535    10742     +207     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 111 to 123
fn bin_config() -> bincode::config::Configuration {
bincode::config::standard()
}

fn encode(&self) -> Vec<u8> {
bincode::serde::encode_to_vec(self, Self::bin_config()).unwrap()
}

fn decode(annotation: &[u8]) -> Self {
let (annotation, _): (AnnotationRaw, usize) =
bincode::serde::decode_from_slice(annotation, Self::bin_config()).unwrap();
annotation
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the serialization for what exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided on having annotations as bytes

struct ImageDatasetItemRaw {
    /// Image path.
    image_path: PathBuf,

    /// Image annotation.
    /// The annotation bytes can represent a string (category name) or path to annotation file.
    annotation: Vec<u8>,
}

But now that you mention it... I don't see any need for serialization just to have bytes 😅 we could simply change the annotation type in ImageDatasetItemRaw to the AnnotationRaw enum. And scrap the encode/decode.

Probably needed another coffee when I went over this part ☕

crates/burn-train/src/metric/hamming.rs Outdated Show resolved Hide resolved
@laggui laggui requested a review from nathanielsimard April 4, 2024 13:53
Copy link
Collaborator

@antimora antimora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one change.

crates/burn-train/src/metric/hamming.rs Outdated Show resolved Hide resolved
crates/burn-train/src/metric/hamming.rs Outdated Show resolved Hide resolved
@laggui laggui requested a review from antimora April 4, 2024 16:57
@laggui laggui merged commit f3e0aa6 into main Apr 5, 2024
15 checks passed
@laggui laggui deleted the feat/multilabel-classification branch April 5, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants