-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kate/splitter #68
Kate/splitter #68
Conversation
6d08cd7
to
e9290f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for a great PR!
I'd suggest fixing code formatting (spaces in arguments) and string formatting to keep the project code in a single style.
datumaro/plugins/splitter.py
Outdated
def __init__(self, dataset, seed): | ||
super().__init__(dataset) | ||
|
||
np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this should be done closer to the actual use, because a lot can happen after constructor is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random shuffling occurs several times.
Do you think I really need to set seed whenever I use the shuffling function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure the results are reproducible. I suggest moving seed setting as much close to the point of use, as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, I'll add test codes to check the results are reproducible.
I need help regarding complexity. $ python -m mccabe datumaro/plugins/splitter.py --min 10
188:4: 'SplitforMatchingReID.__init__' 16
307:4: 'SplitforMatchingReID._rebalancing' 12
370:4: 'SplitforDetection.__init__' 19 But Codacy gives 35 for SplitforMatchingReID.init and 20 for SplitforDetection.init |
Complexity from these tools is not something we look at. Don't bother with this. |
2b673b6
to
cbe6dfe
Compare
for idx, item in enumerate(self._extractor): | ||
for subset, split in by_groups.items(): | ||
if idx in split: | ||
group_id = self._group_map[subset] | ||
item.annotations[0].group = group_id | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I use 'group' property to split 'test' subset into 'query' and 'gallery' set.
It seems that the default subset doesn't support a hierarchy.
In the original implementation, we add "Auxiliary" attribute to the annotation.
Do you think I have to change the current implementation w.r.t the original one?
Or do I have to implement the subset hierarchy?
Please share your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there in no exact support for split hierarchy yet. To do what you want (as a workaround?), I'd create 2 extra subsets - test_gallery
and test_query
, this would look quite natural (at least, in Datumaro). Technically, subsets are extractors themselves, so it is possible to extend the implementation with some nesting. But it would require lots of changes in the importing/exporting code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the gallery and query sets are not subsets. They're a kind of tag.
I need more time to resolve this, so let's leave this as it is at the moment.
I'll resolve this issue while applying the command line feature.
489e5eb
to
b712db3
Compare
b712db3
to
7c74aa7
Compare
I've rebased and squashed some commits. |
…er for classification
Summary
Added task-specific annotation splitters.
How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.