Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup loading large yolo datasets. #26

Merged
merged 7 commits into from
Apr 20, 2022
Merged

Speedup loading large yolo datasets. #26

merged 7 commits into from
Apr 20, 2022

Conversation

satheeshkatipomu
Copy link
Contributor

@satheeshkatipomu satheeshkatipomu commented Apr 15, 2022

Currently, loading yolo dataset(~150k) is taking prohibitively long time compared to loading the same dataset in coco format.

I think it is mostly because of yolo annotation format where we don't have actual image width and heights. So we are trying to get the height and width of the images using imagesize.get but required while converting to other formats.

Nothing much, just parallelized for loop using joblib.Parallel using half the available threads by default. Also updated copyright year in the docs.

Speed test on ~150k dataset:
before joblib.Parallel : >180 mins
after joblib.Parallel: ~12 mins

@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2022

Codecov Report

Merging #26 (d105b39) into main (1ed11c4) will increase coverage by 0.29%.
The diff coverage is 95.34%.

@@            Coverage Diff             @@
##             main      #26      +/-   ##
==========================================
+ Coverage   74.98%   75.28%   +0.29%     
==========================================
  Files          18       18              
  Lines        1659     1679      +20     
==========================================
+ Hits         1244     1264      +20     
  Misses        415      415              
Impacted Files Coverage Δ
optical/converter/yolo.py 93.33% <95.23%> (+1.78%) ⬆️
optical/converter/utils.py 88.98% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ed11c4...d105b39. Read the comment docs.

@satheeshkatipomu
Copy link
Contributor Author

satheeshkatipomu commented Apr 17, 2022

docs build is failing because of jinja2 issue, it goes away after updating Sphinx version to 4.5.0. I have committed these changes, docs build works fine locally but during docs build workflow it is still installing old version(maybe using cache).

Update: Fixed it by updating the versions in docs/requirements.txt apart from pyproject.toml.

@satheeshkatipomu satheeshkatipomu marked this pull request as ready for review April 17, 2022 08:44
Copy link
Contributor

@bharatkumarreddy bharatkumarreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks good to me

optical/converter/yolo.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bishwarup307 bishwarup307 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR.

@satheeshkatipomu satheeshkatipomu merged commit b999711 into main Apr 20, 2022
@bishwarup307 bishwarup307 deleted the speedup_yolo branch May 14, 2022 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants