Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] check for JSON in first byte of LCA database file #1495

Merged
merged 1 commit into from
May 4, 2021

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 3, 2021

Change LCA_Database.load(...) to look at the first byte of the LCA database file, to see if it's a plausible JSON file.

This avoids costly loading of very large files by the rather dumb JSON loader we're using ;). See #1483

Other, better fixes could include -

  • changing over to a different JSON loader
  • doing more sophisticated checks

but this fix works fine for now!

@codecov
Copy link

codecov bot commented May 3, 2021

Codecov Report

Merging #1495 (f6fdee3) into latest (9b5c08f) will increase coverage by 5.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1495      +/-   ##
==========================================
+ Coverage   89.77%   94.87%   +5.10%     
==========================================
  Files         123       96      -27     
  Lines       19565    15956    -3609     
  Branches     1497     1498       +1     
==========================================
- Hits        17564    15139    -2425     
+ Misses       1775      590    -1185     
- Partials      226      227       +1     
Flag Coverage Δ
python 94.87% <100.00%> (-0.03%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/lca/lca_db.py 91.22% <100.00%> (-1.09%) ⬇️
tests/test_lca.py 99.87% <100.00%> (+<0.01%) ⬆️
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/encodings.rs
src/core/src/wasm.rs
src/core/src/index/storage.rs
src/core/src/index/mod.rs
src/core/tests/test.rs
src/core/src/index/bigsi.rs
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b5c08f...f6fdee3. Read the comment docs.

@ctb
Copy link
Contributor Author

ctb commented May 3, 2021

Ready for review and merge @keyabarve @bluegenes @erikyoung85

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

neat that fp.seek(0) resets the file position at 0!

@ctb
Copy link
Contributor Author

ctb commented May 4, 2021

thanks!

@ctb ctb merged commit 5b698a6 into latest May 4, 2021
@ctb ctb deleted the update/lca_db_load branch May 4, 2021 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants