Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi-threaded Batch Document #1857

Merged
merged 3 commits into from
May 2, 2022
Merged

Conversation

HAKSOAT
Copy link
Contributor

@HAKSOAT HAKSOAT commented Apr 26, 2022

This addresses #1778

@HAKSOAT
Copy link
Contributor Author

HAKSOAT commented Apr 26, 2022

Hi, @lintool I have a PR in progress. I need your thoughts on some things:

  1. The original document methods L663 and L680 take in an integer and a string respectively. However, I'm not very sure how I can go about doing this without creating two batchDocument methods with different names. Hence, I'd like to know what the best practice is for Anserini. The PR only currently tackles batch for internal lucene docs and I'd like to add something for collection docid too.

  2. After compiling I saw a bunch of modified markdown files. Will I push all of them as well? I didn't push because I didn't want them to mess up the PR review.

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2022

Codecov Report

Merging #1857 (54e5baa) into master (1842eef) will increase coverage by 0.01%.
The diff coverage is 66.66%.

@@             Coverage Diff              @@
##             master    #1857      +/-   ##
============================================
+ Coverage     57.70%   57.71%   +0.01%     
- Complexity     1055     1058       +3     
============================================
  Files           179      179              
  Lines         10232    10250      +18     
  Branches       1407     1409       +2     
============================================
+ Hits           5904     5916      +12     
- Misses         3836     3841       +5     
- Partials        492      493       +1     
Impacted Files Coverage Δ
...c/main/java/io/anserini/search/SimpleSearcher.java 64.15% <66.66%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1842eef...54e5baa. Read the comment docs.

@lintool
Copy link
Member

lintool commented May 1, 2022

hi @HAKSOAT - thanks for this PR.

Let's go with a single method for now, called batchGetDocument. It should take a list of (external) collection ids, which is the more common case.

If you update and merge in current master, it should get rid of the new markdown edits?

Thanks!

@lintool
Copy link
Member

lintool commented May 1, 2022

lg, but can you please build a fatjar, copy over to pyserini, and make sure it connects up on the pyserini end? i.e., let's build the bindings on the pyserini end just to confirm everything works e2e.

@HAKSOAT
Copy link
Contributor Author

HAKSOAT commented May 1, 2022

lg, but can you please build a fatjar, copy over to pyserini, and make sure it connects up on the pyserini end? i.e., let's build the bindings on the pyserini end just to confirm everything works e2e.

Noted.

@HAKSOAT
Copy link
Contributor Author

HAKSOAT commented May 1, 2022

lg, but can you please build a fatjar, copy over to pyserini, and make sure it connects up on the pyserini end? i.e., let's build the bindings on the pyserini end just to confirm everything works e2e.

It works. I'll put a PR through for this on Pyserini.

@HAKSOAT HAKSOAT changed the title Add Multi-threaded Batch Document (WIP) Add Multi-threaded Batch Document May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants