Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-thread support for scripts/dpr/convert_trec_run_to_retrieval_json.py #1178

Closed
wants to merge 0 commits into from

Conversation

manveertamber
Copy link
Member

For #370

Controlling for everything else it seems using searcher.batch_doc is slower than using searcher.doc. That is to say, I have found using more than one thread leads to this script running slower. However, increasing the number of threads beyond 2 does seem to help.

@lintool
Copy link
Member

lintool commented May 18, 2022

@manveertamber can you run some concrete performance measurements? E.g.,

  • previous impl
  • new impl with 1, 2, 4, 8, ... threads

non-linear scaling is to be expected, I think, since there is inevitably contention for the underlying data structures...

@manveertamber
Copy link
Member Author

manveertamber commented May 18, 2022

@manveertamber can you run some concrete performance measurements? E.g.,

* previous impl

* new impl with 1, 2, 4, 8, ... threads

non-linear scaling is to be expected, I think, since there is inevitably contention for the underlying data structures...

@lintool
Running this on orca:

nohup python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
    --index wikipedia-dpr \
    --topics dpr-nq-test \
    --input runs/run.dpr.nq-test.bm25.trec \
    --output runs/run.dpr.nq-test.bm25.json \
  • previous impl:
    19 minutes 13 seconds

  • new impl:
    19:16 (1 thread, uses searcher.doc)
    52:11 (2 threads)
    43:54 (4 threads)
    37:42 (8 threads)

@lintool
Copy link
Member

lintool commented May 23, 2022

Ack. Let's put this issue on pause.

@HAKSOAT has confirmed that the slowdown happens on the Java end also. We're trying to diagnose why this is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants