Investigate use of the BM25 algorithm to search image titles (original #288) #751
Labels
π» aspect: code
Concerns the software code in the repository
β¨ goal: improvement
Improvement to an existing user-facing feature
π© priority: low
Low priority and doesn't need to be rushed
𧱠stack: ingestion server
Related to the ingestion/data refresh server
β status: blocked
Blocked & therefore, not ready for work
This issue has been migrated from the CC Search API repository
The similarity algorithm used to search titles was switched from BM25 to boolean in cc-archive/cccatalog-api#281 to avoid ranking repeated words in titles higher.
We should investigate switching back to BM25 and set the
k1
tuning value to a low value just for the title field.See cc-archive/cccatalog-api#281 (review) and BM25 algorithm docs for more info.
Original Comments:
annatuma commented on Thu Jan 23 2020:
The text was updated successfully, but these errors were encountered: