Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](inverted index) Optimize the compression of inverted index position information #242

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

zzzxl1993
Copy link
Collaborator

@zzzxl1993 zzzxl1993 commented Oct 12, 2024

  1. Optimize position information in the inverted index

@zzzxl1993
Copy link
Collaborator Author

run buildall


size_t P4DEC(unsigned char *__restrict in, size_t n, uint32_t *__restrict out);
size_t P4NZDEC(unsigned char *__restrict in, size_t n, uint32_t *__restrict out);
size_t P4ENC(uint32_t *__restrict in, size_t n, unsigned char *__restrict out);
size_t P4NZENC(uint32_t *__restrict in, size_t n, unsigned char *__restrict out);

class PforUtil {
public:
static constexpr size_t blockSize = 128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change block size to 128 from 512?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockSize is only used in position compression

{
if (!readers.empty()) {
auto release_readers = [this]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we release reader here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release memory in the try->catch->finally code

@@ -86,6 +87,62 @@ class TermDocsBuffer {
IndexVersion indexVersion_ = IndexVersion::kV0;
};

class TermPostingsBuffer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this buffer? index input already has buffer inside.

@zzzxl1993 zzzxl1993 force-pushed the 202410121538 branch 3 times, most recently from 7a42b2c to dd08cae Compare October 14, 2024 08:43
@zzzxl1993
Copy link
Collaborator Author

run buildall

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need clucene UT test here, for example, set IndexVersion v1 and v2

@zzzxl1993
Copy link
Collaborator Author

run buildall

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@airborne12 airborne12 merged commit 9882657 into apache:clucene Oct 17, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants