Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize serialization for ArrayVectors #105893

Merged
merged 4 commits into from
Mar 8, 2024
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 4, 2024

Currently, we serialize blocks and vectors value by value, employing a simple yet effective approach. However, there are specific cases where we can enhance performance by serializing the underlying structure instead:

  1. Serializing BytesRefArray of a BytesRefArrayVector.
  2. Serializing the firstValueIndexes, nullsMask, and the underlying vector of an ArrayBlock instead of rebuilding the block from values.
  3. Serializing BigArrayBlock.

This PR addresses the first bullet point and lays the groundwork for implementing the second.

@dnhatn dnhatn added the WIP label Mar 4, 2024
@dnhatn dnhatn force-pushed the serialize-vectors branch 3 times, most recently from e1f9b7a to 5e18be5 Compare March 7, 2024 18:38
@dnhatn dnhatn force-pushed the serialize-vectors branch from 5e18be5 to 3ff58b7 Compare March 7, 2024 19:11
@@ -30,6 +34,25 @@ final class BytesRefArrayVector extends AbstractVector implements BytesRefVector
this.values = values;
}

static BytesRefArrayVector readArrayVector(int positions, StreamInput in, BlockFactory blockFactory) throws IOException {
final BytesRefArray values = new BytesRefArray(in, blockFactory.bigArrays());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main improvement in this PR. The rest is for a follow-up where we will implement specialized serialization for ArrayBlock.

@dnhatn dnhatn requested review from nik9000 and ChrisHegarty March 7, 2024 20:58
@dnhatn dnhatn marked this pull request as ready for review March 7, 2024 20:58
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 7, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn merged commit 20f5bac into elastic:main Mar 8, 2024
14 checks passed
@dnhatn dnhatn deleted the serialize-vectors branch March 8, 2024 00:55
@dnhatn
Copy link
Member Author

dnhatn commented Mar 8, 2024

Thanks Nik!

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. And how we can evolve the serial form in a BWC way through the initial Boolean/byte. This was always my hope - nice to see it work in reality. Belated LGTM

dnhatn added a commit that referenced this pull request Mar 9, 2024
A follow-up of #105893

Currently, we serialize blocks value by value, which is simple but 
effective. However, it would be more efficient to serialize the
underlying structures of array blocks instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants