Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify merging enrich output #107018

Merged
merged 1 commit into from
Apr 9, 2024
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 2, 2024

The merge logic in MergePositionsOperator is excessively complex and lacks flexibility. It relies on the source operator emitting pages with ascending positions. Additionally, this merge logic introduced an unusual method, appendAllValuesToCurrentPosition, to the Block.Builder. We should replace this with a simpler and more flexible approach. This PR uses a mechanism similar to the grouping aggregation. In fact, it is very close to the values aggregation. Initially, I considered using the GroupingState from ValuesAggregator. However, unlike in the values aggregation, we don't expect many multi-values in enrich. Hence, I introduced the new EnrichResultBuilders instead.

@dnhatn dnhatn force-pushed the enrich-append-values branch 5 times, most recently from 32c385d to 5a012b6 Compare April 2, 2024 21:03
if (cell.length > 1) {
builder.beginPositionEntry();
}
// TODO: sort and dedup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a follow-up.

@dnhatn dnhatn changed the title WIP Simplify merging enrich output Apr 2, 2024
@dnhatn dnhatn requested a review from nik9000 April 2, 2024 21:48
@dnhatn dnhatn marked this pull request as ready for review April 2, 2024 21:49
@dnhatn dnhatn requested a review from a team as a code owner April 2, 2024 21:49
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 2, 2024
private final int[] mergingChannels;
private final ElementType[] mergingTypes;
private PositionBuilder positionBuilder = null;
private final EnrichResultBuilder[] builders;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class becomes much simpler.

@dnhatn dnhatn force-pushed the enrich-append-values branch from 5a012b6 to 0794d05 Compare April 6, 2024 20:18
@dnhatn
Copy link
Member Author

dnhatn commented Apr 6, 2024

@nik9000 I have benchmarked this change, along with two incoming improvements: one for reading values in batch, and another for leveraging the bytesref ordinal block. With these three changes, enrich in our benchmark is 11 times faster: from 330ms to 30ms.

|  50th percentile latency |    esql_stats_enrich_rates_fares | 332.198  |  31.7257  | -300.473 |  ms | -90.45% |
|  90th percentile latency |    esql_stats_enrich_rates_fares | 335.589  |  32.3748  | -303.214 |  ms | -90.35% |
| 100th percentile latency |    esql_stats_enrich_rates_fares | 340.448  |  35.3605  | -305.087 |  ms | -89.61% |
|  50th percentile latency |       esql_stats_enrich_payments | 330.927  |  29.0027  | -301.924 |  ms | -91.24% |
|  90th percentile latency |       esql_stats_enrich_payments | 335.693  |  29.5329  | -306.16  |  ms | -91.20% |
| 100th percentile latency |       esql_stats_enrich_payments | 349.575  |  31.7998  | -317.775 |  ms | -90.90% |
|  50th percentile latency | esql_stats_enrich_payments_fares | 362.12   |  31.1805  | -330.939 |  ms | -91.39% |
|  90th percentile latency | esql_stats_enrich_payments_fares | 364.841  |  31.6378  | -333.204 |  ms | -91.33% |
| 100th percentile latency | esql_stats_enrich_payments_fares | 368.572  |  32.9891  | -335.583 |  ms | -91.05% |

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that is nice!

@dnhatn
Copy link
Member Author

dnhatn commented Apr 9, 2024

@nik9000 Thanks for reviewing :)

@dnhatn dnhatn merged commit 24aed5c into elastic:main Apr 9, 2024
14 checks passed
@dnhatn dnhatn deleted the enrich-append-values branch April 9, 2024 16:24
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 10, 2024
Landing elastic#107018 broke running ESQL unit tests in IntelliJ. It has
*something* to do with turning on the stringtemplate plugin in the esql
project but I don't really know what. After that PR we'd often get
errors about trying to regenerate evaluators twice. I dunno. This fixes
it. But I don't really know why.

The way this fixes it is by making the `esql` project more like the
`copmute` project. It makes sense that that would help - they both have
the same code generation configuration. Anyway, the operative change is
landing the generated files in the same place as the `compute` project.
Thus all of the file moves.

Again, I have no idea why this works. It's build black magic and I just
shook it until it worked. Most of the credit goes to git-bisect for
finding the commit that broke this.
nik9000 added a commit that referenced this pull request Apr 11, 2024
Landing #107018 broke running ESQL unit tests in IntelliJ. It has
*something* to do with turning on the stringtemplate plugin in the esql
project but I don't really know what. After that PR we'd often get
errors about trying to regenerate evaluators twice. I dunno. This fixes
it. But I don't really know why.

The way this fixes it is by making the `esql` project more like the
`copmute` project. It makes sense that that would help - they both have
the same code generation configuration. Anyway, the operative change is
landing the generated files in the same place as the `compute` project.
Thus all of the file moves.

Again, I have no idea why this works. It's build black magic and I just
shook it until it worked. Most of the credit goes to git-bisect for
finding the commit that broke this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants