Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tx_search queries performance #4003

Open
romac opened this issue May 28, 2024 — with Slack · 0 comments · May be fixed by #4191
Open

Improve tx_search queries performance #4003

romac opened this issue May 28, 2024 — with Slack · 0 comments · May be fixed by #4191
Labels
O: performance Objective: cause to improve performance
Milestone

Comments

Copy link
Member

romac commented May 28, 2024

From Jesse via Slack:

Currently when hermes launches, it tries to find all packets that aren't cleared yet. For doing this it does roughly the following, using tx_search:

pub fn packet_query(request: &QueryPacketEventDataRequest, seq: Sequence) -> Query {
    Query::eq(
        format!("{}.packet_src_channel", request.event_id.as_str()),
        request.source_channel_id.to_string(),
    )
    .and_eq(
        format!("{}.packet_src_port", request.event_id.as_str()),
        request.source_port_id.to_string(),
    )
    .and_eq(
        format!("{}.packet_dst_channel", request.event_id.as_str()),
        request.destination_channel_id.to_string(),
    )
    .and_eq(
        format!("{}.packet_dst_port", request.event_id.as_str()),
        request.destination_port_id.to_string(),
    )
    .and_eq(
        format!("{}.packet_sequence", request.event_id.as_str()),
        seq.to_string(),
    )
}

What we found is that

  • it's actually much faster to query by packet_sequence only, then filter by channel/ports in the hermes side
  • this is due to tx_search not really having cardinality on each index filter (for this TxSearch subroutine would never be able to optimize which filters to scan through first in effort to minimize search space); with the current TxSearch logic TxSearch might attempt to scan through all indexes that matches source_channel/port + dst_channel/port, which there might be millions.
  • Whereas sequence tends to be more unique across channels; it'll only give you N sequences where N is the number of open combination of src/dst
@github-project-automation github-project-automation bot moved this to 🩹 Triage in Hermes May 28, 2024
@romac romac changed the title Improve tx_search queries performance Improve tx_search queries performance May 28, 2024
@romac romac added this to the v1.10 milestone May 28, 2024
@romac romac added the O: performance Objective: cause to improve performance label May 28, 2024
@ljoss17 ljoss17 modified the milestones: v1.10, v1.11 Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O: performance Objective: cause to improve performance
Projects
Status: 🩹 Triage
2 participants