Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NumericRangeQuery in ES queries when rollover is used #1361

Closed
pavolloffay opened this issue Feb 19, 2019 · 6 comments
Closed

Use NumericRangeQuery in ES queries when rollover is used #1361

pavolloffay opened this issue Feb 19, 2019 · 6 comments

Comments

@pavolloffay
Copy link
Member

Then rollover is used the reader reads from a single read alias pointing to many indices. To support s.max-span-age our esRollover.py supports an action to remove old indices from read alias. Instead of requiring to run the script to remove indices from read alias we could change our queries to support time ranges.

See #1242 (comment)

@pavolloffay
Copy link
Member Author

There are a couple of inconsistencies in our read implementation regarding the time range of queries.

  • GetTrace, GetServices and GetOperatios uses --es.max-span-age to compose list of histical indices for the query.
  • FidTraceIDs and FindTraces uses times from query parameters to compose the list of historical indices.

In the past I have seen people complying why get trace cannot find a trace while it is searchable in the main screen.

My idea here is to use wildcard * in index name instead of composing a list of indices. This has multiple implications:

  • get services/operations would return all operations in the storage - GetServices interface does not pass lookback time rage.
  • find traces would have to use span timestamps in queries (instead of indices)
  • a single index name with wildcard could have performance issue - although kibana uses the same so I would guess it is well optimized.

@pavolloffay
Copy link
Member Author

@yurishkuro is Cassandra reader returning all services and operations in the storage or it is somehow controlled via a flag?

@yurishkuro
Copy link
Member

Cassandra returns all names, but they are written with the same TTL as the spans, so generally there are no mismatches as far as storing the data. However, sometimes people do get confused when they see OpName in the dropdown but no traces are found because the default time range is just 1hr and the endpoint may not have seen any activity in that period.

@pavolloffay
Copy link
Member Author

Cassandra returns all names, but they are written with the same TTL as the spans, so generally there are no mismatches as far as storing the data.

This would be the same with Elasticsearch. Service names from the same time period will be removed with spans.

However, sometimes people do get confused when they see OpName in the dropdown but no traces are found because the default time range is just 1hr and the endpoint may not have seen any activity in that period.

Should we start thinking about changing the reader/writer interface to include timestamp?

@yurishkuro
Copy link
Member

You mean changing the contents of Services drop-down depending on the selected time range? Feels a bit overkill.

Could we just make the error message a bit more informative? Ie suggest the user to try a longer time range.

@pavolloffay
Copy link
Member Author

Perf results in #1969 showed that using wildcard in idex name with time range on timestamp significantly increases query time. Therefore I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants