Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M3Query improvements #1590

Open
8 of 25 tasks
arnikola opened this issue Apr 26, 2019 · 7 comments
Open
8 of 25 tasks

M3Query improvements #1590

arnikola opened this issue Apr 26, 2019 · 7 comments
Assignees
Labels
area:query All issues pertaining to query

Comments

@arnikola
Copy link
Collaborator

arnikola commented Apr 26, 2019

General

  • Remove hand-rolled json writers in favor of using a common library (Consider using [easyjson](https://github.com/mailru/easyjson) for query API JSON serialization #1144)
  • Remove use of global Zap logger usage in favor of src/x/instrument logger which uses Zap and use src/x/test/NewLogger in tests
  • Potentially refactor the physical/execution plan and controller to simplify it and be able to more easily add tracing in a single place (rather than adding it on a per-function level)
  • Refactor executor.Engine to be an interface to allow us to mock it in tests.

Add benchmarker and correctness tester

  • Look at Prombench again, if possible make use of its query generation engine to build a tool to generate valid queries and hit them against both Prom and m3q to verify correctness
  • Run prombench against m3query to identify areas to improve performance
  • Potentially add hooks into our CI process that would run diffs vs prombench to ensure correctness and no perf degradation

Introduce pooling to intermediary blocks

  • Ensure block closing logic will not cause blocks to be double-closed or to hang around for too long
  • Add pools to ColumnBlockBuilder
  • investigate adding a RowBlockBuilder which would be optimized for series-wise traversal vs row-based traversal, possibly with some of the downstream

Series consolidation

  • Improve options for series consolidation after fetches (functionality here may change based on how indexing will look - may have different paths for dealing with a centralized index compared with index per namespace)

Function improvements

  • Translate function arguments to parameters in the parsing step rather than sending interface{} and making functions
  • Remove NaNs from output during processing
  • Make OffsetBlock into a more generic DeferredBlock (something like: take in a row transform, a column transform, and a time transform) ([query] Update offset to accept opts #1638)
  • Refactor remaining lazy functions to use this DeferredBlock
  • Enable multi-target querying (potentially re-use fetch results)

Block splitting logic

  • Enable block splitting logic
  • Ensure that nowhere matches series by order across block boundaries
  • Remove forced padding for block splitting logic
  • Refactor series consolidation at the end of processing to be a separate step (maybe as a function)

Future work

  • Dynamic function optimization (less essential with DeferredBlock)
  • Reuse fetches (potentially do something clever about combining into a single fetch? not sure it's useful though)
  • Optimized remote function execution (e.g. functions like sum run on multiple DCs can be run remotely first, then have post-aggregated results sent rather than just performing the fetch remotely)
  • Centralized index support
  • Support M3QL
@arnikola arnikola added the area:query All issues pertaining to query label Apr 26, 2019
@arnikola arnikola self-assigned this Apr 26, 2019
@robskillington
Copy link
Collaborator

One thing interesting about the JSON writers, the major reason in the past we did that is because the default JSON marshaller allocates a lot, however if we have JSON write methods for bytes/strings/whatever other concrete types we can avoid allocations when returning responses (which is what we have today with the current common library streamjson writer).

@arnikola
Copy link
Collaborator Author

Sounds good; was going to add a few benchmarks around the handrolled JSON writer since figured that we were using it on our legacy stack for a reason :p

@andrewmains12
Copy link
Contributor

One thing interesting about the JSON writers, the major reason in the past we did that is because the default JSON marshaller allocates a lot, however if we have JSON write methods for bytes/strings/whatever other concrete types we can avoid allocations when returning responses (which is what we have today with the current common library streamjson writer).

For sure; I think the ideal is to use a codegenned marshaller (or other speedy JSON implementation). #1144 has some suggestions from another contributor.

@benraskin92
Copy link
Collaborator

Do we want to add tracing to more paths?

@benraskin92
Copy link
Collaborator

Also, as an aspirational goal, how about removing time from blocks and just using step size?

@asafm
Copy link
Contributor

asafm commented Oct 11, 2020

@gibbscullen Is this issue tracked in another issue?

@gibbscullen
Copy link
Collaborator

Closed as part of effort to clean up stale issues. Re-opening as still interest in tracking issue.

@gibbscullen gibbscullen reopened this Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:query All issues pertaining to query
Projects
None yet
Development

No branches or pull requests

6 participants