scale-up #103

galsalomon66 · 2022-05-15T14:46:50Z

adding the capability to execute the same query across different input-stream(CSV in this case), merge results of each of the streams, and return them to the caller as a single one. the different execution flows can run in parallel to each other

adding s3select_result (replacing std::string) to handle more options for result production
adding shared-queue to handle results of multiple and parallel execution flows
there are 2 main flows in query execution, 1) non-aggregate flow and 2) aggregate flow
3a. the non-aggregate flow is mainly about merging results of different execution flows.
3b. the aggregate flow handles the complexities of aggregation queries(sum, min, count ...), it splits the execution into 2 phases, the first is processing the query, the second is merging the results of all processes, in the case of aggregation query, it means that AST nodes, behave differently depends on thier phase.
s3select_scaleup simulates a multi-threaded execution of a single query. the application defines a list of files as a single input data set for the query, each of the input files executed on a dedicated thread, and a single consumer merged the results of all producers. (this flow is for the sake of simulation and measurements)
as for the RGW execution, multiple requests will process a single query, from the user perspective it's a single request.
as for splitting the input (improve scalability in some use-cases), per each data source (CSV, Parquet, JSON) it needs a different flow of input splitting.
TODO long result-rows should split into several entries in the shared queue.
TODO result should be handled by a callback or return to the caller (as value)

Signed-off-by: gal Salomon [email protected]

…ream(CSV in this case), and to merge results of each of the streams and return it to caller as a single one. the different executions can ran in parallel to eachother Signed-off-by: gal salomon <[email protected]>

Signed-off-by: gal salomon <[email protected]>

… node (sum,max ...) has 2 state to handle, first-phase(processing query), second-phase(aggregate results of all participants). Signed-off-by: gal salomon <[email protected]>

…tiple execution flows for aggregation and non aggregation flow. bug fixes. Signed-off-by: gal salomon <[email protected]>

Signed-off-by: gal salomon <[email protected]>

…ts to be processed simultaneously Signed-off-by: gal salomon <[email protected]>

Signed-off-by: galsalomon66 <[email protected]>

galsalomon66 changed the title ~~adding capability to execute the same query across different input-st…~~ scale-up May 15, 2022

galsalomon66 mentioned this pull request May 16, 2022

Query parallel execution #89

Closed

galsalomon66 force-pushed the scaleup branch from a9de18f to 0ee00db Compare July 11, 2022 13:58

galsalomon66 added 7 commits September 4, 2022 23:20

oops

32ff728

Signed-off-by: gal salomon <[email protected]>

adding the aggregation flow for scale-up processing. each aggregation…

3bbc699

… node (sum,max ...) has 2 state to handle, first-phase(processing query), second-phase(aggregate results of all participants). Signed-off-by: gal salomon <[email protected]>

adding csv_streamer to enable the processing of a single query on mul…

3c436b6

…tiple execution flows for aggregation and non aggregation flow. bug fixes. Signed-off-by: gal salomon <[email protected]>

bug fix. add pthread to scaleup app

0919c34

Signed-off-by: gal salomon <[email protected]>

adding spiliting functionality. i.e. spliting an input into equal par…

0fc3e73

…ts to be processed simultaneously Signed-off-by: gal salomon <[email protected]>

rebase. adding json flow

974593d

Signed-off-by: galsalomon66 <[email protected]>

galsalomon66 force-pushed the scaleup branch from 0ee00db to 974593d Compare September 5, 2022 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale-up #103

scale-up #103

galsalomon66 commented May 15, 2022 •

edited

Loading

scale-up #103

Are you sure you want to change the base?

scale-up #103

Conversation

galsalomon66 commented May 15, 2022 • edited Loading

galsalomon66 commented May 15, 2022 •

edited

Loading