accelerate one time big data set query #299

zhouqingqing · 2020-12-19T17:57:46Z

The reason we need foreign table scan is to accelerate debug queries against big data set. Currently we have to load the whole data set into memory, collect stats, then run the query. This is slow when the data set is big (but good for batch of queries run).

To solve this problem, we need the following:

DDL to persists/read back stats: basic function is already there. See statis.cs.
support feign table with syntax like this:

CREATE FOREIGN TABLE A(i int)
        OPTIONS ( filename 'data/data1.csv', format 'csv' );

Note that we have PhysicScanFile can read from csv.

With above, we can:

One time to load data set, collect stats and persists stats.
Whenever you query use foreign table, you can load stats and read csv directly.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accelerate one time big data set query #299

accelerate one time big data set query #299

zhouqingqing commented Dec 19, 2020

accelerate one time big data set query #299

accelerate one time big data set query #299

Comments

zhouqingqing commented Dec 19, 2020