Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate one time big data set query #299

Open
zhouqingqing opened this issue Dec 19, 2020 · 0 comments
Open

accelerate one time big data set query #299

zhouqingqing opened this issue Dec 19, 2020 · 0 comments

Comments

@zhouqingqing
Copy link
Owner

The reason we need foreign table scan is to accelerate debug queries against big data set. Currently we have to load the whole data set into memory, collect stats, then run the query. This is slow when the data set is big (but good for batch of queries run).

To solve this problem, we need the following:

  1. DDL to persists/read back stats: basic function is already there. See statis.cs.
  2. support feign table with syntax like this:
CREATE FOREIGN TABLE A(i int)
        OPTIONS ( filename 'data/data1.csv', format 'csv' );

Note that we have PhysicScanFile can read from csv.

With above, we can:

  1. One time to load data set, collect stats and persists stats.
  2. Whenever you query use foreign table, you can load stats and read csv directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant