Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Arrow (file) datasource #1858

Closed
wants to merge 16 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Feb 17, 2022

Which issue does this PR close?

Closes #1857

Rationale for this change

Support reading arrow files.

❯ CREATE EXTERNAL TABLE t STORED AS ARROW LOCATION 'file.arrow';
0 rows in set. Query took 0.004 seconds.
❯ select * from t;
+----+-----+-------+
| f0 | f1  | f2    |
+----+-----+-------+
| 1  | foo | true  |
| 2  | bar |       |
| 3  | baz | false |
| 4  |     | true  |
+----+-----+-------+
4 rows in set. Query took 0.006 seconds.

What changes are included in this PR?

Are there any user-facing changes?

@Dandandan Dandandan changed the title [WIP] arrow datasource [WIP] Arrow datasource Feb 17, 2022
@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate sql SQL Planner labels Feb 17, 2022
@Dandandan Dandandan changed the title [WIP] Arrow datasource [WIP] Arrow (file) datasource Feb 18, 2022
@matthewmturner
Copy link
Contributor

matthewmturner commented Mar 15, 2022

Excited for this! I have a PR for writing arrow in #1893. I'll pause on that until this is done so that i can better test the write functionality. And ill align to your file names etc.

@alamb
Copy link
Contributor

alamb commented Mar 21, 2022

#2048 has the arrow upgrade

@matthewmturner
Copy link
Contributor

hi @Dandandan - do you have an idea how close this PR to being ready?

@Dandandan Dandandan marked this pull request as ready for review April 1, 2022 19:55
@Dandandan
Copy link
Contributor Author

hi @Dandandan - do you have an idea how close this PR to being ready?

Hey @matthewmturner I didn't put in the work yet to finalize the PR.

I think not that much work is needed, we should update the changes in DataFusion, and use the change in the new Arrow version, and add some more tests.
Probably won't have time the coming days, so feel free
if you want to have it merged soon.

@alamb alamb marked this pull request as draft April 15, 2022 14:50
@alamb
Copy link
Contributor

alamb commented Apr 15, 2022

marking as draft (so it is easer to see what PRs are waiting for review)

@matthewmturner
Copy link
Contributor

apologies - i have been busy working on some other things so didnt follow up. I do hope to help get this over the line soon if @Dandandan doesnt get the chance.

@alamb
Copy link
Contributor

alamb commented Apr 15, 2022

No worries -- nothing to apologize for -- I am just trying to keep the reviews flowing :)

@andygrove andygrove removed the datafusion Changes in the datafusion crate label Jun 3, 2022
@alamb
Copy link
Contributor

alamb commented Jan 14, 2023

This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read Arrow Files
4 participants