Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http(s) support to the command line #8753

Merged
merged 7 commits into from
Feb 4, 2024
Merged

Conversation

kcolford
Copy link
Contributor

@kcolford kcolford commented Jan 4, 2024

Which issue does this PR close?

Closes #8752

Rationale for this change

See the linked issue.

What changes are included in this PR?

See the commit log.

Are these changes tested?

This functionality is trivial enough that I don't think it needs to be covered by unit tests and I found nowhere where we could perform full integration test for this repo.

Are there any user-facing changes?

Users will now be able to use the http and https schemes when using create external table. Documentation hasn't been provided in this PR because it doesn't appear that this repo hosts that documentation.

No changes to public APIs.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition 👍

Regarding docs, you can update them under here:

https://github.com/apache/arrow-datafusion/blob/e5036d0e760b637724e8ac59c32924f126311d39/docs/source/user-guide/cli.md?plain=1#L194

Regarding testing, I see in the file there are some unit tests, e.g.

https://github.com/apache/arrow-datafusion/blob/e5036d0e760b637724e8ac59c32924f126311d39/datafusion-cli/src/exec.rs#L335-L357

Is it possible to create a version of this but for http/https? Failing that, maybe can comment an example usage to demonstrate this new feature will function properly.

Comment on lines 282 to 284
"http" | "https" => {
Arc::new(HttpBuilder::new().with_url(url.origin().ascii_serialization()).build()?) as Arc<dyn ObjectStore>
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"http" | "https" => {
Arc::new(HttpBuilder::new().with_url(url.origin().ascii_serialization()).build()?) as Arc<dyn ObjectStore>
}
"http" | "https" => {
let builder = HttpBuilder::new().with_url(url.origin().ascii_serialization());
Arc::new(builder.build()?) as Arc<dyn ObjectStore>
}

small nit, just to make it easier on the eyes

@alamb
Copy link
Contributor

alamb commented Jan 5, 2024

Thank you @kcolford and @Jefffrey for the review. It would be great to address @Jefffrey 's comments

I tried this out on the ClickBench file fetched via http and it worked like a charm

❯ create external table hits stored as parquet location 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 rows in set. Query took 0.178 seconds.

❯ describe hits;
+-----------------------+-----------+-------------+
| column_name           | data_type | is_nullable |
+-----------------------+-----------+-------------+
| WatchID               | Int64     | YES         |
| JavaEnable            | Int16     | YES         |
| Title                 | Binary    | YES         |
| GoodEvent             | Int16     | YES         |
| EventTime             | Int64     | YES         |
| EventDate             | UInt16    | YES         |
...
| RefererHash           | Int64     | YES         |
| URLHash               | Int64     | YES         |
| CLID                  | Int32     | YES         |
+-----------------------+-----------+-------------+
105 rows in set. Query took 0.003 seconds.

This is a really nice improvement. Let us know if you need help updating the docs or tests

@alamb
Copy link
Contributor

alamb commented Jan 21, 2024

Hi @kcolford -- I was wondering if you think you might have a chance to work on this PR soon? If not perhaps I can help find someone to finish it up.

Thanks again

@alamb
Copy link
Contributor

alamb commented Jan 28, 2024

I think this feature is important, so if no one else gets a chance to get it over the line I will try to do so sometime this week

@alamb
Copy link
Contributor

alamb commented Feb 4, 2024

I took the liberty of merging this PR up from main and adding documentation and tests. Thanks again @kcolford and @Jefffrey

@alamb alamb merged commit 840499f into apache:main Feb 4, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add http(s) support to the cli
3 participants