Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added section on duckdb reading gcs files #8651

Merged
merged 5 commits into from
Mar 17, 2024

Conversation

pieter-factful
Copy link
Contributor

Description of changes

On Zulip I was supported in reading a jsonl file stored on GCS with duckdb. With more people likely te encounter the same issue, it was suggested to add it to the docs. It was suggested to add a new page to the 'how-to' section. The 'input/output' section seemed a good fit.

@kszucs
Copy link
Member

kszucs commented Mar 13, 2024

In order to fix the lining error pre-commit should be installed, you can find a guide about that at https://ibis-project.org/contribute/03_style

After running the pre-commit the linting errors should be automatically fixed.

@ncclementi
Copy link
Contributor

@pieter-factful looks like you are missing some pre-commits that will take care of the linting if you run them .

Here are some instructions: https://ibis-project.org/contribute/03_style

But in few words, in your conda environment you can do this, that will make sure to run them before you commit, and in most cases it will fix the files, then you can add them again.

$ pip install pre-commit
$ pre-commit install 

@ncclementi
Copy link
Contributor

Whoops, @kszucs beat me for few seconds, I didn't see the comment.

docs/how-to/input-output/gcs_duckdb.qmd Outdated Show resolved Hide resolved
Comment on lines 22 to 24
Where `<URL>` is the url provided by GCS. Note that:
- For private files use the URL with prefix `gs://`
- For public files you can use the URL with prefix `gcs://`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Where `<URL>` is the url provided by GCS. Note that:
- For private files use the URL with prefix `gs://`
- For public files you can use the URL with prefix `gcs://`
Where `<URL>` is the url provided by GCS.

Both gs:// and gcs:// should work regardless of permissions. Is that not the case for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to leave out the note altogther. I didn't want to include the specific URL. And figured it unncessary to specifiy the prefix.

As for the access, I checked and yes, looks like all 3 (gs/gcs/storage.googleapis.com all work. Odd, I ran into issues the other day. Maybe mixed up the files: I had both a public and a private version of the file.

@cpcloud cpcloud added docs Documentation related issues or PRs duckdb The DuckDB backend labels Mar 14, 2024
@pieter-factful
Copy link
Contributor Author

@cpcloud @ncclementi @kszucs Looks like all checks passed - thanks for your help!

docs/how-to/input-output/gcs_duckdb.qmd Outdated Show resolved Hide resolved
docs/how-to/input-output/gcs_duckdb.qmd Outdated Show resolved Hide resolved
@cpcloud cpcloud added this to the 9.0 milestone Mar 17, 2024
@cpcloud cpcloud enabled auto-merge (squash) March 17, 2024 11:30
@cpcloud cpcloud merged commit c2b06f6 into ibis-project:main Mar 17, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues or PRs duckdb The DuckDB backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants