-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ballista context should get file metadata from scheduler, not from local disk #22
Comments
@andygrove as the client is handling the logical plan, I think it does not need to know about the list of files or the statistics, it only needs the schema:
As flight already has an endpoint to query the schema, this would avoid creating and maintaining a new one 😃 |
Hi @andygrove, we have integrated ballista with HDFS support. Our workaround is to make the file path self described. For example, a local file path should be file://tmp/..., a hdfs file path should hdfs://localhost:xxx:/tmp/... To make it work, we also changed the object store api a bit. Later I'll create a PR for this. |
@yahoNanJing this intersects work I'm currently working on, so anything you could share would be helpful! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I have a Ballista cluster running, and each scheduler and executor has access to TPC-H data locally.
I am running the benchmark client on my desktop, and I do not have access to the data locally.
Query planning fails with "file not found" because
BallistaContext::read_parquet
is looking for the file on the local file system when it should be getting the file metadata from a scheduler in the cluster.Describe the solution you'd like
The context should send a gRPC request to the scheduler to get the necessary metadata.
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: