Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source name format #1041

Closed
abhineet13 opened this issue Mar 14, 2022 · 1 comment
Closed

Data source name format #1041

abhineet13 opened this issue Mar 14, 2022 · 1 comment

Comments

@abhineet13
Copy link

Background [Optional]

Hi, the data source tab lists data source name as the last string in storage path, for example
spark.read.parquet("s3://bucket/folder/partition=2020-01-01") will have a data source name of "partition=2020-01-01" and
val df= spark.read.parquet("s3://bucket/folder")
df.createOrReplaceTempView("df")
spark.sql("select * from df where partition=2020-01-01") will create a data source name of "folder" in Spline UI.

Question

Is it possible to keep data source naming consistent at folder level, otherwise for daily jobs Spline UI will show multiple data source names for each partition.

Thanks

@wajda
Copy link
Contributor

wajda commented Mar 15, 2022

Unfortunately, it all depends on what the agent provides. The UI representation is as good as the metadata that Spline server receives.
In your example Spark actually sees two different datasource URIs according to what is provided to the Read operation. The UI simply shown the last portion of the URI as a short name as it simply doesn't have any more precise information.
We expect this problem to be at least partially solved with the help of Data source management feature that we plan to implement in the future (see #689)

@AbsaOSS AbsaOSS locked and limited conversation to collaborators Mar 15, 2022
@wajda wajda converted this issue into discussion #1042 Mar 15, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants