You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to fully assess the datasets factories solution we need to get a somewhat functioning prototype. This will give insight into the complexity of the solution, risks involved and potential drawbacks we haven't yet considered in discussions.
To keep in mind/try-out while prototyping
Responsibility of creating default dataset/pattern matching should be in the DataCatalog and not in the Runner. Currently, default dataset creation happens in the Runner, but this was always odd and supposed to be a temporary solution.
Factory definition + syntax should ideally go into the catalog so you'd have:
def create_spark_dataset(dataset_name: str, *chunks):# e.g. here chunks=["root_namespace", "something-instead-the-*", "spark"]return SparkDataSet(filepath=f"data/{chunks[0]}/{chunks[1]}.parquet", file_format="parquet")"{root_namespace}.{*}@{spark}":
type: spark.SparkDataSetfilepath: data/{chunks[0]}/{chunks[1]}.parquetfile_format: parquet
Ideally config loaders shouldn't know about/deal with this special syntax.
Catalog validation should happen lazily somehow. Or only on explicit catalog entries.
Question to answer
Can this be implemented in a non-breaking way?
The text was updated successfully, but these errors were encountered:
Description
Subtask of #2423
Context
In order to fully assess the datasets factories solution we need to get a somewhat functioning prototype. This will give insight into the complexity of the solution, risks involved and potential drawbacks we haven't yet considered in discussions.
To keep in mind/try-out while prototyping
DataCatalog
and not in theRunner
. Currently, default dataset creation happens in theRunner
, but this was always odd and supposed to be a temporary solution.Question to answer
The text was updated successfully, but these errors were encountered: