Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track lineage in Waimak #76

Open
alexjbush opened this issue Apr 30, 2019 · 1 comment
Open

Track lineage in Waimak #76

alexjbush opened this issue Apr 30, 2019 · 1 comment

Comments

@alexjbush
Copy link
Member

alexjbush commented Apr 30, 2019

It might be nice to track label lineage in Waimak and attach it to some kind of label metadata structure. It would contain the input labels and dependencies that were used to produce this label.

This could then be written out as metadata (table comment) when producing Hive tables. This would produce a simple data lineage for each table.

We could also think of an approach for tracking source files that records in labels came from. This would be tough though as "'input_file_name' does not support more than one sources":
https://issues.apache.org/jira/browse/SPARK-18667
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala#L611

@timpharo
Copy link

timpharo commented Nov 10, 2020

I would like to second this. It would be nice to be able to track linage through the labels to allow better understanding of lineage and also to allow graphs to be generated showing this. An example:

Graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants