Track lineage in Waimak #76

alexjbush · 2019-04-30T10:31:46Z

It might be nice to track label lineage in Waimak and attach it to some kind of label metadata structure. It would contain the input labels and dependencies that were used to produce this label.

This could then be written out as metadata (table comment) when producing Hive tables. This would produce a simple data lineage for each table.

We could also think of an approach for tracking source files that records in labels came from. This would be tough though as "'input_file_name' does not support more than one sources":
https://issues.apache.org/jira/browse/SPARK-18667
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala#L611

timpharo · 2020-11-10T11:03:07Z

I would like to second this. It would be nice to be able to track linage through the labels to allow better understanding of lineage and also to allow graphs to be generated showing this. An example:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track lineage in Waimak #76

Track lineage in Waimak #76

alexjbush commented Apr 30, 2019 •

edited

Loading

timpharo commented Nov 10, 2020 •

edited

Loading

Track lineage in Waimak #76

Track lineage in Waimak #76

Comments

alexjbush commented Apr 30, 2019 • edited Loading

timpharo commented Nov 10, 2020 • edited Loading

alexjbush commented Apr 30, 2019 •

edited

Loading

timpharo commented Nov 10, 2020 •

edited

Loading