Optimised Query to fetch attribute level dependency for all target attributes for a given execution plan for 0.6.x Spline model #937
Replies: 4 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
-
@wajda I have update the question also. I am looking for metadata info related details only(column level lineage of all attributes in final table) How can I backtrack lineage for all attributes in final node (Write type) to source tables (Read type) using AQL. We don't need intermediate nodes info like joins and projections? Back to my example of Employee (Read type), Department (Read type) and Final (Write type). "column_relationship":[
{"input_entity_name":"Employee", "output_entity_name":"Final", "source_column_name": "empId", "target_column_name": "empId"},
{"input_entity_name":"Employee", "output_entity_name":"Final", "source_column_name": "empName", "target_column_name": "empName"},
{"input_entity_name":"Employee", "output_entity_name":"Final", "source_column_name": "deptId", "target_column_name": "deptId"},
{"input_entity_name":"Employee", "output_entity_name":"Final", "source_column_name": "bonus", "target_column_name": "bonus"},
{"input_entity_name":"Department", "output_entity_name":"Final", "source_column_name": "deptId", "target_column_name": "deptId"},
{"input_entity_name":"Department", "output_entity_name":"Final", "source_column_name": "deptName", "target_column_name": "deptName"},
{"input_entity_name":"Department", "output_entity_name":"Final", "source_column_name": "bonusMultiplier", "target_column_name": "bonusMultiplier"},
{"input_entity_name":"Department", "output_entity_name":"Final", "source_column_name": "bonusMultiplier", "target_column_name": "effectiveBonus"},
{"input_entity_name":"Employee", "output_entity_name":"Final", "source_column_name": "bonus", "target_column_name": "effectiveBonus”}
] Trying to come up with AQL across Operations, attributes etc to come up with expected result and share once ready. Just want to ensure that its optimized enough |
Beta Was this translation helpful? Give feedback.
-
@wajda I am using below logic for getting the desired result.
|
Beta Was this translation helpful? Give feedback.
-
Grouping of Attribute for an Operation makes it more meaningful. We are working for a way to combine 1 and 2 , wherein we are able to retrieve attributes grouped based on operation and the attributes for write operation type having derived from information. |
Beta Was this translation helpful? Give feedback.
-
Background [Optional]
Currently Spline 0.6.x has capability to track attribute level lineage (backward) for target data source. Spline UI has the capability which involves two clicks. First at the target attribute and then click on details.
Question
@wajda
We are looking for AQL to fetch attribute level dependency for all attributres of target datasource in an optimal way for a given eventID
For example, we have two datasets Employee and Department and going through below transformation where we are deriving effectiveBonus by deriving it from Employee.bonus and Department.bonusMultiplier. What will be the optimized AQL for the same.
Employee:
empId
empName
deptId
bonus
Department:
deptId
deptName
bonusMultiplier
Code Logic:
Dataset empDS = // Read Employee
Dataset deptDS = // Read Department
Dataset bonus = empDS.join(deptDS, "deptId").withColumn("effectiveBonus”, col(“bonus”).multiply(col("bonusMultiplier”)));
salDS.write().save(“Final”) // Final Table
_Expected Result _
This will be pushed to Meta Integration Tool [MITI] (http://www.metaintegration.net/). We are trying to build more of an adaptor from Spline to MITI
Trying to come up with AQL across Operations, attributes etc to comeup with expected result and share once ready. Just want to ensure that its optimized enougbh
Beta Was this translation helpful? Give feedback.
All reactions