You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found in my production environment, it's hard to add listener to the spark shell. Then i thought it will be ok to get execution plan of spark jobs from the history server. Is it possible to use the execution plan i downloaded instead of spark job listener?
Thank you!
Feature
Add a feature to read spark execution plan file, get lineage and send data to the spline producer.
Example [Optional]
example logical plan:
InsertIntoHadoopFsRelationCommand s3://example/dw/ods/ods_example_table, [dt=2023-04-18], false, [dt#62], Parquet, [field.delim=�, line.delim=
, serialization.format=�, partitionOverwriteMode=DYNAMIC, parquet.compression=SNAPPY, mergeSchema=false], Overwrite, CatalogTable(
Database: default
Table: ods_example_table
Owner: hdfs
Created Time: Wed Apr 19 16:03:44 CST 2023
Last Access: UNKNOWN
Created By: Spark 2.2 or prior
Type: EXTERNAL
Provider: hive
Table Properties: [bucketing_version=2, parquet.compression=SNAPPY, serialization.null.format=, transient_lastDdlTime=1681891424]
Location: s3://example/dw/ods/ods_example_table
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=�, line.delim=
, field.delim=�]
Partition Provider: Catalog
Partition Columns: [dt]
Schema: root
|-- seller_id: string (nullable = true)
|-- code: string (nullable = true)
|-- country_code: string (nullable = true)
|-- org_code: string (nullable = true)
|-- name_cn: string (nullable = true)
|-- name_en: string (nullable = true)
|-- api: string (nullable = true)
|-- create_time: string (nullable = true)
|-- creater: string (nullable = true)
|-- update_time: string (nullable = true)
|-- operator: string (nullable = true)
|-- status: string (nullable = true)
|-- org_code_add: string (nullable = true)
|-- user_name: string (nullable = true)
|-- is_online: string (nullable = true)
|-- dt: string (nullable = true)
), org.apache.spark.sql.execution.datasources.CatalogFileIndex@a03e7be2, [seller_id, code, country_code, org_code, name_cn, name_en, api, create_time, creater, update_time, operator, status, org_code_add, user_name, is_online, dt]
+- Project [seller_id#47, code#48, country_code#49, org_code#50, name_cn#51, name_en#52, api#53, create_time#54, creater#55, update_time#56, operator#57, status#58, org_code_add#59, user_name#60, is_online#61, cast(2023-04-18 as string) AS dt#62]
+- Project [cast(seller_id#0 as string) AS seller_id#47, cast(code#1 as string) AS code#48, cast(country_code#2 as string) AS country_code#49, cast(org_code#3 as string) AS org_code#50, cast(name_cn#4 as string) AS name_cn#51, cast(name_en#5 as string) AS name_en#52, cast(api#6 as string) AS api#53, cast(create_time#7 as string) AS create_time#54, cast(creater#8 as string) AS creater#55, cast(update_time#9 as string) AS update_time#56, cast(operator#10 as string) AS operator#57, cast(status#11 as string) AS status#58, cast(org_code_add#12 as string) AS org_code_add#59, cast(user_name#13 as string) AS user_name#60, cast(is_online#14 as string) AS is_online#61]
+- Project [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, creater#8, update_time#9, operator#10, status#11, org_code_add#12, user_name#13, is_online#14]
+- SubqueryAlias spark_catalog.dbinit.ods_example_table
+- HiveTableRelation [dbinit.ods_example_table, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, cre..., Partition Cols: []]
Proposed Solution [Optional]
The text was updated successfully, but these errors were encountered:
No, Spark Agent is meant to be used as a listener. The execution plan representation from the Spark history server isn't enough. Such functionality however could be implemented as another project, something like Spline Agent for Spark History Server.
No, Spark Agent is meant to be used as a listener. The execution plan representation from the Spark history server isn't enough. Such functionality however could be implemented as another project, something like Spline Agent for Spark History Server.
Background
I found in my production environment, it's hard to add listener to the spark shell. Then i thought it will be ok to get execution plan of spark jobs from the history server. Is it possible to use the execution plan i downloaded instead of spark job listener?
Thank you!
Feature
Add a feature to read spark execution plan file, get lineage and send data to the spline producer.
Example [Optional]
example logical plan:
InsertIntoHadoopFsRelationCommand s3://example/dw/ods/ods_example_table, [dt=2023-04-18], false, [dt#62], Parquet, [field.delim=�, line.delim=
, serialization.format=�, partitionOverwriteMode=DYNAMIC, parquet.compression=SNAPPY, mergeSchema=false], Overwrite, CatalogTable(
Database: default
Table: ods_example_table
Owner: hdfs
Created Time: Wed Apr 19 16:03:44 CST 2023
Last Access: UNKNOWN
Created By: Spark 2.2 or prior
Type: EXTERNAL
Provider: hive
Table Properties: [bucketing_version=2, parquet.compression=SNAPPY, serialization.null.format=, transient_lastDdlTime=1681891424]
Location: s3://example/dw/ods/ods_example_table
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=�, line.delim=
, field.delim=�]
Partition Provider: Catalog
Partition Columns: [
dt
]Schema: root
|-- seller_id: string (nullable = true)
|-- code: string (nullable = true)
|-- country_code: string (nullable = true)
|-- org_code: string (nullable = true)
|-- name_cn: string (nullable = true)
|-- name_en: string (nullable = true)
|-- api: string (nullable = true)
|-- create_time: string (nullable = true)
|-- creater: string (nullable = true)
|-- update_time: string (nullable = true)
|-- operator: string (nullable = true)
|-- status: string (nullable = true)
|-- org_code_add: string (nullable = true)
|-- user_name: string (nullable = true)
|-- is_online: string (nullable = true)
|-- dt: string (nullable = true)
), org.apache.spark.sql.execution.datasources.CatalogFileIndex@a03e7be2, [seller_id, code, country_code, org_code, name_cn, name_en, api, create_time, creater, update_time, operator, status, org_code_add, user_name, is_online, dt]
+- Project [seller_id#47, code#48, country_code#49, org_code#50, name_cn#51, name_en#52, api#53, create_time#54, creater#55, update_time#56, operator#57, status#58, org_code_add#59, user_name#60, is_online#61, cast(2023-04-18 as string) AS dt#62]
+- Project [cast(seller_id#0 as string) AS seller_id#47, cast(code#1 as string) AS code#48, cast(country_code#2 as string) AS country_code#49, cast(org_code#3 as string) AS org_code#50, cast(name_cn#4 as string) AS name_cn#51, cast(name_en#5 as string) AS name_en#52, cast(api#6 as string) AS api#53, cast(create_time#7 as string) AS create_time#54, cast(creater#8 as string) AS creater#55, cast(update_time#9 as string) AS update_time#56, cast(operator#10 as string) AS operator#57, cast(status#11 as string) AS status#58, cast(org_code_add#12 as string) AS org_code_add#59, cast(user_name#13 as string) AS user_name#60, cast(is_online#14 as string) AS is_online#61]
+- Project [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, creater#8, update_time#9, operator#10, status#11, org_code_add#12, user_name#13, is_online#14]
+- SubqueryAlias spark_catalog.dbinit.ods_example_table
+- HiveTableRelation [
dbinit
.ods_example_table
, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [seller_id#0, code#1, country_code#2, org_code#3, name_cn#4, name_en#5, api#6, create_time#7, cre..., Partition Cols: []]Proposed Solution [Optional]
The text was updated successfully, but these errors were encountered: