You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When manual starting the lineage enabled pyspark pipeline, got an Py4JError. The same issue also is found in 1.7.0 release.
2024/10/25 19:43:36 INFO PythonPipeline: STARTED: PythonPipeline driver
Creating the PipelineBase
2024/10/25 19:43:36 INFO LineageUtil: Recording lineage data...
Traceback (most recent call last):
File "/opt/spark/jobs/pipelines/python-pipeline/python_pipeline_driver.py", line 30, in <module>
PipelineBase().record_pipeline_lineage_start_event()
File "/usr/local/lib/python3.11/dist-packages/python_pipeline/generated/pipeline/pipeline_base.py", line 88, in record_pipeline_lineage_start_event
self._lineage_util.record_lineage(self._emitter, run_event)
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/util/lineage_util.py", line 228, in record_lineage
emitter.emit_run_event(event)
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/emitter.py", line 83, in emit_run_event
self.build_message_client()
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/emitter.py", line 63, in build_message_client
self._emitter = MessagingClient(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/aissemble_messaging/messaging_client.py", line 36, in __init__
self.service_port = self.start_service_jvm()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/aissemble_messaging/messaging_client.py", line 170, in start_service_jvm
return launch_gateway(
^^^^^^^^^^^^^^^
File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 297, in launch_gateway
py4j.protocol.Py4JError: Could not find py4j jar at
DoD
Resolve py4j jar not found issue
Steps to Reproduce
Using create a new aissemble-based project using the latest archetype snapshot.
Fully generate the project by running mvn clean install and following manual actions
Build the project without the cache and follow the last manual action.
mvn clean install -Dmaven.build.cache.skipCache
Deploy the project and wait for all services ready
tilt up; tilt down
Manually trigger the python-pipeline pod
Expected Behavior
Verify that pipeline should start and complete without any error
Log in to kafka to check message successfully received
Go into kafka container: kubectl exec -it kafka-cluster-0 -- sh
Check messages in the lineage-event-out topic: /opt/bitnami/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic lineage-event-out --from-beginning
Actual Behavior
Pipeline stopped because of below exectpion
2024/10/25 19:43:36 INFO PythonPipeline: STARTED: PythonPipeline driver
Creating the PipelineBase
2024/10/25 19:43:36 INFO LineageUtil: Recording lineage data...
Traceback (most recent call last):
File "/opt/spark/jobs/pipelines/python-pipeline/python_pipeline_driver.py", line 30, in <module>
PipelineBase().record_pipeline_lineage_start_event()
File "/usr/local/lib/python3.11/dist-packages/python_pipeline/generated/pipeline/pipeline_base.py", line 88, in record_pipeline_lineage_start_event
self._lineage_util.record_lineage(self._emitter, run_event)
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/util/lineage_util.py", line 228, in record_lineage
emitter.emit_run_event(event)
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/emitter.py", line 83, in emit_run_event
self.build_message_client()
File "/usr/local/lib/python3.11/dist-packages/aissemble_data_lineage/emitter.py", line 63, in build_message_client
self._emitter = MessagingClient(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/aissemble_messaging/messaging_client.py", line 36, in __init__
self.service_port = self.start_service_jvm()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/aissemble_messaging/messaging_client.py", line 170, in start_service_jvm
return launch_gateway(
^^^^^^^^^^^^^^^
File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 297, in launch_gateway
py4j.protocol.Py4JError: Could not find py4j jar at
Description
When manual starting the lineage enabled pyspark pipeline, got an Py4JError. The same issue also is found in 1.7.0 release.
DoD
Resolve py4j jar not found issue
Steps to Reproduce
PythonPipeline.json
files with below content
mvn clean install
and following manual actionstilt up; tilt down
python-pipeline pod
Expected Behavior
kubectl exec -it kafka-cluster-0 -- sh
lineage-event-out
topic:/opt/bitnami/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9093 --topic lineage-event-out --from-beginning
Actual Behavior
Pipeline stopped because of below exectpion
Additional Context
The text was updated successfully, but these errors were encountered: