-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data_pipelines][rne] Save flux json file even if it's not completed #207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont really understand how you use the first_exec
logic to determine that previous workflow failed. Plus I think ti
needs a better name
@@ -65,6 +100,8 @@ def compute_start_date(): | |||
def get_and_save_daily_flux_rne( | |||
start_date: str, | |||
end_date: str, | |||
first_exec: bool, | |||
ti, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ti
is not very self explanatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, ti
stands for task instance
and represents a specific run of a task within the context of a DAG.
The name ti
is integral to Airflow's design and is used throughout the codebase.
@XavierJp The current implementation of the |
Given the unreliable nature of the RNE API, it's crucial to enhance the robustness of our workflow to ensure we capture as much data as possible. Specifically, the goal is to persist all data received from the API, even if it's incomplete for a particular day.
In response to this, this PR introduces a mechanism to save JSON files, even if they are unfinished, into MinIO in the event of an exception during the API request. In the subsequent execution of the workflow, the system will retrieve the latest saved file from MinIO. It will then extract the last
siren
in the file, allowing the workflow to resume from that point onwards.This approach serves as a resilient strategy, safeguarding against potential data loss caused by API disruptions. By storing partial data in MinIO, we ensure that any information obtained before encountering an exception is preserved. This way, the workflow can continue seamlessly by picking up where it left off, focusing on the latest available data.
In summary, this PR not only addresses the fickle nature of the RNE API by saving incomplete data but also establishes a reliable mechanism to resume workflow execution from the last successfully obtained
siren
, contributing to the overall robustness of our data retrieval process.