Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data_pipelines][rne] Save flux json file even if it's not completed #207

Merged
merged 4 commits into from
Nov 17, 2023

Conversation

HAEKADI
Copy link
Contributor

@HAEKADI HAEKADI commented Nov 16, 2023

Given the unreliable nature of the RNE API, it's crucial to enhance the robustness of our workflow to ensure we capture as much data as possible. Specifically, the goal is to persist all data received from the API, even if it's incomplete for a particular day.

In response to this, this PR introduces a mechanism to save JSON files, even if they are unfinished, into MinIO in the event of an exception during the API request. In the subsequent execution of the workflow, the system will retrieve the latest saved file from MinIO. It will then extract the last siren in the file, allowing the workflow to resume from that point onwards.

This approach serves as a resilient strategy, safeguarding against potential data loss caused by API disruptions. By storing partial data in MinIO, we ensure that any information obtained before encountering an exception is preserved. This way, the workflow can continue seamlessly by picking up where it left off, focusing on the latest available data.

In summary, this PR not only addresses the fickle nature of the RNE API by saving incomplete data but also establishes a reliable mechanism to resume workflow execution from the last successfully obtained siren, contributing to the overall robustness of our data retrieval process.

@HAEKADI HAEKADI self-assigned this Nov 16, 2023
@HAEKADI HAEKADI added enhancement New feature or request data pipelines labels Nov 16, 2023
@HAEKADI HAEKADI requested a review from XavierJp November 16, 2023 14:38
@HAEKADI HAEKADI changed the title [data-pipelines][rne] Save json file even if it's not completed [data-pipelines][rne] Save flux json file even if it's not completed Nov 16, 2023
Copy link
Contributor

@XavierJp XavierJp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really understand how you use the first_execlogic to determine that previous workflow failed. Plus I think ti needs a better name

@@ -65,6 +100,8 @@ def compute_start_date():
def get_and_save_daily_flux_rne(
start_date: str,
end_date: str,
first_exec: bool,
ti,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ti is not very self explanatory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, ti stands for task instance and represents a specific run of a task within the context of a DAG.
The name ti is integral to Airflow's design and is used throughout the codebase.

@HAEKADI
Copy link
Contributor Author

HAEKADI commented Nov 16, 2023

@XavierJp The current implementation of the first_exec var logic is obviously not the most optimal, but the fastest in my opinion. The goal is to ensure that the most recent file from MinIO is retrieved once (first iteration in loop) during the workflow. However, if this logic is removed, during each iteration (meaning for every day of the execution), the workflow will attempt to retrieve the latest file saved to MinIO. This leads to unnecessary calls to both MinIO and the RNE API, resulting in inefficiencies.

@HAEKADI HAEKADI changed the title [data-pipelines][rne] Save flux json file even if it's not completed [data_pipelines][rne] Save flux json file even if it's not completed Nov 16, 2023
@HAEKADI HAEKADI merged commit 74c58eb into main Nov 17, 2023
2 checks passed
@HAEKADI HAEKADI deleted the rne-last-siren branch November 17, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data pipelines enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants