Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024-12-05 - Green Software Playbooks agenda #19

Open
6 tasks
bryaki02 opened this issue Dec 4, 2024 · 4 comments
Open
6 tasks

2024-12-05 - Green Software Playbooks agenda #19

bryaki02 opened this issue Dec 4, 2024 · 4 comments
Labels

Comments

@bryaki02
Copy link
Contributor

bryaki02 commented Dec 4, 2024

Date

2024-12-05 - 15:00 UTC - See the time in your timezone https://everytimezone.com

Roll Call

Please add a comment to this issue during the meeting to denote attendance.
Any untracked attendees will be added by the GSF team below:

  • Full Name, Affiliation, (optional) GitHub username

Previous Meeting

Notes from the previous meeting:

  • Discussed the process for capturing the playbook instructions for the Data Engineer Directory

Agenda

  • Convene & Roll Call
  • Review submissions since last meeting
  • Plan for new year to attract new contributors
  • Review the agenda and suggest new agenda points
  • [Agenda Item]
  • AOB, Q&A & Adjourn

Any Other Business

@moin-oss
Copy link
Contributor

moin-oss commented Dec 5, 2024

I'm going to be a few minutes late to the meeting.

@f-mellinghoff
Copy link

Create a suggestion for : "Move archived data to appropriate storage (maybe cold storage is enough for some archived data)"

@moin-oss
Copy link
Contributor

moin-oss commented Dec 5, 2024

Will add a writeup on "minimizing the frequency of batch jobs" and determine how much this overlaps with "only load data where changes occurred, maybe think about event based triggers (only load delta, but also only start followup ETL processes, if some source data changed)"

@f-mellinghoff
Copy link

f-mellinghoff commented Dec 19, 2024

Green Software Playbooks – Data Engineering

Improvements to existing projects

Move data to appropriate storage (hot vs. cold storage)

Analyse your existing data and move all rarely accessed data which still needs to be stored (e.g. for compliance or legal reasons) to cold storage.
By default, most data is stored in hot storage which is typically meant for frequently used data where fast and reliable access is needed. But often projects also include data which must only be archived and will not be accessed regularly any longer. For this data cold storage with a slower access to the data is sufficient and it should be moved accordingly.
Establish a continuous process to judge if data can be moved to cold storage or needs to be kept in hot storage.

Green IT Advantages: By moving data to cold storage not only will the storage costs be reduced (though access costs increase), but it will also save energy as the servers on which the data is stored do not need to constantly be available.

Considerations during setup of a new project

Move data to appropriate storage (hot vs. cold storage)

Define a data strategy from the beginning of your project which defines which data storage option (hot vs. cold) should be used for which data.
Data stored in hot storage typically needs to be accessed frequently whereas cold storage data is not accessed regularly.
Establish a continuous process to judge if data can be moved to cold storage or needs to be kept in hot storage.

Green IT Advantages: By moving data to cold storage not only will the storage costs be reduced (though access costs increase), but it will also save energy as the servers on which the data is stored do not need to constantly be available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants