Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature backup / restore #1359

Merged
merged 32 commits into from
Jun 29, 2020
Merged

Feature backup / restore #1359

merged 32 commits into from
Jun 29, 2020

Conversation

sk4zuzu
Copy link
Contributor

@sk4zuzu sk4zuzu commented Jun 16, 2020

No description provided.

@seriva
Copy link
Collaborator

seriva commented Jun 18, 2020

Instead of passing in the complete cluster yml with added backup/restore configuration for the epicli recovery/backup commands I would suggest only to use the backup/restore configuration and get the rest of the cluster definition from the manifest.yml which is can be loaded from the build directory input.

This way its better seperated and its easier to use the same backup/restore configuration for many clusters.

core/src/epicli/cli/engine/PatchEngine.py Outdated Show resolved Hide resolved
core/src/epicli/cli/engine/PatchEngine.py Outdated Show resolved Hide resolved
@@ -15,6 +15,7 @@ specification:
- "--web.console.libraries=/etc/prometheus/console_libraries" # Directory should be the same as "config_directory"
- "--web.console.templates=/etc/prometheus/consoles" # Directory should be the same as "config_directory"
- "--web.listen-address=0.0.0.0:9090" # Address that Prometheus console will be available
- "--web.enable-admin-api" # Enables administrative HTTP API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enable this permanently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really I guess. The problem is that doing this "on-demand" requires modification of the systemd unit and in result full stop / start cycle. I wanted to avoid restarts...

mkyc
mkyc previously requested changes Jun 19, 2020
Copy link
Contributor

@mkyc mkyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of PR is impossible to review. My opinion is that this should get divided into 6 separated PR's each per one component. Also isn't some documentation missing?

@sk4zuzu
Copy link
Contributor Author

sk4zuzu commented Jun 19, 2020

I believe documentation is being prepared by @plirglo, am I correct?

@sk4zuzu
Copy link
Contributor Author

sk4zuzu commented Jun 19, 2020

@mkyc, I've removed some unneeded code in last few commits, I hope it helps with the review. 👍

@plirglo
Copy link
Contributor

plirglo commented Jun 22, 2020

@sk4zuzu Yes, it's in progress

Copy link
Contributor

@mkyc mkyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sk4zuzu thanks for those cleanups.

sk4zuzu and others added 9 commits June 29, 2020 13:09
…1257)

* backup/recovery: adding ansible code for syncing files between hosts

* backup/recovery: adding rsync to package requirements

* backup/recovery: fixing rsync code + adding random name for private key

* backup/recovery: adding ssh code (transfer via pipeline)

* backup/recovery/prometheus: enabling admin API + initial implementation
…ts (#1259)

* backup/recovery: moving sync code to roles + cleanups

* backup/recovery: adding initial code for restoring monitoring snapshots

* backup/recovery: adding kubernetes backup sync (copy-only)

* backup/recovery: checking if snapshots are available first
* backup/recovery: adding on-the-fly sha1 calculation in the download_via_ssh copy method

* backup/recovery: fixing on-the-fly sha1 calculation

- monitoring: reverting invalid tar argument order
- download_via_ssh: making the inline_script fail properly
* postgres backup

* Add all config files to backup
- switching from ssh to rsync
- adding grafana backup/restore
- adding prometheus etc backup/restore
- work in progress, does not work with elasticsearch clusters yet
* backup/recovery: adding rabbitmq (without messages)

* backup/recovery: adding rabbitmq (archive fix)
sk4zuzu and others added 3 commits June 29, 2020 13:10
* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring kibana is always down
- deleting all indices prior to restore

* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring all filebeat instances are always down

* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring kibana and filebeat instances will not be started (via reboot) during restore
@sk4zuzu sk4zuzu force-pushed the feature/backup-restore branch from 70760a5 to d5a36ef Compare June 29, 2020 11:18
@sk4zuzu sk4zuzu dismissed mkyc’s stale review June 29, 2020 11:23

Incomplete review.

Comment on lines +67 to +89
"""Populate input yaml documents with additional required ad-hoc data."""

# Seed the self.configuration_docs
self.configuration_docs = copy.deepcopy(self.input_docs)

# Please notice using DefaultMerger is not needed here, since it is done already at this point.
# We just check if documents are missing and insert default ones without the unneeded merge operation.
for kind in {'configuration/backup', 'configuration/recovery'}:
try:
# Check if the required document is in user inputs
document = select_single(self.configuration_docs, lambda x: x.kind == kind)
except ExpectedSingleResultException:
# If there is no document provided by the user, then fallback to defaults
document = load_yaml_obj(data_types.DEFAULT, 'common', kind)
# Inject the required "version" attribute
document['version'] = VERSION
# Copy the "provider" value from the cluster model
document['provider'] = self.cluster_model.provider
# Save the document for later use
self.configuration_docs.append(document)
finally:
# Copy the "provider" value to the specification as well
document.specification['provider'] = document['provider']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we left so many comments in ready to use code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think comparing to other pojects we don't have enough comments 🤔. Which ones do you want me to remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one. This is only generally question. I don't see many places commented like this. I suspect it's a new quality.

@rafzei rafzei self-requested a review June 29, 2020 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants