Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem while trying to backup (legacy) #2353

Closed
vsfomin opened this issue Aug 17, 2023 · 25 comments
Closed

Problem while trying to backup (legacy) #2353

vsfomin opened this issue Aug 17, 2023 · 25 comments

Comments

@vsfomin
Copy link

vsfomin commented Aug 17, 2023

Self-Hosted Version

20.11.0

CPU Architecture

x86_64

Docker Version

24.0.5

Docker Compose Version

v2.20.2

Steps to Reproduce

cd /opt/docker/sentry
sudo docker-compose run -v $(pwd)/sentry:/sentry-data/backup  --rm -T -e SENTRY_LOG_LEVEL=CRITICAL web export /sentry-data/backup/backup.json
Creating sentry_onpremise_web_run ... done
/usr/local/lib/python2.7/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
  from cryptography import x509
>> Beginning export
Error: Could not open file /sentry-data/backup/backup.json: Permission denied

Expected Result

Successfully backing up all data to backup.json file

Actual Result

Error: Could not open file /sentry-data/backup/backup.json: Permission denied while trying to backup (legacy style)
For some reason, permission denied error occured.
My goal is make a backup and use this backup for restoring on new Rocky Linux 9 server with Sentry 20.11, after that I will update it to 21.5.0 -> 21.6.3 -> 23.6.2 -> latest. Just git checkout tags/ then ./install.sh for this goal if I'm not mistaken.

If this type of error if unrealize to solve, maybe you can you explain how I should use volume back up in Full Backup style on sentry-clickhouse example? This is not clear for me.

s24765:sentry root# docker volume inspect sentry-clickhouse
[
    {
        "CreatedAt": "2023-06-08T13:30:28+03:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/sentry-clickhouse/_data",
        "Name": "sentry-clickhouse",
        "Options": null,
        "Scope": "local"
    }
]

docker run --rm --volumes-from -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar <what shoul I type there?? Is this full path of mountpoint?>

Event ID

No response

@azaslavsky
Copy link
Contributor

Is it possible that you already have a backup.json file at that location with restricted permissions? Or that there is a system level permission restriction?

@vsfomin
Copy link
Author

vsfomin commented Aug 18, 2023

@azaslavsky yes, the reason was that the folder had the wrong permissions.
I copied backup.json to the new VM in sentry/sentry/ folder. But when I try to restore it the error occurs.

docker-compose run --rm -T web import /etc/sentry/backup.json
...
07:33:00 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
/usr/local/lib/python2.7/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
  from cryptography import x509
07:33:03 [INFO] sentry.plugins.github: apps-not-configured
Traceback (most recent call last):
  File "/usr/local/bin/sentry", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/sentry/runner/__init__.py", line 166, in main
    cli(prog_name=get_prog(), obj={}, max_content_width=100)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sentry/runner/decorators.py", line 30, in inner
    return ctx.invoke(f, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sentry/runner/commands/backup.py", line 16, in import_
    obj.save()
  File "/usr/local/lib/python2.7/site-packages/django/core/serializers/base.py", line 205, in save
    models.Model.save_base(self.object, using=using, raw=True, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 838, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 905, in _save_table
    forced_update)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 955, in _do_update
    return filtered._update(values) > 0
  File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 667, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 1204, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
    raise original_exception
django.db.utils.IntegrityError: UniqueViolation('duplicate key value violates unique constraint "django_content_type_app_label_model_76bd3d3b_uniq"\nDETAIL:  Key (app_label, model)=(sentry, ruleactivity) already exists.\n',)
SQL: UPDATE "django_content_type" SET "app_label" = %s, "model" = %s WHERE "django_content_type"."id" = %s

What can I do to resolve it?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Aug 18, 2023
@azaslavsky
Copy link
Contributor

If you go into the JSON file itself, are there any models that start with something besides `sentry.*"? Which ones in particular?

@vsfomin
Copy link
Author

vsfomin commented Aug 18, 2023

@azaslavsky
yes, there are two models which start with something besides "sentry.*" "model": "contenttypes.contenttype", "model": "auth.permission",

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 Aug 18, 2023
@azaslavsky
Copy link
Contributor

Hmm, neither of those should be in there: https://github.com/getsentry/sentry/blob/5b47ccc4bcb416f7cda37e80ad09a114aa89f1a5/src/sentry/backup/helpers.py#L7 🤔. I believe this is the exact reason we put those exclusions in. If you are importing on a fresh instance, I would just manually remove these models from the JSON and try again.

@vsfomin
Copy link
Author

vsfomin commented Aug 18, 2023

I've deleted.
Now I see this:
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
raise original_exception
django.db.utils.IntegrityError: UniqueViolation('duplicate key value violates unique constraint "sentry_useremail_user_id_email_ade975f1_uniq"\nDETAIL: Key (user_id, email)=(6, a.zhulin@<our_company_domain>) already exists.\n',)
SQL: UPDATE "sentry_useremail" SET "user_id" = %s, "email" = %s, "validation_hash" = %s, "date_hash_added" = %s, "is_verified" = %s WHERE "sentry_useremail"."id" = %s

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Aug 18, 2023
@azaslavsky
Copy link
Contributor

I would delete those line as well, see #2259 (comment). We're working on improving this script by 23.10.0, but that would still require a major upgrade on your end to see the benefits.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 Aug 22, 2023
@vsfomin
Copy link
Author

vsfomin commented Aug 23, 2023

Ok, my mistake was in path where tar extract files.
This is proper commands to use for full backup:

docker run --rm --volumes-from sentry_onpremise_clickhouse_1 -v $(pwd):/backup ubuntu bash -c "cd /var && tar xvf /backup/backup_clickhouse.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_web_1 -v $(pwd):/backup ubuntu bash -c "cd /data && tar xvf /backup/backup_web.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_postgres_1 -v $(pwd):/backup ubuntu bash -c "cd /var && tar xvf /backup/backup_postgres.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_redis_1 -v $(pwd):/backup ubuntu bash -c "cd /data && tar xvf /backup/backup_redis.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_zookeeper_1 -v $(pwd):/backup ubuntu bash -c "cd /var && tar xvf /backup/backup_zookeeper.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_kafka_1 -v $(pwd):/backup ubuntu bash -c "cd /var && tar xvf /backup/backup_kafka.tar --strip 1"
docker run --rm --volumes-from sentry_onpremise_symbolicator_1 -v $(pwd):/backup ubuntu bash -c "cd /data && tar xvf /backup/backup_symbo.tar --strip 1"

But after that I see the error in web container.
web.txt
sentry.models.project.DoesNotExist: Project matching query does not exist

@williamdes
Copy link
Contributor

williamdes commented Aug 23, 2023

I made a nice script that does not require Docker to work. Or the overkill image ubuntu that could be alpine:latest

@vsfomin
Copy link
Author

vsfomin commented Aug 23, 2023

@williamdes Do you have script which do the restore?

@williamdes
Copy link
Contributor

@williamdes Do you have script which do the restore?

Not yet, but I think I will have to make it today/this week as the goal is to move ingest traffic to another node and perform an upgrade

@vsfomin
Copy link
Author

vsfomin commented Aug 23, 2023

@williamdes I'd really appreciate if you'd paste it here

@azaslavsky
Copy link
Contributor

The backup script is intended to restore "low-volume" information, like orgs, teams, and projects, not "high-volume" data like events and issues. So you won't see a full restore.

We are currently working on vastly improving this script and supporting full backup capabilities: getsentry/team-ospo#153

@vsfomin
Copy link
Author

vsfomin commented Aug 25, 2023

@azaslavsky Will this script work for older versions of Sentry?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Aug 25, 2023
@williamdes
Copy link
Contributor

The backup script is intended to restore "low-volume" information, like orgs, teams, and projects, not "high-volume" data like events and issues. So you won't see a full restore.

Can you explain why?
I did intentionally forget the postgres volume as its on another external server

@williamdes
Copy link
Contributor

Here is the backup and restore scripts, the backup one changed a bit

#!/bin/bash

set -eu

BACKUP_NAME="backup_$(date --iso-8601=seconds)_"

echo "Backing up prefix: $BACKUP_NAME"

getFolderForVolume() {
  docker inspect "$1" --format '{{ .Mountpoint }}'
}

backupVolume() {
  VOL_DIR="$(getFolderForVolume "$1")"
  echo "Backing up $1: $VOL_DIR"
  # See: https://stackoverflow.com/a/76008162/5155484
  tar --use-compress-program=lz4 -cf ./${BACKUP_NAME}${1}.tar.lz4 -C "$VOL_DIR" .
}

# See: docker volume ls -q
backupVolume "sentry-clickhouse"
backupVolume "sentry-data"
backupVolume "sentry-kafka"
backupVolume "sentry-redis"
backupVolume "sentry-symbolicator"
backupVolume "sentry-zookeeper"

ls -lah ./${BACKUP_NAME}*.tar.lz4

restore.sh

#!/bin/bash

set -eu

read -p "Enter backup name (example: backup_2023-08-27T23:23:48+02:00_): " BACKUP_NAME

echo "Restoring backup with prefix: $BACKUP_NAME"

ls -lah ./${BACKUP_NAME}*.tar.lz4

getFolderForVolume() {
  # Find or create
  docker inspect "$1" --format '{{ .Mountpoint }}' 2> /dev/null || echo ""
}

restoreVolume() {
  VOL_DIR="$(getFolderForVolume "$1")"

  if [ -d "${VOL_DIR}" ]; then
    echo "Cleaning $1: $VOL_DIR"
    docker volume rm "$1" 2> /dev/null 1> /dev/null
  fi

  docker volume create "$1" 2> /dev/null 1> /dev/null
  VOL_DIR="$(getFolderForVolume "$1")"

  if [ ! -d "${VOL_DIR}" ]; then
    echo "The volume $1 seems broken, path: $VOL_DIR"
    exit 1
  fi

  echo "Restoring $1: $VOL_DIR"
  if [ ! -f "./${BACKUP_NAME}${1}.tar.lz4" ]; then
    echo "Could not find the backup: ./${BACKUP_NAME}${1}.tar.lz4"
    exit 1
  fi
  # See: https://stackoverflow.com/a/76008162/5155484
  tar --use-compress-program=lz4 -xf ./${BACKUP_NAME}${1}.tar.lz4 -C "$VOL_DIR"
}

# See: docker volume ls -q
restoreVolume "sentry-clickhouse"
restoreVolume "sentry-data"
restoreVolume "sentry-kafka"
restoreVolume "sentry-redis"
restoreVolume "sentry-symbolicator"
restoreVolume "sentry-zookeeper"

@azaslavsky
Copy link
Contributor

The backup script is intended to restore "low-volume" information, like orgs, teams, and projects, not "high-volume" data like events and issues. So you won't see a full restore.

That's just the nature of the script - it has always been that way. We're working on improving this, as tracked by the linked issue.

If you're asking about the technical reasons, it is just much more difficult to do backup/restore of high-volume models like events, of which there could be on the order of millions. Current backups are realistically limited to a few MBs, whereas included millions of events could push us into 100+GB territory. Many parts of the system (ex, our use of JSON) start to break down at that scale, necessitating a lot more work to ensure reliability.

@vsfomin
Copy link
Author

vsfomin commented Aug 29, 2023

@williamdes for some reason backup script only backups sentry-clickhouse

s24765:sentry root# ./backup_script.sh
Backing up prefix: backup_2023-08-29T11:02:10+0300_
Backing up sentry-clickhouse: /var/lib/docker/volumes/sentry-clickhouse/_data
tar: ./data/default/outcomes_hourly_local: file changed as we read it
tar: ./data/default/sentry_local/20230828-90_7786249_7786249_0: File removed before we read it
tar: ./data/default/sentry_local/20230828-90_7786250_7786250_0: File removed before we read it
tar: ./data/default/sentry_local: file changed as we read it
tar: ./data/default/outcomes_raw_local/20230828_8608329_8617230_7815: File removed before we read it
tar: ./data/default/outcomes_raw_local/20230828_8617232_8617232_0: File removed before we read it
tar: ./data/default/outcomes_raw_local/20230828_8608329_8617231_7816: File removed before we read it
tar: ./data/default/outcomes_raw_local: file changed as we read it
tar: ./data/system/metric_log/202308_10745864_10745864_0: File removed before we read it
tar: ./data/system/metric_log/202308_10745863_10745863_0: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745864_931: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745862_929: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745863_930: File removed before we read it
tar: ./data/system/metric_log: file changed as we read it

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Aug 29, 2023
@williamdes
Copy link
Contributor

@williamdes for some reason backup script only backups sentry-clickhouse

s24765:sentry root# ./backup_script.sh
Backing up prefix: backup_2023-08-29T11:02:10+0300_
Backing up sentry-clickhouse: /var/lib/docker/volumes/sentry-clickhouse/_data
tar: ./data/default/outcomes_hourly_local: file changed as we read it
tar: ./data/default/sentry_local/20230828-90_7786249_7786249_0: File removed before we read it
tar: ./data/default/sentry_local/20230828-90_7786250_7786250_0: File removed before we read it
tar: ./data/default/sentry_local: file changed as we read it
tar: ./data/default/outcomes_raw_local/20230828_8608329_8617230_7815: File removed before we read it
tar: ./data/default/outcomes_raw_local/20230828_8617232_8617232_0: File removed before we read it
tar: ./data/default/outcomes_raw_local/20230828_8608329_8617231_7816: File removed before we read it
tar: ./data/default/outcomes_raw_local: file changed as we read it
tar: ./data/system/metric_log/202308_10745864_10745864_0: File removed before we read it
tar: ./data/system/metric_log/202308_10745863_10745863_0: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745864_931: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745862_929: File removed before we read it
tar: ./data/system/metric_log/202308_10742514_10745863_930: File removed before we read it
tar: ./data/system/metric_log: file changed as we read it

Yes, this is probably why they seem to say the backup script solution is not great
You basically need to shutdown everything to have a non moving backup

@vsfomin
Copy link
Author

vsfomin commented Aug 29, 2023

@williamdes Hm.. for some reason I can't restore:

 root  s24769  opt  sentry   20.11.0  8+  2⚑  #  ./restore.sh
Enter backup name (example: backup_2023-08-27T23:23:48+02:00_): backup_2023-08-29T19:33:48+02:00_
Restoring backup with prefix: backup_2023-08-29T19:33:48+02:00_
-rw-r--r-- 1 fomin fomin 3.6G Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-clickhouse.tar.lz4
-rw-r--r-- 1 fomin fomin 887K Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-data.tar.lz4
-rw-r--r-- 1 fomin fomin 351M Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-kafka.tar.lz4
-rw-r--r-- 1 fomin fomin  53G Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-postgres.tar.lz4
-rw-r--r-- 1 fomin fomin 409K Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-redis.tar.lz4
-rw-r--r-- 1 fomin fomin  313 Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-symbolicator.tar.lz4
-rw-r--r-- 1 fomin fomin  40K Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-zookeeper.tar.lz4
Cleaning sentry-clickhouse: /var/lib/docker/volumes/sentry-clickhouse/_data
Restoring sentry-clickhouse: /var/lib/docker/volumes/sentry-clickhouse/_data
tar (child): lz4: Cannot exec: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now

I try to do docker compose down and docker compose up -d previously.
Above result with down. Below with up -d:


Enter backup name (example: backup_2023-08-27T23:23:48+02:00_): backup_2023-08-29T19:33:48+02:00_
Restoring backup with prefix: backup_2023-08-29T19:33:48+02:00_
-rw-r--r-- 1 fomin fomin 3.6G Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-clickhouse.tar.lz4
-rw-r--r-- 1 fomin fomin 887K Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-data.tar.lz4
-rw-r--r-- 1 fomin fomin 351M Aug 29 18:53 ./backup_2023-08-29T19:33:48+02:00_sentry-kafka.tar.lz4
-rw-r--r-- 1 fomin fomin  53G Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-postgres.tar.lz4
-rw-r--r-- 1 fomin fomin 409K Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-redis.tar.lz4
-rw-r--r-- 1 fomin fomin  313 Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-symbolicator.tar.lz4
-rw-r--r-- 1 fomin fomin  40K Aug 29 18:59 ./backup_2023-08-29T19:33:48+02:00_sentry-zookeeper.tar.lz4
Cleaning sentry-clickhouse: /var/lib/docker/volumes/sentry-clickhouse/_data

UPD: Sorry, my mistake. I need to install lz4.

@vsfomin
Copy link
Author

vsfomin commented Aug 29, 2023

@williamdes Thanks! Your scripts helps me to backup and restore all data and update from 20.11.0 to 23.8.0.

@azaslavsky
Copy link
Contributor

Going to mark this as resolved. Thank you for all of the help @williamdes!

@williamdes
Copy link
Contributor

Awesome, glad it helped
Please Sentry team post us issues we can subscribe to for following the backup subject :)

@azaslavsky
Copy link
Contributor

getsentry/team-ospo#153 is the tracking bug for this. The end goal is a robust backup/restore script that covers all data.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Archived in project
Development

No branches or pull requests

4 participants