Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

yarongilor · 2024-09-23T14:31:55Z

Packages

Scylla version: 2024.2.0~rc2-20240904.4c26004e5311 with build-id a8549197de3c826053f88ddfd045b365b9cd8692

Kernel Version: 5.15.0-1068-aws

Issue description

The backup restore failed with error:

restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0

The restore task was started like:

< t:2024-09-19 15:13:31,266 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: restore/08395d3e-1492-4af3-86dc-d9b0b03039fc
< t:2024-09-19 15:13:31,564 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.255Z","N":"restore","M":"Initialized views","views":null,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,564 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.257Z","N":"scheduler","M":"PutTask","task":"restore/08395d3e-1492-4af3-86dc-d9b0b03039fc","schedule":{"cron":"{\"spec\":\"\",\"start_date\":\"0001-01-01T00:00:00Z\"}","window":null,"timezone":"Etc/UTC","start_date":"0001-01-01T00:00:00Z","interval":"","num_retries":3,"retry_wait":"10m"},"properties":{"location":["s3:manager-backup-tests-permanent-snapshots-us-east-1"],"restore_schema":true,"snapshot_tag":"sm_20240812164539UTC"},"create":true,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,565 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.264Z","N":"scheduler.4253a65e","M":"Schedule","task":"restore/08395d3e-1492-4af3-86dc-d9b0b03039fc","in":"0s","begin":"2024-09-19T15:13:31.264Z","retry":0,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,565 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.264Z","N":"http","M":"POST /api/v1/cluster/4253a65e-2c97-48dc-a939-7c7590741a75/tasks","from":"127.0.0.1:34234","status":201,"bytes":0,"duration":"3766ms","_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}

then failed:

< t:2024-09-19 15:13:35,364 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Command "sudo sctool  -c 4253a65e-2c97-48dc-a939-7c7590741a75 progress restore/08395d3e-1492-4af3-86dc-d9b0b03039fc" finished with status 0
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Restore progress
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Run:              bb1652f4-7699-11ef-bc2a-0a833fefb519
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Status:           ERROR (restoring backed-up data)
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Cause:            restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Start time:       19 Sep 24 15:13:31 UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > End time: 19 Sep 24 15:13:33 UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Duration: 2s
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Progress: 0% | 0%
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20240812164539UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ╭───────────────┬──────────┬──────────┬─────────┬────────────┬────────╮
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > │ Keyspace      │ Progress │     Size │ Success │ Downloaded │ Failed │
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ├───────────────┼──────────┼──────────┼─────────┼────────────┼────────┤
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > │ system_schema │  0% | 0% │ 352.731k │       0 │          0 │      0 │
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ╰───────────────┴──────────┴──────────┴─────────┴────────────┴────────╯
< t:2024-09-19 15:13:35,364 f:cli.py          l:1148 c:sdcm.mgmt.cli        p:DEBUG > sctool res after parsing: [['Restore progress'], ['Run: bb1652f4-7699-11ef-bc2a-0a833fefb519'], ['Status: ERROR (restoring backed-up data)'], ['Cause: restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {\'class\': \'org.apache.cassandra.locator.NetworkTopologyStrategy\', \'us-east\': \'3\'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0'], ['Start time: 19 Sep 24 15:13:31 UTC'], ['End time: 19 Sep 24 15:13:33 UTC'], ['Duration: 2s'], ['Progress: 0%', '0%'], ['Snapshot Tag: sm_20240812164539UTC'], ['Keyspace', 'Progress', 'Size', 'Success', 'Downloaded', 'Failed'], ['system_schema', '0%', '0%', '352.731k', '0', '0', '0']]

2024-09-19 15:13:39.530: (DisruptionEvent Severity.ERROR) period_type=end event_id=74265a87-4830-422e-a42f-7081a9ec6230 duration=58s: nemesis_name=MgmtRestore target_node=Node alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3 [34.242.246.113 | 10.4.3.150] errors=Schema restoration of sm_20240812164539UTC has failed!
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5207, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2966, in disrupt_mgmt_restore
    assert restore_task.status == TaskStatus.DONE, \
AssertionError: Schema restoration of sm_20240812164539UTC has failed!

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6 (18.202.235.208 | 10.4.3.36) (shards: -1)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5 (54.75.40.118 | 10.4.3.65) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-4 (34.241.184.210 | 10.4.0.247) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3 (34.242.246.113 | 10.4.3.150) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2 (108.129.126.116 | 10.4.1.130) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1 (34.245.137.137 | 10.4.1.50) (shards: 14)

OS / Image: ami-0555cb82c50d0d5f1 (aws: undefined_region)

Test: longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test id: 4afc0c3a-7457-4d8b-a69a-8ee387d26369
Test name: enterprise-2024.2/alternator_tablets/longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 4afc0c3a-7457-4d8b-a69a-8ee387d26369
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 4afc0c3a-7457-4d8b-a69a-8ee387d26369

Logs:

alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5 - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_105715/alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5-4afc0c3a.tar.gz
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6 - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_105715/alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6-4afc0c3a.tar.gz
db-cluster-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/db-cluster-4afc0c3a.tar.gz
sct-runner-events-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/sct-runner-events-4afc0c3a.tar.gz
2024_09_19__10_57_16_766.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__10_57_16_766.sct-4afc0c3a.log.gz
2024_09_19__16_16_55_387.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_16_55_387.sct-4afc0c3a.log.gz
2024_09_19__16_25_22_374.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_25_22_374.sct-4afc0c3a.log.gz
2024_09_19__16_33_50_995.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_33_50_995.sct-4afc0c3a.log.gz
2024_09_19__16_42_17_873.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_42_17_873.sct-4afc0c3a.log.gz
2024_09_19__16_50_45_800.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_50_45_800.sct-4afc0c3a.log.gz
loader-set-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/loader-set-4afc0c3a.tar.gz
monitor-set-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/monitor-set-4afc0c3a.tar.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-42-50.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16312.1726763829000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16312.1726763829000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-46-15.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16648.1726764141000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16648.1726764141000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-47-09.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16354.1726763992000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16354.1726763992000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1-2024-09-19_16-49-59.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.17956.1726764362000000/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.17956.1726764362000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-51-03.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16850.1726764281000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16850.1726764281000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-51-18.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16915.1726764444000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16915.1726764444000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1-2024-09-19_16-54-13.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.18246.1726764508000000/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.18246.1726764508000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-55-19.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.17151.1726764570000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.17151.1726764570000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-56-12.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.17296.1726764733000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.17296.1726764733000000.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

Michal-Leszczynski · 2024-09-24T06:41:26Z

Well, this is expected.
SM restores schema by applying the output of DESC SCHEMA WITH INTERNALS queried during the backup.
The problem is that schema contains topology related information - dcs in which the keyspace is replicated.
So in order to use SM restore schema task, the restore destination cluster needs to consist of the same dcs as the backed up cluster.

A workaround is to take the schema file from backup location, modify it to fit your needs, and apply it manually.

karol-kokoszka · 2024-09-30T09:37:38Z

@yarongilor Is there anything you suggest to change in Scylla Manager ? As per #4041 (comment) this is expected behavior of the manager.

It looks that there is no datacenter of the "us-east" name in the destination cluster.

# cassandra-rackdc.properties
#
# The lines may include white spaces at the beginning and the end.
# The rack and data center names may also include white spaces.
# All trailing and leading white spaces will be trimmed.
#
dc=thedatacentername
rack=therackname
# prefer_local=<false | true>
# dc_suffix=<Data Center name suffix, used by EC2SnitchXXX snitches>

yarongilor · 2024-09-30T10:17:27Z

@roydahan , @fruch , is there any known resolution for this issue?
The test ran on eu-west-1 region (with Datacenter: eu-west) and failed restoring backup to us-east datacenter. Is it a matter of wrong selected region to test? or it require an SCT fix?

roydahan · 2024-09-30T12:07:24Z

It's not a new issue, mostly a usability issue.
@mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

Michal-Leszczynski · 2024-09-30T13:07:41Z

Issue about restoring schema into a differenct DC setting: #4049.

fruch · 2024-09-30T15:22:19Z

Issue about restoring schema into a differenct DC setting: #4049.

So currently the user is supposed to do the schema restore manually

@mikliapko so I'll say we should at least skip the nemesis if the region of the snapshots doesn't match.

At least until it would be implemented on the test end or manager end.

mikliapko · 2024-10-01T09:21:25Z

It's not a new issue, mostly a usability issue. @mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

I don't remember we have an issue for that.
Created the new one for me to correctly handle the case when the region of the snapshots doesn't match.
#4052

rayakurl · 2024-10-01T09:58:15Z

@mikliapko - IMO we can plan for a workaround, depends when this issue will be fixed from Manager side.
@karol-kokoszka , @Michal-Leszczynski - please discuss this in the next Manager refinement meeting. If it's not going to be handled soon, @mikliapko will create a WA in the test for it.

fruch · 2024-10-01T12:03:51Z

It's not a new issue, mostly a usability issue. @mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

I don't remember we have an issue for that. Created the new one for me to correctly handle the case when the region of the snapshots doesn't match. #4052

there was an issue about this, long long ago:
https://github.com/scylladb/qa-tasks/issues/1477

I don't know if anything was done to try to apply any workaround.

timtimb0t · 2024-10-22T13:05:10Z

The issue reproduced at https://jenkins.scylladb.com/job/scylla-master/job/tier1/job/longevity-mv-si-4days-streaming-test/7/
plus latest run

yarongilor added backup restore and removed backup labels Sep 23, 2024

Michal-Leszczynski mentioned this issue Sep 30, 2024

Make it possible to restore schema into a different DC setting #4049

Open

mikliapko mentioned this issue Oct 1, 2024

Fix restore nemesis tests to correctly handle the case when the region of the snapshots doesn't match #4052

Open

yarongilor assigned mikliapko Oct 14, 2024

fruch mentioned this issue Nov 11, 2024

[tablets] Decommission failure: Failed to drain tablets: std::runtime_error (Unable to find new replica for tablet) scylladb/scylladb#19504

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

yarongilor commented Sep 23, 2024

Logs:

Michal-Leszczynski commented Sep 24, 2024

karol-kokoszka commented Sep 30, 2024 •

edited

Loading

yarongilor commented Sep 30, 2024 •

edited

Loading

roydahan commented Sep 30, 2024

Michal-Leszczynski commented Sep 30, 2024

fruch commented Sep 30, 2024

mikliapko commented Oct 1, 2024

rayakurl commented Oct 1, 2024

fruch commented Oct 1, 2024

timtimb0t commented Oct 22, 2024 •

edited

Loading

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Comments

yarongilor commented Sep 23, 2024

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

Michal-Leszczynski commented Sep 24, 2024

karol-kokoszka commented Sep 30, 2024 • edited Loading

yarongilor commented Sep 30, 2024 • edited Loading

roydahan commented Sep 30, 2024

Michal-Leszczynski commented Sep 30, 2024

fruch commented Sep 30, 2024

mikliapko commented Oct 1, 2024

rayakurl commented Oct 1, 2024

fruch commented Oct 1, 2024

timtimb0t commented Oct 22, 2024 • edited Loading

karol-kokoszka commented Sep 30, 2024 •

edited

Loading

yarongilor commented Sep 30, 2024 •

edited

Loading

timtimb0t commented Oct 22, 2024 •

edited

Loading