Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery from data centre failure in multi-dc not working #420

Open
mikenorgate opened this issue Apr 24, 2020 · 2 comments
Open

Recovery from data centre failure in multi-dc not working #420

mikenorgate opened this issue Apr 24, 2020 · 2 comments
Labels

Comments

@mikenorgate
Copy link

I am doing some tests to see if antidote would be a suitable solution for us. I am running some tests with a setup based on this docker compose example https://github.com/AntidoteDB/docker-antidote/blob/master/compose-files/dc3n1/docker-compose.yml

I am testing the scenario where a single data centre becomes unavailable and I want to see what the recovery behaviour is. However I have found the upon restarting the failed DC it does not catch up with changes that it has missed.

Here is how I'm doing my test:

  • docker stop dc2n1
  • Increment a counter in dc1n1
  • docker start dc2n1

After restarting dc2n1 the following is output in the logs and the line Buffering txn in {{'antidote@dc1n1', is repeated forever

2020-04-24T13:05:00.728870+00:00 notice:
disk_log: repairing "/antidote-data/data/data_antidote/0--0.LOG" ...

2020-04-24T13:05:00.756855+00:00 notice:
disk_log: repairing "/antidote-data/data/data_antidote/730750818665451459101842416358141509827966271488--730750818665451459101842416358141509827966271488.LOG" ...

2020-04-24T13:05:01.790063+00:00 info:
Starting heartbeat sender timers
2020-04-24T13:05:01.791126+00:00 notice:
Forgetting DC {'antidote@dc3n1',
                  {1587,733399,89049}}
2020-04-24T13:05:01.791356+00:00 notice:
Forgetting DC {'antidote@dc1n1',
                  {1587,733359,875651}}
2020-04-24T13:05:01.791527+00:00 info:
Observing DC {'antidote@dc3n1',
                 {1587,733399,89049}}
2020-04-24T13:05:07.827192+00:00 info:
Observing DC {'antidote@dc1n1',
                 {1587,733359,875651}}
2020-04-24T13:05:10.434336+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434412+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434554+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434580+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434671+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434691+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:13.522205+00:00 info:
Connected to DC {'antidote@dc3n1',
                    {1587,733399,89049}}
2020-04-24T13:05:13.553140+00:00 info:
Waiting for DC {'antidote@dc1n1',
                   {1587,733359,875651}}
2020-04-24T13:05:14.473092+00:00 info:
Whoops, lost message. New is 42, last was 27. Asking the remote DC {{'antidote@dc1n1',
                                                                     {1587,
                                                                      733359,
                                                                      875651}},
                                                                    730750818665451459101842416358141509827966271488}
2020-04-24T13:05:14.474678+00:00 critical:
Received unexpected log_reader_resp messages in state normal Message []. State {inter_dc_sub_buf,
                                                                                normal,
                                                                                {{'antidote@dc2n1',
                                                                                  {1587,
                                                                                   733385,
                                                                                   215012}},
                                                                                 730750818665451459101842416358141509827966271488},
                                                                                init,
                                                                                {[],
                                                                                 []},
                                                                                true}
2020-04-24T13:05:14.553899+00:00 info:
Waiting for DC {'antidote@dc1n1',
                   {1587,733359,875651}}
2020-04-24T13:05:15.474089+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
2020-04-24T13:05:15.554915+00:00 info:
Connected to DC {'antidote@dc1n1',
                    {1587,733359,875651}}
2020-04-24T13:05:15.615894+00:00 info:
Starting heartbeat sender timers
2020-04-24T13:05:15.616401+00:00 info:
    application: antidote
    started_at: 'antidote@dc2n1'
2020-04-24T13:05:15.686080+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686282+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686460+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686635+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686875+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686993+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:16.474943+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
2020-04-24T13:05:17.476105+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
2020-04-24T13:05:18.476976+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
2020-04-24T13:05:19.477951+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
2020-04-24T13:05:20.479036+00:00 info:
Buffering txn in {{'antidote@dc1n1',
                      {1587,733359,875651}},
                  730750818665451459101842416358141509827966271488}
@peterzeller
Copy link
Member

@dajenet This might be related to what you are working on. Can you check if you can reproduce this problem and whether it is fixed on your branch?

@dajenet
Copy link
Contributor

dajenet commented Apr 27, 2020

Looks like the problem I encountered. I will take a look at it.

@albsch albsch added the bug label Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants