rqlited does not load #260

omerkorenn · 2017-01-16T16:24:38Z

Hi,
I preloaded a single node of rqlited (created a 1.5gb rafb.db file).
After gracefully shutdown of the process i tried to reload it, but the process ends with "set peer timeout expired" before all the data is being loaded to memory.

Please advise,

Thanks !!!

See the log here:
sudo ./rqlited --http 0.0.0.0:4001 ./node1

        _ _ _           _
       | (_) |         | |

_ __ __ | || |_ ___ | |
| '/ _ | | | / _ / _ | The lightweight, distributed
| | | (| | | | || __/ (| | relational database.
|| _, |||___|_,|
| |
|_|

[rqlited] 2017/01/16 16:22:49 rqlited starting, version v3.9.2, commit 46589e9, branch master
[store] 2017/01/16 16:22:49 SQLite in-memory database opened
[store] 2017/01/16 16:22:49 enabling single-node mode
[tcp] 2017/01/16 16:22:49 mux serving on 127.0.0.1:4002, advertising 127.0.0.1:4002
[cluster] 2017/01/16 16:22:49 service listening on 127.0.0.1:4002
2017/01/16 16:22:49 [INFO] raft: Node at 127.0.0.1:4002 [Follower] entering Follower state (Leader: "")
[rqlited] 2017/01/16 16:22:50 failed to set peer for 127.0.0.1:4002 to 0.0.0.0:4001: no leader available (retrying)
2017/01/16 16:22:50 [WARN] raft: Heartbeat timeout from "" reached, starting election
2017/01/16 16:22:50 [INFO] raft: Node at 127.0.0.1:4002 [Candidate] entering Candidate state
2017/01/16 16:22:50 [DEBUG] raft: Votes needed: 1
2017/01/16 16:22:50 [DEBUG] raft: Vote granted from 127.0.0.1:4002. Tally: 1
2017/01/16 16:22:50 [INFO] raft: Election won. Tally: 1
2017/01/16 16:22:50 [INFO] raft: Node at 127.0.0.1:4002 [Leader] entering Leader state
2017/01/16 16:22:50 [DEBUG] raft: Node 127.0.0.1:4002 updated peer set (2): [127.0.0.1:4002]
2017/01/16 16:22:50 [DEBUG] raft: Node 127.0.0.1:4002 updated peer set (2): [127.0.0.1:4002]
2017/01/16 16:22:50 [DEBUG] raft: Node 127.0.0.1:4002 updated peer set (2): [127.0.0.1:4002]
[cluster] 2017/01/16 16:23:01 received connection from 127.0.0.1:45257
[rqlited] 2017/01/16 16:23:11 failed to set peer for 127.0.0.1:4002 to 0.0.0.0:4001: timed out enqueuing operation (retrying)
[cluster] 2017/01/16 16:23:21 received connection from 127.0.0.1:45258
[rqlited] 2017/01/16 16:23:31 failed to set peer for 127.0.0.1:4002 to 0.0.0.0:4001: timed out enqueuing operation (retrying)
[rqlited] 2017/01/16 16:23:31 failed to set peer for localhost:4002 to 0.0.0.0:4001: set peer timeout expired

The text was updated successfully, but these errors were encountered:

otoolep · 2017-01-17T21:14:03Z

Thanks for the report @omerkorenn -- I will take a look.

otoolep · 2017-01-18T05:47:33Z

@omerkorenn -- did you load the your node via this technique?

https://github.com/rqlite/rqlite/blob/master/doc/RESTORE_FROM_SQLITE.md

otoolep · 2017-01-18T05:54:59Z

If you are loading from a SQLite dump, I would guess you're hitting the 10-second timeout at this line of code:

https://github.com/rqlite/rqlite/blob/master/store/store.go#L441

I can make this configurable, so you can increase it.

otoolep · 2017-01-18T06:17:59Z

@omerkorenn -- can you build top of tree, as per these instructions:

https://github.com/rqlite/rqlite/blob/master/CONTRIBUTING.md#building-rqlite

If so, please launch rqlite with a larger timeout like so:

rqlited -raftapplytimeout 30s

30 seconds is a suggestion, you might need to go higher. I'd be interested in knowing what value you need. v3.9.2 has a 10 second timeout.

otoolep · 2017-01-21T04:37:43Z

OK, I'm going to assume this is solved.

omerkorenn · 2017-01-21T11:10:34Z

Hi,

Sorry for the late response.

The timeout is indeed the problem.
It is hard to know how much time is needed to load the data.
It really machine/dataset dependent, anyway i aim to use a ~32GB dataset.

Do you think it is possible to overcome the problem by trying to connect to the raft after loading the data instead of in parallel ?

otoolep · 2017-01-21T17:05:20Z

Did you try any increasing the timeout at all?

omerkorenn · 2017-01-22T08:31:31Z

Yeah sure, it worked with 120s

otoolep · 2017-01-22T17:37:28Z

OK, 120s is kinda long. I don't follow your suggestion about "parallel". Can you explain more?

You could split the file you're loading into multiple smaller files, which will help you keep your timeout setting low.

omerkorenn · 2017-01-22T17:58:12Z

Thanks i'll try this.

I saw that the store.open(..)
https://github.com/rqlite/rqlite/blob/master/cmd/rqlited/main.go#L176
returns before loading all data to memory, and publishAPIAddr is being called after this
https://github.com/rqlite/rqlite/blob/master/cmd/rqlited/main.go#L216
causing the process to fail after certain timeout.

So i thought maybe it is possible some how to wait the store to be ready before calling publishAPIAddr.

But i'm not sure about it

otoolep · 2017-01-22T18:26:04Z

Interesting @omerkorenn -- you might be onto something there, though I would need to confirm that Open doesn't block until all Raft log messages have been applied.

Let me re-open to investigate.

otoolep · 2017-02-03T06:56:18Z

@omerkorenn -- I have confirmed that NewRaft does return before all the log entries have been applied. I'm trying to see if there is a way to allow it to block instead.

Better fix for issue #260.

otoolep · 2017-02-03T07:45:49Z

@omerkorenn -- top of tree solves this problem correctly. It waits, by default, up to 120 seconds for the initial state to be applied. You no longer need to set -raftapplytimeout. If 120 seconds is not sufficiently long, you can increase it.

otoolep mentioned this issue Jan 18, 2017

Allow Raft Apply timeout to be configurable #261

Merged

otoolep closed this as completed Jan 21, 2017

otoolep reopened this Jan 22, 2017

otoolep added a commit that referenced this issue Feb 3, 2017

Allow Store to wait for initial logs to be applied

059ab07

Better fix for issue #260.

otoolep closed this as completed Feb 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rqlited does not load #260

rqlited does not load #260

omerkorenn commented Jan 16, 2017

otoolep commented Jan 17, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 21, 2017

omerkorenn commented Jan 21, 2017

otoolep commented Jan 21, 2017

omerkorenn commented Jan 22, 2017

otoolep commented Jan 22, 2017

omerkorenn commented Jan 22, 2017

otoolep commented Jan 22, 2017

otoolep commented Feb 3, 2017

otoolep commented Feb 3, 2017

rqlited does not load #260

rqlited does not load #260

Comments

omerkorenn commented Jan 16, 2017

otoolep commented Jan 17, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 18, 2017

otoolep commented Jan 21, 2017

omerkorenn commented Jan 21, 2017

otoolep commented Jan 21, 2017

omerkorenn commented Jan 22, 2017

otoolep commented Jan 22, 2017

omerkorenn commented Jan 22, 2017

otoolep commented Jan 22, 2017

otoolep commented Feb 3, 2017

otoolep commented Feb 3, 2017