Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(client): setup separate SyncArbiter for ViewClientActor with 4 threads #2970

Merged
merged 8 commits into from
Jul 11, 2020

Conversation

mikhailOK
Copy link
Contributor

Change from #2752 + allow storage failures in code reachable from view client.

Test plan

Run existing tests

@gitpod-io
Copy link

gitpod-io bot commented Jul 10, 2020

@mikhailOK mikhailOK force-pushed the actix_threads branch 2 times, most recently from c527f90 to e08a5ee Compare July 10, 2020 06:54
@@ -32,6 +34,8 @@ impl ShutdownableThread {
impl Drop for ShutdownableThread {
fn drop(&mut self) {
self.shutdown();
// Leaving some time for all threads to stop after system is stopped.
thread::sleep(Duration::from_millis(100));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it is not a solution, but a symptomatic patch. Is there an issue about this?

Copy link
Contributor

@MaksymZavershynskyi MaksymZavershynskyi Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should be explicitly waiting for all threads to stop.

tests/test_tps_regression.rs Outdated Show resolved Hide resolved
neard/src/main.rs Outdated Show resolved Hide resolved
…reads

Change from 2752 + allow storage failures in code reachable from view
client.

Test plan
---------
Run existing tests
neard/tests/rpc_nodes.rs Outdated Show resolved Hide resolved
@mikhailOK mikhailOK linked an issue Jul 10, 2020 that may be closed by this pull request
) -> Addr<ViewClientActor> {
let request_manager = Arc::new(RwLock::new(ViewClientRequestManager::new()));
SyncArbiter::start(config.view_client_threads, move || {
// ViewClientActor::start_in_arbiter(&Arbiter::current(), move |_ctx| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this line be removed?

@@ -171,5 +166,5 @@ pub fn start_with_config(

trace!(target: "diagnostic", key="log", "Starting NEAR node with diagnostic activated");

(client_actor, view_client)
(client_actor, view_client, vec![client_arbiter, arbiter])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to keep the arbiters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we join them at the end, it seems like System::run blocks until stop signal and we want to wait for all threads to end, but not 100% sure it's needed

Copy link
Contributor

@MaksymZavershynskyi MaksymZavershynskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also fix #2948 ?

@mikhailOK
Copy link
Contributor Author

does not fix, only makes it affect chain store

Copy link
Contributor

@MaksymZavershynskyi MaksymZavershynskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncArbiter seems to be the right thing to use here, but does it mean that every other actor in our node runs on system arbiter, including the network and the RPC server? This would mean we are using a single OS thread for everything, but view clients. CC @pmnoxx

I'll delegate reviewing concurrency in this code to @pmnoxx and @frol

Also, @damons mentioned that our storage might allow dirty reads (we are using batched operations, instead of transactions API in rocksdb) so this PR might expose this issue now. If he is right, prepare for nodes returning garbage through RPC. CC @ailisp .

@mikhailOK
Copy link
Contributor Author

Network spawns many threads since #2772
Regular client runs in one thread until we make the bigger storage change
JsonRpc requests that go to view client can now fail with these random storage errors, but we don't know yet how common they will be in practice

@MaksymZavershynskyi
Copy link
Contributor

Our JSONRpc server (not to be confused with ViewClienActor) still uses system arbiter, is it correct? If it is so then it means we have one thread for all RPC and regular ClientActor. CC @frol

JsonRpc requests that go to view client can now fail with these random storage errors, but we don't know yet how common they will be in practice

This is quite dangerous. Can we make sure our ViewClientActor does not serve torn data? If it is the case then lots of tools built on top of it will start failing, e.g. the bridge.

@SkidanovAlex
Copy link
Collaborator

This is quite dangerous. Can we make sure our ViewClientActor does not serve torn data? If it is the case then lots of tools built on top of it will start failing, e.g. the bridge.

Our database is generally set up in a way that we do not modify stuff, we only insert and delete (except for very few exceptions). Correspondingly, the ViewClient will either serve your request successfully (all the data it expected was present throughout), or will fail.

Mostly failures will be due to concurrent GC, in which case the same request sent in ~1 second will also fail anyway.

The alternatives to this fix are:

  1. Implement proper snapshot isolation level. Mikhail is working on it, but will not be done on the timeframe for Phase 1
  2. Keep ViewClientActor with ClientActor on the same thread. That makes it very easy to practically stall block production, even if we disable JsonRPC (via e.g. state sync requests or transactions propagation).

Let's observe how ViewClient actually behaves with this change. In expectation, we will not observe any transient errors.

@MaksymZavershynskyi
Copy link
Contributor

Correspondingly, the ViewClient will either serve your request successfully (all the data it expected was present throughout), or will fail.

Since we have multiple columns in the database can it be possible that while block production or block import is happening and we have populated only some of the columns the ViewClient will swoop in and read fresh data from some columns and old data from other columns?

Let's observe how ViewClient actually behaves with this change. In expectation, we will not observe any transient errors.

I would prefer if we were certain before merging in. I am launching a persistent bridge today, and some of our partners will be using it. It would not look nice on us if it breaks because one of our RPC returns garbage data occasionally.

@SkidanovAlex
Copy link
Collaborator

SkidanovAlex commented Jul 10, 2020

Since we have multiple columns in the database can it be possible that while block production or block import is happening and we have populated only some of the columns the ViewClient will swoop in and read fresh data from some columns and old data from other columns?

@mikhailOK can correct me if I'm wrong, but I believe that all the affected end points in the ViewClient have the following pattern:

  1. Read A from the storage (say a block). If it doesn't exist, error out
  2. Read B from the storage (say a chunk) that we know exists given A exists. Before this change: panic if failed to read. After this change: trace and error out if failed to read.

This "read B" part can fail for two reasons:

a) We GCed B after A was read. Note that GC would not GC B before it GCed A, but naturally the timing could have been:

VC reads A -> GC deletes A -> GC deleted B -> VC reads B.

In this case the ViewClient will fail with this change (and would not fail before this change because VC and GC are were in the same thread), but it is OK, since A is GCed, such a request is expected to start failing.

b) We inserted A before ViewClient read it, but haven't had a chance to insert B. Indeed, if the block production was not writing atomically, it would be possible:

BP writes A -> VC reads A -> VC attempts to read B -> BP writes B

I see you above mention that we do not use Transaction API, and have a concern that it might imply a possibility for dirty reads. Fortunately, in RocksDB batched writes are atomic, so dirty reads are impossible (second paragraph here).

Note that even with atomic writes, it is still possible to have:

VC reads A -> (atomically BP writes A, BP writes B)

But it's OK, because VC will fail on reading A, and will not attempt to read B, so this data race is also acceptable.

@pmnoxx
Copy link
Contributor

pmnoxx commented Jul 11, 2020

SyncArbiter seems to be the right thing to use here, but does it mean that every other actor in our node runs on system arbiter, including the network and the RPC server? This would mean we are using a single OS thread for everything, but view clients. CC @pmnoxx

I'll delegate reviewing concurrency in this code to @pmnoxx and @frol

Also, @damons mentioned that our storage might allow dirty reads (we are using batched operations, instead of transactions API in rocksdb) so this PR might expose this issue now. If he is right, prepare for nodes returning garbage through RPC. CC @ailisp .

Http server starts it's own thread for accepting connections and workers.
PeerManagerActor start in arbiters, which uses it's own thread, and each Peer also has it's own dedicated Arbiter.
As far as I see, everything else runs on the main thread.

There is a bug in Actix library. It looks like Actix doesn't guarantee fairness. While there are messages in mailbox, Actix will prioritize processing messages until mailbox is empty. Execution of any other actors or tasks scheduled with run_later will be delayed.

@bowenwang1996
Copy link
Collaborator

@mikhailOK is there something that stops us from merging this PR?

Copy link
Contributor

@pmnoxx pmnoxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SkidanovAlex SkidanovAlex merged commit a14446f into master Jul 11, 2020
@SkidanovAlex SkidanovAlex deleted the actix_threads branch July 11, 2020 23:58
@MaksymZavershynskyi
Copy link
Contributor

Thanks @pmnoxx . Since @pmnoxx and @frol approved it, you don't need my approval.

@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented Jul 13, 2020 via email

bowenwang1996 added a commit that referenced this pull request Jul 28, 2020
We currently call `genesis_state` instead `Chain::new`, which means every time we initialize client or view client we will compute genesis state again. Since #2970 we start 4 view client actors and one client actor, so it means that we could be calling `gensis_state` five times at the same time, which consumes too much memory and doesn't make any sense. This PR fixes it by computing genesis state only once on initialization.

Test plan
---------
Manually verify that with this fix, we can start a testnet node without needing an absurd amount of memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split VlewClient and Client into separate threads
7 participants