-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trouble adding a fourth node - will not become functional #19013
Comments
Downgraded all nodes to v1.0.6 and the fourth node is happy now, accepting and serving requests. |
Thanks for the details, @tomholub! Getting the full dump of logs is fantastic. Just to make sure I'm understanding you, all of the nodes, including the fourth one, were running Do you have the logs from when the fourth node died or from the other nodes? And can you take screenshots of the |
Yes, during the time that this was happening, all nodes were running 1.1-beta.20170928, including the fourth one. I retrieved the logs from n-4 through another node's dashboard during the time that n-4 was running, in between crashes. I have copied the full log until the end of fourth node - or so I believe. You will see two Unfortunately, that cluster was a testing one and I didn't retain any records of it. Note that this was tested on 1GB ram machines (I understand that 2GB is recommended). There were other services running on these nodes, and although other services were idle, I wonder if the 4th node was just running out of memory / crashing for that reason. Either way, that should not affect the remaining nodes in the cluster. Interestingly, the stable |
Also, looking at my records, there was a replication factor of 3 set this way:
(part of a node starting script) |
Alright, I've got a repro and will make sure we get a fix in soon. It looks like we're hitting a case along the lines of #17971. Thanks a ton for the report! |
awesome, I'm glad we got this fixed! By the way, @tomholub it looks like you are running a globally distributed cluster, which is something we caution people about since there are still several really exciting improvements we are making to the product that would make that experience better. @a-robinson is very much leading the charge on those features. How has that been working for you so far? Would love to get your general thoughts to see what we could improve for you? |
TL;DRFlexible write consistency per tx, less memory, more stability. Thanks for this db, it has potential! Full versionI thought roach may be a perfect fit for us, as we don't have lots of throughput and don't require a whole lot of optimization, just a low maintenance geo-distributed db to reduce latency for our users across the globe. Roach does that for reads, but the writes are painfully slow in a geo-distributed db. Here are some of my notes, kind of my x-mas wishlist for a perfect db:
They say that it's worth to solve a problem if potential users are already solving it in their own painful way. We ended up using MariaDB Galera cluster for now - painful indeed, but workable. We can run these on $5 nodes instead of the $20 roach nodes. That means we can afford more nodes, and get closer to our users, globally. Geo-distributed writes are slower, though comparable to roach. We resorted to developing primitive eventually-consistent writes in our app code like this: Consistent writes:
Eventually consistent writes:
All reads:
It allows us to do fast local reads as well as writes, while still having option to do consistent strong writes when needed. But really, we're replicating what I think the db should be able to do itself. If roach could do something like that, with a bit less memory, it would be a solid choice. |
amazing, thanks for the detailed feedback @tomholub Regarding making globally deployed clusters more performant, we have several ideas floating around that might interest you. 1) allowing non-voting replicas to service potentially stale reads, 2) setting a safe max timestamp in which a client can make a safe read of a non-voting replica, 3) geo-partitioning coming up in 1.2 in which you can pin rows / ranges to specific datacenters. Supporting eventual consistency on writes is almost an existential question for us. We've erred towards trying to find ways to make writes consistent, but faster. If we allow eventual consistency, you would end up having many of the issues that other databases have. It's a catch-22! Have you considered trying to use zone configs at a greater granularity? For example, if you don't care as much about resiliency, you could theoretically have a Singapore table where all three replicas live in Singapore, and you would be able to serve faster reads and writes for Singapore data to Singapore users. Not really the best production setup, but a potentially interesting thought experiment. This should get better when we build out row-level partitioning, which will allow this configuration at the row level. The cost efficiency metric you provided is super helpful - really useful to see that stark contrast. We should work on making that better... |
We also need Germans to read data about Singaporean users (from German server, but not necessarily in real time). A weak write in Singapore needs to be immediately readable in Singapore. It's ok if it takes 10 seconds to be readable in Germany (for us). In other words - we expect to see a lot of global dirty reads in between datacenters (writes may take a bit of delay to sync), but want to be able to configure immediate sync / no dirty reads within a locality. We'd also need to retain option to use the current write mechanism when we need a strong global write. Pack that onto a 1gb machine reliably and we are golden. Approach 1 - integrated queue as an option Approach 2 - locality has authority over rows it created - configurable option |
Fixed as part of #17971 |
@tomholub I just read through this issue again since it came through my email with the most recent fix. Apologies for the lack of follow up. Thank you for the feedback and thoughts - you should definitely consider playing around with our next new feature (geo-partitioning) coming up in our new release. It won't let you do those eventual consistency writes, but it will allow you to have much better control over where your rows live. |
Thanks for the update |
My hope and expectation was that the data would replicate to the fourth node, and maybe it would even start transparently helping the other node in the same datacenter (Elasticsearch style)
Instead, the moment the node started, throughput went down the drain and stayed there, until the node that joined died a few minutes later:
All nodes running 1.1-beta.20170928
Eventually, after a few similar cycles, I was able to get fourth node be a part of the cluster without affecting the neighbor, but it will not accept sql connections, alternate between healthy and suspect, and die within 10 minutes of starting.
I'm working with 3 tables, < 10 rows and maybe 10kb data. Also tried to let the cluster rest - but fourth node will not become functional.
These are the logs from node 4 colocated with the NY node (different machine on same datacenter):
Sorry for the big dump, hard to say which logs are important.
The text was updated successfully, but these errors were encountered: