-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster stops working after a couple of minutes #53609
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Pretty similar to this issue, where the cluster stays online just half a minute: |
Hi @luiscosio, mind sharing details about the machines these CRDB nodes are running on? Given there's 1.5 TB of data being stored in, I'm just checking to see that these boxes aren't overloaded. |
Hello @irfansharif, the servers have 64vCPUs and 64GB RAM. |
@nvanbenschoten, @aayushshah15: Is this the sort of thing that #51888 and #51894 would also help mitigate? What was CRDB's behavior on cold restarts when there are as many ranges as shown above? Is that now improved? |
That is only available at v20.2.0-alpha.3, right @irfansharif? What would recommend starting the cluster again? I see over #46660 (comment) a couple of configuration suggestions, but I do not find anywhere in the docs where I can configure |
Yes, we're hoping to include those patches in our upcoming 20.2 release, but if you're open to trying out the alpha in a test setup, I'd be curious to know what happens. As for setting the range size, in 20.1 we should already default to 512 MB. The link you were looking for is https://www.cockroachlabs.com/docs/stable/configure-zone.html#variables. |
We'll try the alpha release and let you know if this fixes the issue. Regarding setting |
I believe those are expected to be set as environment variables when running the cockroach binary. |
Yes, you are right. We set up those variables and now the cluster is back
online.
We will test with the alpha version as well and report the results.
On Mon 31 Aug 2020 at 13:35 irfan sharif ***@***.***> wrote:
I believe those are expected to be set as environment variables when
running the cockroach binary.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#53609 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAANATABXAA7HGX3O4SAYLSDPUQRANCNFSM4QOPTFIQ>
.
--
[image: twitter] <https://twitter.com/luiscosio>
[image: linkedin] <https://www.linkedin.com/in/luiscosio/>
[image: instagram] <https://www.instagram.com/luiscosio/>
Luis Cosio
Artificial intelligence and cloud computing
(+52) 1 33 2951 0674 <(+52)+1+33+2951+0674>
|
@irfansharif The cluster is working just fine, but since we updated it per this ticket to be able to bring it back online from a cold start, now we are experiencing this issue: |
Closing due to age. |
Describe the problem
After having a cluster running for a couple of minutes, the three nodes status go to suspect, and then the cluster stops working. Here is the error:
To Reproduce
Running on three different nodes:
docker run -d -p 26257:26257 -p 8085:8080 -v /var/server/cockroach-data:/cockroach/cockroach-data -v /var/server/certs:/etc/ssl/private --name cockroach cockroachdb/cockroach:v20.1.4 start --insecure --advertise-addr=10.67.42.158 --join=10.67.42.157,10.67.42.158,10.67.42.159 --max-sql-memory=1024MiB
Database size is 1.5TB
Environment:
cockroach sql
, JDBC, ...]The text was updated successfully, but these errors were encountered: