-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: allow worker node failure #25480
Comments
I investigated further and one reason for this is cluster is getting too busy during building indexes. This particular schema is having 15 fields and 15 indexes. I suppose indexes are too many. Could you please confirm? |
Indexes do contribute to the time, yes. You could try without them and backfill them later. |
This is great idea. Is it possible to build an index from a field in JSONB? I know about inverted indexes but my JSONB has many fields and I want only some of the indexed. |
Yes you can create an index over a field in a JSONB value: 1) create a computed column that extracts the value you want in the JSONB field 2) create an index over that computed column. After that to use the index you'll need to write the WHERE constraint over the computed column (not the original JSON field) but at least the index will be fully automated this way. |
Currently it is not possible to build computed column after IMPORT |
Regenerated CSV again. Now I try to IMPORT without creating indexes.
|
You can add computed columns after IMPORT in the 2.1 alpha release. We may add them to IMPORT in the future too, but it's complicated and it's not a priority yet. Ok those error messages are interesting. I'll look into this on Monday. |
After importing 7 hours got the following message in command line: From other side in the web UI I still see this Job running and now after a while it show now Failed. Th csv file is 150 mil records/200GB and CRDB cannot handle it. n1 logs -W180520 21:19:25.282063 213 storage/node_liveness.go:501 [n1,hb] slow heartbeat took 1.1s. .sst added periodically in the CRDB folder |
Reason for stop: So after a day of IMPORT it stops because of just one line. I opened it to see where is this character and cannot see where is it. I suppose it is some Unicode character. Are there a way to filter it out during import? Do you have a code for a Golang web server which can do what Caddy is doing so I too add some regex filtering. This is important because I have one database which is 400 mil records and this can happen any time when I import. If I can filter out special characters during import I could prevent that or to have a separate step to generate a safe csv file first filtering it out. it is very time consuming. I made comparison IMPORT test with 100 mil records:
This is 24 times difference and something is very wrong with CRDB. With this long time the chance to be interrupted by communication issue is big. I think the way IMPORT should work is to make possible all records are inside the node doing importing and then beginning replication. Also there is one more consideration - imagine I have 500 mil records / 500 GB initial master table which I need to update daily once master table is created. There are several issues:
I could live with the fact it will take so long if it didn't stop. Daily updates with UPSERT would be fine. Now with CRDB I need at least 3 servers and the way I see it probably I will need 5 to be sure cluster is not down for some reason. I need to watch and restart CRDB server manually because the dashboard not always shows some server needs restart. So even with 3 servers there is no guarantee cluster will be not down. If I UPSERT 10 mil records / day probably will take several hours - I am not sure how this will affect the users using it live - how slow it will be? 3 servers is not cheap to invest in and still not get a stable cluster. With RocksDB and 2 servers the only inconvenience is switching live servers which I managed to automate. Yes there is no SQL but at least it is working. Is having SQL too big price to pat to have it? Now - there is more. Currently I have servers in datacenter at West Cost if I want to add servers at East Cost = here what will happen:
So I don't know what is better - the system to work albeit little primitive at first look, or to fight with CRDB? I can live with the fact it takes 2 weeks to IMPORT master DB and then several hours daily if it works. But assuming it works for our West Coast datacenter how to expand to East Coast? First it will take forever just to create the master DB not talking about the bandwidth cost. There should be solution to attach ready to use Master DB to CRDB from USB drive and then with UPSERT to do daily updates from West Coast to East Coast. The time to spend and the bandwidth to pay would be tolerable if our users have daily updates and there would be no switch between servers. If not CRDB would be not practical for real production use with hundred of GBs of data. |
Regardin orchestration. If CRDB is stable there is not much need of orchestration. But it is not stable. I see the dashboard not always represents what logs show. To be sure before IMPORT now I check every node's logs and usually I need to restart at least one before it. With RocksDB currently we are using 2 separate k8s as single master/node on 2 physical servers. k8s+ElasticSearch+Kibana+Sentinel for Alerts is what is needed for guaranteeing uptime, monitoring and alerts. On each server 2 PODs with web service pointing to a folder handled as hostPath guarantee that is one POD is down k8s will bring it up within minutes. hostPath with the unclear statements what it will not behave on k8s cluster made us to use 1 physical server for k8s master/node. It is fine to have two k8s on two physical servers separately. The benefits of k8s are still there. Now with CRDB - it needs at least 3 physical servers to work at all. This is fine if justified. Using k8s with 3 servers and persistent storage is a problem. hostPath prevents use of k8s effectively - the explanation is very vague. DaemonSet - prevents k8s to recover from failures. The other solution Stateful is still in beta - not production ready. So we cannot use k8s as separate master/node installations on each physical server as before. But the storage story of k8s with multiple servers is not ready yet for production (we waited a year for that but still not ready). dc/os needs at least 5 servers. They need to be configured as planned and not possible to change this later. masters/nodes in one server is not possible. You cannot begin small and test in real situation before investing too much. CRDB app in the dc/os catalog is not recommended for production yet. So orchestration for CRDB is not ready for production too. |
Hi @thstart thank you for the quality feedback. I do not know much about orchestration myself but I can provide a general response to the other points. There are many specific questions in your comments which we can answer one by one. Before I go there I would like to highlight that there are other (production) CockroachDB users who are able to sustain similar read/write workloads as yours against multiple data centers in different regions (including west + east US) without much concern, so I am confident that the hurdles you see can be overcome. First, as to the number of CockroachDB nodes vs. performance. As you know the number 3 is a minimum to create a consensus group for any range. Here consider that any write transaction will need a response from a majority (2 in this case) before it is acknowledged. If you use 3 nodes with 2 on one side and 1 on the other side of a country, any transaction issued from the 1-node side will span the entire country, which is not a happy case for performance. Of course this is true of any strongly consistent system, not just CockroachDB. Then also if you only use 3 nodes in that way, and there is a whole datacenter outage on the 2-node region, then the entire cluster becomes unavailable. This is not ideal either. This is why we recommend users to use groups of 3 nodes in each datacenter, then configure replication zones to ensures that the data specific to each region is co-located in that region. This increases the performance within each region, and ensures that the data of that region is still available when another DC goes down. (Then perhaps you'll want to use 3 datacenters not 2, so that there's still a majority of nodes for the meta-information common to all the cluster in case 1 DC goes down.) Then regarding the performance of IMPORT and subsequent data updates As you have found out, CockroachDB's IMPORT is slower than PostgreSQL's. A part of this is because the data is replicated across multiple nodes upfront. Another part of this is that we have not yet fully optimized IMPORT so there is still room for improvement (after all, CockroachDB is still younger than PostgreSQL). But beyond that, as you pointed out the initial import of your master database should be a one-time operation. This is a one-time cost which you can amortize. Once this import is done, then updating new records each day should not impact performance on the existing data (if it does, we'll certainly want to look into this). You should probably spread out the updates over time though (instead of issuing all in one go), because you want to avoid transaction contention. Again these are general SQL best practices and not specific to CockroachDB. Regarding the error during IMPORT It is indeed very inconvenient that the entire IMPORT job was aborted just because of a single data error. There are multiple things we could do here. One could be to park the CSV lines containing errors in a separate file for manual inspection after IMPORT completes. Another could be to suspend the job and give you the opportunity to fix the file. Then as you propose perhaps you could also decide to ignore the errors entirely with an IMPORT option. I agree that currently the situation is not appropriate, so I filed the following issue for you: #25751 Regarding the overall performance of CockroachDB I recall on Gitter you referred to your setup where you were using Apple Mac Minis to run CockroachDB. Although this hardware enables CockroachDB to run it is certainly not appropriate to make CockroachDB run fast. If this is the hardware you are using to run your master IMPORT job, then I am not surprised that you observe relatively poor performance. There are many ways to select hardware for server software, and we can certainly make recommendations as to which server type to select if you use one of the public clouds. In general we recommend a minimum of 6-8 cores per CockroachDB node and 4GB of RAM, and depending on your workload perhaps more. Again as I wrote earlier I am not an orchestration expert so I will refrain from commenting on your k8s/dcos issues. However meanwhile I'd like to point out that other users have had some success before. So perhaps if we identify the issues you're running into more specifically, it is likely we can help alleviate the problems you see. |
We're tracking the error message problem in #25532. |
I'm adding 3 more Dell PowerEdge servers to address hardware performance. The only things for these issues that matter is the CRDB:
|
I made another attempt to IMPORT. Again unsuccessful with message:"Error : pq: Context deadline exceeded". Now I have 6 servers and I hoped with more servers CRDB will find a way to finish. |
We're tracking that bug in #25866. |
all but one node connected, only one node got following errors when trying to connect to from others So I suppose with 6 CRDB servers it will be able to finish when you fix it? |
26811: kv: update the TCS's txn on requests way out r=andreimatei a=andreimatei The TxnCoordSender maintains a copy of the transaction record, used for things like heartbeating and creating new transactions after a TransactionAbortedError. This copy is supposed to be kept in sync with the client.Txn's copy. Before this patch, the syncing was done by updating the TCS's txn when a response comes back from a request. This patch moves to updating the TCS's txn on a request's way out, in addition to continuing to update it when a request comes back. Besides being the sane thing to do™, this assures that, if the heartbeat loop triggers before the response to the BeginTransaction's batch comes back, the transaction already has the key set. Without this patch, if the heartbeat loop triggered before the BeginTxn response, it would heartbeat key /Min, which is non-sensical (and creating load on range 0 for TPCC loadtests). Release note: None 26881: importccl: restart IMPORT on worker node failure r=mjibson a=mjibson Attempt to detect a context canceled error in IMPORT which is caused by a node going away in the dist SQL run. Send a special error back to the job registry indicating a restart should happen instead of a failure. We are shipping this with a skipped test because it is flakey. We are ok doing that because it is still better than what we had before in many cases, just not all. We will work to improve the other things so that we can correctly detect when IMPORT can be restarted due to a node outage, which will allow us to unskip this test. Fixes #25866 Fixes #25480 Release note (bug fix): IMPORT now detects node failure and will restart instead of fail. 26968: settings: bump minimum supported version to v2.0 r=nvanbenschoten a=nvanbenschoten We're currently shipping v2.1 alphas, so enforce a minimum binary version of v2.0. This ensures that no one can upgrade directly from v1.1 to v2.1. Instead, they need to make a pit sop in v2.0. Release note: None 26984: storageccl: retry SST chunks with new splits on err r=dt a=dt Simpler alternative to #26930. Closes #26930. Previously an ImportRequest would fail to add SSTables that spanned the boundries of the target range(s). This reattempts the AddSSTable call with re-chunked SSTables that avoid spanning the bounds returned in range mismatch error. It does this by iterating the SSTable to build and add smaller sstables for either side of the split. This error currently happens rarely in practice -- we usually explicitly split ranges immediately before sending an Import with matching boundsto them. Usually the empty, just-split range has no reason to split again, so the Import usually succeeds. However in some cases, like resuming a prior RESTORE, we may be re-Importing into ranges that are *not* empty and could have split at points other than those picked by the RESTORE statement. Fixes #17819. Subsumes #24299. Closes #24299. Release note: none. Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: David Taylor <[email protected]>
Was this intentionally closed?
On Tue, Jun 26, 2018, 7:46 PM craig[bot] ***@***.***> wrote:
Closed #25480 <#25480> via
#26881 <#26881>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#25480 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE135FoxFcftZxU01iH_tCyT1m-gs5Y_ks5uAnN_gaJpZM4T-gua>
.
--
…-- Tobias
|
Yes, this is correct. |
During IMPORT the job coordinator can go down and the job will be restarted. However if another node goes down I think the job will be marked as failed. In this case we should be aware that this is a transient error and restart the job so it makes a new plan with healthy nodes.
The text was updated successfully, but these errors were encountered: