Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch inserts don't reliable work in 19.2 #43017

Closed
ricardocrdb opened this issue Dec 6, 2019 · 10 comments
Closed

Batch inserts don't reliable work in 19.2 #43017

ricardocrdb opened this issue Dec 6, 2019 · 10 comments
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community

Comments

@ricardocrdb
Copy link

User is running into a possible regression occurs in 19.2 where a transaction that involves batch UPSERTS of 256 rows is committing, but the results are not being shown when performing a count of the added rows. The user is performing an UPSERT that can be found at the location here. This also contains the DDL for the relevant tables for the query.

The customer has reported that the same transactions and batch UPSERT behave as expected in 19.1.3, as a COUNT shows the newly added rows.

@ricardocrdb ricardocrdb added C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community A-sql-execution Relating to SQL execution. labels Dec 6, 2019
@RoachietheSupportRoach
Copy link
Collaborator

Zendesk ticket #4218 has been linked to this issue.

@knz
Copy link
Contributor

knz commented Dec 6, 2019

cc @RaduBerinde @andreimatei for triage

@RaduBerinde
Copy link
Member

I don't think the planning or sql execution of upsert has changed materially between 19.1 - 19.2. I would start with KV. We will likely need to reproduce.

@bdarnell
Copy link
Contributor

bdarnell commented Dec 6, 2019

Summarizing the private file in gdrive:

The table has a composite PK with types (INT8, STRING). The INT8 portion of the PK is also a foreign key referencing another table, and has a secondary index (as required by the FK, even though it's redundant).

The statements are UPSERT RETURNING NOTHING with a 256-element VALUES clause. If I'm reading correctly, these are being run as two-phase transactions (with explicit commit run after the UPSERT statement returns). They are running seven such transactions at once, and report that all transactions succeed but a subsequent count query only includes the results of one or two batches (so a value of 256 or 512 instead of the expected 1792?)

Tagging @nvanbenschoten since parallel commits is the most likely suspect for a change between 19.1 and 19.2 (also RETURNING NOTHING was made a no-op in #35959, also for 19.2).

@nvanbenschoten
Copy link
Member

The 7 concurrent transactions are each writing 256-row batches. Are we sure that these batches are disjoint from one another or could they be overlapping on either their primary key or on refr_data_apr_drg_coding_sys_id_idx? If they were overlapping than the UPSERT statements would all succeed but we'd be left with fewer than 1792 rows. If they weren't overlapping then the fact that there were multiple transactions running concurrently shouldn't matter.

Could we get the query plan for the COUNT statement?

I tried reproducing but didn't have any luck. @ricardocrdb could you try to get more information about what exactly was generating the batch inserts. A script that reproduces would get us a very long way.

@nvanbenschoten
Copy link
Member

If we suspect that parallel commits is responsible, we can have them try disabling it and seeing if they can still reproduce. This would help narrow down what's going wrong.

SET CLUSTER SETTING kv.transaction.parallel_commits_enabled = false;

@nvanbenschoten
Copy link
Member

FWIW, I'd be pretty surprised if that changes anything because these transactions shouldn't even be using parallel commits. If they're only running a single KV batch which is touching more than 128 keys (the default for kv.transaction.write_pipelining_max_batch_size) then no writes will be pipelined. That means that we'll never even enter the STAGING transaction state when we go to commit. Instead, we'll go straight to COMMITTED.

@nvanbenschoten
Copy link
Member

If they were overlapping than the UPSERT statements would all succeed but we'd be left with fewer than 1792 rows.

@andy-kimball made a good point that we could easily test this by changing the UPSERT statements into INSERT statements. If any INSERTS collide, an error will be thrown.

@nvanbenschoten
Copy link
Member

@ricardocrdb and I met with the original reporter of this issue today. After some back and forth, we were able to narrow the issue down to a bug in the vectorized execution engine's handling of ordered, limited scans. The customer was maintaining a partial state of a table in their client application and it was getting confused by a buggy SELECT that was causing them to DELETE one set of rows but UPSERT a separate set of rows (instead of the same set of rows). So when they checked the table, they were losing rows. With the vectorized execution engine disabled (SET CLUSTER SETTING sql.defaults.vectorize = 0;), the issue went away. This explains why the issue only began happening after an upgrade to v19.2.

The customer seems to be ok with leaving the vectorized execution engine disabled until the issue is resolved.

@ricardocrdb you mentioned that you were going to open a new issue to track the scan bug and attach the data set we were able to gather. Have you gotten a chance to do so? I'm going to go ahead and close this issue once we have the new one.

@jordanlewis
Copy link
Member

Thanks for your investigation. Looks like this boils down to #42816, which was fixed in #42843 and will be present in 19.2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community
Projects
None yet
Development

No branches or pull requests

7 participants