[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

sonalsagarwal · 2024-06-12T16:39:44Z

Description

Steps to reproduce:

CREATE TABLE test_table (id int8 GENERATED ALWAYS AS IDENTITY NOT NULL, k text primary key, v TEXT);
CREATE TABLE test_table (id int8 NOT NULL, k text primary key, v TEXT);

Inserted rows in the tables with 100 threads for 5 mins.

Results :

1 column with "GENERATED ALWAYS AS IDENTITY"  :  throughput => 20k and CPU Util => 80%
without "GENERATED ALWAYS AS IDENTITY"  : throughput => 57k and CPU Util => 70%

Cluster details : 3-node RF3 1AZ m6i.4xlarge
Please refer attached metrics screenshot for the 2 runs.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

FranckPachot · 2024-06-12T21:24:55Z

The default cache is 100 (set by --ysql_sequence_cache_minval) so with this throughput it's still hitting the sequence 200 times per second. You can increase the cache:

CREATE TABLE test_table (id int8 GENERATED ALWAYS AS IDENTITY (cache 1000)  NOT NULL, k text primary key, v TEXT);

sonalsagarwal · 2024-06-18T16:22:28Z

We tried increasing cache (1000 / 10000 / 1000000), but didn't see any difference in the latency or throughput.

FranckPachot · 2024-06-22T12:49:58Z

@sonalsagarwal do you use prepared statement or is it planned each time? I've seen recently that during the planning phase it reads dependencies from the catalog (which is in yb-master so network call)

FranckPachot · 2024-06-22T21:27:06Z

I confirm that there generated always as identity has bad performance compared to bigserial or sequence.
I've run 6 concurrent insert workloads on 2024.1:

bigint generated always as identity, simple and prepared
bigserial, simple and prepared
bigint and inserting with nextval from a sequence, simple and prepared
And gathered wait event sampling with the new ASH
YSQL:

TServer:

With bigint generated always as identity a lot of time is spent on TServer "Raft_WaitingForReplication" and YSQL waits on it ("StorageFlush"). Non-prepared has some waits on "CatalogRead"

The sequence spends most of its time in YSQL "QueryProcessing" and very little in TServer.

This is what I've run for the workload above:

drop sequence s;
drop table if exists demo;
create table demo ( id bigint generated always as identity primary key, value int) ;
\! echo "insert into demo(value)     values(0);              " | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(value)     values(0);              " | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120
drop table if exists demo;
create table demo ( id bigserial                           primary key, value int) ;
\! echo "insert into demo(value)     values(0);              " | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(value)     values(0);              " | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120
drop table if exists demo;
create sequence s;
create table demo ( id bigint                              primary key, value int) ;
\! echo "insert into demo(id, value) values (0,nextval('s'));" | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(id, value) values (0,nextval('s'));" | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120

FranckPachot · 2024-06-22T22:10:16Z

Apparently, the difference is that, without secondary indexes, bigserial uses fast-path but generated is transactional:

Write operation with generated:

yugabyte=# 2024-06-22 21:50:28.337 UTC [386] LOG:  statement: execute ins;
2024-06-22 21:50:28.337 UTC [386] DETAIL:  prepare: prepare ins as insert into demo(value) values(0);
I0622 21:50:28.341383   386 pg_session.cc:281] Session id 2: Buffering operation: ...
I0622 21:50:28.342567   386 pg_session.cc:653] Session id 2: Flushing buffered operations, using transactional session (num ops: 1)
I0622 21:50:28.345868   386 pg_client.cc:487] Session id 2: Committing transaction

Write operation with bigserial:

yugabyte=# 2024-06-22 21:51:11.206 UTC [386] LOG:  statement: execute ins;
2024-06-22 21:51:11.206 UTC [386] DETAIL:  prepare: prepare ins as insert into demo(value) values(0);
I0622 21:51:11.206781   386 pg_session.cc:318] Session id 2: Applying operation: ...
I0622 21:51:11.206837   386 pg_session.cc:345] Session id 2: Flushing collected operations, using session type: kRegular num ops: 1

Summary: There are differences in implementation of autoincrement columns defined as "serial" and "generated as identity". Serial columns have defaults, defined as nextval('<sequence name'). As a builtin function nextval is considered safe to run in a single shard transactions. Identity columns are implemented by special expression node NextValueExpr, which was on the list of nodes preventing single shard expressions. In fact, NextValueExpr is safe, so removing it from the list. Jira: DB-11735 Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction' Reviewers: tnayak Reviewed By: tnayak Subscribers: tnayak, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36248

Summary: 25d427a [PLAT-13975] Support for non-restart gflags upgrade in Kubernetes universes 26cd461 [PLAT-14165] Health check to report on time drift. 9b83b80 [#23036] docdb: Switch CloneStateInfo object from scoped_refptr to std::shared_ptr e5399d9 [PLAT-14511] Improved yba installer migrations to allow some versions to not exist 0db5d16 [14279] YSQL: Add basic flag validation for ysql_pg_conf_csv, ysql_hba_conf_csv and ysql_ident_conf_csv 841fbdb [#21829] docdb: Fix scan that does not honor timeouts in certain scenarios, to not run into runaway scan scenarios. f20048b [#22840] DocDB: Fix DFATAL in WaitQueue::Impl::HandleWaiterStatusRpcResponse da6cbd5 [DOC-396] Set up SSO using Jumpcloud (#22861) d0e93c9 [PLAT-14104] Network configuration module for YNP 63cceb3 [#22837] YSQL: Fix fast path insert into tables with identity columns 5c7715f [#23057] YSQL: Fix missing newlines and pfree b3bc338 [doc][yba] DR clarifications (#23050) f897398 [#21500] YSQL: Skip schema version mismatch tests with Connection Manager enabled e77b95b [PLAT-14504] - feat : db audit log ui improvements 7e289b5 [#21859] ASH: Make ASH uuid data human readable when viewed through /rpcz 23acae3 [#22957] YSQL: Make connection sticky for GUC variables that cannot be SET in an explicit transaction with YSQL Connection Manager cfa6300 [#22556] docdb: Speed up backward scans for flat doc reader for packed row V2 a446f27 [PLAT-14530] - fix : Edit Azure Provider is failing "I worked around it by removing the two PNG files from the revision." - Steve Varnau Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36285

…dentity columns Summary: Original commit: 63cceb3 / D36248 There are differences in implementation of autoincrement columns defined as "serial" and "generated as identity". Serial columns have defaults, defined as nextval('<sequence name'). As a builtin function nextval is considered safe to run in a single shard transactions. Identity columns are implemented by special expression node NextValueExpr, which was on the list of nodes preventing single shard expressions. In fact, NextValueExpr is safe, so removing it from the list. Jira: DB-11735 Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction' Reviewers: tnayak Reviewed By: tnayak Subscribers: yql, tnayak, smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36302

… identity columns Summary: Original commit: 63cceb3 / D36248 There are differences in implementation of autoincrement columns defined as "serial" and "generated as identity". Serial columns have defaults, defined as nextval('<sequence name'). As a builtin function nextval is considered safe to run in a single shard transactions. Identity columns are implemented by special expression node NextValueExpr, which was on the list of nodes preventing single shard expressions. In fact, NextValueExpr is safe, so removing it from the list. Jira: DB-11735 Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction' Reviewers: tnayak Reviewed By: tnayak Subscribers: yql, tnayak, smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36299

sonalsagarwal added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 12, 2024

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 12, 2024

m-iancu assigned andrei-mart Jun 21, 2024

andrei-mart closed this as completed Jun 28, 2024

sushantrmishra added 2.20 Backport Required 2024.1 Backport Required and removed status/awaiting-triage Issue awaiting triage labels Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

sonalsagarwal commented Jun 12, 2024 •

edited by jira bot

Loading

FranckPachot commented Jun 12, 2024

sonalsagarwal commented Jun 18, 2024

FranckPachot commented Jun 22, 2024

FranckPachot commented Jun 22, 2024 •

edited

Loading

FranckPachot commented Jun 22, 2024

[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

Comments

sonalsagarwal commented Jun 12, 2024 • edited by jira bot Loading

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

FranckPachot commented Jun 12, 2024

sonalsagarwal commented Jun 18, 2024

FranckPachot commented Jun 22, 2024

FranckPachot commented Jun 22, 2024 • edited Loading

FranckPachot commented Jun 22, 2024

sonalsagarwal commented Jun 12, 2024 •

edited by jira bot

Loading

FranckPachot commented Jun 22, 2024 •

edited

Loading