Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Insert throughput drops by ~3x with a "GENERATED ALWAYS AS IDENTITY" column in the table schema. #22837

Closed
1 task done
sonalsagarwal opened this issue Jun 12, 2024 · 5 comments
Assignees
Labels

Comments

@sonalsagarwal
Copy link

sonalsagarwal commented Jun 12, 2024

Jira Link: DB-11735

Description

Steps to reproduce:

CREATE TABLE test_table (id int8 GENERATED ALWAYS AS IDENTITY NOT NULL, k text primary key, v TEXT);
CREATE TABLE test_table (id int8 NOT NULL, k text primary key, v TEXT);

Inserted rows in the tables with 100 threads for 5 mins.

Results :

1 column with "GENERATED ALWAYS AS IDENTITY"  :  throughput => 20k and CPU Util => 80%
without "GENERATED ALWAYS AS IDENTITY"  : throughput => 57k and CPU Util => 70%

Cluster details : 3-node RF3 1AZ m6i.4xlarge
Please refer attached metrics screenshot for the 2 runs.
CPU
throughput

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@sonalsagarwal sonalsagarwal added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 12, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 12, 2024
@FranckPachot
Copy link
Contributor

The default cache is 100 (set by --ysql_sequence_cache_minval) so with this throughput it's still hitting the sequence 200 times per second. You can increase the cache:

CREATE TABLE test_table (id int8 GENERATED ALWAYS AS IDENTITY (cache 1000)  NOT NULL, k text primary key, v TEXT);

@sonalsagarwal
Copy link
Author

We tried increasing cache (1000 / 10000 / 1000000), but didn't see any difference in the latency or throughput.

@FranckPachot
Copy link
Contributor

@sonalsagarwal do you use prepared statement or is it planned each time? I've seen recently that during the planning phase it reads dependencies from the catalog (which is in yb-master so network call)

@FranckPachot
Copy link
Contributor

FranckPachot commented Jun 22, 2024

I confirm that there generated always as identity has bad performance compared to bigserial or sequence.
I've run 6 concurrent insert workloads on 2024.1:

  • bigint generated always as identity, simple and prepared
  • bigserial, simple and prepared
  • bigint and inserting with nextval from a sequence, simple and prepared
    And gathered wait event sampling with the new ASH
    YSQL:
    image
    TServer:
    image

With bigint generated always as identity a lot of time is spent on TServer "Raft_WaitingForReplication" and YSQL waits on it ("StorageFlush"). Non-prepared has some waits on "CatalogRead"

The sequence spends most of its time in YSQL "QueryProcessing" and very little in TServer.

This is what I've run for the workload above:

drop sequence s;
drop table if exists demo;
create table demo ( id bigint generated always as identity primary key, value int) ;
\! echo "insert into demo(value)     values(0);              " | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(value)     values(0);              " | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120
drop table if exists demo;
create table demo ( id bigserial                           primary key, value int) ;
\! echo "insert into demo(value)     values(0);              " | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(value)     values(0);              " | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120
drop table if exists demo;
create sequence s;
create table demo ( id bigint                              primary key, value int) ;
\! echo "insert into demo(id, value) values (0,nextval('s'));" | pgbench -M simple   -c 30 -nf /dev/stdin -t 10000
\! sleep 30
\! echo "insert into demo(id, value) values (0,nextval('s'));" | pgbench -M prepared -c 30 -nf /dev/stdin -t 10000
\! sleep 120

@FranckPachot
Copy link
Contributor

Apparently, the difference is that, without secondary indexes, bigserial uses fast-path but generated is transactional:

Write operation with generated:

yugabyte=# 2024-06-22 21:50:28.337 UTC [386] LOG:  statement: execute ins;
2024-06-22 21:50:28.337 UTC [386] DETAIL:  prepare: prepare ins as insert into demo(value) values(0);
I0622 21:50:28.341383   386 pg_session.cc:281] Session id 2: Buffering operation: ...
I0622 21:50:28.342567   386 pg_session.cc:653] Session id 2: Flushing buffered operations, using transactional session (num ops: 1)
I0622 21:50:28.345868   386 pg_client.cc:487] Session id 2: Committing transaction

Write operation with bigserial:

yugabyte=# 2024-06-22 21:51:11.206 UTC [386] LOG:  statement: execute ins;
2024-06-22 21:51:11.206 UTC [386] DETAIL:  prepare: prepare ins as insert into demo(value) values(0);
I0622 21:51:11.206781   386 pg_session.cc:318] Session id 2: Applying operation: ...
I0622 21:51:11.206837   386 pg_session.cc:345] Session id 2: Flushing collected operations, using session type: kRegular num ops: 1

andrei-mart added a commit that referenced this issue Jun 28, 2024
Summary:
There are differences in implementation of autoincrement columns defined
as "serial" and "generated as identity". Serial columns have defaults,
defined as nextval('<sequence name'). As a builtin function nextval is
considered safe to run in a single shard transactions.

Identity columns are implemented by special expression node
NextValueExpr, which was on the list of nodes preventing single shard
expressions. In fact, NextValueExpr is safe, so removing it from the
list.
Jira: DB-11735

Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction'

Reviewers: tnayak

Reviewed By: tnayak

Subscribers: tnayak, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36248
jasonyb pushed a commit that referenced this issue Jul 2, 2024
Summary:
 25d427a [PLAT-13975] Support for non-restart gflags upgrade in Kubernetes universes
 26cd461 [PLAT-14165] Health check to report on time drift.
 9b83b80 [#23036] docdb: Switch CloneStateInfo object from scoped_refptr to std::shared_ptr
 e5399d9 [PLAT-14511] Improved yba installer migrations to allow some versions to not exist
 0db5d16 [14279] YSQL: Add basic flag validation for ysql_pg_conf_csv, ysql_hba_conf_csv and ysql_ident_conf_csv
 841fbdb [#21829] docdb: Fix scan that does not honor timeouts in certain scenarios, to not run into runaway scan scenarios.
 f20048b [#22840] DocDB: Fix DFATAL in WaitQueue::Impl::HandleWaiterStatusRpcResponse
 da6cbd5 [DOC-396] Set up SSO using Jumpcloud (#22861)
 d0e93c9 [PLAT-14104] Network configuration module for YNP
 63cceb3 [#22837] YSQL: Fix fast path insert into tables with identity columns
 5c7715f [#23057] YSQL: Fix missing newlines and pfree
 b3bc338 [doc][yba] DR clarifications (#23050)
 f897398 [#21500] YSQL: Skip schema version mismatch tests with Connection Manager enabled
 e77b95b [PLAT-14504] - feat : db audit log ui improvements
 7e289b5 [#21859] ASH: Make ASH uuid data human readable when viewed through /rpcz
 23acae3 [#22957] YSQL: Make connection sticky for GUC variables that cannot be SET in an explicit transaction with YSQL Connection Manager
 cfa6300 [#22556] docdb: Speed up backward scans for flat doc reader for packed row V2
 a446f27 [PLAT-14530] - fix : Edit Azure Provider is failing

"I worked around it by removing the two PNG files from the revision." - Steve Varnau

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36285
andrei-mart added a commit that referenced this issue Jul 2, 2024
…dentity columns

Summary:
Original commit: 63cceb3 / D36248
There are differences in implementation of autoincrement columns defined
as "serial" and "generated as identity". Serial columns have defaults,
defined as nextval('<sequence name'). As a builtin function nextval is
considered safe to run in a single shard transactions.

Identity columns are implemented by special expression node
NextValueExpr, which was on the list of nodes preventing single shard
expressions. In fact, NextValueExpr is safe, so removing it from the
list.
Jira: DB-11735

Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction'

Reviewers: tnayak

Reviewed By: tnayak

Subscribers: yql, tnayak, smishra

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36302
andrei-mart added a commit that referenced this issue Jul 2, 2024
… identity columns

Summary:
Original commit: 63cceb3 / D36248
There are differences in implementation of autoincrement columns defined
as "serial" and "generated as identity". Serial columns have defaults,
defined as nextval('<sequence name'). As a builtin function nextval is
considered safe to run in a single shard transactions.

Identity columns are implemented by special expression node
NextValueExpr, which was on the list of nodes preventing single shard
expressions. In fact, NextValueExpr is safe, so removing it from the
list.
Jira: DB-11735

Test Plan: ./yb-build.sh --java-test 'org.yb.pgsql.TestPgTransactions#testIdentityNoTransaction'

Reviewers: tnayak

Reviewed By: tnayak

Subscribers: yql, tnayak, smishra

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants