Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption INSERTing Decimal type #1293

Open
7 tasks done
flarco opened this issue May 8, 2024 · 8 comments
Open
7 tasks done

High memory consumption INSERTing Decimal type #1293

flarco opened this issue May 8, 2024 · 8 comments

Comments

@flarco
Copy link

flarco commented May 8, 2024

Observed

Hi, I am seeing a big memory usage from the resultFromStatement after making a INSERT query.
I believe the leak is happening here:

case *proto.ColDecimal128:

See graph below.
This happens when I insert millions of rows with Decimal & Int64 values.
The decimal values are made using github.com/shopspring/decimal, with decimal.NewFromString.
I am actually not using any of the result since I am making an INSERT. Not sure why it's appending from result and taking so much memory.

Here is the pprof output:

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/fritz/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Type: inuse_space
Time: May 8, 2024 at 9:45am (-03)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 472.25MB, 97.42% of 484.75MB total
Dropped 52 nodes (cum <= 2.42MB)
Showing top 20 nodes out of 28
      flat  flat%   sum%        cum   cum%
  296.42MB 61.15% 61.15%   296.42MB 61.15%  github.com/ClickHouse/ch-go/proto.(*ColDecimal128).Append (inline)
  138.53MB 28.58% 89.73%   138.53MB 28.58%  github.com/ClickHouse/ch-go/proto.(*ColInt64).Append (inline)
   33.01MB  6.81% 96.54%    33.01MB  6.81%  github.com/ClickHouse/ch-go/proto.(*ColUInt8).Append (inline)

image

Expected behaviour

Should not take up much memory after an INSERT call.

Code example

See https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/database/database_clickhouse.go#L157

			insertStatement := conn.GenerateInsertStatement(
				table.FullName(),
				insFields,
				1,
			)

			stmt, err := conn.Prepare(insertStatement)
			if err != nil {
				g.Trace("%s: %#v", table, columns.Names())
				return g.Error(err, "could not prepare statement")
			}

			decimalCols := []int{}
			intCols := []int{}
			int64Cols := []int{}
			floatCols := []int{}
			for row := range batch.Rows {
				var eG g.ErrorGroup

				// set decimals correctly
				for _, colI := range decimalCols {
					if row[colI] != nil {
						val, err := decimal.NewFromString(cast.ToString(row[colI]))
						if err == nil {
							row[colI] = val
						}
						eG.Capture(err)
					}
				}

				// set Int32 correctly
				for _, colI := range intCols {
					if row[colI] != nil {
						row[colI], err = cast.ToIntE(row[colI])
						eG.Capture(err)
					}
				}

				// set Int64 correctly
				for _, colI := range int64Cols {
					if row[colI] != nil {
						row[colI], err = cast.ToInt64E(row[colI])
						eG.Capture(err)
					}
				}

				// set Float64 correctly
				for _, colI := range floatCols {
					if row[colI] != nil {
						row[colI], err = cast.ToFloat64E(row[colI])
						eG.Capture(err)
					}
				}

				if err = eG.Err(); err != nil {
					err = g.Error(err, "could not convert value for COPY into table %s", tableFName)
					ds.Context.CaptureErr(err)
					return err
				}

				count++
				// Do insert
				ds.Context.Lock()
				_, err := stmt.Exec(row...)
				ds.Context.Unlock()
				if err != nil {
					ds.Context.CaptureErr(g.Error(err, "could not COPY into table %s", tableFName))
					g.Trace("error for row: %#v", row)
					return g.Error(err, "could not execute statement")
				}
			}

Environment

  • clickhouse-go version: v2.24.0
  • Interface: database/sql
  • Go version: 1.21.1
  • Operating system: mac
  • ClickHouse version: yandex/clickhouse-server:21.3
  • Is it a ClickHouse Cloud? No
  • CREATE TABLE statements for tables involved:
create table `default`.`tpcds_store_sales_tmp` (`ss_sold_date_sk` Nullable(Int64),
`ss_sold_time_sk` Nullable(Int64),
`ss_item_sk` Nullable(Int64),
`ss_customer_sk` Nullable(Int64),
`ss_cdemo_sk` Nullable(Int64),
`ss_hdemo_sk` Nullable(Int64),
`ss_addr_sk` Nullable(Int64),
`ss_store_sk` Nullable(Int64),
`ss_promo_sk` Nullable(Int64),
`ss_ticket_number` Nullable(Int64),
`ss_quantity` Nullable(Int64),
`ss_wholesale_cost` Nullable(Decimal(24,6)),
`ss_list_price` Nullable(Decimal(24,6)),
`ss_sales_price` Nullable(Decimal(24,6)),
`ss_ext_discount_amt` Nullable(Decimal(24,6)),
`ss_ext_sales_price` Nullable(Decimal(24,6)),
`ss_ext_wholesale_cost` Nullable(Decimal(24,6)),
`ss_ext_list_price` Nullable(Decimal(24,6)),
`ss_ext_tax` Nullable(Decimal(24,6)),
`ss_coupon_amt` Nullable(Decimal(24,6)),
`ss_net_paid` Nullable(Decimal(24,6)),
`ss_net_paid_inc_tax` Nullable(Decimal(24,6)),
`ss_net_profit` Nullable(Decimal(24,6))) engine=MergeTree  ORDER BY tuple()
@flarco flarco changed the title Decimal Memory leak? Memory leak from INSERT? May 8, 2024
@jkaflik jkaflik self-assigned this May 9, 2024
@alisman
Copy link

alisman commented May 28, 2024

@jkaflik any timeframe on this?

@jkaflik
Copy link
Contributor

jkaflik commented May 31, 2024

@flarco, thanks for reporting this. There is definitely some unexpected behavior here. Could you clarify on:

I am actually not using any of the result since I am making an INSERT.

please?

@alisman initially assigned me to triage. Currently, we don't have the capacity to have a look shortly. Any contributions are welcome.

@flarco
Copy link
Author

flarco commented Jun 1, 2024

Limiting the number of rows inserted per transaction helps a lot with lowering the memory usage. So that works as a work-around.

The issue stands, though: if millions of rows are inserted in 1 transaction, the memory leakage crashes the process.

@gofort
Copy link

gofort commented Jul 22, 2024

Same thing for us. We are inserting around 1 mln records per 1 insert (we can't decrease it unfortunately, too much data) and memory consumption is enourmous.

@jkaflik jkaflik changed the title Memory leak from INSERT? High memory consumption using Decimal type Aug 21, 2024
@jkaflik jkaflik changed the title High memory consumption using Decimal type High memory consumption INSERTing Decimal type Aug 21, 2024
@jkaflik
Copy link
Contributor

jkaflik commented Aug 21, 2024

@gofort is your case also related to Decimal type? Could you provide a bit of details:

  • number of rows insert per batch
  • target data types
  • Go data types

@flarco @alisman could you confirm your request is strictly related to Decimal type? What is number of rows?

@vkazmirchuk
Copy link

@jkaflik I'll answer you instead of @gofort

We insert 1M records, each record has 52 columns.

Here is the structure we pass to method batch.AppendStruct(item)

type Item struct {
	FieldA       time.Time `ch:"field_a"`
	FieldB       time.Time `ch:"field_b"`
	FieldC       net.IP    `ch:"field_c"`
	FieldD       string    `ch:"field_d"`
	FieldE       string    `ch:"field_e"`
	FieldF       string    `ch:"field_f"`
	FieldG       string    `ch:"field_g"`
	FieldH       string    `ch:"field_h"`
	FieldI       uint16    `ch:"field_i"`
	FieldJ       int64     `ch:"field_j"`
	FieldK       string    `ch:"field_k"`
	FieldL       string    `ch:"field_l"`
	FieldM       int64     `ch:"field_m"`
	FieldN       string    `ch:"field_n"`
	FieldO       uint32    `ch:"field_o"`
	FieldP       string    `ch:"field_p"`
	FieldQ       []uint32  `ch:"field_q"`
	FieldR       []int64   `ch:"field_r"`
	FieldS       string    `ch:"field_s"`
	FieldT       []uint16  `ch:"field_t"`
	FieldU       []uint32  `ch:"field_u"`
	FieldV       []uint32  `ch:"field_v"`
	FieldW       int32     `ch:"field_w"`
	FieldX       int32     `ch:"field_x"`
	FieldY       string    `ch:"field_y"`
	FieldZ       net.IP    `ch:"field_z"`
	FieldAA      string    `ch:"field_aa"`
	FieldAB      string    `ch:"field_ab"`
	FieldAC      string    `ch:"field_ac"`
	FieldAD      uint32    `ch:"field_ad"`
	FieldAE      string    `ch:"field_ae"`
	FieldAF      string    `ch:"field_af"`
	FieldAG      string    `ch:"field_ag"`
	FieldAH      string    `ch:"field_ah"`
	FieldAI      string    `ch:"field_ai"`
	FieldAJ      string    `ch:"field_aj"`
	FieldAK      string    `ch:"field_ak"`
	FieldAL      string    `ch:"field_al"`
	FieldAM      string    `ch:"field_am"`
	FieldAN      string    `ch:"field_an"`
	FieldAO      uint8     `ch:"field_ao"`
	FieldAP      string    `ch:"field_ap"`
	FieldAQ      []net.IP  `ch:"field_aq"`
	FieldAR      uint64    `ch:"field_ar"`
	FieldAS      string    `ch:"field_as"`
	FieldAT      uint32    `ch:"field_at"`
	FieldAU      uint32    `ch:"field_au"`
	FieldAV      string    `ch:"field_av"`
	FieldAW      uint16    `ch:"field_aw"`
	FieldAX      uint16    `ch:"field_ax"`
	FieldAY      int8      `ch:"field_ay"`
	FieldAZ      string    `ch:"field_az"`
}

The table in the database contains the following columns:

`field_a` DateTime('UTC'),
`field_b` Date,
`field_c` IPv6,
`field_d` LowCardinality(String),
`field_e` String,
`field_f` String,
`field_g` LowCardinality(String),
`field_h` LowCardinality(String),
`field_i` UInt16,
`field_j` Int64,
`field_k` String,
`field_l` String,
`field_m` Int64,
`field_n` String,
`field_o` UInt32,
`field_p` LowCardinality(String),
`field_q` Array(UInt32),
`field_r` Array(Int64),
`field_s` String,
`field_t` Array(UInt16),
`field_u` Array(UInt32),
`field_v` Array(UInt32),
`field_w` Int32,
`field_x` Int32,
`field_y` LowCardinality(String),
`field_z` IPv6,
`field_aa` String,
`field_ab` LowCardinality(String),
`field_ac` LowCardinality(String),
`field_ad` Nullable(UInt32),
`field_ae` LowCardinality(String),
`field_af` LowCardinality(String),
`field_ag` String,
`field_ah` LowCardinality(String),
`field_ai` LowCardinality(String),
`field_aj` LowCardinality(String),
`field_ak` LowCardinality(String),
`field_al` LowCardinality(String),
`field_am` LowCardinality(String),
`field_an` LowCardinality(String),
`field_ao` UInt8,
`field_ap` LowCardinality(String),
`field_aq` Array(IPv6),
`field_ar` UInt64,
`field_as` LowCardinality(String),
`field_at` UInt32,
`field_au` UInt32,
`field_av` LowCardinality(String),
`field_aw` Nullable(UInt16),
`field_ax` Nullable(UInt32),
`field_ay` Int8,
`field_az` LowCardinality(String)

@jkaflik
Copy link
Contributor

jkaflik commented Aug 22, 2024

@vkazmirchuk what is memory usage you notice? Do you have pprof memory report? I strongly recommend to create another issue. I want this one to be strictly focused on Decimal type support as originally mentioned by issue creator.

@vkazmirchuk
Copy link

Thanks! I created issue #1384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants