max-sql-memory not set #35054

codepitbull · 2019-02-19T15:58:58Z

Describe the problem

After porting some memory heavy queries from Postgres to Cockroach we see issues with exceeding the memory bduget. We had followed the production checklist and adjusted the values to the recommended 25%.

We still get:
ERROR: root: memory budget exceeded: 10240 bytes requested, 134210560 currently allocated, 134217728 bytes in budget

The value for "bytes in budget" still reflects 128 mb so it seems the values don't get passed trhough.

We checked cockroach.log and see:

I190218 13:32:22.083875 1 server/config.go:386  system total memory: 15 GiB
I190218 13:32:22.083901 1 server/config.go:388  server configuration:
max offset             500000000
cache size             3.8 GiB
SQL memory pool size   3.8 GiB
scan interval          10m0s
scan min idle time     10ms
scan max idle time     1s
event log enabled      true

So it looks like we set the values correctly.

To Reproduce
This is the commandline we use for starting cockroach:

/opt/cockroachdb/cockroach start --store=/mnt/data/cockroachdb --temp-dir=/tmp --port=26257 --http-port=7005 --log-dir=/mnt/logs/cockroachdb --cache=.25 --max-sql-memory=.25 --insecure

Expected behavior
We expected to see the queries use more than the 128 mb.

Environment:

CockroachDB version 2.1.4
Server OS: Ubuntu
Client app JDBC

The text was updated successfully, but these errors were encountered:

marcelbirkner · 2019-02-19T21:43:38Z

Here is an example of the failing query.

Table structure:

SHOW COLUMNS FROM rollup_minutes;
  column_name | data_type | is_nullable | column_default | generation_expression |                                                              indices                                                              | is_hidden
+-------------+-----------+-------------+----------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-----------+
  customer    | STRING    |    false    | NULL           |                       | {"rollup_minutes_pkey","rollup_minutes_customer_timestamp_name_key","rollup_minutes_customer_idx","rollup_minutes_timestamp_idx"} |   false
  timestamp   | INT8      |    false    | NULL           |                       | {"rollup_minutes_pkey","rollup_minutes_customer_timestamp_name_key","rollup_minutes_customer_idx","rollup_minutes_timestamp_idx"} |   false
  name        | STRING    |    false    | NULL           |                       | {"rollup_minutes_pkey","rollup_minutes_customer_timestamp_name_key","rollup_minutes_customer_idx","rollup_minutes_timestamp_idx"} |   false
  sims        | INT       |    true     | NULL           |                       | {}                                                                                                                                |   false

Query:

SELECT 
  floor(timestamp/3600000)*3600000 AS ts, name as sensorname, sum(sims) as sims 
FROM 
  rollup_minutes 
WHERE 
  customer = :customerId AND 
  timestamp >= :timeTillRollup 
GROUP BY 
  name, floor(timestamp/3600000)*3600000

marcelbirkner · 2019-02-19T21:53:04Z

Here is the EXPLAIN statement for the query above:

EXPLAIN SELECT floor(timestamp/3600000)*3600000 AS ts, name as sensorname, sum(sims) as sims FROM rollup_minutes WHERE customer = '-fOnY8rwTXOy0BJaXsexfQ' AND timestamp >= 1550376000000 GROUP BY name, floor(timestamp/3600000)*3600000;

         tree         |    field    |                                 description
+---------------------+-------------+-----------------------------------------------------------------------------+
  render              |             |
   └── group          |             |
        │             | aggregate 0 | name
        │             | aggregate 1 | column6
        │             | aggregate 2 | sum(sims)
        │             | group by    | @2,@1
        └── render    |             |
             └── scan |             |
                      | table       | rollup_minutes@rollup_minutes_pkey
                      | spans       | /"-fOnY8rwTXOy0BJaXsexfQ"/1550376000000-/"-fOnY8rwTXOy0BJaXsexfQ"/PrefixEnd
(10 rows)

Time: 1.39429ms

marcelbirkner · 2019-02-19T21:57:28Z

EXPLAIN(opt) result:

EXPLAIN(opt) SELECT floor(timestamp/3600000)*3600000 AS ts, name as sensorname, sum(sims) as sims FROM rollup_minutes WHERE customer = '-fOnY8rwTXOy0BJaXsexfQ' AND timestamp >= 1550376000000 GROUP BY name, floor(timestamp/3600000)*3600000;
                                                    text
+----------------------------------------------------------------------------------------------------------+
  group-by
   ├── columns: ts:6(decimal) sensorname:3(string!null) sims:5(decimal)
   ├── grouping columns: name:3(string!null) column6:6(decimal)
   ├── side-effects
   ├── stats: [rows=3.33333333, distinct(3,6)=3.33333333]
   ├── cost: 3.83333333
   ├── key: (3,6)
   ├── fd: (3,6)-->(5)
   ├── prune: (5)
   ├── project
   │    ├── columns: column6:6(decimal) name:3(string!null) sims:4(int)
   │    ├── side-effects
   │    ├── stats: [rows=3.33333333, distinct(3,6)=3.33333333]
   │    ├── cost: 3.66666667
   │    ├── prune: (3,4,6)
   │    ├── scan rollup_minutes
   │    │    ├── columns: customer:1(string!null) timestamp:2(int!null) name:3(string!null) sims:4(int)
   │    │    ├── constraint: /1/2/3: [/'-fOnY8rwTXOy0BJaXsexfQ'/1550376000000 - /'-fOnY8rwTXOy0BJaXsexfQ']
   │    │    ├── stats: [rows=3.33333333, distinct(1)=1, distinct(2,3)=3.33333333]
   │    │    ├── cost: 3.6
   │    │    ├── key: (2,3)
   │    │    ├── fd: ()-->(1), (2,3)-->(4)
   │    │    └── interesting orderings: (+1,+2,+3) (+2,+1,+3)
   │    └── projections [outer=(2-4), side-effects]
   │         └── mult [type=decimal, outer=(2), side-effects]
   │              ├── function: floor [type=decimal, outer=(2), side-effects]
   │              │    └── div [type=decimal, outer=(2), side-effects]
   │              │         ├── variable: timestamp [type=int, outer=(2)]
   │              │         └── const: 3600000 [type=int]
   │              └── const: 3600000 [type=decimal]
   └── aggregations [outer=(4)]
        └── sum [type=decimal, outer=(4)]
             └── variable: sims [type=int, outer=(4)]
(33 rows)

jordanlewis · 2019-02-19T21:59:43Z

Thanks for the report. I'm a bit perplexed because the line here:

SQL memory pool size   3.8 GiB

is exactly the line that reflects the size of the root memory budget. As you point out, the error message reflects 128MB, not 3.8GB.

Are you completely sure that the server that printed that error message out was the same one that you ran the query on? If you are running a cluster, all nodes need to be started with that flag - not just one.

marcelbirkner · 2019-02-19T22:00:13Z

Table size:

marcelbirkner · 2019-02-19T22:04:25Z

Hi @jordanlewis

I restarted the server on Monday with the 3,8GiB cache and SQL memory pool size. Thats also when we see 3,8GiB in the startup logs.

The query errors in the logs with the 128MiB bucket limit are from today. All applications using CockroachDB have been restarted as well. We are a bit puzzled as well.

We run a single node setup at the moment.

marcelbirkner · 2019-02-19T22:08:56Z

As you can see in the hardware dashboard memory usage increased from 1.1GiB to 5.6GiB after setting --cache=.25 --max-sql-memory=.25 on a 16GB AWS m5.xlarge.

jordanlewis · 2019-02-19T22:16:03Z

Can you please confirm that running the problematic query in ./cockroach sql still outputs the memory error?

marcelbirkner · 2019-02-19T22:25:10Z

Right now it works when I run the query via cockroach sql

(735 rows)

Time: 399.059261ms

We have about 500 Java clients using postgres driver, that calculate daily/hourly/minute rollups and we end up having spikes throughout the day when we see the "budget exceeded" error in the cockroachdb.log.

During the spikes we have a handful of the 500 Java clients fail 20 - 50 times (see screenshot below).

Database configs are 100% identical.

database:
  driverClass: "org.postgresql.Driver"
  abandonWhenPercentageFull: 0
  alternateUsernamesAllowed: false
  commitOnReturn: false
  rollbackOnReturn: false
  autoCommitByDefault: null
  readOnlyByDefault: null
  user: "XXXXXXXXXXXXXX"
  password: XXXXXXXXXXXX
  url: "jdbc:postgresql://cockroachdb-0-eu-west-1.XXXXXXXXXXXXX:26257/sales?sslmode=disable"
  properties:
    charSet: "UTF-8"
  defaultCatalog: null
  defaultTransactionIsolation: "DEFAULT"
  useFairQueue: true
  initialSize: 1
  minSize: 1
  maxSize: 3
  initializationQuery: null
  logAbandonedConnections: false
  logValidationErrors: true
  maxConnectionAge:
    present: false
  maxWaitForConnection: "5 seconds"
  minIdleTime: "1 minute"
  validationQuery: "/* MyService Health Check */ SELECT 1"
  validationQueryTimeout:
    present: true
  checkConnectionWhileIdle: true
  checkConnectionOnBorrow: true
  checkConnectionOnConnect: true
  checkConnectionOnReturn: false
  autoCommentsEnabled: true
  evictionInterval: "10 seconds"
  validationInterval: "30 seconds"
  validatorClassName:
    present: false
  removeAbandoned: false
  removeAbandonedTimeout: "60 seconds"
  jdbcInterceptors:
    present: false
  ignoreExceptionOnPreLoad: false
  defaultReadOnly: false

marcelbirkner · 2019-02-19T22:28:13Z

With our current setup we tested different memory settings to see the connection behaviour:

128 MiB: 1245 connections
1 GiB: 9980 connections
2 GiB: 19968 connections

So we should be fine with our 500 clients.

marcelbirkner · 2019-02-19T22:31:31Z

EXPLAIN ANALYZE results:

jordanlewis · 2019-02-19T22:48:09Z

I'm surprised this aggregate would be the one to cause the out of memory errors. Are there other concurrent queries? All queries share the root memory pool - perhaps there's one that's much heavier weight?

Is the error message you included at the top of the issue the actual captured error message from the application, after you changed the setting?

codepitbull · 2019-02-20T06:39:04Z

I checked again, there are multiple failing queries on top of the ones Marcel mentioned:
INSERT INTO rollup_minutes (customer, timestamp, name, sims) values (:customerId, :timestamp, :name, :sims on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims")
These are done in batches of size 1000.

The value they for the size of the pool report in the error message is always the same, the 128 MiB mentioned above.

org.quartz.core.JobRunShell Job Cashier.Minute Rollup Job threw an unhandled Exception: #12 ! org.jdbi.v3.core.statement.UnableToExecuteStatementException: java.sql.BatchUpdateException: Batch entry 0 /* MinuteRollupDao.addRollupDataAndIncrement / INSERT INTO rollup_minutes (customer, timestamp, name, sims) values ('rtFZ9HHQSgKNV2efzjPx-A', 1550413560000, 'process', 14) on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims was aborted: ERROR: root: memory budget exceeded: 10240 bytes requested, 134210560 currently allocated, 134217728 bytes in budget Call getNextException to see other errors in the batch. [statement:"INSERT INTO rollup_minutes (customer, timestamp, name, sims) values (:customerId, :timestamp, :name, :sims) on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims", rewritten:"/ MinuteRollupDao.addRollupDataAndIncrement / INSERT INTO rollup_minutes (customer, timestamp, name, sims) values (:customerId, :timestamp, :name, :sims) on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims", parsed:"ParsedSql{sql='/ MinuteRollupDao.addRollupDataAndIncrement / INSERT INTO rollup_minutes (customer, timestamp, name, sims) values (?, ?, ?, ?) on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims', parameters=ParsedParameters{positional=false, parameterNames=[customerId, timestamp, name, sims]}}", arguments:{ positional:{}, named:{}, finder:[{lazy bean property arguments "com.instana.cashier.jdbi.model.RollupModel@10500320"]}]#12! at org.jdbi.v3.core.statement.PreparedBatch.internalBatchExecute(PreparedBatch.java:175)#12! Caused by: java.sql.BatchUpdateException: Batch entry 0 / MinuteRollupDao.addRollupDataAndIncrement */ INSERT INTO rollup_minutes (customer, timestamp, name, sims) values ('rtFZ9HHQSgKNV2efzjPx-A', 1550413560000, 'process', 14) on conflict (customer, timestamp, name) do update set sims = rollup_minutes.sims+excluded.sims was aborted: ERROR: root: memory budget exceeded: 10240 bytes requested, 134210560 currently allocated, 134217728 bytes in budget Call getNextException to see other errors in the batch.#12! at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)#12! Suppressed: org.postgresql.util.PSQLException: ERROR: root: memory budget exceeded: 10240 bytes requested, 134210560 currently allocated, 134217728 bytes in budget#012! #011at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)#12! Caused by: org.postgresql.util.PSQLException: ERROR: root: memory budget exceeded: 10240 bytes requested, 134210560 currently allocated, 134217728 bytes in budget#012! at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)

I also traced through the cockroach-code and everything looks correct to me so the size should be set. But whatever we do, we always get those 128 MiB.

I am 100% sure that it must something very basic we awre missing here but we are currently out of ideas :)

marcelbirkner · 2019-02-20T08:33:55Z

Hi @jordanlewis

we found the problem. The log messages from our Log Management provider are 2 days behind but have current timestamps (see screenshot).

Thank you for your help 👍
You can close the ticket

jordanlewis · 2019-02-20T13:37:48Z

No problem- I'm glad it was something like that!

jordanlewis self-assigned this Feb 20, 2019

jordanlewis added C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community labels Feb 20, 2019

jordanlewis closed this as completed Feb 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max-sql-memory not set #35054

max-sql-memory not set #35054

codepitbull commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

codepitbull commented Feb 20, 2019 •

edited

Loading

marcelbirkner commented Feb 20, 2019

jordanlewis commented Feb 20, 2019

max-sql-memory not set #35054

max-sql-memory not set #35054

Comments

codepitbull commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 • edited Loading

marcelbirkner commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 • edited Loading

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

marcelbirkner commented Feb 19, 2019 • edited Loading

marcelbirkner commented Feb 19, 2019 • edited Loading

marcelbirkner commented Feb 19, 2019

jordanlewis commented Feb 19, 2019

codepitbull commented Feb 20, 2019 • edited Loading

marcelbirkner commented Feb 20, 2019

jordanlewis commented Feb 20, 2019

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019 •

edited

Loading

marcelbirkner commented Feb 19, 2019 •

edited

Loading

codepitbull commented Feb 20, 2019 •

edited

Loading