Reduce memory usage of TypedSet #4123

raunaqmorarka · 2020-06-21T11:13:05Z

No description provided.

Lewuathe

Do you have any microbenchmark result measuring how much memory usage we can reduce by this change?

sopel39

How does this PR reduce memory usage in TypesSet?

presto-main/src/main/java/io/prestosql/operator/aggregation/TypedSet.java

presto-main/src/main/java/io/prestosql/type/TypeUtils.java

sopel39 · 2020-06-30T09:21:05Z

presto-main/src/main/java/io/prestosql/type/TypeUtils.java

+        else if (type instanceof VarcharType) {
+            // If bound on length of varchar is smaller than defaultSize, use that as expected size
+            return ((VarcharType) type).getLength()
+                    .map(length -> Math.min(length, defaultSize))


Varchars usually won't occupy full declared length.

varchar length is "character count", andvarchar is encoded in UTF-8 which means up to 4 bytes per character. If this is just an expected size, assuming ascii is reasoanble, but we should note that in a comment here.

Added a comment for UTF-8

I still don't think that assuming that varchar will occupy entire length is the correct one. It seems very pessimistic. What do you think @dain?

@sopel39 that's why we do min here, right?

I'm not sure what contract of defaultSize is here. Could it be excessively large?

In current usage defaultSize is 16, 32 or 100 depending on where it's getting called from. This change should just reduce the expected size estimate when a smaller bound is known (E.g. varchar(10)). This would help reduce memory usage in cases where a large no. of TypeSet are generated but each set has small no. of entries which are a few characters long.

I think we should assume the default size is one byte. I would update the comment a bit:

It can take up to 4 bytes per character due to UTF-8 encoding, but we assume the data is ASCII and only needs one byte.

Updated the comments in code

dain

I agree with @sopel39's comments. I don't think we need the first commit that changes the fast utils IntArrayList to an int[]. The second commit looks good, but needs a fix for the CHAR branch

findepi

(just skimming)

presto-main/src/main/java/io/prestosql/type/TypeUtils.java

sopel39

LGTM on Use getInt to access IntArrayList. I suggest making two separate PRs for the two commits

sopel39 · 2020-07-28T09:11:24Z

presto-main/src/main/java/io/prestosql/type/TypeUtils.java

+        else if (type instanceof VarcharType) {
+            // If bound on length of varchar is smaller than defaultSize, use that as expected size
+            return ((VarcharType) type).getLength()
+                    .map(length -> Math.min(length, defaultSize))


I still don't think that assuming that varchar will occupy entire length is the correct one. It seems very pessimistic. What do you think @dain?

presto-spi/src/main/java/io/prestosql/spi/type/VarcharType.java

presto-spi/src/main/java/io/prestosql/spi/type/CharType.java

sopel39

@dain do you want to take a look?

dain

Looks good to me.

dain · 2020-07-29T22:35:05Z

presto-main/src/main/java/io/prestosql/type/TypeUtils.java

+        else if (type instanceof VarcharType) {
+            // If bound on length of varchar is smaller than defaultSize, use that as expected size
+            return ((VarcharType) type).getLength()
+                    .map(length -> Math.min(length, defaultSize))


I think we should assume the default size is one byte. I would update the comment a bit:

It can take up to 4 bytes per character due to UTF-8 encoding, but we assume the data is ASCII and only needs one byte.

Dain gave lgtm

sopel39 · 2020-07-30T09:45:31Z

merged, thanks!

cla-bot bot added the cla-signed label Jun 21, 2020

raunaqmorarka requested a review from dain June 22, 2020 13:41

Lewuathe reviewed Jun 25, 2020

View reviewed changes

sopel39 reviewed Jun 30, 2020

View reviewed changes

dain previously requested changes Jul 13, 2020

View reviewed changes

Use getInt to access IntArrayList

0fc03f4

raunaqmorarka force-pushed the typed_set_opt branch from 10ab1b3 to ddbf3ed Compare July 28, 2020 05:32

raunaqmorarka requested review from dain and sopel39 July 28, 2020 05:37

findepi reviewed Jul 28, 2020

View reviewed changes

presto-main/src/main/java/io/prestosql/type/TypeUtils.java Outdated Show resolved Hide resolved

presto-main/src/main/java/io/prestosql/type/TypeUtils.java Outdated Show resolved Hide resolved

presto-main/src/main/java/io/prestosql/type/TypeUtils.java Outdated Show resolved Hide resolved

raunaqmorarka force-pushed the typed_set_opt branch from ddbf3ed to 937403b Compare July 28, 2020 07:16

sopel39 reviewed Jul 28, 2020

View reviewed changes

raunaqmorarka force-pushed the typed_set_opt branch from 937403b to 76cd2f6 Compare July 28, 2020 13:25

raunaqmorarka requested a review from sopel39 July 28, 2020 13:26

sopel39 approved these changes Jul 29, 2020

View reviewed changes

dain reviewed Jul 29, 2020

View reviewed changes

Use length of VarcharType and CharType to calculate EXPECTED_ENTRY_SIZE

3fa8534

raunaqmorarka force-pushed the typed_set_opt branch from 76cd2f6 to 3fa8534 Compare July 30, 2020 08:47

raunaqmorarka requested a review from dain July 30, 2020 08:49

sopel39 merged commit 547eb1a into trinodb:master Jul 30, 2020

sopel39 mentioned this pull request Jul 30, 2020

Release notes for 340 #4527

Closed

8 tasks

raunaqmorarka deleted the typed_set_opt branch January 14, 2021 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of TypedSet #4123

Reduce memory usage of TypedSet #4123

raunaqmorarka commented Jun 21, 2020

Lewuathe left a comment

sopel39 left a comment

sopel39 Jun 30, 2020

dain Jul 7, 2020

raunaqmorarka Jul 28, 2020

sopel39 Jul 28, 2020

findepi Jul 28, 2020

sopel39 Jul 28, 2020

raunaqmorarka Jul 28, 2020

dain Jul 29, 2020

raunaqmorarka Jul 30, 2020

dain left a comment

findepi left a comment

sopel39 left a comment

sopel39 Jul 28, 2020

sopel39 left a comment

dain left a comment

dain Jul 29, 2020

sopel39 commented Jul 30, 2020

Reduce memory usage of TypedSet #4123

Reduce memory usage of TypedSet #4123

Conversation

raunaqmorarka commented Jun 21, 2020

Lewuathe left a comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dain left a comment

Choose a reason for hiding this comment

findepi left a comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 left a comment

Choose a reason for hiding this comment

dain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Jul 30, 2020