Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25492][TEST] Refactor WideSchemaBenchmark to use main method #22501

Closed
wants to merge 6 commits into from
Closed

[SPARK-25492][TEST] Refactor WideSchemaBenchmark to use main method #22501

wants to merge 6 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Sep 20, 2018

What changes were proposed in this pull request?

Refactor WideSchemaBenchmark to use main method.

  1. use spark-submit:
bin/spark-submit --class  org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
  1. Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"

How was this patch tested?

manual tests

@SparkQA
Copy link

SparkQA commented Sep 20, 2018

Test build #96369 has finished for PR 22501 at commit f56b732.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

* 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
* 2. build/sbt "sql/test:runMain <this class>"
* 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
* Results will be written to "benchmarks/WideSchemaBenchmark-results.txt".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix doc generation failure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun. Actually I'm waiting for #22484. I want to move withTempDir() to RunBenchmarkWithCodegen.scala.

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala
@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97056 has finished for PR 22501 at commit e6f39f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
if (!file.exists()) {
file.createNewFile()
}
output = Some(new FileOutputStream(file))
output = Option(new FileOutputStream(file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like irrelevant pig-back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change here because: #22443 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, @HyukjinKwon meant when you need to touch this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that I will forget it after a long time, so I am changing this time. I should revert it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you replace Some to Option? Are you worrying new FileOutputStream(file) becomes null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that there's no point of checking null below from my cursory look. If there's no chance that it becomes null, we can leave it Some and remove null check below.

@SparkQA
Copy link

SparkQA commented Oct 18, 2018

Test build #97534 has finished for PR 22501 at commit 82e2367.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

1 cols x 100000 rows (read in-mem) 22 / 25 4.6 219.4 1.0X
1 cols x 100000 rows (exec in-mem) 22 / 28 4.5 223.8 1.0X
1 cols x 100000 rows (read parquet) 45 / 49 2.2 449.6 0.5X
1 cols x 100000 rows (write parquet) 204 / 223 0.5 2044.4 0.1X
Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the difference on ratio, this might be a little regression on Parquet writer from Spark 2.1.0 (SPARK-17335).

cc @cloud-fan and @gatorsmile , @rdblue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how this happens. Can you create a JIRA ticket to investigate this regression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a parquet issue. I found that the binary write performance is a little worse after upgrading to parquet 1.10.0: apache/parquet-java#505. I will verify it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following EC2 result shows the consistent ratio like Spark 2.1.0. The result on Mac seemed to be unstable for some unknown reason like #22501 (comment).

1 cols x 100000 rows (read parquet)             61 /   70          1.6         610.2       0.6X
1 cols x 100000 rows (write parquet)           209 /  233          0.5        2086.1       0.2X

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun, so you are saying that it doesn't appear that there is a performance regression, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part, right, @rdblue . I guess so.
After merging EC2 result to @wangyum 's PR, I'll compare the numbers one by one once again.

2500 cols x 40 rows (read in-mem) 261 / 434 0.4 2607.3 0.1X
2500 cols x 40 rows (exec in-mem) 624 / 701 0.2 6240.5 0.0X
2500 cols x 40 rows (read parquet) 196 / 301 0.5 1963.4 0.1X
2500 cols x 40 rows (write parquet) 687 / 1049 0.1 6870.6 0.0X
Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gap between best and average is too high in line 32 and line 33.
I'll try to run this on EC2, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this large gap was removed at EC2 result.

1 deep x 100000 rows (write parquet) 195 / 219 0.5 1945.1 0.1X
100 deep x 1000 rows (read in-mem) 39 / 57 2.5 393.1 0.5X
100 deep x 1000 rows (exec in-mem) 480 / 556 0.2 4795.7 0.0X
100 deep x 1000 rows (read parquet) 7943 / 7950 0.0 79427.5 0.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, @wangyum . Is this 4 times slower than before?

cc @dbtsai .

@dongjoon-hyun
Copy link
Member

Hi, @wangyum . I ran the test on EC2 r3.xlarge, too. It looks more stable than this.
Could you review and merge wangyum#19 ?

@cloud-fan
Copy link
Contributor

thank you guys for refreshing the benchmarks and results! It's very helpful.

If possible, can we post the perf regressions we found in the umbrella JIRA? Then people can see if the perf regression is reasonable(if we have addressed it) or investigate how the regression was introduced.

Thanks!

@SparkQA
Copy link

SparkQA commented Oct 20, 2018

Test build #97627 has finished for PR 22501 at commit 64e5ede.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

seems jenkins is broken, cc @shaneknapp

Command "/tmp/tmp.JfFHaoRFPU/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps" failed with error code 1 in /home/jenkins/workspace/SparkPullRequestBuilder/python/
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Cleaning up temporary directory - /tmp/tmp.JfFHaoRFPU
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ; received return code 1

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 20, 2018

@cloud-fan After updating on EC2, almost ratio and values looks more stable and reasonable for now. The following two are noticeable changes, but it looks like Parquet Writer improvement (instead of regression).

1. Read/Write ratio is reverted (0.8 -> 1.7)

- 128 x 8 deep x 1000 rows (read parquet)         69 /   74          1.4         693.9       0.2X
- 128 x 8 deep x 1000 rows (write parquet)        78 /   83          1.3         777.7       0.2X
+ 128 x 8 deep x 1000 rows (read parquet)        351 /  379          0.3        3510.3       0.1X
+ 128 x 8 deep x 1000 rows (write parquet)       199 /  203          0.5        1988.3       0.2X

2. Read/Write ratio is changed noticeably (4.6 -> 8.3)

- 1024 x 11 deep x 100 rows (read parquet)        426 /  433          0.2        4263.7       0.0X
- 1024 x 11 deep x 100 rows (write parquet)        91 /   98          1.1         913.5       0.1X
+ 1024 x 11 deep x 100 rows (read parquet)       2063 / 2078          0.0       20629.2       0.0X
+ 1024 x 11 deep x 100 rows (write parquet)       248 /  266          0.4        2475.1       0.1X

Since this is the first attempt to track this and the previous result is too old, there exists some obvious limitation during comparison. From Spark 2.4.0, we can get a consistent compasison instead of different personal mac.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 20, 2018

Test build #97642 has finished for PR 22501 at commit 64e5ede.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 20, 2018

Test build #97644 has finished for PR 22501 at commit 64e5ede.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

@cloud-fan -- pip isn't broken... the actual error is found right above what you cut and pasted:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)

i won't be able to look any deeper in to this until at least tomorrow at the earliest.

@HyukjinKwon
Copy link
Member

I guess it's related with pip packaging tho.

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
        'Programming Language :: Python :: Implementation :: PyPy']
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
        return distutils.core.setup(**attrs)
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
        self.install_for_development()
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
        self.process_distribution(None, self.dist, not self.no_deps)
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
        self.install_egg_scripts(dist)
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
        script_text = strm.read()
      File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
    

It's from setup.py

@kiszk
Copy link
Member

kiszk commented Oct 20, 2018

I am looking at each commit from the latest to old at https://github.com/apache/spark/commits/master

@HyukjinKwon
Copy link
Member

Thanks. It might rather more be related to external factors.

@kiszk
Copy link
Member

kiszk commented Oct 20, 2018

Thanks, when it was successful, this is a part of log from this

copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
Writing pyspark-3.0.0.dev0/setup.cfg
Creating tar archive
removing 'pyspark-3.0.0.dev0' (and everything under it)
Installing dist into virtual env
Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
  Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
mkl-random 1.0.1 requires cython, which is not installed.
Installing collected packages: py4j, pyspark
  Running setup.py develop for pyspark
Successfully installed py4j-0.10.7 pyspark
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Run basic sanity check on pip installed version with spark-submit

Now, we are seeing the following. Is this a problem related to py4j or pyspark on external sites?

copying pyspark/streaming/util.py -> pyspark-3.0.0.dev0/pyspark/streaming
Writing pyspark-3.0.0.dev0/setup.cfg
Creating tar archive
removing 'pyspark-3.0.0.dev0' (and everything under it)
Installing dist into virtual env
Obtaining file:///home/jenkins/workspace/SparkPullRequestBuilder/python
Collecting py4j==0.10.7 (from pyspark==3.0.0.dev0)
  Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
mkl-random 1.0.1 requires cython, which is not installed.
Installing collected packages: py4j, pyspark
  Running setup.py develop for pyspark
    Complete output from command /tmp/tmp.EWtmCOYUBn/3.5/bin/python -c "import setuptools, tokenize;__file__='/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps:
    running develop
    running egg_info
    writing dependency_links to pyspark.egg-info/dependency_links.txt
    writing pyspark.egg-info/PKG-INFO
    writing requirements to pyspark.egg-info/requires.txt
    writing top-level names to pyspark.egg-info/top_level.txt
    Could not import pypandoc - required to package PySpark
    package init file 'deps/bin/__init__.py' not found (or not a regular file)
    package init file 'deps/jars/__init__.py' not found (or not a regular file)
    package init file 'pyspark/python/pyspark/__init__.py' not found (or not a regular file)
    package init file 'lib/__init__.py' not found (or not a regular file)
    package init file 'deps/data/__init__.py' not found (or not a regular file)
    package init file 'deps/licenses/__init__.py' not found (or not a regular file)
    package init file 'deps/examples/__init__.py' not found (or not a regular file)
    reading manifest file 'pyspark.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '*.py[cod]' found anywhere in distribution
    warning: no previously-included files matching '__pycache__' found anywhere in distribution
    warning: no previously-included files matching '.DS_Store' found anywhere in distribution
    writing manifest file 'pyspark.egg-info/SOURCES.txt'
    running build_ext
    Creating /tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/pyspark.egg-link (link to .)
    Adding pyspark 3.0.0.dev0 to easy-install.pth file
    Installing load-spark-env.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-submit script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-class.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing beeline.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing find-spark-home.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing run-example script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-shell2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing pyspark script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing sparkR script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-sql script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-submit.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-shell script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing beeline script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-submit2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing find-spark-home script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing sparkR.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing run-example.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing sparkR2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-shell.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-sql.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Installing spark-class2.cmd script to /tmp/tmp.EWtmCOYUBn/3.5/bin
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in <module>
        'Programming Language :: Python :: Implementation :: PyPy']
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup
        return distutils.core.setup(**attrs)
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run
        self.install_for_development()
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development
        self.process_distribution(None, self.dist, not self.no_deps)
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution
        self.install_egg_scripts(dist)
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts
        script_text = strm.read()
      File "/tmp/tmp.EWtmCOYUBn/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128)
    
    ----------------------------------------

@kiszk
Copy link
Member

kiszk commented Oct 20, 2018

Is this the oldest test failure related to this type of failure?

@HyukjinKwon
Copy link
Member

Yup, I made a fix #22782

@kiszk
Copy link
Member

kiszk commented Oct 20, 2018

Thanks, I found 0xc2 in docker-image-tool.sh. I will put my finding into #22782

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 20, 2018

Test build #97665 has finished for PR 22501 at commit 64e5ede.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum and all!

@asfgit asfgit closed this in 62551cc Oct 21, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Refactor `WideSchemaBenchmark` to use main method.
1. use `spark-submit`:
```console
bin/spark-submit --class  org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
```

2. Generate benchmark result:
```console
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
```

## How was this patch tested?

manual tests

Closes apache#22501 from wangyum/SPARK-25492.

Lead-authored-by: Yuming Wang <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk
Copy link
Member

MaxGekk commented Jan 5, 2020

Any ideas how much memory does this benchmark require. I set 32Gb but it crashes with OOM on the master:

export SBT_OPTS="-Xmx32g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
[error] Caused by: java.lang.reflect.InvocationTargetException
[error] 	at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
[error] 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[error] 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
[error] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468)
[error] 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
[error] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467)
[error] 	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
[error] 	... 132 more
[error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
[error] 	at java.util.Arrays.copyOf(Arrays.java:3332)
[error] 	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
[error] 	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
[error] 	at java.lang.StringBuilder.append(StringBuilder.java:136)
[error] 	at org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411)
[error] 	at org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410)
[error] 	at org.apache.spark.sql.types.StructType$$Lambda$2439/60886146.apply(Unknown Source)
[error] 	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)

@dongjoon-hyun
Copy link
Member

It sounds like a regression, @MaxGekk . This test suite was executed with the example in the code without any additional information like export SBT_OPTS="-Xmx32g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m".

@dongjoon-hyun
Copy link
Member

FYI, this was tested in this PR, Oct 19, 2018, initially. The latest run was 854a0f7 , Oct 3, 2019 in the [SPARK-29320][TESTS] Compare sql/core module in JDK8/11 (Part 1) PR.

@wangyum wangyum deleted the SPARK-25492 branch January 6, 2020 01:17
@MaxGekk
Copy link
Member

MaxGekk commented Jan 6, 2020

I have rechecked the benchmark on the recent master - it is still failing with OOM. I opened the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-30429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants