[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

zuotingbing · 2017-12-20T08:08:32Z

What changes were proposed in this pull request?

Start HiveThriftServer2.
Connect to thriftserver through beeline.
Close the beeline.
repeat step2 and step 3 for many times.
we found there are many directories never be dropped under the path hive.exec.local.scratchdir and hive.exec.scratchdir, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated.

In addition, we use jmap -histo:live [PID]
to printout the size of objects in HiveThriftServer2 Process, we can find the object org.apache.spark.sql.hive.client.HiveClientImpl and org.apache.hadoop.hive.ql.session.SessionState keep increasing even though we closed all the beeline connections, which may caused the leak of Memory.

How was this patch tested?

manual tests

This PR follw-up the #19989

srowen · 2017-12-20T20:44:32Z

This indeed is the primary change as it's open vs master. #19989 had some concerns about whether this affects correctness though?

zuotingbing · 2017-12-21T01:41:24Z

Thanks @srowen , so whom could i ping to make sure this change has no side effects?

srowen · 2017-12-21T02:05:47Z

I'm asking you to respond to #19989 (comment)

cloud-fan · 2017-12-21T02:30:02Z

we can find the object org.apache.spark.sql.hive.client.HiveClientImpl and org.apache.hadoop.hive.ql.session.SessionState keep increasing

Can you check the GC root and explain why they are increasing? The fix looks not correct to me as we should create new session.

zuotingbing · 2017-12-21T07:31:13Z

It seems each time when connect to thrift server through beeline, the SessionState.start(state) will be called two times. one is in HiveSessionImpl:open , another is in HiveClientImpl.newSession() for sql("use default") . When close the beeline connection, only close the HiveSession with HiveSessionImpl.close(), but the object of HiveClientImpl.newSession() will be left over.

zuotingbing · 2017-12-21T09:26:38Z

override protected lazy val resourceLoader: HiveSessionResourceLoader = {
val client: HiveClient = externalCatalog.client.newSession()
new HiveSessionResourceLoader(session, client)
}

Is it necessary to create a new HiveClient here since there has a hiveClient in externalCatalog already?
If it is necessary, we need to supply a method to close the hiveClient which be created here and in this method we also need to clean up the scratchdir(hdfsSessionPath and localSessionPath) which are created by HiveClientImpl.

gatorsmile · 2017-12-22T08:18:40Z

cc @liufengdb

zuotingbing · 2017-12-28T06:33:00Z

Could you please to check this PR? Thanks @liufengdb

mgaido91 · 2017-12-28T16:40:39Z

What it seems is never closed by your analysis is the client used to interact with the metastore. This might be a problem which we are not aware of in normal SQL applications, since we have only one client in those cases.

What you are doing in your fix is avoiding creating a client for each HiveSessionBuilder, thus:

this would mean that we are creating more than one SessionBuilder, ie. more than one SparkSession, which is not true as far as I know.
any session would share the same client to connect to the metastore, which is wrong IMHO.

Please let me know if I misunderstood or I was wrong with something.

liufengdb · 2017-12-28T22:43:50Z

@zuotingbing I took a close look at the related code and thought the issue you raised is valid:

The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.
Another possible reason to use a new hive client is to run this hive statement. But I think it just some leftovers from old spark and should be removed. So overall it is fine to use the shared client from HiveExternalCatalog without creating a new hive client, like this patch does.
Currently, there are no ways to clean up the resource created by a new session of SQLContext/SparkSession. I couldn't understand the design tradeoff behind this (@srowen ). So it is not easy to remove the temp dirs as one step of a session close.
To what extent, does spark need these scratch dirs? Is it possible we can make this step optional if it is not used for all the deployment modes? Not sure who is the best person to answer this question.

mgaido91 · 2017-12-29T09:38:55Z

The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.

@liufengdb does it mean that we are creating more than one SparkSession in the thriftserver?

liufengdb · 2017-12-30T06:59:08Z

By this line, yes.

liufengdb · 2018-01-06T01:08:10Z

lgtm!

gatorsmile · 2018-01-06T01:08:45Z

ok to test

viirya · 2018-01-06T03:09:28Z

The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.

Shouldn't addJar be session-based? At least seems in Hive it is: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources

Although looks like in SessionResourceLoader for SessionState, addJar isn't session-based too. So at least seems we have consistent behavior.

SparkQA · 2018-01-06T03:11:23Z

Test build #85736 has finished for PR 20029 at commit 2b1e166.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-01-06T10:06:35Z

addJar is cross-session.

# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the #19989 Author: zuotingbing <[email protected]> Closes #20029 from zuotingbing/SPARK-22793. (cherry picked from commit be9a804) Signed-off-by: gatorsmile <[email protected]>

gatorsmile

Also LGTM

Thanks! Merged to master/2.3

# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the apache#19989 Author: zuotingbing <[email protected]> Closes apache#20029 from zuotingbing/SPARK-22793. (cherry picked from commit be9a804)

[SPARK-22793][SQL]Memory leak in Spark Thrift Server

2b1e166

zuotingbing mentioned this pull request Dec 21, 2017

[SPARK-22793][SQL][BACKPORT-2.0]Memory leak in Spark Thrift Server #19989

Closed

gatorsmile reviewed Jan 6, 2018

View reviewed changes

asfgit closed this in be9a804 Jan 6, 2018

zuotingbing deleted the SPARK-22793 branch January 8, 2018 05:48

liufengdb mentioned this pull request Feb 1, 2018

[SPARK-21993][SQL] Close sessionState when finish #19219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

zuotingbing commented Dec 20, 2017 •

edited

Loading

srowen commented Dec 20, 2017

zuotingbing commented Dec 21, 2017

srowen commented Dec 21, 2017

cloud-fan commented Dec 21, 2017

zuotingbing commented Dec 21, 2017

zuotingbing commented Dec 21, 2017 •

edited

Loading

gatorsmile commented Dec 22, 2017

zuotingbing commented Dec 28, 2017

mgaido91 commented Dec 28, 2017

liufengdb commented Dec 28, 2017 •

edited

Loading

mgaido91 commented Dec 29, 2017

liufengdb commented Dec 30, 2017

liufengdb commented Jan 6, 2018

gatorsmile commented Jan 6, 2018

viirya commented Jan 6, 2018

SparkQA commented Jan 6, 2018

gatorsmile commented Jan 6, 2018

gatorsmile left a comment

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

Conversation

zuotingbing commented Dec 20, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Dec 20, 2017

zuotingbing commented Dec 21, 2017

srowen commented Dec 21, 2017

cloud-fan commented Dec 21, 2017

zuotingbing commented Dec 21, 2017

zuotingbing commented Dec 21, 2017 • edited Loading

gatorsmile commented Dec 22, 2017

zuotingbing commented Dec 28, 2017

mgaido91 commented Dec 28, 2017

liufengdb commented Dec 28, 2017 • edited Loading

mgaido91 commented Dec 29, 2017

liufengdb commented Dec 30, 2017

liufengdb commented Jan 6, 2018

gatorsmile commented Jan 6, 2018

viirya commented Jan 6, 2018

SparkQA commented Jan 6, 2018

gatorsmile commented Jan 6, 2018

gatorsmile left a comment

Choose a reason for hiding this comment

zuotingbing commented Dec 20, 2017 •

edited

Loading

zuotingbing commented Dec 21, 2017 •

edited

Loading

liufengdb commented Dec 28, 2017 •

edited

Loading