Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22793][SQL]Memory leak in Spark Thrift Server #20029

Closed
wants to merge 1 commit into from

Conversation

zuotingbing
Copy link

@zuotingbing zuotingbing commented Dec 20, 2017

What changes were proposed in this pull request?

  1. Start HiveThriftServer2.
  2. Connect to thriftserver through beeline.
  3. Close the beeline.
  4. repeat step2 and step 3 for many times.
    we found there are many directories never be dropped under the path hive.exec.local.scratchdir and hive.exec.scratchdir, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated.

In addition, we use jmap -histo:live [PID]
to printout the size of objects in HiveThriftServer2 Process, we can find the object org.apache.spark.sql.hive.client.HiveClientImpl and org.apache.hadoop.hive.ql.session.SessionState keep increasing even though we closed all the beeline connections, which may caused the leak of Memory.

How was this patch tested?

manual tests

This PR follw-up the #19989

@srowen
Copy link
Member

srowen commented Dec 20, 2017

This indeed is the primary change as it's open vs master. #19989 had some concerns about whether this affects correctness though?

@zuotingbing
Copy link
Author

Thanks @srowen , so whom could i ping to make sure this change has no side effects?

@srowen
Copy link
Member

srowen commented Dec 21, 2017

I'm asking you to respond to #19989 (comment)

@cloud-fan
Copy link
Contributor

we can find the object org.apache.spark.sql.hive.client.HiveClientImpl and org.apache.hadoop.hive.ql.session.SessionState keep increasing

Can you check the GC root and explain why they are increasing? The fix looks not correct to me as we should create new session.

@zuotingbing
Copy link
Author

It seems each time when connect to thrift server through beeline, the SessionState.start(state) will be called two times. one is in HiveSessionImpl:open , another is in HiveClientImpl.newSession() for sql("use default") . When close the beeline connection, only close the HiveSession with HiveSessionImpl.close(), but the object of HiveClientImpl.newSession() will be left over.

@zuotingbing
Copy link
Author

zuotingbing commented Dec 21, 2017

override protected lazy val resourceLoader: HiveSessionResourceLoader = {
val client: HiveClient = externalCatalog.client.newSession()
new HiveSessionResourceLoader(session, client)
}

Is it necessary to create a new HiveClient here since there has a hiveClient in externalCatalog already?
If it is necessary, we need to supply a method to close the hiveClient which be created here and in this method we also need to clean up the scratchdir(hdfsSessionPath and localSessionPath) which are created by HiveClientImpl.

@gatorsmile
Copy link
Member

cc @liufengdb

@zuotingbing
Copy link
Author

Could you please to check this PR? Thanks @liufengdb

@mgaido91
Copy link
Contributor

What it seems is never closed by your analysis is the client used to interact with the metastore. This might be a problem which we are not aware of in normal SQL applications, since we have only one client in those cases.

What you are doing in your fix is avoiding creating a client for each HiveSessionBuilder, thus:

  1. this would mean that we are creating more than one SessionBuilder, ie. more than one SparkSession, which is not true as far as I know.
  2. any session would share the same client to connect to the metastore, which is wrong IMHO.

Please let me know if I misunderstood or I was wrong with something.

@liufengdb
Copy link

liufengdb commented Dec 28, 2017

@zuotingbing I took a close look at the related code and thought the issue you raised is valid:

  1. The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.

  2. Another possible reason to use a new hive client is to run this hive statement. But I think it just some leftovers from old spark and should be removed. So overall it is fine to use the shared client from HiveExternalCatalog without creating a new hive client, like this patch does.

  3. Currently, there are no ways to clean up the resource created by a new session of SQLContext/SparkSession. I couldn't understand the design tradeoff behind this (@srowen ). So it is not easy to remove the temp dirs as one step of a session close.

  4. To what extent, does spark need these scratch dirs? Is it possible we can make this step optional if it is not used for all the deployment modes? Not sure who is the best person to answer this question.

@mgaido91
Copy link
Contributor

The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.

@liufengdb does it mean that we are creating more than one SparkSession in the thriftserver?

@liufengdb
Copy link

By this line, yes.

@liufengdb
Copy link

lgtm!

@gatorsmile
Copy link
Member

ok to test

@viirya
Copy link
Member

viirya commented Jan 6, 2018

The hiveClient created for the resourceLoader is only used to addJar, which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can just use the shared hive client for this purpose.

Shouldn't addJar be session-based? At least seems in Hive it is: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources

Although looks like in SessionResourceLoader for SessionState, addJar isn't session-based too. So at least seems we have consistent behavior.

@SparkQA
Copy link

SparkQA commented Jan 6, 2018

Test build #85736 has finished for PR 20029 at commit 2b1e166.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

addJar is cross-session.

asfgit pushed a commit that referenced this pull request Jan 6, 2018
# What changes were proposed in this pull request?
1. Start HiveThriftServer2.
2. Connect to thriftserver through beeline.
3. Close the beeline.
4. repeat step2 and step 3 for many times.
we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated.

In addition, we use `jmap -histo:live [PID]`
to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory.

# How was this patch tested?
manual tests

This PR follw-up the #19989

Author: zuotingbing <[email protected]>

Closes #20029 from zuotingbing/SPARK-22793.

(cherry picked from commit be9a804)
Signed-off-by: gatorsmile <[email protected]>
Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also LGTM

Thanks! Merged to master/2.3

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Dec 6, 2018
# What changes were proposed in this pull request?
1. Start HiveThriftServer2.
2. Connect to thriftserver through beeline.
3. Close the beeline.
4. repeat step2 and step 3 for many times.
we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated.

In addition, we use `jmap -histo:live [PID]`
to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory.

# How was this patch tested?
manual tests

This PR follw-up the apache#19989

Author: zuotingbing <[email protected]>

Closes apache#20029 from zuotingbing/SPARK-22793.

(cherry picked from commit be9a804)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants