Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SPARK-1413: Parquet messes up stdout and stdin when used in Spark REPL #325

Closed
wants to merge 19 commits into from

Conversation

witgo
Copy link
Contributor

@witgo witgo commented Apr 4, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@witgo witgo changed the title Fix SPARK-1413: Parquet messes up stdout and stdin when used in Spark REPL [WIP] Fix SPARK-1413: Parquet messes up stdout and stdin when used in Spark REPL Apr 4, 2014
@marmbrus
Copy link
Contributor

marmbrus commented Apr 4, 2014

ok to test

@@ -81,11 +81,14 @@ private[sql] case class ParquetRelation(val path: String)

private[sql] object ParquetRelation {

var isEnableLogForwarding = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just call this enabledLogForwarding. Thanks for the fix though!

@mateiz
Copy link
Contributor

mateiz commented Apr 4, 2014

Jenkins, test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@witgo
Copy link
Contributor Author

witgo commented Apr 4, 2014

There is a problem I do not know what the reason is cause

import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
case class Record(key: Int, value: String)
val sqlContext = new SQLContext(sc)
import sqlContext._
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
rdd.registerAsTable("records")
rdd.saveAsParquetFile("records.parquet")

=>

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 4, 2014

@AndreSchumacher, any thoughts?

@AndreSchumacher
Copy link
Contributor

Ouch. OK, I'll have a look at this. The default j.u.l. logger writes to stdout. Maybe the log forwarding doesn't work as intended.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13762/

@witgo
Copy link
Contributor Author

witgo commented Apr 4, 2014

class parquet.Log has a static block ( add a default handler in case there is none )

@AndreSchumacher
Copy link
Contributor

@witgo that alone would not explain a deadlock if the forwarding is initialized multiple times? Unless maybe the log forwarding is called before parquet.Log is fully initialized I guess. Maybe that is why it did not show up in the tests that don't use the console. Great that you found it and thanks a lot for the fix. I tried it and it works fine here. For completeness let's run the Travis checks once more.

Could you still remove my now misleading comment about the parent loggers?

@witgo
Copy link
Contributor Author

witgo commented Apr 5, 2014

LogManager.getLogManager.reset () call twice seems to cause some problems
Seq("parquet.hadoop.ColumnChunkPageWriteStore", ... parent logger is Logger.getLogger("parquet"). only need to set the Logger.getLogger("parquet") .

@pwendell
Copy link
Contributor

pwendell commented Apr 5, 2014

So is the main problem here that parquet should not use a static block like this to do initialization? If so, @witgo do you mind documenting this in the code to explain that it's a work around. Also, should we reach out to the parquet team and @julienledem about removing this static initialization? It seems this initialization code breaks the use case where JUL is being routed to SLF4j via a bridge. Or maybe it's sensitive to the order in which the SLF4J bridge and the Parquet Log class do static initialization.

@pwendell
Copy link
Contributor

pwendell commented Apr 5, 2014

Btw @julienledem in Spark we also add some default initialization, but we do it lazily instead of statically because this allows re-routing packages to statically initialize before we try to detect whether any logging has been configured:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L106

@AndreSchumacher
Copy link
Contributor

@witgo Good idea to move the forwarding setup out of there but in the process it seems that edd9630 broke the forwarding again.

Also: I think it is confusing to set the log level to null (root level). Parquet hides most log statements behind if (level >= WARN) etc. and relies on the compiler to remove these for performance reasons. So we should just do the same as parquet.Log does and do parquetLogger.setLevel(parquet.Log.LEVEL) so we won't confuse users. Note that the actual level can be still set via Log4J (assuming it's not finer than that).

@witgo
Copy link
Contributor Author

witgo commented Apr 7, 2014

@AndreSchumacher
parquet.Log has a static block ( add a default handler in case there is none )
The following code unset Logger.getLogger("parquet")

val parquetLogger = java.util.logging.Logger.getLogger("parquet")
parquetLogger.getHandlers.foreach(parquetLogger.removeHandler)
if(parquetLogger.getLevel != null) parquetLogger.setLevel(null)
if(!parquetLogger.getUseParentHandlers) parquetLogger.setUseParentHandlers(true)

@AndreSchumacher
Copy link
Contributor

@witgo yes, it seems the parent handlers need to be set for the forwarding to work. My only remaining suggestion is to the log level. I still think setting it to anything else than the pre-compiled one would be confusing (see my comment above).

@pwendell are you OK with having the initialization code there where it is now?

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Apr 7, 2014
Modify spark on yarn to create SparkConf process
@@ -135,4 +136,6 @@ trait Logging {
private object Logging {
@volatile private var initialized = false
val initLock = new Object()
// SLF4JBridgeHandler.removeHandlersForRootLogger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line not actually necessary?

If it turns out we need to directly call SLF4JBrigeHandler let's change the call to just use reflection and fail gracefully if it's not found on the classpath. It's not pretty but will be friendlier to downstream users. And we can drop it if Parquet ditches static initializers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I'd first check if the class can be loaded, then if it can, use java reflection to access and call the class methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right,I don't understand the scala reflection, an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For static methods it looks like this:

try {
  // We use reflection here to handle the case where users remove the 
  // slf4j-to-jul bridge order to route their logs to JUL.
  val bridgeClass = Class.forName("org.slf4j.bridge.SLF4JBridgeHandler")
  bridgeClass.getMethod("removeHandlersForRootLogger").invoke(null)
  val installed = bridgeClass.getMethod("isInstalled").invoke(null).asInstanceOf[Boolean]
  if (!installed) {
    bridgeClass.getMethod("install").invoke(null)
  }
} catch {
  case e: Exception => // can't log anything yet so just fail silently
}

@pwendell
Copy link
Contributor

@witgo I gave an example of using reflection - mind adjusting your patch and checking if this works?

@witgo
Copy link
Contributor Author

witgo commented Apr 10, 2014

@pwendell
Thank you, this patch works

But this only solve one problem. The Spark dependence is fixed, we can only use log4j.

In SparkBuild.scala file:

    "log4j"                      % "log4j"            % "1.2.17",
    "org.slf4j"                  % "slf4j-api"        % slf4jVersion,
    "org.slf4j"                  % "slf4j-log4j12"    % slf4jVersion,
    "org.slf4j"                  % "jul-to-slf4j"     % slf4jVersion,
    "org.slf4j"                  % "jcl-over-slf4j"   % slf4jVersion,

To become a configurable logging system?

@pwendell
Copy link
Contributor

no we don't need to change the dependencies. The case I wanted to handle was if users downstream remove the dependencies (e.g. they remove "jul-to-slf4j") and do a custom spark build. This is something people actually do.

@pwendell
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13996/

@pwendell
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13998/

@AndreSchumacher
Copy link
Contributor

I'm seeing some Akka related problems in my local build but nothing that would seem related to this PR. The log forwarding seems to work.

@pwendell thanks a lot for the help on this. I think in the interest of getting the saveAsParquetFile working inside the console that should be merged soon.

@pwendell
Copy link
Contributor

@AndreSchumacher it's likely related to an accidental merge of a buggy patch into master... try rebasing to the latest master which includes a reversion of the bug.

@pwendell
Copy link
Contributor

Okay thanks, merged!

@asfgit asfgit closed this in a74fbbb Apr 10, 2014
@witgo witgo deleted the SPARK-1413 branch April 11, 2014 00:04
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
… REPL

Author: witgo <[email protected]>

Closes apache#325 from witgo/SPARK-1413 and squashes the following commits:

e57cd8e [witgo] use scala reflection to access and call the SLF4JBridgeHandler  methods
45c8f40 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
5e35d87 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
0d5f819 [witgo] review commit
45e5b70 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
fa69dcf [witgo] Merge branch 'master' into SPARK-1413
3c98dc4 [witgo] Merge branch 'master' into SPARK-1413
38160cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
ba09bcd [witgo] remove set the parquet log level
a63d574 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
5231ecd [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
3feb635 [witgo] parquet logger use parent handler
fa00d5d [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
8bb6ffd [witgo] enableLogForwarding note fix
edd9630 [witgo]  move to
f447f50 [witgo] merging master
5ad52bd [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1413
76670c1 [witgo] review commit
70f3c64 [witgo] Fix SPARK-1413
tnachen pushed a commit to tnachen/spark that referenced this pull request Jun 4, 2017
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* Add job terraform-provider-openstack-acceptance-test-rocky
* manila-provisioner job: set manila share type (apache#321)
* manila-provisioner job: set manila share type
* manila-provisioner job: added a sleep in between deleting pvc+pod and sc+secret
* Add job for ansible against OpenStack Rocky release
* Add periodic job for ansible against OpenStack Rocky release
* Add job for packer against OpenStack Rocky release
* Add periodic job for packer against OpenStack Rocky release
* Add periodic job for docker-machine against OpenStack Rocky release

Related-Bug: theopenlab/openlab#82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants