Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor] [sql] Partial revert of e683182c3e. #6243

Closed
wants to merge 1 commit into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented May 18, 2015

Moving this file to a different module breaks the maven build; because it
exposes a Guava type in an API used from a different module, unit tests
that end up calling into this code will fail because Guava is shaded in
the maven build.

Moving this file to a different module breaks the maven build; because it
exposes a Guava type in an API used from a different module, unit tests
that end up calling into this code will fail because Guava is shaded in
the maven build.
@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

/cc @rxin

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33016 has started for PR 6243 at commit c1658cb.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33016 has finished for PR 6243 at commit c1658cb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33016/
Test FAILed.

@JoshRosen
Copy link
Contributor

/cc @yhuai as well.

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33018 has started for PR 6243 at commit c1658cb.

@andrewor14
Copy link
Contributor

@vanzin @JoshRosen I was looking at the master maven builds and it appears that they have been passing for a few days (until today, but that's caused by a separate build break #6244):

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/

even though e683182 was merged 5 days ago. Could you point me to specific instance where this causes a build break / test failure?

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

Do those builds run unit tests?

The issue is when you run the spark-sql unit tests; unit tests are run against the unshaded classes, so SQLContext still references JavaTypeInference.inferDataType(com.google.common.reflect.TypeToken). But the JavaTypeInference class in the classpath now would come from the catalyst jar, which would have been shaded, and would have a different signature (JavaTypeInference.inferDataType(org.spark-project.guava.reflect.TypeToken)). And the test would fail with a "NoSuchMethodError" or something.

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

Actually, after writing that, it's a little weird that it compiles at all; but I did run into the unit test issue. Let me clean up my local env and try again. (With that whole change reverted, things work fine locally.)

@JoshRosen
Copy link
Contributor

@andrewor14, maybe this is caused by that issue where the master Maven build didn't run SQL tests on every commit? We should prioritize merging #5955 to fix this.

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

@JoshRosen that PR seems to be for sbt only? (IIRC the dev/ scripts use sbt.)

@andrewor14
Copy link
Contributor

According to @yhuai #5955 is for both SBT and maven. So my guess is that if we merge in #5955 right now we will start seeing test failures that will be fixed by this patch. Is that correct @vanzin?

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

Here's what I ran into:

  • Running "mvn install -DskipTests" for everything works. Why this works is a little of a mystery to me (since spark-sql should be building against the shaded spark-catalyst artifact and thus fail), but...

  • If you run spark-sql unit tests, they fail:

    Running test.org.apache.spark.sql.JavaDataFrameSuite
    Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.621 sec <<< FAILURE! - in test.org.apache.spark.sql.JavaDataFrameSuite
    testCreateDataFrameFromJavaBeans(test.org.apache.spark.sql.JavaDataFrameSuite) Time elapsed: 0.028 sec <<< ERROR!
    java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(Lcom/google/common/reflect/TypeToken;)Lscala/Tuple2;
    at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:982)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:501)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:530)
    at test.org.apache.spark.sql.JavaDataFrameSuite.testCreateDataFrameFromJavaBeans(JavaDataFrameSuite.java:147)

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

@andrewor14 if maven builds are currently not running spark-sql tests, then yes, pushing that PR will cause builds to fail, and this one should fix that.

@yhuai
Copy link
Contributor

yhuai commented May 18, 2015

@vanzin I checked https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/2393/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/consoleFull. I found

Running test.org.apache.spark.sql.JavaDataFrameSuite
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.789 sec - in test.org.apache.spark.sql.JavaDataFrameSuite

Any idea on the reason that this test passed in jenkins?

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33018 has finished for PR 6243 at commit c1658cb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33018/
Test FAILed.

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

@yhuai that's interesting. I don't have a good explanation. I'm trying to reproduce that locally to see if I missed something. The only difference I can see is that jenkins is running mvn package while I'm running mvn install.

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

So, the only explanation I have is: mvn package does not deploy things to the local repository. But the subsequent mvn test picks up things from the local reposistory. So you're running the tests against a different set of artifacts than the ones that were built.

I checked by running mvn test -X and the classpaths all point to files in ~/.m2/repository. No references to the classes directory under the target dir, except for the particular module being currently built (spark-sql in my case).

That would actually imply that the maven builds are technically broken. Tests are being run against some random artifact that is under the jenkins account's home dir, not against what was just built. IMO the correct way to do things would be to have a per-build local repository and use mvn install instead of mvn package; I'm not sure how you'd do that (I think in our internal builds we use chroot for that).

@vanzin
Copy link
Contributor Author

vanzin commented May 18, 2015

(Update: I checked ~/.m2/repository on the jenkins machine and it doesn't contain any 1.4.0-SNAPSHOT artifacts, so that kinda puts a damper on my theory. I'm still trying to figure out why the compilation succeeds while the tests fail - that still doesn't make sense to me. It's probably some obscure maven thing that I'm not familiar with.)

@vanzin
Copy link
Contributor Author

vanzin commented May 19, 2015

For those curious, the compilation thing seems to be something super weird with scalac. This code compiles fine:

package org.apache.spark.sql

import org.apache.spark.sql.catalyst.JavaTypeInference
import com.google.common.reflect.TypeToken

object foo{

  def main(args: Array[String]): Unit = {
    JavaTypeInference.inferDataType(TypeToken.of(classOf[Integer]))
  }

}

While the following java code:

import org.apache.spark.sql.catalyst.JavaTypeInference$;
import com.google.common.reflect.TypeToken;

class foo {

  public static void main(String[] args) {
    JavaTypeInference$.inferDataType(TypeToken.of(foo.class));
  }

}

Fails with the expected error:

test.java:7: inferDataType(org.spark-project.guava.reflect.TypeToken<?>) in org.apache.spark.sql.catalyst.JavaTypeInference$ cannot be applied to (com.google.common.reflect.TypeToken<foo>)
  JavaTypeInference$.inferDataType(TypeToken.of(foo.class));

Perhaps it has something to do with the fact that we're using a technically illegal package name for the shaded classes (org.spark-project).

@andrewor14
Copy link
Contributor

I see. Unfortunately I am not familiar enough with scalac or SQL to comment on this in any substantial way. @marmbrus @rxin any ideas?

@marmbrus
Copy link
Contributor

test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33115 has started for PR 6243 at commit c1658cb.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33115 has finished for PR 6243 at commit c1658cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33115/
Test PASSed.

@marmbrus
Copy link
Contributor

Are we sure that it was changing the package that broke this? Is it possible that the maven shading plugin is not playing nicely with the scala compiler?

Either way, I'm okay with this change if it solves the problem.

@vanzin
Copy link
Contributor Author

vanzin commented May 20, 2015

Are we sure that it was changing the package that broke this?

Not changing of the package, but changing of the module where the file resides. That's what causes shading to get between SQLContext tests and their use of JavaTypeInference.

@marmbrus
Copy link
Contributor

Ah, thanks for explaining. I'm still okay with this if we don't have any other ideas about how to fix it.

@vanzin
Copy link
Contributor Author

vanzin commented May 21, 2015

So, can we push this? Or does anyone have a better idea of how to fix it?

@srowen
Copy link
Member

srowen commented May 25, 2015

I'm going to commit this soon if there are no further comments.

@rxin
Copy link
Contributor

rxin commented May 25, 2015

I'm still confused. Is our Jenkins maven build broken? If not, what builds are broken?

Is this related to the "catalyst" module, or the package name? If it is about package name, I think it is better to put this in core's catalyst module.

@vanzin
Copy link
Contributor Author

vanzin commented May 25, 2015

Any build that does "mvn install" followed by "mvn test" is broken. Try for yourself.

As I explained above, the problem is the file living in a different module - as in a different project in the tree. It has nothing to do with the package name. See how the file is being moved - I just placed it where it lived before, which was under a different package.

@rxin
Copy link
Contributor

rxin commented May 25, 2015

Thanks - does this mean shading for "catalyst" module is broken?

@rxin
Copy link
Contributor

rxin commented May 26, 2015

BTW if we want to merge this, let's create a catalyst package in core for now and move that there.

@rxin
Copy link
Contributor

rxin commented May 26, 2015

(And don't we need to update Jenkins to make sure it is running the appropriate tests?)

@vanzin
Copy link
Contributor Author

vanzin commented May 26, 2015

Thanks - does this mean shading for "catalyst" module is broken?

No, it just means that you cannot expose inter-module APIs that expose shaded classes without breaking things. As for the package, it sounds weird to create a catalyst package inside sql/core, since there's none at the moment. Which is why I chose to just do a plain revert of that part of the original patch.

Regarding the maven build, I don't know. Everything I tried locally did not work, so if something works in jenkins here, it sure to me sounds like something might be broken in its setup.

@rxin
Copy link
Contributor

rxin commented May 27, 2015

Thanks for the explanation. Then this pull request should fix the problem right? #6431

@vanzin
Copy link
Contributor Author

vanzin commented May 27, 2015

Yes, that should achieve the same thing. I'll close this one then.

@vanzin vanzin closed this May 27, 2015
asfgit pushed a commit that referenced this pull request May 27, 2015
This should also close #6243.

Author: Reynold Xin <[email protected]>

Closes #6431 from rxin/JavaTypeInference-guava and squashes the following commits:

e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
asfgit pushed a commit that referenced this pull request May 27, 2015
This should also close #6243.

Author: Reynold Xin <[email protected]>

Closes #6431 from rxin/JavaTypeInference-guava and squashes the following commits:

e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.

(cherry picked from commit 6fec1a9)
Signed-off-by: Reynold Xin <[email protected]>
@vanzin vanzin deleted the sql-fix branch June 2, 2015 18:24
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This should also close apache#6243.

Author: Reynold Xin <[email protected]>

Closes apache#6431 from rxin/JavaTypeInference-guava and squashes the following commits:

e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This should also close apache#6243.

Author: Reynold Xin <[email protected]>

Closes apache#6431 from rxin/JavaTypeInference-guava and squashes the following commits:

e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants