Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-6433 hive tests to import spark-sql test JAR for QueryTest access #5119

Conversation

steveloughran
Copy link
Contributor

  1. Test JARs are built & published
  2. log4j.resources is explicitly excluded. Without this, downstream test run logging depends on the order the JARs are listed/loaded
  3. sql/hive pulls in spark-sql &...spark-catalyst for its test runs
  4. The copied in test classes were rm'd, and a test edited to remove its now duplicate assert method
  5. Spark streaming is now build with the same plugin/phase as the rest, but its shade plugin declaration is kept in (so different from the rest of the test plugins). Due to (Removed reference to incubation in Spark user docs. #2), this means the test JAR no longer includes its log4j file.

Outstanding issues:

  • should the JARs be shaded? spark-streaming-test.jar does, but given these are test jars for developers only, especially in the same spark source tree, it's hard to justify.
  • maven-jar-plugin v 2.6 was explicitly selected; without this the apache-1.4 parent template JAR version (2.4) chosen.
  • Are there any other resources to exclude?

@steveloughran steveloughran changed the title SPARK-6433 patch 001: test JARs are built; sql/hive pulls in spark-sql &... SPARK-6433 hive tests to import spark-sql test JAR for QueryTest access Mar 21, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -158,6 +158,7 @@
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
<snappy.version>1.1.1.6</snappy.version>
<netlib.java.version>1.1.2</netlib.java.version>
<maven-jar-plugin.version>2.6</maven-jar-plugin.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I don't think this needs to be factored out into a property as I don't think it would be overridden. It just occurs one place too. The plugin versions are just written once in the parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifted it from the hadoop code. the parent one is @ v 2.4, so it comes down to whether you are happy with what that parent gives you or not. Easy to alter.

@steveloughran
Copy link
Contributor Author

-Updated patch with the indentation corrected; plugin version entrusted to the apache parent template

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still wasn't sure what this config was for, what does it do?

@marmbrus
Copy link
Contributor

+1 from the SQL side. I only duplicated that code because I was afraid of the maven changes to avoid doing as such. I'll defer to Sean's maven expertise on when this is ready to be merged.

@steveloughran
Copy link
Contributor Author

As an experiment, I changed the plugin declaration to exclude the main JAR phase, that is, commented out this bit:

          <execution>
            <id>prepare-jar</id>
            <phase>prepare-package</phase>
            <goals>
              <goal>jar</goal>
            </goals>
          </execution>

Running {{mvn help:effective-pom}} showed that the entire {{}} section had been overridden, with the result that the the main build didn't work: jars weren't being published.

       <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>2.4</version>
          <executions>
            <execution>
              <id>prepare-test-jar</id>
              <phase>prepare-package</phase>
              <goals>
                <goal>test-jar</goal>
              </goals>
              <configuration>
                <excludes>
                  <exclude>log4j.properties</exclude>
                </excludes>
                <archive>
                  <manifestEntries>
                    <mode>development</mode>
                    <url>http://spark.apache.org/</url>
                  </manifestEntries>
                  <manifest>
                    <addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
                    <addDefaultImplementationEntries>true</addDefaultImplementationEntries>
                  </manifest>
                </archive>
              </configuration>
            </execution>
          </executions>

To summarise: yes, you do need to re-declare parent executions when adding them. This is not just superstition.

@srowen
Copy link
Member

srowen commented Mar 26, 2015

Hm, so I tried changing the config to:

        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-jar-plugin</artifactId>
          <version>2.4</version>
          <executions>
            <execution>
              <id>prepare-test-jar</id>
              <phase>prepare-package</phase>
              <goals>
                <goal>test-jar</goal>
              </goals>
              <configuration>
                <excludes>
                  <exclude>log4j.properties</exclude>
                </excludes>
              </configuration>
            </execution>
          </executions>
        </plugin>

The effective POM doesn't show a default jar, goal, yeah. But I saw -tests.jar and .jar artifacts in all the modules. Are you sure the main JARs weren't created or just that it didn't show in the effective POM? I have another project that declares basically this above and it does definitely make both JARs and test JARs.

@steveloughran
Copy link
Contributor Author

I did a clean build and it didn't work, at least not with the command.
{code}
mvn clean install -DskipTests -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Dhbase.profile=hadoop2 -Phive
{code}
Put the jar execution in and and it worked again. Mvn 3.0.5, BTW.

I can update this pull with a version without the target & see what Jenkins says.

@srowen
Copy link
Member

srowen commented Mar 27, 2015

Hm, could be a Maven version thing; I'm on 3.2. Are you looking at target or what is actually installed in .m2? I was looking at target. Heh, yeah maybe see what Jenkins has to say. Otherwise, no point belaboring it more and we can just leave in the declaration. (Though I still think the archive bit isn't essential?)

@steveloughran
Copy link
Contributor Author

I commented it out again and did work, so I am now concluding I trust maven even less than before. Pushed a new commit with the JAR execution omitted.

@srowen
Copy link
Member

srowen commented Mar 28, 2015

Jenkins, add to whitelist

@srowen
Copy link
Member

srowen commented Mar 28, 2015

ok to test

@srowen
Copy link
Member

srowen commented Mar 28, 2015

Looking good except I still don't think the <archive> stanza is needed. The manifest isn't used in Spark and since it gets mashed down in the assembly JAR I tend to not want to put things in it that may not end up being the same in the final assembly.

@SparkQA
Copy link

SparkQA commented Mar 28, 2015

Test build #29341 has started for PR 5119 at commit c2b5f89.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 28, 2015

Test build #29341 has finished for PR 5119 at commit c2b5f89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29341/
Test PASSed.

</goals>
<configuration>
<excludes>
<exclude>log4j.properties</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being thorough here at the end -- so the idea here is that the tests shouldn't 'publish' their logging config right? I think that will only affect the log4j.properties in the test dirs, yeah. As long as it doesn't remove the one actually used, in the core source tree, in the final assembly.

@steveloughran
Copy link
Contributor Author

It leaves out the log4j.properties of every test JAR, to stop it contaminating downstream tests. you don't want to be trying to debug exactly which log4j file is coming in on the CP first. Doesn't touch the production jars.

@steveloughran
Copy link
Contributor Author

To clarify something: do you expect there to be a {{log4j.properties}} file in the slider assembly JAR? because there isn't one, not in trunk@ 0e2753f :

 $ mvn package -DskipTests  -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0  -Dhbase.profile=hadoop2 -Phive
 $ pushd assembly/target/scala-2.10/
 $ unzip spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar
 $ find . -name log4j.properties
 $

Looking in the source tree, there are only such files in the test resources, other than one in kinesis-asl

$ find . -name log4j.properties
./bagel/src/test/resources/log4j.properties
./bagel/target/scala-2.10/test-classes/log4j.properties
./core/src/test/resources/log4j.properties
./core/target/scala-2.10/test-classes/log4j.properties
./external/flume/src/test/resources/log4j.properties
./external/flume/target/scala-2.10/test-classes/log4j.properties
./external/flume-sink/src/test/resources/log4j.properties
./external/flume-sink/target/scala-2.10/test-classes/log4j.properties
./external/kafka/src/test/resources/log4j.properties
./external/kafka/target/scala-2.10/test-classes/log4j.properties
./external/mqtt/src/test/resources/log4j.properties
./external/mqtt/target/scala-2.10/test-classes/log4j.properties
./external/twitter/src/test/resources/log4j.properties
./external/twitter/target/scala-2.10/test-classes/log4j.properties
./external/zeromq/src/test/resources/log4j.properties
./external/zeromq/target/scala-2.10/test-classes/log4j.properties
./extras/java8-tests/src/test/resources/log4j.properties
./extras/kinesis-asl/src/main/resources/log4j.properties
./extras/kinesis-asl/src/test/resources/log4j.properties
./graphx/src/test/resources/log4j.properties
./graphx/target/scala-2.10/test-classes/log4j.properties
./launcher/src/test/resources/log4j.properties
./launcher/target/scala-2.10/test-classes/log4j.properties
./mllib/src/test/resources/log4j.properties
./mllib/target/scala-2.10/test-classes/log4j.properties
./repl/src/test/resources/log4j.properties
./repl/target/scala-2.10/test-classes/log4j.properties
./sql/catalyst/src/test/resources/log4j.properties
./sql/catalyst/target/scala-2.10/test-classes/log4j.properties
./sql/core/src/test/resources/log4j.properties
./sql/core/target/scala-2.10/test-classes/log4j.properties
./sql/hive/src/test/resources/log4j.properties
./sql/hive/target/scala-2.10/test-classes/log4j.properties
./streaming/src/test/resources/log4j.properties
./streaming/target/scala-2.10/test-classes/log4j.properties
./yarn/src/test/resources/log4j.properties
./yarn/target/scala-2.10/test-classes/log4j.properties

This pull request will make sure that none of the test log4j files will make it onto the classpath of any downstream tests importing the test JARs.

I manually tested this by

  1. Switching to this pull-request branch
  2. creating a log4j file core/src/main/resources/log4j.properties file
      echo "#inserted into core" > core/src/main/resources/log4.properties
  1. building a clean package
  2. verifying the file was in the slider-core JAR
  3. verifying it made it through to the assembly JAR

@srowen
Copy link
Member

srowen commented Mar 29, 2015

Yes that all looks correct with regard to log4j.properties. The remaining issue I think is the <archive> stanza, which I think does not need to be added.

@srowen
Copy link
Member

srowen commented Mar 31, 2015

I'm ready to merge this but still haven't heard anything on the <archive> element. Unless there's a need for it, it should be removed.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29488 has started for PR 5119 at commit a6dca33.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29488/
Test FAILed.

@srowen
Copy link
Member

srowen commented Mar 31, 2015

Jenkins, retest this please. (Timeout, but doesn't look like any stuck test; it was almost done with Python tests when it stopped.)

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29498 has started for PR 5119 at commit a6dca33.

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29498 has finished for PR 5119 at commit a6dca33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29498/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 31, 2015

I like this. Let me give @pwendell a day or so to comment since it touches the build but I think this is a good change.

@@ -1472,6 +1473,25 @@
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
</plugin>
<!-- Build the JARs-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this say something more useful? Maybe like:

<!-- Build test-jar's for all projects, since some projects depend on tests from others -->

Otherwise it might be hard for someone to understand what is going on here.

@pwendell
Copy link
Contributor

pwendell commented Apr 1, 2015

The overall approach LGTM, but I would suggest adding a better comment since it's non-obvious what is going on.

@steveloughran
Copy link
Contributor Author

OK

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29538 has started for PR 5119 at commit 81ceb01.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29538/
Test FAILed.

@srowen
Copy link
Member

srowen commented Apr 1, 2015

Jenkins, retest this please. (Timeout again.)

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29544 has started for PR 5119 at commit 81ceb01.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29544 has finished for PR 5119 at commit 81ceb01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29544/
Test PASSed.

@asfgit asfgit closed this in ee11be2 Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants