Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21276][CORE] Update lz4-java to the latest (v1.4.0) #18883

Closed
wants to merge 2 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Aug 8, 2017

What changes were proposed in this pull request?

This pr updated lz4-java to the latest (v1.4.0) and removed custom LZ4BlockInputStream. We currently use custom LZ4BlockInputStream to read concatenated byte stream in shuffle. But, this functionality has been implemented in the latest lz4-java (lz4/lz4-java#105). So, we might update the latest to remove the custom LZ4BlockInputStream.

Major diffs between the latest release and v1.3.0 in the master are as follows (lz4/lz4-java@62f7547...6d4693f);

  • fixed NPE in XXHashFactory similarly
  • Don't place resources in default package to support shading
  • Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
  • Try to load lz4-java from java.library.path, then fallback to bundled
  • Add ppc64le binary
  • Add s390x JNI binding
  • Add basic LZ4 Frame v1.5.0 support
  • enable aarch64 support for lz4-java
  • Allow unsafeInstance() for ppc64le archiecture
  • Add unsafeInstance support for AArch64
  • Support 64-bit JNI build on Solaris
  • Avoid over-allocating a buffer
  • Allow EndMark to be incompressible for LZ4FrameInputStream.
  • Concat byte stream

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80394 has finished for PR 18883 at commit 531a216.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Aug 8, 2017

I'll update soon.

*
* TODO: merge this into upstream
*/
public final class LZ4BlockInputStream extends FilterInputStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess this needs a MiMa exclude. It's technically public but nobody should have ever referenced this directly. The codec class is separate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, thanks! I'll check and update soon.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80395 has finished for PR 18883 at commit 49135d7.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80406 has finished for PR 18883 at commit 49135d7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Aug 8, 2017

Test build #80412 has finished for PR 18883 at commit 49135d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Aug 8, 2017

@dongjoon-hyun Thanks for your test re-triggers!

@dongjoon-hyun
Copy link
Member

It's because I like your PR. :)

@kiszk
Copy link
Member

kiszk commented Aug 9, 2017

LGTM

@srowen
Copy link
Member

srowen commented Aug 9, 2017

Merged to master

@asfgit asfgit closed this in b78cf13 Aug 9, 2017
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this"),

// [SPARK-21276] Update lz4-java to the latest (v1.4.0)
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.io.LZ4BlockInputStream")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen This is a breaking change. We should not remove a public class that is in the api docs: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.io.LZ4BlockInputStream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "public" only insofar as it has to be in Java to use it this way. There's no case where a user should or would use this class directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the user may write some codes to run different logics according to the InputStream types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm not sure if we want to pursue strictly compatible. Just to point out the issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants