Skip to content

Commit

Permalink
[SPARK-3121] Wrong implementation of implicit bytesWritableConverter
Browse files Browse the repository at this point in the history
val path = ... //path to seq file with BytesWritable as type of both key and value
val file = sc.sequenceFile[Array[Byte],Array[Byte]](path)
file.take(1)(0)._1

This prints incorrect content of byte array. Actual content starts with correct one and some "random" bytes and zeros are appended. BytesWritable has two methods:

getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values

copyBytes() - return just begining of internal array determined by internal length property

It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes.

dbtsai

Author: Jakub Dubovský <[email protected]>
Author: Dubovsky Jakub <[email protected]>

Closes #2712 from james64/3121-bugfix and squashes the following commits:

f85d24c [Jakub Dubovský] Test name changed, comments added
1b20d51 [Jakub Dubovský] Import placed correctly
406e26c [Jakub Dubovský] Scala style fixed
f92ffa6 [Dubovsky Jakub] performance tuning
480f9cd [Dubovsky Jakub] Bug 3121 fixed

(cherry picked from commit fc616d5)
Signed-off-by: Josh Rosen <[email protected]>
  • Loading branch information
Jakub Dubovský authored and JoshRosen committed Oct 13, 2014
1 parent b539b0e commit dc18167
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 1 deletion.
6 changes: 5 additions & 1 deletion core/src/main/scala/org/apache/spark/SparkContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import scala.language.implicitConversions

import java.io._
import java.net.URI
import java.util.Arrays
import java.util.concurrent.atomic.AtomicInteger
import java.util.{Properties, UUID}
import java.util.UUID.randomUUID
Expand Down Expand Up @@ -1364,7 +1365,10 @@ object SparkContext extends Logging {
simpleWritableConverter[Boolean, BooleanWritable](_.get)

implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
simpleWritableConverter[Array[Byte], BytesWritable](_.getBytes)
simpleWritableConverter[Array[Byte], BytesWritable](bw =>
// getBytes method returns array which is longer then data to be returned
Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
)
}

implicit def stringWritableConverter(): WritableConverter[String] =
Expand Down
40 changes: 40 additions & 0 deletions core/src/test/scala/org/apache/spark/SparkContextSuite.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark

import org.scalatest.FunSuite

import org.apache.hadoop.io.BytesWritable

class SparkContextSuite extends FunSuite {
//Regression test for SPARK-3121
test("BytesWritable implicit conversion is correct") {
val bytesWritable = new BytesWritable()
val inputArray = (1 to 10).map(_.toByte).toArray
bytesWritable.set(inputArray, 0, 10)
bytesWritable.set(inputArray, 0, 5)

val converter = SparkContext.bytesWritableConverter()
val byteArray = converter.convert(bytesWritable)
assert(byteArray.length === 5)

bytesWritable.set(inputArray, 0, 0)
val byteArray2 = converter.convert(bytesWritable)
assert(byteArray2.length === 0)
}
}

0 comments on commit dc18167

Please sign in to comment.