Skip to content

Commit

Permalink
[SPARK-13646] [MLlib] QuantileDiscretizer counts dataset twice in get…
Browse files Browse the repository at this point in the history
…SampledInput.
  • Loading branch information
eliasah committed Mar 3, 2016
1 parent 7b25dc7 commit 0f319ed
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ object QuantileDiscretizer extends DefaultParamsReadable[QuantileDiscretizer] wi
require(totalSamples > 0,
"QuantileDiscretizer requires non-empty input dataset but was given an empty input.")
val requiredSamples = math.max(numBins * numBins, minSamplesRequired)
val fraction = math.min(requiredSamples.toDouble / dataset.count(), 1.0)
val fraction = math.min(requiredSamples.toDouble / totalSamples , 1.0)
dataset.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()).collect()
}

Expand Down

0 comments on commit 0f319ed

Please sign in to comment.