Incorrect rawPrediction and probability from scoring test data? #788

devilwing0723 · 2020-01-29T03:30:51Z

Hi, I was training a binary model using mmlspark lightgbm, and found very weird rawPrediction and probability after scoring the test data. I ran the codes as follows:

"
from mmlspark.train import ComputeModelStatistics, TrainedClassifierModel
predictionModel = LightGBMClassificationModel.loadNativeModelFromFile("s3a://cof-risk-ccrm-mad/users/dhq076/rt_v30_rebuld_ml/ums_ndq_201607_ds")
prediction = predictionModel.transform(test)
prediction.limit(10).toPandas()
"

The resulting raw prediction takes the form such as "[0.8350025657401177, -0.8350025657401177]", and the probability looks like "[1.8350025657401177, -0.8350025657401177]". Although the resulting prediction takes form of 1 or 0, the value of raw prediction and probability just look weird. Is this a bug or is what we should expect to get? If it's not a bug, then how do we interpret the raw prediction and probability?

imatiach-msft · 2020-01-29T20:32:26Z

@devilwing0723 what version of mmlspark are you using? I recall this issue was fixed recently. There were actually several related issues like this.
#676
#578
and one related PR to lightgbm:
microsoft/LightGBM#2356

devilwing0723 · 2020-01-29T21:25:27Z

@imatiach-msft I used the mmlspark_2.11 JAR 0.18.1 downloaded from https://jar-download.com/artifacts/com.microsoft.ml.spark/mmlspark_2.11/0.18.1/source-code. I saw in a document of the repo that the package needs to be loaded from maven, I first downloaded the same package from maven website, but that didn't work for me. I then figured out that the package in the above mentioned website did work. It appeared to be the most updated version.

imatiach-msft · 2020-01-29T21:38:08Z

it looks like 0.18.1 does not have the fix:
https://mvnrepository.com/artifact/com.microsoft.ml.spark/mmlspark_2.11/0.18.1
it uses lightgbm 2.2.350 but fix was in 2.2.400. Using the RC 1.0 version or latest snapshot should have the fix. Not sure when the next release will be out, adding @mhamilton723

devilwing0723 · 2020-01-29T22:23:36Z

@imatiach-msft Thanks for the clarification. Can you refer me to the most recent version of mmlspark? And another thing I want to clarify is, what is the rawPrediction from scoring the data when I train the model with binary objective? It doesn't look like anything I am familiar with. In the local version of lightgbm, we can determine if we want to score the test data as the logit for binary target. Is this doable in mmlspark lightgbm?

leihuang · 2021-10-12T20:52:00Z

I am using com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3 (on Databricks) and am still having the problem. Is this expected? Thank you!

leihuang · 2021-10-13T21:09:31Z

After installing the latest version of mmlspark (coordinate "com.microsoft.ml.spark:mmlspark:1.0.0-rc4"; and using Spark 3.1.1 & Scala 2.12), things work fine now! Thanks @imatiach-msft!

imatiach-msft self-assigned this Jan 29, 2020

imatiach-msft added fixed in latest area/lightgbm question labels Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect rawPrediction and probability from scoring test data? #788

Incorrect rawPrediction and probability from scoring test data? #788

devilwing0723 commented Jan 29, 2020

imatiach-msft commented Jan 29, 2020

devilwing0723 commented Jan 29, 2020

imatiach-msft commented Jan 29, 2020

devilwing0723 commented Jan 29, 2020

leihuang commented Oct 12, 2021 •

edited

Loading

leihuang commented Oct 13, 2021

Incorrect rawPrediction and probability from scoring test data? #788

Incorrect rawPrediction and probability from scoring test data? #788

Comments

devilwing0723 commented Jan 29, 2020

imatiach-msft commented Jan 29, 2020

devilwing0723 commented Jan 29, 2020

imatiach-msft commented Jan 29, 2020

devilwing0723 commented Jan 29, 2020

leihuang commented Oct 12, 2021 • edited Loading

leihuang commented Oct 13, 2021

leihuang commented Oct 12, 2021 •

edited

Loading