Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect rawPrediction and probability from scoring test data? #788

Open
devilwing0723 opened this issue Jan 29, 2020 · 6 comments
Open

Comments

@devilwing0723
Copy link

Hi, I was training a binary model using mmlspark lightgbm, and found very weird rawPrediction and probability after scoring the test data. I ran the codes as follows:

"
from mmlspark.train import ComputeModelStatistics, TrainedClassifierModel
predictionModel = LightGBMClassificationModel.loadNativeModelFromFile("s3a://cof-risk-ccrm-mad/users/dhq076/rt_v30_rebuld_ml/ums_ndq_201607_ds")
prediction = predictionModel.transform(test)
prediction.limit(10).toPandas()
"

The resulting raw prediction takes the form such as "[0.8350025657401177, -0.8350025657401177]", and the probability looks like "[1.8350025657401177, -0.8350025657401177]". Although the resulting prediction takes form of 1 or 0, the value of raw prediction and probability just look weird. Is this a bug or is what we should expect to get? If it's not a bug, then how do we interpret the raw prediction and probability?

@imatiach-msft
Copy link
Contributor

@devilwing0723 what version of mmlspark are you using? I recall this issue was fixed recently. There were actually several related issues like this.
#676
#578
and one related PR to lightgbm:
microsoft/LightGBM#2356

@devilwing0723
Copy link
Author

@imatiach-msft I used the mmlspark_2.11 JAR 0.18.1 downloaded from https://jar-download.com/artifacts/com.microsoft.ml.spark/mmlspark_2.11/0.18.1/source-code. I saw in a document of the repo that the package needs to be loaded from maven, I first downloaded the same package from maven website, but that didn't work for me. I then figured out that the package in the above mentioned website did work. It appeared to be the most updated version.

@imatiach-msft
Copy link
Contributor

it looks like 0.18.1 does not have the fix:
https://mvnrepository.com/artifact/com.microsoft.ml.spark/mmlspark_2.11/0.18.1
it uses lightgbm 2.2.350 but fix was in 2.2.400. Using the RC 1.0 version or latest snapshot should have the fix. Not sure when the next release will be out, adding @mhamilton723

@devilwing0723
Copy link
Author

@imatiach-msft Thanks for the clarification. Can you refer me to the most recent version of mmlspark? And another thing I want to clarify is, what is the rawPrediction from scoring the data when I train the model with binary objective? It doesn't look like anything I am familiar with. In the local version of lightgbm, we can determine if we want to score the test data as the logit for binary target. Is this doable in mmlspark lightgbm?

@leihuang
Copy link

leihuang commented Oct 12, 2021

I am using com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3 (on Databricks) and am still having the problem. Is this expected? Thank you!

drawing

@leihuang
Copy link

After installing the latest version of mmlspark (coordinate "com.microsoft.ml.spark:mmlspark:1.0.0-rc4"; and using Spark 3.1.1 & Scala 2.12), things work fine now! Thanks @imatiach-msft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants