-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect rawPrediction and probability from scoring test data? #788
Comments
@devilwing0723 what version of mmlspark are you using? I recall this issue was fixed recently. There were actually several related issues like this. |
@imatiach-msft I used the mmlspark_2.11 JAR 0.18.1 downloaded from https://jar-download.com/artifacts/com.microsoft.ml.spark/mmlspark_2.11/0.18.1/source-code. I saw in a document of the repo that the package needs to be loaded from maven, I first downloaded the same package from maven website, but that didn't work for me. I then figured out that the package in the above mentioned website did work. It appeared to be the most updated version. |
it looks like 0.18.1 does not have the fix: |
@imatiach-msft Thanks for the clarification. Can you refer me to the most recent version of mmlspark? And another thing I want to clarify is, what is the rawPrediction from scoring the data when I train the model with binary objective? It doesn't look like anything I am familiar with. In the local version of lightgbm, we can determine if we want to score the test data as the logit for binary target. Is this doable in mmlspark lightgbm? |
After installing the latest version of |
Hi, I was training a binary model using mmlspark lightgbm, and found very weird rawPrediction and probability after scoring the test data. I ran the codes as follows:
"
from mmlspark.train import ComputeModelStatistics, TrainedClassifierModel
predictionModel = LightGBMClassificationModel.loadNativeModelFromFile("s3a://cof-risk-ccrm-mad/users/dhq076/rt_v30_rebuld_ml/ums_ndq_201607_ds")
prediction = predictionModel.transform(test)
prediction.limit(10).toPandas()
"
The resulting raw prediction takes the form such as "[0.8350025657401177, -0.8350025657401177]", and the probability looks like "[1.8350025657401177, -0.8350025657401177]". Although the resulting prediction takes form of 1 or 0, the value of raw prediction and probability just look weird. Is this a bug or is what we should expect to get? If it's not a bug, then how do we interpret the raw prediction and probability?
The text was updated successfully, but these errors were encountered: