-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct conversion of Spark model stages into MLeap local models #261
Conversation
Codecov Report
@@ Coverage Diff @@
## master #261 +/- ##
==========================================
- Coverage 86.67% 86.36% -0.31%
==========================================
Files 317 318 +1
Lines 10403 10447 +44
Branches 322 552 +230
==========================================
+ Hits 9017 9023 +6
- Misses 1386 1424 +38
Continue to review full report at Codecov.
|
So now we have to spool up spark to use this sparkless scoring?? Could you get the necessary info another way? eg. serialize the dataframe schema along with the model? |
case m: VectorSlicerModel => x => m.apply(x(0).asInstanceOf[Vector]) | ||
case m: WordLengthFilterModel => x => m.apply(x(0).asInstanceOf[Seq[String]]) | ||
case m: WordToVectorModel => x => m.apply(x(0).asInstanceOf[Seq[String]]) | ||
case m => throw new RuntimeException(s"Unsupported MLeap model: ${m.getClass.getName}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so every wrapped spark stage has to be in this list? we should add that the the docs on wrapping...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently added all the stages from features package. We can also add models from classification, regression and recommendation packages, but we already have the first two of them covered as our own OpTransformer
stages, so I did not see much of a point adding them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so to your question - for right now I think we have everything covered, except recommenders, which I am planning to add once we are ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a todo with the classification and regression models? I dont know that this will be much use without them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding those is very easy. the thing is we already have classification and regression models as OpTransformers so MLeap won’t be used to run them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point :-)
@leahmcguire we used to do before when loading spark stages. Now I just explicitly exposed an ability to users to control spark session lifecycle. The only way around avoiding spark session is to export our models into MLeap format. Which is indeed a possibility and I am open to discuss it. As of right now, local scoring assumes the model format as we have it now (i.e. json + parquet files). |
Related issues
StringIndexerModel
into MLeap model it was expectingml_attr
metadata to be present in transformed Dataframe.apply
method on MLeap models did not work correctly, since many models had more than oneapply
method present.Describe the proposed solution
apply
method on MLeap models and explicitly convert MLeap models into scoring methods.Describe alternatives you've considered
N/A