What's nice about logistic regression is that the scoring can be done in a simple SQL query (it's just a weighted average transformed with a sigmoid). But how to persuade Rapidminer to generate the SQL?
Assuming that your model is generated with Generalized Linear Model operator, pass the trained model to Execute Script operator. Copy paste the content of glm2sql.java
into Execute Script. It will generate the core of SQL that may look like:
select "ID"
, 1/(1 + exp(-(-7.2194 + "ADRESS_COUNT" + "ADRESS_TYPE_LAST = temporal"))) as "PREDICTED_PROBABILITY"
from (
select "ID"
, 0.0688 * "ADRESS_COUNT" as "ADRESS_COUNT" -- Example of a numeric feature
, case when "ADRESS_TYPE_LAST" = 'temporal' then 0.7613 else 0 end as "ADRESS_TYPE_LAST = temporal" -- Example of a nominal feature
from "MAINSAMPLE"
) t1
Supported features
- Linear and logistic regression (with logit linkage)
- Numerical and nominal features
If you want to extract the coefficients and store them into the database (e.g.: for a dashboard), use glm2table.java
, which returns ExampleSet.
Pull requests are welcomed.