Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLJModelInterface.fit does not accept tables? #169

Closed
olivierlabayle opened this issue Oct 5, 2022 · 6 comments
Closed

MLJModelInterface.fit does not accept tables? #169

olivierlabayle opened this issue Oct 5, 2022 · 6 comments

Comments

@olivierlabayle
Copy link

Hello,

Thank you for the work here!

Apologies if this is not the right place for the following question. As I understand it, it seems the MLJModelInterface.fit method for EvoTypes does not allow for general tables (The machine interface works well because it calls the reformat function beforehand):

using EvoTrees
using MLJBase

n = 100
X = MLJBase.table(rand(n, 3))
y = rand(n)

evo = EvoTreeRegressor()
MLJBase.fit(evo, 1, X, y)

From the MLJ doc I thought that should be the case or am I understanding it wrong?

@jeremiedb
Copy link
Member

Effectively, fit expects that data provided has went through the reformat step. However, MLJBase.fit! works fine on tabular data, and you can actually start training using that function as well, so it may be all that is needed.
I'd like to have a dedicated tabular handling within EvoTrees, notably to manage categorical data, but I'm missing time for that!

@ablaom
Copy link
Contributor

ablaom commented Oct 5, 2022

If reformat is implemented, then fit is not required to accept tables. Rather it accepts the form of data output by reformat. I thought the docs were clear on this point but am happy for a PR to clarify.

The "data front end" apparatus allows machines to avoid reconverting data from user-form (eg, table) into model-specific form (eg, matrix) in certain cases: in particular, when retraining using the same view of the data (rows) but new hyper-parameters, such as an iteration parameter. In this way, for example, external control of iterative models (using IteratedModel wrapper) for example) is possible, without data conversions happening every iteration.

Also, when choosing a different view of the same data (new rows) but same hyper-parameters, conversions are avoided. So, for example, in cross-validation. The model overloads selectrows for his model-specific format.

@olivierlabayle
Copy link
Author

I understand, thank you both for the clarification!

Maybe the sentence that would benefit from clarification is the following: "If the core algorithm being wrapped requires data in a different or more specific form, then fit will need to coerce the table into the form desired (and the same coercions applied to X will have to be repeated for Xnew in predict)."

It is indeed later said that the data front-end is an alternative option but it wasn't obvious that the MLJModelInterface.fit would then not be required to respect the "table input contract".

@ablaom
Copy link
Contributor

ablaom commented Oct 5, 2022

How about, following the cited sentence, we add the new sentence:

"An exception to this requirement occurs when a data front-end is implemented; see Implementing a data front-end below."

@olivierlabayle
Copy link
Author

That would be great thank you!

@jeremiedb
Copy link
Member

I'm assuming this can be closed. Feel fre to reopen otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants