Add MLJ compliant docstrings #130

josephsdavid · 2022-10-26T19:22:41Z

In service of #913, as documented here !

yaxxie · 2022-10-27T09:54:24Z

Hi @josephsdavid
Thanks for the contribution!

A couple of remarks;

Please avoid simply reformatting code. This makes diffs harder to read and muddies the purpose of the contribution
Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.
Please also don't forget to add yourself to the contributors list CONTRIBUTORS.md 🙂

yaxxie · 2022-10-27T10:00:14Z

src/MLJInterface.jl

-    weights=true,
-    descr="Microsoft LightGBM FFI wrapper: Classifier",
+    weights=true
+    # descr="Microsoft LightGBM FFI wrapper: Classifier",


Specifically, how come you're commenting these ones out? And if there's a good reason for it, I'd expect it to be deleted rather than commented out.

Oh whoops! I meant to delete them! There is a good reason, the existence of a docstring after the model metadata is created overwrites the descr field i believe, making it no longer needed (paging @ablaom to confirm, there is a reason but i may have mixed it up :) )

The MLJ model trait docstring (alias descr) used to be for a short summary string, which was not that useful, in retrospect. Now it is not to be overloaded but instead falls back to the full docstring (the one @josephsdavid has worked on here).

So, yes, these should be deleted.

yaxxie · 2022-10-27T10:01:48Z

src/MLJInterface.jl

 )

+"""


I'd like to ask that you revert most of the changes above these lines back (except the descr ones, after explanations)

Oh oops, did not realize i made changes above these lines 😅 Will change back so we have proper git blames :)

I can't tell if this has been addressed. @yaxxie / @josephsdavid Is this done?

src/MLJInterface.jl

josephsdavid · 2022-10-27T15:44:34Z

Please fill in the description to the PR. For example, a link to the documentation about "MLJ docstrings" would be useful to the reader.

hah i was so excited to have all the parameters documented i missed the other pieces of work 😓

ablaom · 2022-10-27T21:28:53Z

src/MLJInterface.jl

 )

+"""
+`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a framework for gradient boosting


@josephsdavid I don't see a proper header for the docstring, or the code that auto-generates one? Is this because we are adding to an existing docstring? Or maybe some other reason?

src/MLJInterface.jl

ablaom · 2022-10-27T21:56:50Z

src/MLJInterface.jl

+- `feature_fraction::Float64 = 1.0`: The fraction of features to select before fitting a tree. Can be used to speed up training and reduce over-fitting.
+- `feature_fraction_bynode::Float64 = 1.0`: The fraction of features to select for each tree node. Can be used to reduce over-fitting.
+- `feature_fraction_seed::Int = 2`: Random seed to use for the gesture fraction
+- `bagging_fraction::Float64 = 1.0`: The fraction of samples to use before fitting fitting a tree. Can be used to speed up training and reduce over-fitting.


Can we please keep the length of lines below 92 (Blue style recommendation). I can't see easily see the end of the lines to review.

src/MLJInterface.jl

ablaom · 2022-10-27T22:18:38Z

src/MLJInterface.jl

+
+predict(lgbm, train)
+```
+"""


Couple of suggestions about the example:

I think PrettyPrinting and Statistics are both redundant. The function pretty is exported by MLJ. Actually, I don't think we need call pretty here anyway.

Earlier in the docstring we use X and y but here it's features and targets which is confusing. For consistency with other MLJ docs, I suggest sticking to X and y.

I don't think the @show lines add much.

This is resolved.

ablaom

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

josephsdavid · 2022-10-28T15:11:11Z

@josephsdavid Thanks for this mammoth effort. 🦣

I don't see sections "Fitted parameters" or "Report", which are required.

Given the fact that all the models have a lot of hyper-parameters in common, I wonder if you would consider, for easier maintenance, interpolating a string constant for the common ones?

I've looked over the first docstring for now. Please ping me when you've addressed my comments and I'll review the others too.

Will do! going to go over more closely over the weekend :)

Co-authored-by: Anthony Blaom, PhD <[email protected]>

ablaom · 2022-12-01T01:28:28Z

src/MLJInterface.jl

@@ -402,4 +402,363 @@ MLJModelInterface.metadata_model(
    descr="Microsoft LightGBM FFI wrapper: Regressor",


These descr declarations should be deleted.

To make the automatically generated header more readable, add

human_name="LightGBM regressor"

Make similar changes to the other models.

ablaom · 2022-12-01T01:32:28Z

src/MLJInterface.jl

+X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =
+partition(collect(eachindex(y)), 0.70, shuffle=true)


Missing carriage return:

Suggested change

X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =

partition(collect(eachindex(y)), 0.70, shuffle=true)

X, y = @load_boston # a table and a vector

X = DataFrame(X) train, test = partition(collect(eachindex(y)), 0.70, shuffle=true)

ablaom · 2022-12-01T01:32:46Z

src/MLJInterface.jl

+X, y = @load_boston # a table and a vector X = DataFrame(X) train, test =
+partition(collect(eachindex(y)), 0.70, shuffle=true)
+
+first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default


Missing carriage return:

Suggested change

first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default

first(X, 3)

lgb = LGBMRegressor() #initialised a model with default

ablaom · 2022-12-01T01:33:58Z

src/MLJInterface.jl

+partition(collect(eachindex(y)), 0.70, shuffle=true)
+
+first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
+params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!


Is this params redundant?

@josephsdavid Be great if you can test examples before committing .

ablaom

Thanks @josephsdavid for the progress! We're getting there.

Particular attention is still needed in the examples. If you could please check they run, that will save me some review time.

ablaom · 2022-12-01T01:36:02Z

src/MLJInterface.jl

+
+predict(lgbm, train)
+```
+"""


This is resolved.

ablaom · 2022-12-01T01:51:31Z

src/MLJInterface.jl

+"""
+$(MLJModelInterface.doc_header(LGBMRegressor))
+
+`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a


I'd say this initial "LightGBMRegressor: " is redundant. The automated header already makes it clear what we are talking about. (And I have been silently removing these before merging your PR's in other package). Also, it's confusing in this instance because the model struct is actually called LGBMRegressor, not LightGBMRegressor.

Suggested change

`LightGBMRegressor`: LightGBM, short for light gradient-boosting machine, is a

LightGBM, short for light gradient-boosting machine, is a

Please address in all the models.

ablaom · 2022-12-01T02:48:46Z

src/MLJInterface.jl

+
+```julia
+
+using DataFrames: DataFrame using MLJ


Add carriage return:

Suggested change

using DataFrames: DataFrame using MLJ

using DataFrames: DataFrame

using MLJ

ablaom · 2022-12-01T02:59:52Z

src/MLJInterface.jl

+partition(collect(eachindex(y)), 0.70, shuffle=true)
+
+first(X, 3) |> pretty lgb = LGBMRegressor() #initialised a model with default
+params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!


y is a vector.

Suggested change

params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!

params lgbm = machine(lgb, X[train, :], y[train]) |> MLJ.fit!

ablaom · 2022-12-01T03:03:14Z

src/MLJInterface.jl

+params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!
+
+predict(lgbm, X[test, :]) ```
+


Can we please add the example a couple lines showing how to access the feature importances. This is a pretty popular feature of tree boosters.

Ditto the other models.

ablaom · 2022-12-01T03:04:31Z

src/MLJInterface.jl

+framework for gradient boosting based on decision tree algorithms and used for
+ranking, classification and other machine learning tasks, with a focus on
+performance and scalability. This model in particular is used for various types of
+regression


The last sentence mentions "regression". That doesn't sound right for this classifier.

ablaom · 2022-12-01T03:07:10Z

src/MLJInterface.jl

+# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
+machine(model, X, y) Here:


Missing carriage returns:

Suggested change

# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =

machine(model, X, y) Here:

# Training data

In MLJ or MLJBase, bind an instance `model` to data with

mach = machine(model, X, y)

Here:

ablaom · 2022-12-01T03:08:38Z

src/MLJInterface.jl

+# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =
+machine(model, X, y) Here:


Suggested change

# Training data In MLJ or MLJBase, bind an instance `model` to data with mach =

machine(model, X, y) Here:

# Training data

In MLJ or MLJBase, bind an instance `model` to data with

mach = machine(model, X, y)

Here:

ablaom · 2022-12-01T03:10:00Z

src/MLJInterface.jl

+- `predict(mach, Xnew)`: return predictions of the target given new features
+  `Xnew` having the same Scitype as `X` above.


Suggested change

- `predict(mach, Xnew)`: return predictions of the target given new features

`Xnew` having the same Scitype as `X` above.

- `predict(mach, Xnew)`: return predictions of the target given new features

`Xnew`, which should have the same scitype as `X` above.

ablaom · 2022-12-01T03:10:37Z

src/MLJInterface.jl

+params lgbm = machine(lgb, X[train, :], y[train, 1]) |> MLJ.fit!
+
+predict(lgbm, train) ```
+


Similar comments apply here as in the regressor example.

Update MLJInterface.jl

9849f62

yaxxie changed the title ~~Add MLJ compliant docstrings!~~ Add MLJ compliant docstrings Oct 27, 2022

yaxxie reviewed Oct 27, 2022

View reviewed changes

src/MLJInterface.jl Show resolved Hide resolved