Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing parameters in ML.NET Public API #2665

Merged
merged 4 commits into from
Feb 25, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Feb 20, 2019

Fixes #2257, #2660, #2177

Also related to #2680 #2613

Using ApiReview tool, went through the public surface area making the changes (as described in the issue)

@codecov
Copy link

codecov bot commented Feb 20, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@44c3113). Click here to learn what that means.
The diff coverage is 97.67%.

@@            Coverage Diff            @@
##             master    #2665   +/-   ##
=========================================
  Coverage          ?   71.65%           
=========================================
  Files             ?      804           
  Lines             ?   141872           
  Branches          ?    16111           
=========================================
  Hits              ?   101654           
  Misses            ?    35781           
  Partials          ?     4437
Flag Coverage Δ
#Debug 71.65% <97.67%> (?)
#production 67.95% <97.05%> (?)
#test 85.74% <100%> (?)
Impacted Files Coverage Δ
...actorizationMachine/FactorizationMachineCatalog.cs 50% <0%> (ø)
...oft.ML.StandardLearners/StandardLearnersCatalog.cs 88.57% <100%> (ø)
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 88.37% <100%> (ø)
src/Microsoft.ML.HalLearners/HalLearnersCatalog.cs 94.28% <100%> (ø)
test/Microsoft.ML.Tests/OnnxConversionTest.cs 97.16% <100%> (ø)
.../Microsoft.ML.Tests/TrainerEstimators/SdcaTests.cs 100% <100%> (ø)
src/Microsoft.ML.LightGBM/LightGbmCatalog.cs 87.5% <100%> (ø)
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 80.4% <100%> (ø)
src/Microsoft.ML.KMeansClustering/KMeansCatalog.cs 100% <100%> (ø)
src/Microsoft.ML.PCA/PCACatalog.cs 53.84% <100%> (ø)
... and 2 more

@abgoswam abgoswam changed the title Fixing weights parameter in ML.NET Public API [WIP] Fixing (weights , groupid) parameters in ML.NET Public API Feb 20, 2019
@@ -29,7 +29,7 @@ public static class KMeansClusteringExtensions
/// </example>
public static KMeansPlusPlusTrainer KMeans(this ClusteringCatalog.ClusteringTrainers catalog,
string featureColumn = DefaultColumnNames.Features,
Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

featureColumn [](start = 18, length = 13)

So we're calling this exampleWeightColumnName, but calling the above featureColumn. I'm not terribly insistent one way or the other, but we should at least be consistent I think.

In the original issue you proposed calling this * exampleWeightColumn, which seems to me to be consistent with the existing behavior here. Yet elsewhere I see that we are calling things labelColumn and labelColumnName here and there.

Is this on someone's radar? If we are not going to fix it (we should!) a second best alternative would be to be at least locally consistent. Not sure what @rogancarr and @eerhardt and @sfilipi think? #Resolved

Copy link
Member Author

@abgoswam abgoswam Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #2177 proposes we use suffix ColumnName consistently

In this PR I am making that change for weight, groupid parameters. In a follow up PR I will do it for label and features.

So once both PRs go in, we will mark #2177 as complete, and the following convention will be used

  • labelColumnName
  • featureColumnName
  • exampleWeightColumnName
  • rowGroupColumnName

This is certainly on my radar, as we drive towards consistency in the APIs . I have also updated the description of the issue so it has reference to issue #2177 . Hope that sounds good ? #Closed

Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, thank you @abgoswam ! Sorry missed that discussion. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomFinley. I have consolidated all the changes in this PR. So this PR addresses {features, labels, groupid, weight} column names in a consistent manner.


In reply to: 258779314 [](ancestors = 258779314)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomFinley. I have consolidated all the changes in this PR. So this PR addresses {features, labels, groupid, weight} column names in a consistent manner.


In reply to: 258776812 [](ancestors = 258776812)

{
Contracts.CheckValue(catalog, nameof(catalog));
var env = CatalogUtils.GetEnvironment(catalog);
var options = new OlsLinearRegressionTrainer.Options
{
LabelColumn = labelColumnName,
FeatureColumn = featureColumnName,
WeightColumn = weightColumn != null ? Optional<string>.Explicit(weightColumn) : Optional<string>.Implicit(DefaultColumnNames.Weight)
WeightColumn = exampleWeightColumnName != null ? Optional<string>.Explicit(exampleWeightColumnName) : Optional<string>.Implicit(DefaultColumnNames.Weight)
Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh. Am I wrong or is there literally no way to have non-weighted training if you happen to have a column named Weight????

It looks like there's an existing issue #1065 that is capturing this. Not your code, so don't have to fix, I just notice this way. #Resolved

Copy link
Contributor

@TomFinley TomFinley Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I see it's incidentally assigned to you @abgoswam . 😄 Maybe after this PR we can think about how to address this. #Resolved

Copy link
Member Author

@abgoswam abgoswam Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. We should fix this behavior as part of #1065 (in a separate PR). #Closed

/// <param name="rank">The number of principal components.</param>
/// <param name="overSampling">Oversampling parameter for randomized PrincipalComponentAnalysis training.</param>
/// <param name="center">If enabled, data is centered to be zero mean.</param>
/// <param name="seed">The seed for random number generation.</param>
public static PrincipalComponentAnalysisEstimator ProjectToPrincipalComponents(this TransformsCatalog.ProjectionTransforms catalog,
string outputColumnName,
string inputColumnName = null,
string weightColumn = PrincipalComponentAnalysisEstimator.Defaults.WeightColumn,
string exampleWeightColumnName = PrincipalComponentAnalysisEstimator.Defaults.WeightColumn,
Copy link
Member

@eerhardt eerhardt Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting this to null will be covered in #1065, right? #Resolved

Copy link
Member Author

@abgoswam abgoswam Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps #1065 would be the right place to set this to null as default value, since that issue related to changing the behavior of the weights parameters.

The issues addressed by this PR are all related to consistency in API naming convention. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1065 did not address this. hence taking care of this in this PR


In reply to: 259028009 [](ancestors = 259028009)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in this PR itself.


In reply to: 259039000 [](ancestors = 259039000)

@abgoswam abgoswam changed the title [WIP] Fixing (weights , groupid) parameters in ML.NET Public API Fixing parameters in ML.NET Public API Feb 23, 2019
/// <param name="weights">The optional weights column.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="rowGroupColumnName">The name of the group column.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary for this PR, but I wonder if we need some documentation around the row group column. I'm obviously not an expert, but I have no idea what this is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi...#2536 gives some context on this. we can add more documentation around this in follow up documentation focused PRs.

@@ -513,7 +513,7 @@ private static void RunEndToEnd(MLContext mlContext, IDataView trainData, string
// Construct the learning pipeline. Note that we are now providing a contract name for the custom mapping:
// otherwise we will not be able to save the model.
var estimator = mlContext.Transforms.CustomMapping<InputRow, OutputRow>(CustomMappings.IncomeMapping, nameof(CustomMappings.IncomeMapping))
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumn: "Label"));
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "Label"));
Copy link
Member

@eerhardt eerhardt Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usually means the cookbook .md file needs to be updated as well. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making me take a second look at this. Updated the cookbook .md file in a couple of places.

p.s. my initial search for "labelColumn" in the cookbook .md file did not return relevant results. Apparently it used to be called just "label" at some point :)


In reply to: 259626158 [](ancestors = 259626158)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam abgoswam merged commit 7cc208c into dotnet:master Feb 25, 2019
@abgoswam abgoswam deleted the abgoswam/parameter_consistency branch March 20, 2019 20:13
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants