Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify API for advanced settings. (SDCA) #2093

Merged
merged 13 commits into from
Jan 18, 2019
17 changes: 8 additions & 9 deletions docs/samples/Microsoft.ML.Samples/Dynamic/SDCA.cs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
using System;
using System.Linq;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic
{
Expand Down Expand Up @@ -59,15 +60,13 @@ public static void SDCA_BinaryClassification()
// If we wanted to specify more advanced parameters for the algorithm,
// we could do so by tweaking the 'advancedSetting'.
var advancedPipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
.Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent
(labelColumn: "Sentiment",
featureColumn: "Features",
advancedSettings: s=>
{
s.ConvergenceTolerance = 0.01f; // The learning rate for adjusting bias from being regularized
s.NumThreads = 2; // Degree of lock-free parallelism
})
);
.Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(
new SdcaBinaryTrainer.Options {
LabelColumn = "Sentiment",
FeatureColumn = "Features",
ConvergenceTolerance = 0.01f, // The learning rate for adjusting bias from being regularized
NumThreads = 2, // Degree of lock-free parallelism
}));

// Run Cross-Validation on this second pipeline.
var cvResults_advancedPipeline = mlContext.BinaryClassification.CrossValidate(data, pipeline, labelColumn: "Sentiment", numFolds: 3);
Expand Down
149 changes: 70 additions & 79 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaBinary.cs

Large diffs are not rendered by default.

31 changes: 13 additions & 18 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaMultiClass.cs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
using Microsoft.ML.Training;
using Float = System.Single;

[assembly: LoadableClass(SdcaMultiClassTrainer.Summary, typeof(SdcaMultiClassTrainer), typeof(SdcaMultiClassTrainer.Arguments),
[assembly: LoadableClass(SdcaMultiClassTrainer.Summary, typeof(SdcaMultiClassTrainer), typeof(SdcaMultiClassTrainer.Options),
new[] { typeof(SignatureMultiClassClassifierTrainer), typeof(SignatureTrainer), typeof(SignatureFeatureScorerTrainer) },
SdcaMultiClassTrainer.UserNameValue,
SdcaMultiClassTrainer.LoadNameValue,
Expand All @@ -29,14 +29,14 @@ namespace Microsoft.ML.Trainers
{
// SDCA linear multiclass trainer.
/// <include file='doc.xml' path='doc/members/member[@name="SDCA"]/*' />
public class SdcaMultiClassTrainer : SdcaTrainerBase<SdcaMultiClassTrainer.Arguments, MulticlassPredictionTransformer<MulticlassLogisticRegressionModelParameters>, MulticlassLogisticRegressionModelParameters>
public class SdcaMultiClassTrainer : SdcaTrainerBase<SdcaMultiClassTrainer.Options, MulticlassPredictionTransformer<MulticlassLogisticRegressionModelParameters>, MulticlassLogisticRegressionModelParameters>
{
public const string LoadNameValue = "SDCAMC";
public const string UserNameValue = "Fast Linear Multi-class Classification (SA-SDCA)";
public const string ShortName = "sasdcamc";
internal const string Summary = "The SDCA linear multi-class classification trainer.";

public sealed class Arguments : ArgumentsBase
public sealed class Options : ArgumentsBase
Copy link
Contributor

@artidoro artidoro Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArgumentsBase [](start = 38, length = 13)

Are we planning to rename this too? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point the focus is to get the public APIs fixed.

Rename of internal classes / variables is low priority at this point.


In reply to: 248102673 [](ancestors = 248102673)

{
[Argument(ArgumentType.Multiple, HelpText = "Loss Function", ShortName = "loss", SortOrder = 50)]
public ISupportSdcaClassificationLossFactory LossFunction = new LogLossFactory();
Expand All @@ -57,41 +57,36 @@ public sealed class Arguments : ArgumentsBase
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <param name="advancedSettings">A delegate to set more settings.
/// The settings here will override the ones provided in the direct method signature,
/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public SdcaMultiClassTrainer(IHostEnvironment env,
Copy link
Member

@sfilipi sfilipi Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public SdcaMultiClassTrainer [](start = 6, length = 30)

same #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made internal. deletion will happen in #2100


In reply to: 248770465 [](ancestors = 248770465)

string labelColumn = DefaultColumnNames.Label,
string featureColumn = DefaultColumnNames.Features,
string weights = null,
ISupportSdcaClassificationLoss loss = null,
float? l2Const = null,
float? l1Threshold = null,
int? maxIterations = null,
Action<Arguments> advancedSettings = null)
: base(env, featureColumn, TrainerUtils.MakeU4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weights), advancedSettings,
l2Const, l1Threshold, maxIterations)
int? maxIterations = null)
: base(env, featureColumn, TrainerUtils.MakeU4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weights),
l2Const, l1Threshold, maxIterations)
{
Host.CheckNonEmpty(featureColumn, nameof(featureColumn));
Host.CheckNonEmpty(labelColumn, nameof(labelColumn));
_loss = loss ?? Args.LossFunction.CreateComponent(env);
Loss = _loss;
}

internal SdcaMultiClassTrainer(IHostEnvironment env, Arguments args,
internal SdcaMultiClassTrainer(IHostEnvironment env, Options options,
Copy link
Member

@sfilipi sfilipi Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal SdcaMultiClassTrainer [](start = 8, length = 30)

delete, maybe #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100


In reply to: 248770420 [](ancestors = 248770420)

string featureColumn, string labelColumn, string weightColumn = null)
: base(env, args, TrainerUtils.MakeU4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weightColumn))
: base(env, options, TrainerUtils.MakeU4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weightColumn))
{
Host.CheckValue(labelColumn, nameof(labelColumn));
Host.CheckValue(featureColumn, nameof(featureColumn));

_loss = args.LossFunction.CreateComponent(env);
_loss = options.LossFunction.CreateComponent(env);
Loss = _loss;
}

internal SdcaMultiClassTrainer(IHostEnvironment env, Arguments args)
: this(env, args, args.FeatureColumn, args.LabelColumn)
internal SdcaMultiClassTrainer(IHostEnvironment env, Options options)
: this(env, options, options.FeatureColumn, options.LabelColumn)
{
}

Expand Down Expand Up @@ -455,14 +450,14 @@ public static partial class Sdca
ShortName = SdcaMultiClassTrainer.ShortName,
XmlInclude = new[] { @"<include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/member[@name=""SDCA""]/*' />",
@"<include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/example[@name=""StochasticDualCoordinateAscentClassifier""]/*' />" })]
public static CommonOutputs.MulticlassClassificationOutput TrainMultiClass(IHostEnvironment env, SdcaMultiClassTrainer.Arguments input)
public static CommonOutputs.MulticlassClassificationOutput TrainMultiClass(IHostEnvironment env, SdcaMultiClassTrainer.Options input)
{
Contracts.CheckValue(env, nameof(env));
var host = env.Register("TrainSDCA");
host.CheckValue(input, nameof(input));
EntryPointUtils.CheckInputArgs(host, input);

return LearnerEntryPointsUtils.Train<SdcaMultiClassTrainer.Arguments, CommonOutputs.MulticlassClassificationOutput>(host, input,
return LearnerEntryPointsUtils.Train<SdcaMultiClassTrainer.Options, CommonOutputs.MulticlassClassificationOutput>(host, input,
() => new SdcaMultiClassTrainer(host, input),
() => LearnerEntryPointsUtils.FindColumn(host, input.TrainingData.Schema, input.LabelColumn));
}
Expand Down
33 changes: 14 additions & 19 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaRegression.cs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
using Microsoft.ML.Trainers;
using Microsoft.ML.Training;

[assembly: LoadableClass(SdcaRegressionTrainer.Summary, typeof(SdcaRegressionTrainer), typeof(SdcaRegressionTrainer.Arguments),
[assembly: LoadableClass(SdcaRegressionTrainer.Summary, typeof(SdcaRegressionTrainer), typeof(SdcaRegressionTrainer.Options),
new[] { typeof(SignatureRegressorTrainer), typeof(SignatureTrainer), typeof(SignatureFeatureScorerTrainer) },
SdcaRegressionTrainer.UserNameValue,
SdcaRegressionTrainer.LoadNameValue,
Expand All @@ -24,19 +24,19 @@
namespace Microsoft.ML.Trainers
{
/// <include file='doc.xml' path='doc/members/member[@name="SDCA"]/*' />
public sealed class SdcaRegressionTrainer : SdcaTrainerBase<SdcaRegressionTrainer.Arguments, RegressionPredictionTransformer<LinearRegressionModelParameters>, LinearRegressionModelParameters>
public sealed class SdcaRegressionTrainer : SdcaTrainerBase<SdcaRegressionTrainer.Options, RegressionPredictionTransformer<LinearRegressionModelParameters>, LinearRegressionModelParameters>
{
internal const string LoadNameValue = "SDCAR";
internal const string UserNameValue = "Fast Linear Regression (SA-SDCA)";
internal const string ShortName = "sasdcar";
internal const string Summary = "The SDCA linear regression trainer.";

public sealed class Arguments : ArgumentsBase
public sealed class Options : ArgumentsBase
{
[Argument(ArgumentType.Multiple, HelpText = "Loss Function", ShortName = "loss", SortOrder = 50)]
public ISupportSdcaRegressionLossFactory LossFunction = new SquaredLossFactory();

public Arguments()
public Options()
{
// Using a higher default tolerance for better RMS.
ConvergenceTolerance = 0.01f;
Expand All @@ -61,40 +61,35 @@ public Arguments()
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <param name="advancedSettings">A delegate to set more settings.
/// The settings here will override the ones provided in the direct method signature,
/// if both are present and have different values.
/// The columns names, however need to be provided directly, not through the <paramref name="advancedSettings"/>.</param>
public SdcaRegressionTrainer(IHostEnvironment env,
Copy link
Member

@sfilipi sfilipi Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public SdcaRegressionTrainer(IHos [](start = 5, length = 36)

shall this ctor go away? #Resolved

Copy link
Member Author

@abgoswam abgoswam Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made internal. deletion will happen in #2100


In reply to: 248770050 [](ancestors = 248770050)

string labelColumn = DefaultColumnNames.Label,
string featureColumn = DefaultColumnNames.Features,
string weights = null,
ISupportSdcaRegressionLoss loss = null,
float? l2Const = null,
float? l1Threshold = null,
int? maxIterations = null,
Action<Arguments> advancedSettings = null)
: base(env, featureColumn, TrainerUtils.MakeR4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weights), advancedSettings,
l2Const, l1Threshold, maxIterations)
int? maxIterations = null)
: base(env, featureColumn, TrainerUtils.MakeR4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weights),
l2Const, l1Threshold, maxIterations)
{
Host.CheckNonEmpty(featureColumn, nameof(featureColumn));
Host.CheckNonEmpty(labelColumn, nameof(labelColumn));
_loss = loss ?? Args.LossFunction.CreateComponent(env);
Loss = _loss;
}

internal SdcaRegressionTrainer(IHostEnvironment env, Arguments args, string featureColumn, string labelColumn, string weightColumn = null)
: base(env, args, TrainerUtils.MakeR4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weightColumn))
internal SdcaRegressionTrainer(IHostEnvironment env, Options options, string featureColumn, string labelColumn, string weightColumn = null)
Copy link
Member

@sfilipi sfilipi Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal SdcaRegressionTrainer [](start = 7, length = 31)

I'd remove this one too. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100


In reply to: 248770295 [](ancestors = 248770295)

: base(env, options, TrainerUtils.MakeR4ScalarColumn(labelColumn), TrainerUtils.MakeR4ScalarWeightColumn(weightColumn))
{
Host.CheckValue(labelColumn, nameof(labelColumn));
Host.CheckValue(featureColumn, nameof(featureColumn));

_loss = args.LossFunction.CreateComponent(env);
_loss = options.LossFunction.CreateComponent(env);
Loss = _loss;
}

internal SdcaRegressionTrainer(IHostEnvironment env, Arguments args)
: this(env, args, args.FeatureColumn, args.LabelColumn)
internal SdcaRegressionTrainer(IHostEnvironment env, Options options)
Copy link
Member

@singlis singlis Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SdcaRegressionTrainer [](start = 17, length = 21)

isnt there suposed to be a constructor with common arguments and one with options? If we make the other two constructors internal (and eventually remove), then we have just this constructor with the options. Is that OK? Does SDCARegressionTrainer have any common args? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure what you mean.

After #2100, only 1 constructor should remain.


In reply to: 248865494 [](ancestors = 248865494)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that main point with the API changes is that we end up with two constructors that are publicly available to the user:

  • one constructor that would be used for most scenarios and contains a set of arguments that are just passed in Foo(arg1, arg2...)
  • one constructor that only takes in options if we want to set advanced features Foo(options)

Now the first constructor can use the options constructor under the covers, i.e. Foo(arg1, arg2): this(new Options(){ arg1 = arg1, arg2 = arg2 }... but we would still expose two different constructors publicly that can be used.

Is that not the case for SDCA?


In reply to: 248870888 [](ancestors = 248870888,248865494)

Copy link
Member Author

@abgoswam abgoswam Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should have two MLContext extensions that are publicly available to the user. This is the public API for ML.NET
  2. All constructors are internal
  3. With Refactoring of Constructors #2100 we will unify the internal constructors. But thats P2 for now.

In reply to: 248880304 [](ancestors = 248880304,248870888,248865494)

: this(env, options, options.FeatureColumn, options.LabelColumn)
{
}

Expand Down Expand Up @@ -178,14 +173,14 @@ public static partial class Sdca
ShortName = SdcaRegressionTrainer.ShortName,
XmlInclude = new[] { @"<include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/member[@name=""SDCA""]/*' />",
@"<include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/example[@name=""StochasticDualCoordinateAscentRegressor""]/*' />" })]
public static CommonOutputs.RegressionOutput TrainRegression(IHostEnvironment env, SdcaRegressionTrainer.Arguments input)
public static CommonOutputs.RegressionOutput TrainRegression(IHostEnvironment env, SdcaRegressionTrainer.Options input)
{
Contracts.CheckValue(env, nameof(env));
var host = env.Register("TrainSDCA");
host.CheckValue(input, nameof(input));
EntryPointUtils.CheckInputArgs(host, input);

return LearnerEntryPointsUtils.Train<SdcaRegressionTrainer.Arguments, CommonOutputs.RegressionOutput>(host, input,
return LearnerEntryPointsUtils.Train<SdcaRegressionTrainer.Options, CommonOutputs.RegressionOutput>(host, input,
() => new SdcaRegressionTrainer(host, input),
() => LearnerEntryPointsUtils.FindColumn(host, input.TrainingData.Schema, input.LabelColumn));
}
Expand Down
Loading