Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch housing dataset to wine #170

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

This commit replaces housing dataset to wine dataset which we download during build from external source

@Ivanidzo4ka
Copy link
Contributor Author

address #3

build.proj Outdated
</TestFile>
</ItemGroup>

<Target Name="DownloadExternalTestFiles" Inputs="@(TestFile)" Outputs="%(TestFile.Result)">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do me a favor and test this with 2 or 3 datasets downloaded from the internet? (I guess it can be any file, it doesn't have to be a .csv file) I just want to ensure this works with more than 1 file, and it works correctly when doing it a 2nd time that the data sets aren't downloaded on subsequent builds.

That way the next person who needs to add a dataset isn't bit by a bug here.

Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<ItemGroup> <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/winequality-white.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"> <DestinationFile>$(MSBuildThisFileDirectory)test/data/external/winequality-white.csv</DestinationFile> </TestFile> <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/image.png" Url="https://www.google.com/logos/doodles/2018/tamara-de-lempickas-120th-birthday-4614326680813568-l.png"> <DestinationFile>$(MSBuildThisFileDirectory)test/data/external/image.png</DestinationFile> </TestFile> <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/logo.png" Url="https://www.baidu.com/img/bd_logo1.png"> <DestinationFile>$(MSBuildThisFileDirectory)test/data/external/logo.png</DestinationFile> </TestFile> </ItemGroup>
test it with this. tried different combinations, they all work
Is there are any way to format it properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! nice to hear it works for many items.

As for the formatting, if you use triple back-ticks ``` at the start of a line, markdown will show exactly what you type until the next set of triple back-ticks. This is useful for displaying code. You can even put a code hint after the triple back-ticks XML, and markdown will syntax highlight. So if I take your comment above it looks like this:

<ItemGroup>
    <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/winequality-white.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv">
         <DestinationFile>$(MSBuildThisFileDirectory)test/data/external/winequality-white.csv</DestinationFile>
    </TestFile>
    <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/image.png" Url="https://www.google.com/logos/doodles/2018/tamara-de-lempickas-120th-birthday-4614326680813568-l.png">
<DestinationFile>$(MSBuildThisFileDirectory)test/data/external/image.png</DestinationFile>
</TestFile>
<TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/logo.png" Url="https://www.baidu.com/img/bd_logo1.png">         <DestinationFile>$(MSBuildThisFileDirectory)test/data/external/logo.png</DestinationFile>
    </TestFile>
  </ItemGroup>

See https://guides.github.com/features/mastering-markdown/#GitHub-flavored-markdown for more info.

@eerhardt
Copy link
Member

I see a few more references to housing.txt in the test code:

F:\git\machinelearning\test\Microsoft.ML.Core.Tests\UnitTests\TestCSharpApi.cs(275):            var dataPath = GetDataPath(@"housing.txt");
  F:\git\machinelearning\test\Microsoft.ML.Core.Tests\UnitTests\TestEntryPoints.cs(861):            var dataPath = GetDataPath("housing.txt");
  F:\git\machinelearning\test\Microsoft.ML.Core.Tests\UnitTests\TestEntryPoints.cs(893):            TestEntryPointRoutine("housing.txt", "Trainers.StochasticDualCoordinateAscentRegressor");
  F:\git\machinelearning\test\Microsoft.ML.Core.Tests\UnitTests\TestEntryPoints.cs(967):            TestEntryPointRoutine("housing.txt", "Trainers.PoissonRegressor");
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\Datasets.cs(151):            trainFilename = "housing.txt",
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\Datasets.cs(152):            testFilename = "housing.txt"
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\TestCommandBase.cs(1496):                string pathData = GetDataPath(@"..\Housing (regression)", "housing.txt");
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\TestCommandBase.cs(1536):            string pathData = GetDataPath(@"..\Housing (regression)", "housing.txt");
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\TestCommandBase.cs(1608):            string pathData = GetDataPath(@"..\Housing (regression)", "housing.txt");
  F:\git\machinelearning\test\Microsoft.ML.TestFramework\TestCommandBase.cs(1649):                string pathData = GetDataPath(@"..\Housing (regression)", "housing.txt");

Can you update them all?

build.proj Outdated


<ItemGroup>
<TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/winequality-white.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv">
Copy link
Member

@danmoseley danmoseley May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I believe the attribute and subelement formats are equivalent, you might want to pick one or the other. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't move Include to subelement, but I can move Url to it. having everything as attribute is looks too long


In reply to: 189065564 [](ancestors = 189065564)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a requesting change here, just giving some information for future reference.

The Include attribute on an MSBuild Item is "special". It has to be an XML attribute. Before MSBuild v15, all "custom" metadata (like your Url attribute) could NOT be put as an XML attribute - instead it had to be a sub-element. In v15, they changed it so custom metadata now can be in either an attribute or sub-element.

A format I like and use regularly is the following:

    <TestFile Include="$(MSBuildThisFileDirectory)/test/data/external/winequality-white.csv"
              Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
              DestinationFile="$(MSBuildThisFileDirectory)test/data/external/winequality-white.csv" />

I think it is compact, reads easy, and is easy to maintain/update.

@Ivanidzo4ka
Copy link
Contributor Author

We don't run any of these tests. #78 Anirudh has PR to enable them, and it should be resolved in that PR. (Either by Anirudh, or me)


In reply to: 389680637 [](ancestors = 389680637)

@Ivanidzo4ka
Copy link
Contributor Author

@dotnet-bot test Windows_NT Debug

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka
Copy link
Contributor Author

Anyone other than Eric can look on this? @glebuk

@glebuk
Copy link
Contributor

glebuk commented May 22, 2018

        using (var loader = new BinaryLoader(Env, new BinaryLoader.Arguments(), confusionMatrixPath))

Missing TestCSarpAPI?
#Closed


Refers to: test/Microsoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs:1037 in 45478c2. [](commit_id = 45478c2, deletion_comment = False)

public void EntryPointEvaluateRegression()
{
var dataPath = GetDataPath("housing.txt");
var dataPath = GetDataPath(@"external/winequality-white.csv");
Copy link
Contributor

@glebuk glebuk May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@"external/winequality-white.csv [](start = 39, length = 32)

Extract dataset info into static dataset classes so that no need to repeat paths and schemas. Ideally you can even just return a new loader for each dataset. #Closed

Copy link
Contributor

@glebuk glebuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka merged commit d51321c into dotnet:master May 23, 2018
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
* replace housing uci dataset to wine quality
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants