Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly? #1941

epignatelli · 2020-08-20T08:49:47Z

Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly?

IsakNaslundBh · 2020-08-20T08:59:27Z

Not sure I fully understand the question?

Are you asking if there is a reason we are storing json files at all? Or is the question about the inner workings of the Library_Engine?

epignatelli · 2020-08-20T09:04:18Z

The dataset is a collection of constant objects. Why do we store this collection in a json format, in a separate file, rather than into a class (eg. public static class Gradients : Dataset)?

IsakNaslundBh · 2020-08-20T09:16:25Z

Some varying reasons:

Ease of use for less experienced developers, as well as more experienced ones. Having it in JSON files and letting the menu be dictated purely by the file structure means that all you have to do to create a dataset is to generate the objects you like, in whatever UI you like, then plug them to a Dataset obejct and push that obejct through the file adapter. Then just put the file in the correct folder and raise a PR. No need to change any bit of code etc. Easy to do, easy to understand.
Those datasets can be massive. If we were to put all of the SteelSections on the steel section class it would be thousands of lines. Also it would require constant updates to the specific class, requiring far more code knowledge etc.
IMO, keeping the schema and the data separate makes more sense to me in general. Some for the reasons above, but also as a general concept. The class is just the class definition, not a lot specific instances of it.

Saying all of this, we have a few cases of static properties on the class (as I know you know). For example the Vector class and Point class (getting X, Y, and Z vectors as well as the origin point). Something like that might make sense for the gradient case, but still think in general data is better kept separate from the class definition.

epignatelli · 2020-08-20T09:42:40Z

I am not sure how easier and more intuitive that is. Unless the data in the dataset can be generated procedurally with an algrithm, I don't see the benefits.

The menu organization is a superstructure that we generated - we could do the same if datasets where organised in a different way, e.g. in classes (we do it already to cluster methods in components). There is a lot of knowlege in what you're describing - why should I plug all into a Dataset object? Then use a FileAdapter? What's the correct folder? Again, we data is not procedurally generated, that's not easier than writing the same thing you would write in grasshopper into a cs file.

Yep, this I understand, it makes sense. Maybe that's the case for procedurally generated data, I suppose.
It has a drawback, though. If you changed a schema and not versioned, the project compiles and you know you broke the dataset only at runtime, if you use it.

I am no proposing to mix schema and data instances - the opposite. Ideally what I would do is to add another angle and have a MachineLearning_Datasets project (as well as any other namespace) that holds the instances.

My whole point here is: I can't use datsets in their simplest form: by typing the data that's in there. I need to open the UI, create the objects, serialise them, create a file adapter and push them into the correct folder.

FraserGreenroyd · 2020-08-20T09:44:55Z

Hi guys, this is a perfect opportunity to use Discussions - could we move this conversation there until we get to a stage of having an actionable issue? 😄

IsakNaslundBh · 2020-08-20T10:00:14Z

Good point @FraserGreenroyd . @epignatelli , want to port across, and we can continue there? :)

epignatelli · 2020-08-20T10:03:51Z

Here you go guys!
BHoM/BHoM#973

epignatelli added the type:question Ask for further details or start conversation label Aug 20, 2020

epignatelli assigned alelom, al-fisher, adecler, FraserGreenroyd and IsakNaslundBh Aug 20, 2020

epignatelli closed this as completed Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly? #1941

Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly? #1941

epignatelli commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020

epignatelli commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020 •

edited

Loading

epignatelli commented Aug 20, 2020

FraserGreenroyd commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020

epignatelli commented Aug 20, 2020

Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly? #1941

Is there a specific reason we are storing the datasets in their serialised form rather than static fields/properties in the assembly? #1941

Comments

epignatelli commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020

epignatelli commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020 • edited Loading

epignatelli commented Aug 20, 2020

FraserGreenroyd commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020

epignatelli commented Aug 20, 2020

IsakNaslundBh commented Aug 20, 2020 •

edited

Loading