Patch to version 0.4.0 #257

Mec-iS · 2023-04-03T03:10:57Z

Collecting changes for version 0.4.0

New expected behaviour

see #256

Change logs

Add Logistic Regression random test
Add Result type to DenseMatrix constructor by @EdmundsEcho
BREAKING CHANGES on DenseMatrix constructor, now constructor returns a Result

codecov-commenter · 2023-04-03T03:17:52Z

Codecov Report

Attention: Patch coverage is 54.16667% with 33 lines in your changes are missing coverage. Please review.

Project coverage is 45.49%. Comparing base (80a93c1) to head (3fcd722).

Files	Patch %	Lines
src/linalg/basic/matrix.rs	57.35%	29 Missing ⚠️
src/algorithm/neighbour/bbd_tree.rs	0.00%	2 Missing ⚠️
src/error/mod.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #257      +/-   ##
===============================================
- Coverage        46.44%   45.49%   -0.96%     
===============================================
  Files               85       85              
  Lines             7247     7254       +7     
===============================================
- Hits              3366     3300      -66     
- Misses            3881     3954      +73

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

morenol

LGTM

Mec-iS · 2023-04-05T08:23:07Z

@morenol we need to fix Wasm tests

src/linalg/basic/matrix.rs

Mec-iS · 2023-04-06T05:29:14Z

releasing a new minor for these changes is not completely justified. We need to add some more implementations to this to justify a minor release.

Mec-iS · 2023-04-06T06:22:15Z

Also: probably, in accord with the library principles, probably it is better to panic early at initialization than returning a Result. smartcore is meant to be as stable as possible after compilation, using a Result may just move problems into runtime. So there is the option of just allow controlled panic!(..) instead of returning a Fail.

EdmundsEcho · 2023-04-06T10:07:06Z

To which function are you referring?

I get that if a config or something early is flawed, the app should exit with useful feedback. Elsewhere, the user can "opt-in" to a panic by using "unwrap".

Here is a link that might be useful https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html. I see a line between if "dangerous/harmful" then panic, otherwise Err. Err is a valid answer given the input. Panic is not generally expected behavior. Furthermore "fail early" is not what we are talking about, it's more "at the time a mistake is discovered" what does the lib return.

As a user of the library, I would expect to be able to send commands over time. A user (or service), might "get it wrong". Does that mean the user cannot "try again"?

In the event I run this in Tokio, I would not want the thread to panic (very bad and tough to navigate), but rather give me the answer - that can include: Err could not process b/c data was wrong, a match branch I can recover from.

Runtime errors in py is the reason I moved to Rust. Returning Err is not a runtime error like we experience in py. It's only a decision on what to return - app crash/panic or Err.

More generally about runtime errors: by definition of the compile-time checks in Rust, they don't equate to py (incomparable).

Right now the lib is fragile because it doesn't have strong typing through and through (likely mostly on the result side of the process). It will get better over time where the app does not instantiate a bad state (like what we have with DenseMatrix in the 0.4). Until then let the user know with an Err. For instance getting results is a fragile process.

What do you think?

morenol · 2023-04-06T12:53:56Z

I also think that a Result is better than panic for this case. Also agree that we could delay the crates.io release until we have more things that we could add to a breaking change release

EdmundsEcho · 2023-04-06T15:37:03Z

I agree there is more that can be done for the next major release. I have some ideas for how to add to the changes for 0.4. Much of the ideas come from what Axum has done for http.

Reporting

I tried to use the coefficients and intercept to output probabilities (not binary classes) and was surprised by how fragile the process was. e.g., get_row(0), get((0,0)) requires a lot of "under the hood" knowledge not to mention the gymnastics required to extract values from &X.

One way to go from a -> b -> c without the user being aware, is create what I call "transient types". Using the transient types you build implementations of std::convert::From. This way, the end user can do something like:

let coefficients: Vec<_> = lr.coefficients().into();
let intercept: f64 = lr.intercept().into();

// or 
let coefficients: Vec<_> = lr.coefficients().iter().collect();
let intercept: f64 = lr.intercept(); // casts to f64

There are other "best practices" for increasing the usability of each method by using traits such as AsRef, Borrow that I wonder might be worth looking through.

Augment what is included in a report e.g., F stat. I'm not all familiar with the subtle differences between ML and traditional stat modeling. I come from the latter. The incremental effort to report on "common sense" "checks" that we should be interested in is likely good fodder for discussion. It might be a feature, or the user choosing to "see" more of what has already been computed in the default features.
Move away from building your own underlying data structures for hosting and executing common math operations e.g., use ndarray. We are using argmin. We should do the same for the underlying data hosting structs.
Implement "the builder" pattern, like what we experience in many Rust crates, e.g., build a http request. I see what we are doing here in a similar vein. This will augment the capacity to build a "default" for whatever analysis that is "more often" what you want. This would also require augmenting the documentation to fully explain the additional options (accelerated by pointing users to generically available descriptions of the concept on established stats/ml sites?).
All in all, it might be useful to "draw a line" where you want to be able to rely on modules that emit Result vs other, internal private methods that only assert to convey a critical dependence, but that can be made "sound" due to internal, actively monitored assertions. In general, outside "monad", inside "pure".

For the next major after this, or the next after that :)), I would start thinking more about performance. I hit performance issues with 50k x 100 sample size. Part of this would include a review of when we borrow vs take ownership. Ironically, I wonder if there is too much effort to avoid copies. My main point is this, it's still too early to worry about performance in my view. Just note, we have room to improve when the time comes. Unlike python, it will actually make the code both better and more readable, not more convoluted :))

Finally, I've surveyed several packages to accomplish what I need for building stat models. It's a fragmented space with many "half-way" efforts. I like what you all have accomplished. I believe the core strength is assembly: data -> design -> results. I see Axum in smartcore. They assemble the best while focusing on a productive, type-safe, Rust user experience.

I hope this input is helpful. I spent a many years working in Rust. I'm happy to help however you all like.

Mec-iS · 2023-04-07T03:13:04Z

Thanks for your ideas. I am glad you found smartcore interesting and you plan to contribute more!

I believe the library can be very useful in different domains and I am now focusing into using the library for applications to see what it can do. My idea is to keep the library different from the others on multiple dimensions. I think there is already a lot of effort into making things work for "larger" high-level use-cases and that side-tracking this approach can be very interesting in the long term for domains that are maybe not in the "big data" scenario; polars and linfa are already doing great on that, we don't want to overlap too much with their use cases; please see the CONTRIBUTING.
I am currently taking feedback from embedded developers and audio technologists mostly, focusing on real-time processing as an objective; so the use cases I have in mind are more in the context of low-level, small devices, highly portable, high-frequency computations. For example, this field of applications probably needs more something like the latest research on dimensionality reduction techniques than supporting billions of features, or else making ML algorithms more useful on tabular datasets. In this context user experience is a bit of a second-order problem but as far as resource usage stays and these levels (both in terms of time but maybe more important for small devices also space complexity) or improve I am in general OK with better APIs and more features.

It would be great to have more idiomatic Rust, I think let's start with the transient types and the builder pattern, after that we want probably to set up good profiling (we have a bench repo but it is very limited and difficult to run, it would be nice to have something like tarpaulin for profiling, that runs with CI/CD). Vlad, that created the library, is also of the idea that we should drop our custom traits system on arrays but this is something we can see later as far as we provide the bindings to ndarray.
Also it would be nice to have finally fixed predict_proba #211, but I don't know the functionality well enough to know if it is a problem with numerical stability of the library or the implementation itself.

Welcome! @EdmundsEcho

EdmundsEcho · 2023-04-08T00:05:54Z

To better understand the design intent was implemented, may I ask

what the benefits of using a trait Array2d instead of a concrete type that works optimally with the solver? e.g., why not use DenseMatrix and convert whatever users want to this standard?
are all of the methods in the Array-related traits used internally? Or are they there to support pre-processing?
how do you see the line between pre-processing / post-processing and smartcore?
why not use argmin?

Mec-iS · 2023-04-08T02:46:05Z

To better understand the design intent was implemented, may I ask

1. what the benefits of using a trait `Array2d` instead of a concrete type that works optimally with the solver? e.g., why not use `DenseMatrix` and convert whatever users want to this standard?

2. are all of the methods in the `Array`-related traits used internally?  Or are they there to support pre-processing?

3. how do you see the line between pre-processing / post-processing and smartcore?

4. why not use argmin?

I think it is because to take advantage of rust zero-cost abstractions and provide zero-copy slicing and iterators, also it is possible this way to implement other custom concrete types (like a SparseMatrix for example)
They are used internally but they can also be used to define other types. I would keep the custom trait system as we may tend to have a no-dependency (or even a no_std) feature/fork at some point in the future
the library always focused on linear algebra and matrices manipulation, the rest is left to the user. Probably we can add anything that is useful behind feature configurations but the core functionalities have to stay stand-alone for minimal installation.
I guess it didn't exist at the time (2018), check discussions for other hypothesis of possible integrations. Again, using any library that is based on ndarray in the internals will make everything dependant on that, and I don't know if it's worth as we would just overlap other libraries' use cases and working patterns.

Mec-iS · 2024-03-04T08:43:06Z

@morenol please take a look

CHANGELOG.md

morenol · 2024-03-04T13:25:41Z

I think that we need to run cargo fmt

Mec-iS added 4 commits April 3, 2023 11:14

uncomment test

c79d3e8

Add random test for logistic regression

e5250de

linting

28f8b06

Bump version

b734e8c

morenol previously approved these changes Apr 3, 2023

View reviewed changes

Add test for logistic regression

63322c1

Mec-iS dismissed morenol’s stale review via 63322c1 April 4, 2023 10:14

linting

5aed7f5

Mec-iS mentioned this pull request Apr 4, 2023

index out of bounds error using logistic_regression #256

Open

2 tasks

Edmund Cape and others added 7 commits April 4, 2023 15:58

initial commit

2c781ec

final

4c9a650

final-clean

de1a43a

merge

8855efd

Bump to 0.4.0

4c120d3

Fix linter

c1b3677

cleanup

5feab83

Mec-iS mentioned this pull request Apr 5, 2023

Fix 256: Makes instantiating DenseMatrix and others fallible #258

Closed

2 tasks

Mec-iS changed the title ~~Patch 0.3.2~~ Patch 0.4.0 Apr 5, 2023

Mec-iS added 2 commits April 5, 2023 17:27

Update CHANDELOG with breaking changes

574b7fe

Update CHANDELOG date

724268b

Mec-iS commented Apr 5, 2023

View reviewed changes

src/linalg/basic/matrix.rs Outdated Show resolved Hide resolved

Mec-iS commented Apr 5, 2023

View reviewed changes

src/linalg/basic/matrix.rs Outdated Show resolved Hide resolved

Mec-iS commented Apr 5, 2023

View reviewed changes

src/linalg/basic/matrix.rs Show resolved Hide resolved

Mec-iS commented Apr 5, 2023

View reviewed changes

src/linalg/basic/matrix.rs Show resolved Hide resolved

Mec-iS added 4 commits April 6, 2023 10:56

Add functional methods to DenseMatrix implementation

d4e9669

linting

0a377d5

add type declaration in test

4e55346

Fix Wasm tests failing

c987a80

Add type annotations on BBDTree constructor

af5033b

Mec-iS dismissed morenol’s stale review via af5033b April 6, 2023 09:05

Mec-iS mentioned this pull request Apr 10, 2023

Add multiclass classification strategies #65

Open

Mec-iS marked this pull request as ready for review March 4, 2024 08:23

Merge branch 'development' into patch-0.3.2

ea74b32

Mec-iS changed the title ~~Patch 0.4.0~~ Patch 0.3.3 Mar 4, 2024

Mec-iS changed the title ~~Patch 0.3.3~~ Patch to version 0.3.3 Mar 4, 2024

lorenzo-moriondo-agreena added 3 commits March 4, 2024 08:32

fix clippy

5667847

fix clippy

32d7e45

fix tests

92cb697

morenol reviewed Mar 4, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

morenol reviewed Mar 4, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Mec-iS changed the title ~~Patch to version 0.3.3~~ Patch to version 0.4.0 Mar 4, 2024

bump version

c1b0b53

run fmt. fix changelog

3fcd722

Mec-iS requested a review from morenol March 4, 2024 13:29

morenol approved these changes Mar 4, 2024

View reviewed changes

morenol merged commit 239c004 into development Mar 4, 2024
13 of 14 checks passed

Mec-iS deleted the patch-0.3.2 branch March 4, 2024 15:03

Mec-iS mentioned this pull request Mar 25, 2024

Comparison with linfa #277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch to version 0.4.0 #257

Patch to version 0.4.0 #257

Mec-iS commented Apr 3, 2023 •

edited

Loading

codecov-commenter commented Apr 3, 2023 •

edited by codecov bot

Loading

morenol left a comment

Mec-iS commented Apr 5, 2023

Mec-iS commented Apr 6, 2023

Mec-iS commented Apr 6, 2023 •

edited

Loading

EdmundsEcho commented Apr 6, 2023 •

edited

Loading

morenol commented Apr 6, 2023

EdmundsEcho commented Apr 6, 2023

Mec-iS commented Apr 7, 2023 •

edited

Loading

EdmundsEcho commented Apr 8, 2023

Mec-iS commented Apr 8, 2023

Mec-iS commented Mar 4, 2024

morenol commented Mar 4, 2024

Patch to version 0.4.0 #257

Patch to version 0.4.0 #257

Conversation

Mec-iS commented Apr 3, 2023 • edited Loading

New expected behaviour

Change logs

codecov-commenter commented Apr 3, 2023 • edited by codecov bot Loading

Codecov Report

morenol left a comment

Choose a reason for hiding this comment

Mec-iS commented Apr 5, 2023

Mec-iS commented Apr 6, 2023

Mec-iS commented Apr 6, 2023 • edited Loading

EdmundsEcho commented Apr 6, 2023 • edited Loading

morenol commented Apr 6, 2023

EdmundsEcho commented Apr 6, 2023

Mec-iS commented Apr 7, 2023 • edited Loading

EdmundsEcho commented Apr 8, 2023

Mec-iS commented Apr 8, 2023

Mec-iS commented Mar 4, 2024

morenol commented Mar 4, 2024

Mec-iS commented Apr 3, 2023 •

edited

Loading

codecov-commenter commented Apr 3, 2023 •

edited by codecov bot

Loading

Mec-iS commented Apr 6, 2023 •

edited

Loading

EdmundsEcho commented Apr 6, 2023 •

edited

Loading

Mec-iS commented Apr 7, 2023 •

edited

Loading