Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Use more iter_rows and select_rows #107

Merged
merged 10 commits into from
Jul 26, 2016
Merged

Conversation

tafia
Copy link
Contributor

@tafia tafia commented Jul 25, 2016

Partly closes #93

I greped data() to search for relevant lines.

Replace with iter_rows and select_rows whenever I could.

Also sometimes use zip iterator instead of enumerate. Hopefully it'll lead to less bound checks.

@AtheMathmo
Copy link
Owner

This looks like it adds a lot of really nice changes! Thank you! I need to properly review so these comments may not be valid, but there are a couple caveats with zip and select_rows.

From memory zip can prevent vectorization in some cases - and optimizes funnily with floats. I can't find any references right now but hopefully it isn't important for this PR.

And select_rows will copy the underlying data. In some places it may be more efficient to index, or slice, or something else instead. I did have plans a while ago to rename this to copy_rows or something but never followed through...

I'll review properly tomorrow morning but I hope the above doesn't invalidate anything you've done!

@tafia
Copy link
Contributor Author

tafia commented Jul 25, 2016

For the zip, I believe the issue for vectorization has been fixed for slices (which is our case), though it may not be in rust stable yet ...
I didn't know about the issue regarding the floats.

I didn't notice about the select_rows ... this is indeed problematic (even if as you stated it could sometimes lead to faster code).

I couldn't find a get_row fn in rulinalg crate, (a fn that just borrows a slice of the matrix). I think the code is easier to read (and probably to generalized, if ever needed) with such a function than without using the &matrix.data()[i*col .. (i + 1) * col]

If needed I think we could as a start create an Trait that adds it to Matrix, MatrixSlice and MatrixSliceMut.

To conclude I think having some benchmarks would probably help taking a decision ...

@AtheMathmo
Copy link
Owner

That zip change is awesome! I believe I remember reading the float stuff in a conversation that bluss was involved in - perhaps that has been addressed since too. In this case I'm happy to use zip liberally where relevant.

There is an open issue on rulinalg to add get_rows. If you think it would help this PR a great deal I'm happy to implement this soon (or of course you can have a go too!).

And yes I agree - benchmarks would help a great deal. Sadly I have no benchmarks for rusty-machine yet - only for rulinalg.

With all that said, I still haven't reviewed this PR. I'll do so soon. Thanks again!

1) *
inputs.cols()],
inputs);
let sub_neighbours = self.region_query(inputs.select_rows(&[*data_point_idx]).data(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep this as is until AtheMathmo/rulinalg#7 has been tackled. I'll add a comment on that ticket to note that this piece of code should be revisited once that feature lands.

Happy to hear your thoughts on it.

@AtheMathmo
Copy link
Owner

AtheMathmo commented Jul 25, 2016

Having thought a little more about the select_rows introductions - I think it is best to use the select_rows method as in this PR.

Even if it is less efficient - it will be minimally so. And it will be easier to track these down (in addition to some other existing cases I imagine) when the get_rows methods do land.

If you could just check out the comment on the SVM inner loop multiplication I'll be happy to merge after.

@tafia
Copy link
Contributor Author

tafia commented Jul 26, 2016

Made a PR on rulinalg for get_rows:
AtheMathmo/rulinalg#8

I'll update this PR if it gets merged.

@AtheMathmo
Copy link
Owner

I added a few dictator-esque remarks. Nothing too heavy just personal taste and tests. The code itself looks good and will make this PR even more valuable. Thanks for all your help!

@tafia
Copy link
Contributor Author

tafia commented Jul 26, 2016

I added a few dictator-esque remarks

I think you're right! Always better to follow the same coding style throughout the whole crate, it helps readability.

@tafia
Copy link
Contributor Author

tafia commented Jul 26, 2016

I'll update this PR if it gets merged.

I actually think we should merge this PR as-is ... and open another issue to replace select_rows with get_rows

I've done a quick grep, and it looks like it is used widely enough to have its own PR.

@AtheMathmo
Copy link
Owner

Yep, I agree with you too! Will merge this in the next hour or so. Thanks.

@AtheMathmo AtheMathmo merged commit f25dba2 into AtheMathmo:master Jul 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replacing row iteration by chunks with iter_rows method
2 participants