Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Integer Range Operators #931

Merged
merged 9 commits into from
Jun 7, 2016
Merged

Faster Integer Range Operators #931

merged 9 commits into from
Jun 7, 2016

Conversation

manofstick
Copy link
Contributor

As discussed in #520, the current implementation of the range operators is kind of slow. The discussion on #520 listed some issues with the provided implementation which included some issues where the resultant series didn't match, as well as not matching the exceptions thrown in the invalid state of Current (although MSDN docs says that such calls are undefined, so could probably return anything anyway...) These issues, I believe, have been addressed in this implementation.

Paul Westcott added 2 commits February 3, 2016 13:37
A lighter implementation of the range operators that conforms (i think?)
to current impl
@msftclas
Copy link

msftclas commented Feb 5, 2016

Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

@forki
Copy link
Contributor

forki commented Feb 5, 2016

@manofstick do you have any numbers that show that your implementation is indeed faster?

Do we need to add additional tests?

@manofstick
Copy link
Contributor Author

Some performance numbers:

let sw = System.Diagnostics.Stopwatch.StartNew ()

let data = seq {
    let r = System.Random 42
    for i = 1 to 10000000 do
        yield r.NextDouble ()    
}

let average = data |> Seq.average
let minimum = data |> Seq.min
let maximum = data |> Seq.max

System.Console.WriteLine ("{0} ({1},{2},{3})", sw.ElapsedMilliseconds, minimum, average, maximum)
State x32 % of release x64 % of release
release 4664 100% 2166 100%
branch 1929 42% 1278 59%

TIme in milliseconds

@manofstick
Copy link
Contributor Author

@forki

This implementation is basically equivalent to @tpetricek's version, (especially if you get rid of the exceptions in Current property, although that was one of the comments that @latkin made - i.e. to match existing functionality)

I also have a "fast path" for the standard increment by 1, which is faster still, but really we're just tweaking a little bit; although it does affect the performance in the previously listed example by > 10%, so I thought it was worth it.

I probably need to do some better digging to see what existing tests there are to determine if I need to add some, (and probably do; hopefully I'll find some time over the w/end)

@enricosada
Copy link
Contributor

@manofstick if you can rebase (or cherry-pick) and force push to remove the merge master commit 9344c31, it's more clean. Otherwise np

@forki
Copy link
Contributor

forki commented Feb 5, 2016

would it help if the implementation would be special cased for every (or some important like int32) type?

let inline variableStepIntegralRange n step m =
if step = LanguagePrimitives.GenericZero
then invalidArg "step" (SR.GetString(SR.stepCannotBeZero));
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the else it's not needed, because invalidArg raise an exception ( so a if then it's enough, like there
Less indentation

@manofstick
Copy link
Contributor Author

@enricosada

We use Mercurial at work and so I have never bothered to learn git. Send me commands, and I'll do 'em (...I'm just a noob living in GitHub Desktop land...)

else state.Current

member this.Current =
box this.Current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why box? this.Current it' not of generic IEnumerator<'a> so abstract Current : 'a with get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the "IEnumerable.Current : obj" version.

Probably a bad habbit, but I find myself removing all explicit types.

Happy to put the ": obj" in if it makes this more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, my bad, it's ok

@manofstick
Copy link
Contributor Author

@forki

Due to the statically resolved types, and the way the code is structured (i.e. I do a check for overflow combined with a check for exceeding upper bound) I believe that the code is as efficient as it could be for all data sizes (it is only a replace for "integer" types; i.e not floating point) (... well it could be split for positive and negative step so that check only occurs once, but as I'm dealing with the increment by 1 as a separate case, which I think is the most common usage, I wasn't too concerned)

@@ -6497,6 +6497,95 @@ namespace Microsoft.FSharp.Core
interface IEnumerable with
member x.GetEnumerator() = (gen() :> IEnumerator) }

[<NoEquality; NoComparison>]
type VariableStepIntegralRangeState<'a> = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a struct to save an allocation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we just use 3 different variables and remove this type altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I like the allocation, but without changes to the underlying F# code generator I don't think this is particularly easy to fix (without a lot of duplicated code per type.) So the problem with converting to a struct, or having the 3 variables separately (although I do like from a semantic perspective the mutable state all grouped together) is that the class that F# generates in a closure from let mutable blah = ... is just a ref type, so you are paying for the allocation anyway, it's just hidden. So then I thought about actually creating a real type instead of just using the object expression, but because I'm using statically resolved types functionality I would need an inline type which doesn't exist. So I would have to provide a specific type per underlying integer type which seems a bit excessive.

@dsyme
Copy link
Contributor

dsyme commented Feb 5, 2016

Great work. Please run all tests :) (though I believe master has failing tests at the moment :( )

@forki
Copy link
Contributor

forki commented Feb 5, 2016

How can master have failing tests? ;-)
On Feb 5, 2016 11:53, "Don Syme" [email protected] wrote:

Great work. Please run all tests :) (though I believe master has failing
tests at the moment :( )


Reply to this email directly or view it on GitHub
#931 (comment)
.

@dsyme
Copy link
Contributor

dsyme commented Feb 5, 2016

@forki I know, I know.... Poor CI not able to run enough tests, we have to crank it up,

@manofstick manofstick changed the title Manofstick perf range enumerators Faster Integer Range Operators Feb 7, 2016
@dsyme
Copy link
Contributor

dsyme commented Mar 2, 2016

@manofstick I like this, but I suspect we need to add specific new tests to match the cases in the implementation. For example, I look at cases like this: https://github.com/manofstick/visualfsharp/blob/manofstick-perf-range-enumerators/src/fsharp/FSharp.Core/prim-types.fs#L6665 and I reckon we don't have test coverage for edge cases in floating point enumeration.

Are there specific tests cases we could add? Perhaps someone could volunteer to do some hammering on this?

@latkin
Copy link
Contributor

latkin commented Mar 2, 2016

I had a partial list of nasty cases here #520 (comment), not sure if those are incorporated.

@manofstick
Copy link
Contributor Author

Must say I was a bit slack here, I did kind of assume that strange bounds checks would have already been included in the original test suite, and this was just a refactoring.

But anyway, when I get some time (he says, scrambling to try and put sometime behind the keyboard on a Sunday evening) I'll either ensure that such tests do already exist, or that I add some official ones.

(I did poke an prod them a bit, obviously, so I would be surprised if I stuffed up. But hey, coming off the back of my failed recursive types, I think I would like to double checking my stuff!!)

@manofstick
Copy link
Contributor Author

@dsyme / @latkin

Oh, there were no changes to the floating point enumerations... We're you just referring in general to have some extra tests?

@dsyme
Copy link
Contributor

dsyme commented Mar 7, 2016

@manofstick You're right - ignore my comment on floating point, was looking at existing code rather than your new code. We can assume there is testing for the existing code.

@dsyme
Copy link
Contributor

dsyme commented Mar 8, 2016

@manofstick Could you check these integer test cases please linked by @latkin above?

testSeq 1 10 (Some(Int32.MaxValue))  // new impl overflows, underflows
testSeq 1 10 (Some(-1))              // old impl returns 0-length seq, new impl underflows and gives huge seq
testSeq 3 1 None                     // old impl gives empty seq for a .. b with a < b

I don't know if those comments apply to this implementation - I don't think so - but the testing should be added if it's not there already. Also please add some kind of systematic testing for additional edge cases similar to these cases.

If those test cases are already there and there's no additional testing to be done then this can be pulled.

Thanks

@manofstick
Copy link
Contributor Author

OK; I'll dig in and punish edge cases (for all the various integer types). Hopefully find time this weekend.

@manofstick
Copy link
Contributor Author

@dsyme

So much for me trusting that there would have been tests around edges cases to begin with! And actually unless I'm mistaken, I can't actually find any testing around this functionality at all, beyond being utilized in auxiliary functionality in tests in collection modules. (i.e. can't find testing around the integer edges case that are used in floating point sequences that you mentioned before as an example.)

Anyway, I am rather blind, my wife always grabs the thing that I'm looking for from right in front of me, so maybe I'm missing it.

But I'll proceed as if there is currently no testing on ranges; but let me know if you know that there is. (The one place where there does seem to be a little of this kind of testing in regards to BigIntType).

@dsyme
Copy link
Contributor

dsyme commented Mar 14, 2016

Great, thanks. There's some in tests\fsharp\core\libtest\test.fsx, e.g. IntegerLoopsWithMinAndMaxIntAndKnownBounds, IntegerLoopsWithMinAndMaxIntAndKnownBoundsGoingDown, but they only test int32. Minimally duplicating out all these tests for the other numeric types would make sense, plus systematically testing under different strides and edge cases

Various tests for both signed and unsigned integer values or various bit
sizings
After MoveNext has been returning false, we should have been throwing an
exception. This has now been rectified.
@manofstick
Copy link
Contributor Author

The tests added give reasonable coverage (I feel) of original functionality, but as singleStepRangeEnumerator special cases single increments when the range bounds are not max and I so I need to add a couple of extra things around that. Anyway, out of time, as usual. Hopefully this weekend at the latest.

[<Test>] member this.Int32 () = RangeTestsHelpers.signed System.Int32.MinValue System.Int32.MaxValue
[<Test>] member this.UInt32 () = RangeTestsHelpers.unsigned System.UInt32.MinValue System.UInt32.MaxValue
[<Test>] member this.Int64 () = RangeTestsHelpers.signed System.Int64.MinValue System.Int64.MaxValue
[<Test>] member this.UInt64 () = RangeTestsHelpers.unsigned System.UInt64.MinValue System.UInt64.MaxValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about IntPtr and UIntPtr? Thanks

- singleStepRangeEnumerator cases
- exceptions for variableStepRangeEnumerator
- IntPtr & UIntPtr
Well maybe it can be configured, but just shows the method name, not the
class, so prepended Range to the names, which appears to be a mostly
followed naming convention for tests.
@KevinRansom
Copy link
Member

@manofstick @dsyme . Hey guys, is this ready to pull?

it looks pretty great to me.

@dsyme
Copy link
Contributor

dsyme commented Jun 7, 2016

Yes, LGTM

@KevinRansom KevinRansom merged commit 3617da4 into dotnet:master Jun 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants