Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monomorphize generic functions #1736

Closed
marijnh opened this issue Feb 2, 2012 · 10 comments
Closed

Monomorphize generic functions #1736

marijnh opened this issue Feb 2, 2012 · 10 comments
Labels
A-codegen Area: Code generation E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@marijnh
Copy link
Contributor

marijnh commented Feb 2, 2012

There is consensus that we should at least experiment with generating single-type instances of generic functions. So if you have map<T, U> and you call it on <int, int>, the compiler can compile a version specifically for those types, and if you then call it on <@int, (float, float)>, it can generate another version.

We can reuse generated functions for similar types. For example, since generics can't look 'into' types, types like int and uint could be passed to the same version of a function without problems. We also store tydescs in boxes now, so a single version could be used for all @ types, and simply look inside the box if it needs to call free or compare glue on them.

This is related to the project of cross-crate inlining. To monomorphize a function, you also need some representation of its code.

@ghost ghost assigned marijnh Feb 2, 2012
@marijnh
Copy link
Contributor Author

marijnh commented Feb 2, 2012

The extremely crude benchmark below yield less-than-promising results. The monomorphic version of map is 100 lines of LLVM bitcode, whereas the polymorphic version is 132. The speedup gained from monomorphizing, on running this program (with -O), is about five percent.

fn map<T, U>(c: [T], f: fn(T) -> U) -> [U] {
    let r = [];
    for x in c { r += [f(x)]; }
    r
}
fn main() {
    let i = 0, j = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20];
    while i < 100000 {
        i += 1;
        map(j, {|e| e + 1});
    }
}

@nikomatsakis
Copy link
Contributor

I don't want to discourage benchmarking, but I don't think this captures the whole picture. It is true that despite generating amazingly inefficient code for many generic functions we probably don't suffer that much because they mostly serve to glue other pieces together. However, first, performance is not the only concern: correctness is another. The current code for walking data structures dynamically is quite fragile and gets a number of cases wrong (alignment of enums being a prominent example). This stuff can be fixed, but it's going to require a lot of effort.

Second, I think monomorphizing is an enabler for a lot of other things, some of which will also improve performance:

  • specializing to a particular impl, which would allow inlining of method calls on interface-bounded generics;
  • stack maps for cycle collection, garbage collection, deferred reference counting, whatever become much more straightforward;
  • eliminate the indirection for all return values (which may in turn aid LLVM optimizations);
  • optimizing the representation of enums in particular cases (for example, to use a nullable ptr for option@T);
  • etc.

@marijnh
Copy link
Contributor Author

marijnh commented Feb 2, 2012

Also: removing the && and ++ argument mode distinction.

Another benchmark, with more hopeful results (derived tydescs), shows a factor 70 speedup, and a size reduction of the fst function from 171 to 29 lines of bitcode.

Given the fact that most functions tend to non-generic, and that we'll probably only generate a lot of version for some heavily used, typically small functions like vec::map, I guess things are not looking too bad.

fn fst<T: copy, U: copy>(x: (T, U)) -> T {
    let (x, _) = x; x
}

fn main() {
    let u = (10, 10), v = ({a: 10, b: 20}, 1u), i = 0;
    while i < 3000000 {
        fst(u); fst(v);
        i += 1;
    }
}

marijnh added a commit that referenced this issue Feb 3, 2012
Adds a --monomorpize flag to rustc to turn it on. You probably don't
want to use it yet, since it's broken in a whole bunch of ways, but it
successfully monomorphizes simple generic functions called from within
the crate.

Issue #1736
marijnh added a commit that referenced this issue Feb 7, 2012
The free glue for opaque boxes will pick the actual tydesc out of the
box, and call its glue.

Issue #1736
marijnh added a commit that referenced this issue Feb 8, 2012
marijnh added a commit that referenced this issue Feb 9, 2012
@marijnh
Copy link
Contributor Author

marijnh commented Feb 9, 2012

Monomorphization seems to mostly work now. More experimental data:

  • I set up librustc to include the vec module from core, and use its own version, so that we had more in-crate generic calls.
  • I compiled the result once without monomorphization and once with.
  • The monomorphized version works!
  • The binary is 10% bigger.
  • It compiles things about 1% faster.

This suggests that code blowup is not prohibitive, but definitely significant. There are still some tricks that can be used to reduce this.

The speedup is somewhat disappointing. Since we're still spending most of our time in malloc/free related things, I wasn't exactly expecting a big boost from monomorphizing, but more than a percent.

marijnh added a commit that referenced this issue Feb 9, 2012
Enough to be able to compile librustc with --monomorphize.

Issue #1736
@graydon
Copy link
Contributor

graydon commented Feb 10, 2012

Very exciting! 10% is tolerable on size. It's measurable but Kryder's law erases it quickly. I imagine it only takes a few of those 70x-perf-difference cases to lose a user though. So long as they retain some control over this functionality, I suspect users will be glad to see the feature.

@nikomatsakis
Copy link
Contributor

Marijn, do you monomorphize vtables too? That is, if I have a generic type with an iface bound, do I get static dispatch in the end? Also, do you have a generic "all boxes" version?

@marijnh
Copy link
Contributor Author

marijnh commented Feb 10, 2012

@graydon: The 70x perf difference was an increase in speed. I don't think being fast will lose us users.

@nikomatsakis: I'm not yet generating monomorphized vtables, but I am monomorphizing functions with bounded type parameters to not use a vtable at all. Using one instance of a generic functions for all box types is implemented.

@graydon
Copy link
Contributor

graydon commented Feb 15, 2012

@marijnh I know! I mean "this is good -- in cases where we're 70x too slow and users could get that speed back by going to C++, we'd lose them". It's good we have an answer for those cases now!

@marijnh
Copy link
Contributor Author

marijnh commented Mar 9, 2012

This mostly works in caf04ce . Compilation slowdown is significant (~40%). I haven't profiled yet. Generic code becomes a lot faster (~6× in the artificial benchmark at https://gist.github.com/2008845).

@marijnh
Copy link
Contributor Author

marijnh commented Mar 15, 2012

I'm about to land this. I opened #1980 , #1981 and #1982 for things that need further attention.

@marijnh marijnh closed this as completed Mar 15, 2012
@marijnh marijnh removed their assignment Jun 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants