Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very WIP: New optimizer #26079

Merged
merged 2 commits into from
Feb 28, 2018
Merged

Very WIP: New optimizer #26079

merged 2 commits into from
Feb 28, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Feb 16, 2018

This PR introduces a new IR form and optimizer to address some of the performance problems encountered in #25261. The IR is pure SSA form and borrows from B3 and LLVM in its design choices. I'm planning to write a comprehensive design document once this is a bit further along. The new IR contains Phi nodes, so a significant part of this is adding support for those in codegen and the optimizer.

The current status is that this bootstraps (in the sense of building itself) fine on top of an existing system image, and can bootstrap a new system image to the same place as doing an external bootstrap on master (somewhere around version.jl at which point it gets confused about being not-quite-Base). It also passes core and string tests, in the externally bootstrapped version. It does not however, bootstrap itself internally yet (keeps recursing into inference), so this is quite an early state. Opening this PR to give @vtjnash and @JeffBezanson a place to collaborate on this.

Once this is done, we should be able to delete most of the existing passes from optimize.jl, so I'm hoping that this'll be overall net negative.

TODO status:

  • turning on optimization passes
  • line number info DONE
  • handling meta nodes correctly DONE
  • possible improvements (like not adding edges from error()::Undef{})
  • replace the naive algorithms with the fast ones / performance
  • handling enter/exit
  • cleaning up code / removing debug spew

@Keno Keno added this to the 1.0 milestone Feb 16, 2018
@Keno Keno mentioned this pull request Feb 17, 2018
3 tasks
Keno added a commit that referenced this pull request Feb 19, 2018
As was mentioned in #25988, there is a handy trick where you can load
a second copy of Base on top of an existing copy. This is useful for
at least two reasons:
1. Base printing is available, so things like MethodErrors print nicely
2. Even if the load fails, the resulting (broken) copy of base is inspectable
   by standard introspection tools from the REPL, as long as you're a bit
   careful not to mix types from the two copies of Base.

However, as I mentioned in #26079, this only actually works until about version.jl,
at which point things crash. This is because at that point it tries to use PCRE
which uses `Ref(0)`, which is actually an abstract type in Core, even though the
type of the constructed object (`RefValue`) is in Base. As a result, the new Base
gets the wrong kind of `RefValue` (the one from the original `Base`) and things break.

Luckily this is easily fixed by using an explicit `RefValue` call in the relevant places.

A second problem we run into is that `module`s nested under our new `Base`, get a default
import of the old `Base` (unless we declare the new Base to be the global top module, but
that would break the REPL subsequent to loading the new Base, which breaks reason 2 above).
I suggest (and implement in this PR) to have the default import be the next topmodule along
the parent link chain (as we already do for syntax defined in Base), which makes this work.
A small related detail is that in all such modules `import Base: x`, needs to be replaced by
`import .Base: x`, to make sure we resolve the identifier `Base` (as imported from our
new top module) rather than the global name `Base` (which still refers to the old module).

I changed sysimg.jl to avoid loading stdlibs in second Base mode, to avoid having to implement
the same changes there. Since the stdlibs are already decoupled from Base, they can already
be developed separately fairly easily, so there's not much reason to include them in this trick.

For completeness, there's a couple of ways to use this trick, but perhaps the simplest is:
```
cd("base")
baremodule NotBase
    Core.include(NotBase, "sysimg.jl")
end
```
from the REPL.
Keno added a commit that referenced this pull request Feb 20, 2018
As was mentioned in #25988, there is a handy trick where you can load
a second copy of Base on top of an existing copy. This is useful for
at least two reasons:
1. Base printing is available, so things like MethodErrors print nicely
2. Even if the load fails, the resulting (broken) copy of base is inspectable
   by standard introspection tools from the REPL, as long as you're a bit
   careful not to mix types from the two copies of Base.

However, as I mentioned in #26079, this only actually works until about version.jl,
at which point things crash. This is because at that point it tries to use PCRE
which uses `Ref(0)`, which is actually an abstract type in Core, even though the
type of the constructed object (`RefValue`) is in Base. As a result, the new Base
gets the wrong kind of `RefValue` (the one from the original `Base`) and things break.

Luckily this is easily fixed by using an explicit `RefValue` call in the relevant places.

A second problem we run into is that `module`s nested under our new `Base`, get a default
import of the old `Base` (unless we declare the new Base to be the global top module, but
that would break the REPL subsequent to loading the new Base, which breaks reason 2 above).
I suggest (and implement in this PR) to have the default import be the next topmodule along
the parent link chain (as we already do for syntax defined in Base), which makes this work.
A small related detail is that in all such modules `import Base: x`, needs to be replaced by
`import .Base: x`, to make sure we resolve the identifier `Base` (as imported from our
new top module) rather than the global name `Base` (which still refers to the old module).

I changed sysimg.jl to avoid loading stdlibs in second Base mode, to avoid having to implement
the same changes there. Since the stdlibs are already decoupled from Base, they can already
be developed separately fairly easily, so there's not much reason to include them in this trick.

For completeness, there's a couple of ways to use this trick, but perhaps the simplest is:
```
cd("base")
baremodule NotBase
    Core.include(NotBase, "sysimg.jl")
end
```
from the REPL.
Keno added a commit that referenced this pull request Feb 20, 2018
As was mentioned in #25988, there is a handy trick where you can load
a second copy of Base on top of an existing copy. This is useful for
at least two reasons:
1. Base printing is available, so things like MethodErrors print nicely
2. Even if the load fails, the resulting (broken) copy of base is inspectable
   by standard introspection tools from the REPL, as long as you're a bit
   careful not to mix types from the two copies of Base.

However, as I mentioned in #26079, this only actually works until about version.jl,
at which point things crash. This is because at that point it tries to use PCRE
which uses `Ref(0)`, which is actually an abstract type in Core, even though the
type of the constructed object (`RefValue`) is in Base. As a result, the new Base
gets the wrong kind of `RefValue` (the one from the original `Base`) and things break.

Luckily this is easily fixed by using an explicit `RefValue` call in the relevant places.

A second problem we run into is that `module`s nested under our new `Base`, get a default
import of the old `Base` (unless we declare the new Base to be the global top module, but
that would break the REPL subsequent to loading the new Base, which breaks reason 2 above).
I suggest (and implement in this PR) to have the default import be the next topmodule along
the parent link chain (as we already do for syntax defined in Base), which makes this work.
A small related detail is that in all such modules `import Base: x`, needs to be replaced by
`import .Base: x`, to make sure we resolve the identifier `Base` (as imported from our
new top module) rather than the global name `Base` (which still refers to the old module).

I changed sysimg.jl to avoid loading stdlibs in second Base mode, to avoid having to implement
the same changes there. Since the stdlibs are already decoupled from Base, they can already
be developed separately fairly easily, so there's not much reason to include them in this trick.

For completeness, there's a couple of ways to use this trick, but perhaps the simplest is:
```
cd("base")
baremodule NotBase
    Core.include(NotBase, "sysimg.jl")
end
```
from the REPL.
Keno added a commit that referenced this pull request Feb 20, 2018
As was mentioned in #25988, there is a handy trick where you can load
a second copy of Base on top of an existing copy. This is useful for
at least two reasons:
1. Base printing is available, so things like MethodErrors print nicely
2. Even if the load fails, the resulting (broken) copy of base is inspectable
   by standard introspection tools from the REPL, as long as you're a bit
   careful not to mix types from the two copies of Base.

However, as I mentioned in #26079, this only actually works until about version.jl,
at which point things crash. This is because at that point it tries to use PCRE
which uses `Ref(0)`, which is actually an abstract type in Core, even though the
type of the constructed object (`RefValue`) is in Base. As a result, the new Base
gets the wrong kind of `RefValue` (the one from the original `Base`) and things break.

Luckily this is easily fixed by using an explicit `RefValue` call in the relevant places.

A second problem we run into is that `module`s nested under our new `Base`, get a default
import of the old `Base` (unless we declare the new Base to be the global top module, but
that would break the REPL subsequent to loading the new Base, which breaks reason 2 above).
I suggest (and implement in this PR) to have the default import be the next topmodule along
the parent link chain (as we already do for syntax defined in Base), which makes this work.
A small related detail is that in all such modules `import Base: x`, needs to be replaced by
`import .Base: x`, to make sure we resolve the identifier `Base` (as imported from our
new top module) rather than the global name `Base` (which still refers to the old module).

I changed sysimg.jl to avoid loading stdlibs in second Base mode, to avoid having to implement
the same changes there. Since the stdlibs are already decoupled from Base, they can already
be developed separately fairly easily, so there's not much reason to include them in this trick.

For completeness, there's a couple of ways to use this trick, but perhaps the simplest is:
```
cd("base")
baremodule NotBase
    Core.include(NotBase, "sysimg.jl")
end
```
from the REPL.
Keno added a commit that referenced this pull request Feb 20, 2018
As was mentioned in #25988, there is a handy trick where you can load
a second copy of Base on top of an existing copy. This is useful for
at least two reasons:
1. Base printing is available, so things like MethodErrors print nicely
2. Even if the load fails, the resulting (broken) copy of base is inspectable
   by standard introspection tools from the REPL, as long as you're a bit
   careful not to mix types from the two copies of Base.

However, as I mentioned in #26079, this only actually works until about version.jl,
at which point things crash. This is because at that point it tries to use PCRE
which uses `Ref(0)`, which is actually an abstract type in Core, even though the
type of the constructed object (`RefValue`) is in Base. As a result, the new Base
gets the wrong kind of `RefValue` (the one from the original `Base`) and things break.

Luckily this is easily fixed by using an explicit `RefValue` call in the relevant places.

A second problem we run into is that `module`s nested under our new `Base`, get a default
import of the old `Base` (unless we declare the new Base to be the global top module, but
that would break the REPL subsequent to loading the new Base, which breaks reason 2 above).
I suggest (and implement in this PR) to have the default import be the next topmodule along
the parent link chain (as we already do for syntax defined in Base), which makes this work.
A small related detail is that in all such modules `import Base: x`, needs to be replaced by
`import .Base: x`, to make sure we resolve the identifier `Base` (as imported from our
new top module) rather than the global name `Base` (which still refers to the old module).

I changed sysimg.jl to avoid loading stdlibs in second Base mode, to avoid having to implement
the same changes there. Since the stdlibs are already decoupled from Base, they can already
be developed separately fairly easily, so there's not much reason to include them in this trick.

For completeness, there's a couple of ways to use this trick, but perhaps the simplest is:
```
cd("base")
baremodule NotBase
    Core.include(NotBase, "sysimg.jl")
end
```
from the REPL.
@vtjnash
Copy link
Member

vtjnash commented Feb 24, 2018

From discussion with Keno, we're thinking of merging this soon behind a feature-flag. He thought he might have time today to remove all of the debug statements, re-enable the old optimization passes, and squash/rebase the PR.

@Keno
Copy link
Member Author

Keno commented Feb 25, 2018

Rebased, removed disabled optimization passes, squashed, flag added to inference to turn on the new optimizer.

base/client.jl Outdated
@@ -298,11 +298,6 @@ function exec_options(opts)
# load ~/.julia/config/startup.jl file
startup && load_julia_startup()

if repl || is_interactive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I assume this was broken at some point, so carelessly deleted.

@KristofferC
Copy link
Member

Worth to run nanosoldier? Just to make sure.

@Keno
Copy link
Member Author

Keno commented Feb 25, 2018

Sure, it is disabled by default, so in theory, we should be fine, but who knows:
@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Member

vtjnash commented Feb 26, 2018

The tests compiler and cmdlineargs needs some minor fixup, then this seems good to go.

@Keno Keno force-pushed the kf/phinode branch 2 times, most recently from e550803 to 644a375 Compare February 27, 2018 01:09
Keno and others added 2 commits February 27, 2018 14:35
This introduces a new framework for middle-end optimizations in julia, based on
SSA form IR. Much work is still left to be done, but thanks to Jameson's work
this is in a bootstrappable form, so we're merging it behind a feature flag
to allow for quick iteration.

Documentation of the new IR format is available in the devdocs.

Co-authored-by: Jameson Nash <[email protected]>
@Keno
Copy link
Member Author

Keno commented Feb 28, 2018

FreeBSD and OS X hit their respective intermittent kernel bugs. AppVeyor is as on master. Other tests went through though. Merging. As a reminder, this is just getting the code in, the actual functionality is still behind an (off by default) flag.

@Keno Keno merged commit 2345371 into master Feb 28, 2018
@martinholters martinholters deleted the kf/phinode branch February 28, 2018 07:04
@StefanKarpinski
Copy link
Member

Implicated in dot-call global scope slow down: #26767. I thought this was hidden behind a feature flag—is there somewhat it may affect performance in global scope even when the feature flag is turned off? If so, can we mitigate that somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants