New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implement a real InstCombine MIR pass #105808

Closed

saethlin wants to merge 1 commit into rust-lang:master from saethlin:fold

Member

saethlin commented Dec 17, 2022 •

edited

Loading

This pass is built around an observation that in optimized MIR, after inlining, we often end up with a lot of temporary locals that do not even escape their block. When the temporary is only used in two statements which are adjacent, we can fold it away.

This happens with place projections:

_2 = _1.a; // First use of _2
_3 = _2.b; // Last use of _2

With pointer casts (and also numeric casts, but those are more complicated):

_2 = move _1 as *mut T (PtrToPtr);
_3 = move _2 as *mut U (PtrToPtr);

And also with temporary (including raw) references:

_2 = &_1;
_3 = *_2;

rustbot added S-waiting-on-review T-compiler labels

Member Author

saethlin commented Dec 17, 2022

@bors try @rust-timer queue

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf label

Contributor

bors commented Dec 17, 2022

⌛ Trying commit 6b9e5d1d28e19dbdd14c6b1993cec3e0ee2283df with merge 50bd766f65a66a9b9b92fccecf21d94135864fc2...

Member Author

saethlin commented Dec 17, 2022

@bors try @rust-timer queue

This comment has been minimized.

Sign in to view

Contributor

bors commented Dec 17, 2022

⌛ Trying commit 9a6c0dc50c5886bd7906450bc0a7001ac1b2bf1d with merge 186ff00597beff5caa4a628567a8b026333c6a8f...

Contributor

bors commented Dec 17, 2022

💔 Test failed - checks-actions

bors added S-waiting-on-author and removed S-waiting-on-review labels

Member Author

saethlin commented Dec 17, 2022

@bors try @rust-timer queue

This comment has been minimized.

Sign in to view

Contributor

bors commented Dec 17, 2022

⌛ Trying commit 9db861ff637d5add29e5102d4ca82d83f57ab2da with merge 139f427fcbd48a5c8f71a159f3d1be058bf7fffd...

Contributor

bors commented Dec 17, 2022

💔 Test failed - checks-actions

This comment has been minimized.

Sign in to view

This comment has been minimized.

Sign in to view

Contributor

bors commented Dec 18, 2022

☔ The latest upstream changes (presumably #105876) made this pull request unmergeable. Please resolve the merge conflicts.

saethlin force-pushed the fold branch from 9db861f to 84d6b4f Compare

December 18, 2022 23:28

cjgillot reviewed

View reviewed changes

compiler/rustc_mir_transform/src/fold.rs Outdated Show resolved Hide resolved

cjgillot added the A-mir-opt label

saethlin force-pushed the fold branch from 84d6b4f to cdec020 Compare

February 19, 2023 04:22

Member Author

saethlin commented Feb 19, 2023

@bors try @rust-timer queue

This comment has been minimized.

Sign in to view

Contributor

bors commented Feb 19, 2023

⌛ Trying commit cdec0209cc3a9ff8b0fb0296e908481459fabc39 with merge 995f7db06ddfd96031de6e2fcf91247b1416dcbf...

Member Author

saethlin commented May 12, 2023

I opened #111518 for the niche problem. I just want to land this instead of letting it languish. I think even with a little hack it stands on it own.

saethlin marked this pull request as ready for review

May 12, 2023 22:45

Collaborator

rustbot commented May 12, 2023

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Contributor

bors commented May 13, 2023

☔ The latest upstream changes (presumably #111447) made this pull request unmergeable. Please resolve the merge conflicts.

cjgillot reviewed

View reviewed changes

compiler/rustc_mir_transform/src/instcombine.rs Show resolved Hide resolved

compiler/rustc_mir_transform/src/instcombine.rs

    
              // _2 = &_1;

              // _3 = *_2;

              // Into:

              // _3 = _1;

Contributor

cjgillot May 13, 2023

Could you add the explanation why this is correct?
This boils down to the second statement being the only use of _2, but that needs to be explicit.

Could you specify in the comment which among _1/_2/_3 is temp_place, temp_rvalue...

compiler/rustc_mir_transform/src/instcombine.rs

    
                  let Some(second_place) = final_operand.place() else { return None; };

                  if second_place.projection.get(0) != Some(&ProjectionElem::Deref) {

                      return None;

                  }

Contributor

cjgillot May 13, 2023

Could you assert that Some(second_place.local) == _temp_place.as_local())?

compiler/rustc_mir_transform/src/instcombine.rs Outdated Show resolved Hide resolved

compiler/rustc_mir_transform/src/instcombine.rs

    
                  if second_place.projection.get(0) != Some(&ProjectionElem::Deref) {

                      return None;

                  }

Contributor

cjgillot May 13, 2023

Likewise, an assertion on second_place.local could be useful.

compiler/rustc_mir_transform/src/instcombine.rs Outdated Show resolved Hide resolved

compiler/rustc_mir_transform/src/instcombine.rs

    
                          // our StorageDead statements into the last slots, and our new statement somewhere in

                          // the middle.

                          // This ensures that we do not change the location of any statements that we have not

                          // optimized, which minimizes the amount of our analysis that we have invalidated.

Contributor

cjgillot May 13, 2023

What happens if:

_2 = _1 as *const u8;
StorageDead(_4)
StorageLive(_4)
_3 = _2 as *const ();

Member Author

saethlin May 13, 2023

_4 isn't mentioned in the combined statement, so those statements would be ignored. Did you mean _3?

Contributor

cjgillot May 13, 2023

Indeed. My question is about the StorageDead-then-StorageLive getting swapped, so if that's the one triggering the behaviour, _3 it is.

compiler/rustc_mir_transform/src/instcombine.rs

    
                          for (slot, statement) in slots.iter().rev().zip(storage_dead.into_iter().rev()) {

                              assert!(matches!(statements[*slot].kind, StatementKind::Nop));

                              statements[*slot] = statement;

                          }

Contributor

cjgillot May 13, 2023

Could you add a comment explaining why we don't add those storage statements to invalidated_statements?

compiler/rustc_mir_transform/src/instcombine.rs Outdated

    
                          }

                          PlaceContext::MutatingUse(

                              MutatingUseContext::Borrow

                              | MutatingUseContext::Projection

Contributor

cjgillot May 13, 2023

Projection cannot happen in visit_place.

compiler/rustc_mir_transform/src/instcombine.rs Outdated

Comment on lines 375 to 416

    
                          PlaceContext::MutatingUse(

                              MutatingUseContext::Borrow

                              | MutatingUseContext::Projection

                              | MutatingUseContext::AddressOf,

                          ) => {

                              self.analysis[place.local].read.insert(location);

                          }

Contributor

cjgillot May 13, 2023

This branch needs an explanation.
We usually consider all refs and addressofs, either mutable or immutable, to be writes in optimizations.
Why are they all reads here?

scottmcm mentioned this pull request

EXPERIMENT [MIR-OPT] Add a pass that replicates simple branches into predecessors #111574

Closed

Member

scottmcm commented May 17, 2023

If I could propose a test here, https://rust.godbolt.org/z/zh8EP9KcT

pub fn simple_swap<T>(x: &mut T, y: &mut T) {
    use std::ptr::{read, write};
    unsafe {
        let temp = read(x);
        write(x, read(y));
        write(y, temp);
    }
}

Today that's

    bb0: {
        _4 = &raw const (*_1);           // scope 1 at /app/example.rs:4:25: 4:26
        _3 = (*_4);                      // scope 4 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1172:9: 1172:46
        _5 = &raw mut (*_1);             // scope 2 at /app/example.rs:5:15: 5:16
        _7 = &raw const (*_2);           // scope 2 at /app/example.rs:5:23: 5:24
        _6 = (*_7);                      // scope 7 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1172:9: 1172:46
        (*_5) = move _6;                 // scope 10 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1370:9: 1370:45
        _8 = &raw mut (*_2);             // scope 2 at /app/example.rs:6:15: 6:16
        (*_8) = move _3;                 // scope 13 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1370:9: 1370:45
        return;                          // scope 0 at /app/example.rs:8:2: 8:2
    }

but with only combining adjacent statements I think that could inprove all the way down to

    bb0: {
        _3 = (*_1);                      // scope 4 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1172:9: 1172:46
        (*_1) = (*_2);                   // scope 10 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1370:9: 1370:45
        (*_2) = move _3;                 // scope 13 at /rustc/2c41369acc445d04129db40ba998dd7a89fb0d2e/library/core/src/ptr/mod.rs:1370:9: 1370:45
        return;                          // scope 0 at /app/example.rs:8:2: 8:2
    }

using mostly things that it looks like you might have already implemented here.

Member

Dylan-DPC commented May 18, 2023

@saethlin any updates on this?

Member Author

saethlin commented May 18, 2023

This has only been idle for 5 days. I'm having a busy week and haven't had much time for volunteering.

Member

Dylan-DPC commented May 18, 2023

My bad :P oversaw that part :P no worries

saethlin force-pushed the fold branch from 9306b8d to 9949d31 Compare

June 16, 2023 21:51

Member Author

saethlin commented Jun 16, 2023

Just rebasing, don't get excited just yet

This comment has been minimized.

Sign in to view

Contributor

bors commented Jun 19, 2023

☔ The latest upstream changes (presumably #112238) made this pull request unmergeable. Please resolve the merge conflicts.


          Implement a few basic InstCombine optimizations

d308f02

saethlin force-pushed the fold branch from 9949d31 to d308f02 Compare

October 31, 2023 12:35

Collaborator

rust-log-analyzer commented Oct 31, 2023

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

Prepare all required actions
Getting action download info
Download action repository 'actions/checkout@v4' (SHA:b4ffde65f46336ab88eb53be808477a3936bae11)
Download action repository 'actions/upload-artifact@v3' (SHA:a8a3f3ad30e3422c9c7b888a15615d19a852ae32)
Complete job name: PR - mingw-check-tidy
git config --global core.autocrlf false
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
---
Removing intermediate container be00e418118b
 ---> b7b02b74592c
Step 6/10 : COPY host-x86_64/mingw-check/reuse-requirements.txt /tmp/
 ---> 491731421d92
Step 7/10 : RUN pip3 install --no-deps --no-cache-dir --require-hashes -r /tmp/reuse-requirements.txt     && pip3 install virtualenv
Collecting binaryornot==0.4.4
  Downloading binaryornot-0.4.4-py2.py3-none-any.whl (9.0 kB)
Collecting boolean-py==4.0
  Downloading boolean.py-4.0-py3-none-any.whl (25 kB)
---
Building wheels for collected packages: reuse
  Building wheel for reuse (pyproject.toml): started
  Building wheel for reuse (pyproject.toml): finished with status 'done'
  Created wheel for reuse: filename=reuse-1.1.0-cp310-cp310-manylinux_2_35_x86_64.whl size=180117 sha256=2196c9034bf565528bbb1ee6dad4f753eb813f58822363e6b768f09c73e4d4ff
  Stored in directory: /tmp/pip-ephem-wheel-cache-oamd25x6/wheels/c2/3c/b9/1120c2ab4bd82694f7e6f0537dc5b9a085c13e2c69a8d0c76d
Installing collected packages: boolean-py, binaryornot, setuptools, reuse, python-debian, markupsafe, license-expression, jinja2, chardet
  Attempting uninstall: setuptools
    Found existing installation: setuptools 59.6.0
    Not uninstalling setuptools at /usr/lib/python3/dist-packages, outside environment /usr
    Not uninstalling setuptools at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'setuptools'. No files were found to uninstall.
Successfully installed binaryornot-0.4.4 boolean-py-4.0 chardet-5.1.0 jinja2-3.1.2 license-expression-30.0.0 markupsafe-2.1.1 python-debian-0.1.49 reuse-1.1.0 setuptools-66.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting virtualenv
  Downloading virtualenv-20.24.6-py3-none-any.whl (3.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 62.4 MB/s eta 0:00:00
Collecting platformdirs<4,>=3.9.1
  Downloading platformdirs-3.11.0-py3-none-any.whl (17 kB)
Collecting filelock<4,>=3.12.2
  Downloading filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting distlib<1,>=0.3.7
  Downloading distlib-0.3.7-py2.py3-none-any.whl (468 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 KB 127.4 MB/s eta 0:00:00
Installing collected packages: distlib, platformdirs, filelock, virtualenv
Successfully installed distlib-0.3.7 filelock-3.13.1 platformdirs-3.11.0 virtualenv-20.24.6
Removing intermediate container 3f9a3a5770c8
 ---> 15ae60e6f0a5
Step 8/10 : COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
 ---> 48cc9780f505
 ---> 48cc9780f505
Step 9/10 : COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/
 ---> cec34b512e31
Step 10/10 : ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
Removing intermediate container 93629dd9b190
 ---> a16921979805
Successfully built a16921979805
Successfully tagged rust-ci:latest
Successfully tagged rust-ci:latest
##[endgroup]
Built container sha256:a16921979805e0c075820a6b186dc881555a2dc9aece59984be206fdfb36f80e
Uploading finished image sha256:a16921979805e0c075820a6b186dc881555a2dc9aece59984be206fdfb36f80e to https://ci-caches.rust-lang.org/docker/8849b25aebb63c7041ab10114da59fac9c6c89ff409673e53f6251b7e63c69daeaca7298d30885d05004ab27b231421908523f297222d07a53450f37e4691d72
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
a16921979805   1 second ago     /bin/sh -c #(nop)  ENV SCRIPT=TIDY_PRINT_DIF…   0B        
48cc9780f505   2 seconds ago    /bin/sh -c #(nop) COPY file:078ea1d11e7b7cda…   367B      
15ae60e6f0a5   3 seconds ago    |1 DEBIAN_FRONTEND=noninteractive /bin/sh -c…   23.9MB    
491731421d92   9 seconds ago    /bin/sh -c #(nop) COPY file:ac591dd6bc5afa66…   5.33kB    
b7b02b74592c   10 seconds ago   |1 DEBIAN_FRONTEND=noninteractive /bin/sh -c…   23.1MB    
---
<missing>      3 weeks ago      /bin/sh -c #(nop)  LABEL org.opencontainers.…   0B        
<missing>      3 weeks ago      /bin/sh -c #(nop)  ARG LAUNCHPAD_BUILD_ARCH     0B        
<missing>      3 weeks ago      /bin/sh -c #(nop)  ARG RELEASE                  0B        

<botocore.awsrequest.AWSRequest object at 0x7ff75b3eef10>
gzip: stdout: Broken pipe
xargs: docker: terminated by signal 13
https://ci-caches.rust-lang.org/docker/8849b25aebb63c7041ab10114da59fac9c6c89ff409673e53f6251b7e63c69daeaca7298d30885d05004ab27b231421908523f297222d07a53450f37e4691d72
sha256:a16921979805e0c075820a6b186dc881555a2dc9aece59984be206fdfb36f80e
---
DirectMap4k:      182208 kB
DirectMap2M:     5060608 kB
DirectMap1G:    13631488 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint
    Finished dev [unoptimized] target(s) in 0.03s
##[endgroup]
downloading https://ci-artifacts.rust-lang.org/rustc-builds-alt/ffb7ed9fa420e9bcd98d84b431b2009445b7b967/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
extracting /checkout/obj/build/cache/llvm-ffb7ed9fa420e9bcd98d84b431b2009445b7b967-true/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz to /checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm
---
    Finished release [optimized] target(s) in 25.67s
##[endgroup]
fmt check
tidy check
tidy error: the following output file is not associated with any mir-opt test, you can remove it: /checkout/tests/mir-opt/nested_getter.outer_get.InstCombine.diff
tidy error: the following output file is not associated with any mir-opt test, you can remove it: /checkout/tests/mir-opt/casts.roundtrip.PreCodegen.after.mir
removing old virtual environment
removing old virtual environment
creating virtual environment at '/checkout/obj/build/venv' using 'python3.10'
Requirement already satisfied: pip in ./build/venv/lib/python3.10/site-packages (23.3.1)
Collecting black==23.3.0 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 7))
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 32.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 32.8 MB/s eta 0:00:00
Collecting click==8.1.3 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 34))
  Downloading click-8.1.3-py3-none-any.whl (96 kB)
Collecting importlib-metadata==6.7.0 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 38))
  Downloading importlib_metadata-6.7.0-py3-none-any.whl (22 kB)
  Downloading importlib_metadata-6.7.0-py3-none-any.whl (22 kB)
Collecting mypy-extensions==1.0.0 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 42))
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Collecting packaging==23.1 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 46))
  Downloading packaging-23.1-py3-none-any.whl (48 kB)
Collecting pathspec==0.11.1 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 50))
  Downloading pathspec-0.11.1-py3-none-any.whl (29 kB)
  Downloading pathspec-0.11.1-py3-none-any.whl (29 kB)
Collecting platformdirs==3.6.0 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 54))
  Downloading platformdirs-3.6.0-py3-none-any.whl (16 kB)
Collecting ruff==0.0.272 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 58))
  Downloading ruff-0.0.272-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB)
Collecting tomli==2.0.1 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 77))
  Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting typed-ast==1.5.4 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 81))
  Downloading typed_ast-1.5.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (877 kB)
  Downloading typed_ast-1.5.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (877 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 877.7/877.7 kB 81.9 MB/s eta 0:00:00
Collecting typing-extensions==4.6.3 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 107))
  Downloading typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Collecting zipp==3.15.0 (from -r /checkout/src/tools/tidy/config/requirements.txt (line 114))
  Downloading zipp-3.15.0-py3-none-any.whl (6.8 kB)
Installing collected packages: zipp, typing-extensions, typed-ast, tomli, ruff, platformdirs, pathspec, packaging, mypy-extensions, click, importlib-metadata, black
Successfully installed black-23.3.0 click-8.1.3 importlib-metadata-6.7.0 mypy-extensions-1.0.0 packaging-23.1 pathspec-0.11.1 platformdirs-3.6.0 ruff-0.0.272 tomli-2.0.1 typed-ast-1.5.4 typing-extensions-4.6.3 zipp-3.15.0
some tidy checks failed
Build completed unsuccessfully in 0:00:49
  local time: Tue Oct 31 12:40:30 UTC 2023
  network time: Tue, 31 Oct 2023 12:40:30 GMT

scottmcm mentioned this pull request

Add Ord::cmp for primitives as a BinOp in MIR #118310

Merged

bors added a commit to rust-lang-ci/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=<try>

d5cf045

Add `Ord::cmp` for primitives as a `BinOp` in MIR

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

bors added a commit to rust-lang-ci/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=<try>

3d40b1d

Add `Ord::cmp` for primitives as a `BinOp` in MIR

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

Member Author

saethlin commented Feb 14, 2024

Nearly all the ideas in this PR have been better-implemented in GVN, which is for the best.

I'm closing this because while I think GVN still doesn't handle &*ref, I don't know how to fix the storage marker issue above and I don't intend to figure that out just to land what value remains in this PR.

saethlin closed this

saethlin deleted the fold branch

February 14, 2024 21:11

bors added a commit to rust-lang-ci/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=<try>

646278d

Add `Ord::cmp` for primitives as a `BinOp` in MIR

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

bors added a commit to rust-lang-ci/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=<try>

8eaebbb

Add `Ord::cmp` for primitives as a `BinOp` in MIR

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

scottmcm mentioned this pull request

Shrink from_raw_parts's MIR so that Vec::deref MIR-inlines again #123190

Closed

bors added a commit to rust-lang-ci/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=davidtwco

a77322c

Add `Ord::cmp` for primitives as a `BinOp` in MIR

Update: most of this OP was written months ago.  See rust-lang#118310 (comment) below for where we got to recently that made it ready for review.

---

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

bjorn3 pushed a commit to bjorn3/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=davidtwco

79a1bdd

Add `Ord::cmp` for primitives as a `BinOp` in MIR

Update: most of this OP was written months ago.  See rust-lang#118310 (comment) below for where we got to recently that made it ready for review.

---

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request


          Auto merge of rust-lang#118310 - scottmcm:three-way-compare, r=davidtwco

27ba1ba

Add `Ord::cmp` for primitives as a `BinOp` in MIR

Update: most of this OP was written months ago.  See rust-lang#118310 (comment) below for where we got to recently that made it ready for review.

---

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (rust-lang#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (rust-lang#105840) is arguably even nicer, but depends on the optimizer understanding it (llvm/llvm-project#73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with rust-lang@8811efa#diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113 -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (rust-lang#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-mir-opt S-waiting-on-author T-compiler