Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM and SPIRV-LLVM-Translator pulldown (WW26 2024) #14327

Merged
merged 3,091 commits into from
Jul 5, 2024
Merged

Conversation

iclsrc
Copy link
Contributor

@iclsrc iclsrc commented Jun 27, 2024

jiahanxie353 and others added 30 commits June 24, 2024 13:48
Support IR translation for scalable vector store
…ontraction ops (#95710)

This patch introduces pattern rewrites for reducing the rank of named
linalg contraction ops with unit spatial dim(s) to other named
contraction ops. For example `linalg.batch_matmul` with batch size 1 ->
`linalg.matmul` and `linalg.matmul` with unit LHS spatial dim ->
`linalg.vecmat`, etc. These patterns don't support reducing the rank
along reduction dimension as those don't convert to other named
contraction ops.
This PR fixes #55781 by adding the `--no-wasm-opt` and `--wasm-opt`
flags in clang to disable/enable the `wasm-opt` optimizations. The
default is to enable `wasm-opt` as before in order to not break existing
workflows.

I think that adding a warning when no flag or the `--wasm-opt` flag is
given but `wasm-opt` wasn't found in the path may be relevant here. It
allows people using `wasm-opt` to be aware of if it have been used on
their produced binary or not. The only downside I see to this is that
people already using the toolchain with the `-O` and `-Werror` flags but
without `wasm-opt` in the path will see their toolchain break (with an
easy fix: either adding `--no-wasm-opt` or add `wasm-opt` to the path).
I haven't implemented this here because I haven't figured out how to add
such a warning, and I don't know if this warning should be added here or
in another PR.

CC @sunfishcode that proposed in the associated issue to review this
patch.
When printing JSON output with --dynamic-table I noticed that the output
is invalid JSON. This patch overrides the printDynamicTable() function
in the JSONELFDumper to return a list of dictionaries for the
DynamicSection value.
Before the output was:
```
 {
    "FileSummary": {
      "File": "bin/llvm-readelf",
      "Format": "elf64-x86-64",
      "Arch": "x86_64",
      "AddressSize": "64bit",
      "LoadName": "<Not found>"
    }DynamicSection [ (35 entries)
  Tag                Type         Name/Value
  0x000000000000001D RUNPATH      Library runpath: [$ORIGIN/../lib:]
  0x0000000000000001 NEEDED       Shared library: [libm.so.6]
  0x0000000000000001 NEEDED       Shared library: [libz.so.1]
  0x0000000000000001 NEEDED       Shared library: [libzstd.so.1]
```
Now the output looks like:
```
 "DynamicSection": [
      {
        "Tag": 29,
        "Type": "RUNPATH",
        "Value": 6322,
        "Path": [
          "$ORIGIN/../lib"
        ]
      },
      {
        "Tag": 1,
        "Type": "NEEDED",
        "Value": 6109,
        "Library": "libm.so.6"
      },
```
This is to unblock #95007. Will investigate why the assertion is failing
on some arch.
This updates Clang's extension criteria to explicitly mention impacts on
other projects within the monorepo.

These changes were discussed in the following RFC:
https://discourse.llvm.org/t/rfc-require-discussion-of-impact-to-monorepo-stakeholders-when-adding-new-clang-extensions/79613
This paper clarified the lifetime of compound literal objects in odd
scopes, such as use at function prototype scope.

We do not currently implement this paper, as the new test demonstrates.
This also fixes up some asserts in copyPhysReg, loadRegFromStackSlot and
storeRegToStackSlot.
…(#96482)

One reason to want to split this up is to simplify the code added in
#93802, where it checks the SME streaming-mode requirements for a
builtin by checking for the absence of SVE. If the target guards are
separate, we can generate a table and make the Sema code to verify the
runtime mode simpler.

Another reason is to avoid an issue with a check in SveEmitter.cpp where
it ensures that the 'VerifyRuntimeMode' is set correctly for functions
that have both SVE and SME target guards:

if (!Def->isFlagSet(VerifyRuntimeMode) &&
Def->getGuard().contains("sve") &&
      Def->getGuard().contains("sme"))
    llvm_unreachable("Missing VerifyRuntimeMode flag");

However, if we ever add a new feature with "sme" in the name, even
though it is unrelated to FEAT_SME, then this code no longer works.

Note that the arm_sve.td and arm_sme.td files could do with a bit of
restructuring after this but it seems better to follow that up in an NFC
patch.
…390)

Detected with ASAN. `Operation::getLoc()` was called after erasing the
operation.

Reverts 48cf6b6, which attempted to fix
the use-after-free. (But the use-after-free is still there when the
`hasFailed` branch is taken.)
…rt (#85466)

Add support to the runtime for 6.0 spec features that allow num_threads
clause to take a list, and also make use of the strict modifier.
Provides new compiler interface functions for these features.
The passed SCC is the current SCC we're working on.
If you're building and vendoring lldb, you might need to also vendor
these files.
The static verifier flagged dead code in the check since the loop will
only execute once and never reach the iterator increment.

The loop needs to iterate twice to correctly diagnose when a statement
is after the teams.

Since there are two iterations again, reset the iterator to the first
teams directive when the double teams case is seen so the diagnostic can
report both locations.
This applies to the AOT case where we embed models in the compiler. The
change adds support for multiple models for the same agent, and allows
the user select one via a command line flag. "agent" refers to e.g. the
inline advisor or the register allocator eviction advisor.

To avoid build setup complexity, the support is delegated to the saved
model. Since saved models define computational graphs, we can generate a
composite model (this happens prior to building and embedding it in LLVM
and is not shown in this change) that exposes an extra feature with a
predefined name: `_model_selector`. The model, then, delegates
internally to contained models based on that feature value.

Model selection is expected to happen at model instantiation, there is
no current scenario for switching them afterwards.

If the model doesn't expose such a feature but the user passes one, we
report error.

If the model exposes such a feature but the user doesn't pass one, we
also report an error.

Invalid model selector values are expected to be handled by the saved
model.

Internally, the model uses a pair of uint64 values - the high and low of
the MD5 hash of the name.

A tool composing models would, then, need to:
- expose the extra feature, `_model_selector`, shape (2,), uint64 data
type
- test its value (`tf.cond` or `tf.case` in Tensorflow) against the MD5
hash, in the [high, low] order, of contained models based on a
user-specified name (which the user will then use as flag value to the
compiler)

Agents just need to add a flag to capture the name of a model and pass
it to `ReleaseModeModelRunner` at construction. This can be passed in
all cases without checking - the case where the model is not composite
and we pass an empty name, everything works as before.

This change also factors out the string flags we pass to the
`ReleaseModeModelRunner` for better maintainability (we risk confusing
parameters that are strings otherwise)
Static verifier noticed the current code has logically dead code parsing
the clause where IsComma is assigned. Fix this and improve the error
message received when a bad adjust-op is specified.

This will now be handled like 'map' where a nice diagnostic is given
with the correct values, then parsing continues on the next clause
reducing unhelpful diagnostics.
…165)

For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.

New PR of llvm/llvm-project#79553.
This patch makes libc++ build with -fsized-deallocation. That flag is
enabled by default in recent versions of Clang, so this patch will make
libc++ forward-compatible with ToT Clang.
This is necessary for 32b platforms such as ARM and i386.

Link: #94128
mig is a tool vendored with Xcode. Using apple_genrule makes sure that
the bazel selected version of Xcode is preferred, and that the action is
invalidated when that version changes.
- Adds a helper function for checking whether an argument is a
[grid_constant](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties).
- Adds support for cvta.param using changes from
llvm/llvm-project#95289
- Supports escaped grid_constant pointers conservatively, by casting all
uses to the generic address space with cvta.param.
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc])
refactored getStrideFromPointer() to compute directly on SCEVs, and
return an SCEV expression instead of a Value. However, it left behind a
call to getUniqueCastUse(), which is completely unnecessary. Remove
this, showing a positive test update, and simplify the surrounding
program logic.
…rounding modes. (#95736)

- Algorithm:
- Step 1 - Range reduction: for a double precision input `x`, return `k`
and `u` such that
    - k is an integer
    - u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor
polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) *
sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`.
- Step 4 - Use Ziv's rounding test to decide if the result is correctly
rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using
128-bit precision.
- Currently, without FMA instructions, the large range reduction only
works correctly for the default rounding mode (FE_TONEAREST).
- Provide `LIBC_MATH` flag so that users can set `LIBC_MATH =
LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4
and 4'.
Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for tools

@@ -50,7 +50,6 @@
#include "llvm/IR/DebugProgramInstruction.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"
#include <bit>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is approved in KhronosGroup/SPIRV-LLVM-Translator#2628 as well, so we will pulldown it later when it is merged.

@@ -3,8 +3,8 @@
; RUN: llvm-spirv --spirv-ext=+SPV_INTEL_cache_controls %t.bc -o %t.spv
; RUN: llvm-spirv -r %t.spv --spirv-target-env=SPV-IR -o - | llvm-dis -o - | FileCheck %s --check-prefix=CHECK-LLVM

; CHECK-SPIRV-DAG: Name [[#Func:]] "test"
; CHECK-SPIRV-DAG: Name [[#FuncGEP:]] "test_gep"
; CHECK-SPIRV-DAG: EntryPoint [[#]] [[#Func:]] "test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @MrSidims here, because this looks like a plain bug in the test, i.e. this change should be upstreamed later.
Both test and test_gep are kernels and therefore should be represented as EntryPoint instruction in SPIR-V. There could be an OpName instruction referencing them as well, but it is a part of debug info and it can be safely removed from the module. It is just a coincidence that the test works using it.

I assume that the same change will be applied in the upstream translator and therefore it looks ok to me to proceed with it at intel/llvm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a bug in the test, it's a missing patch in intel/llvm (AFAIK it's being restored, it was missing due to lack of support in IGC).
Basically the missing patch was doing to following: for every kernel function it is creating a wrapper entry point that calls the function with the code copied form the original kernel. It was done because OpenCL allows one kernel to call another kernel and to comply the requirement from SPIR-V spec that one EntryPoint must not call another EntryPoint, we had to create such wrapper (actually, translator's community has done it for us).

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both SPIR-V translator patches LGTM, but as I said in a separate comment, having an issue submitted and linked to an XFAILed test (2515f6f) would be very appreciated

@jsji
Copy link
Contributor

jsji commented Jul 3, 2024

Both SPIR-V translator patches LGTM, but as I said in a separate comment, having an issue submitted and linked to an XFAILed test (2515f6f) would be very appreciated

Thanks. #14427 opened for tracking.

@jsji jsji temporarily deployed to WindowsCILock July 3, 2024 18:24 — with GitHub Actions Inactive
@jsji
Copy link
Contributor

jsji commented Jul 3, 2024

  • Failure in IGC DEV CI Containers / Build and Push IGC Dev Docker Images (Intel Drivers Ubuntu 22.04 Docker image with dev IGC, ubunt... (pull_request)

This is known limitation that we won't be able to download the artifacts if the version is more than 1 weeks' old. The recent dev igc update #14362 was blocked by mismatched igc and compute runtime, hence depending on #14376. @sarnex has been working on #14376.

So this is irrelevant to pulldown, should NOT block the pulldown. @intel/dpcpp-devops-reviewers Can you comments as well? Thanks.

Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFE changes LGTM

@jsji
Copy link
Contributor

jsji commented Jul 3, 2024

@intel/llvm-gatekeepers I think this is ready for merge.
Last CI run are green:
Pre-commit: https://github.com/intel/llvm/actions/runs/9767449029
Post-commit: https://github.com/intel/llvm/actions/runs/9767449033

The only change in this round is an include that won't affect any functional tests.

@elizabethandrews
Copy link
Contributor

@intel/dpcpp-cfe-reviewers Would you please have a look at follow testcase updates and template arg fix.

Looks ok

Can you share the PR which made this required?

@mdtoguchi can you take a look?

@jsji
Copy link
Contributor

jsji commented Jul 3, 2024

Can you share the PR which made this required?

llvm/llvm-project@f46d146

@jsji
Copy link
Contributor

jsji commented Jul 4, 2024

XPASS in SYCL Post Commit / e2e-lin (Intel Arc A-Series Graphics with Level Zero, ["Linux", "arc"], --param matrix-xmx8=True ... / Intel Arc A-Series Graphics with Level Zero (pull_request) is common to others.

@jsji
Copy link
Contributor

jsji commented Jul 4, 2024

@intel/llvm-gatekeepers Can someone help to issue a /merge. Thanks.

@steffenlarsen
Copy link
Contributor

@jsji - There seem to be a conflict that needs to be resolved. The Arc failure should also disappear with a rerun of testing.

@jsji jsji temporarily deployed to WindowsCILock July 4, 2024 15:39 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock July 4, 2024 15:40 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock July 4, 2024 18:55 — with GitHub Actions Inactive
@jsji jsji temporarily deployed to WindowsCILock July 4, 2024 19:40 — with GitHub Actions Inactive
@jsji
Copy link
Contributor

jsji commented Jul 5, 2024

@jsji - There seem to be a conflict that needs to be resolved. The Arc failure should also disappear with a rerun of testing.

Arc failure fixed now. Also pick up fix from @vmaksimo for #14427.

@steffenlarsen @intel/llvm-gatekeepers Can you help to issue a /merge . Thanks.

@steffenlarsen
Copy link
Contributor

/merge

@bb-sycl
Copy link
Contributor

bb-sycl commented Jul 5, 2024

Fri 05 Jul 2024 04:52:20 AM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

@bb-sycl
Copy link
Contributor

bb-sycl commented Jul 5, 2024

Fri 05 Jul 2024 04:56:43 AM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

@bb-sycl bb-sycl merged commit 7a7619d into sycl Jul 5, 2024
37 of 39 checks passed
@bb-sycl bb-sycl temporarily deployed to WindowsCILock July 5, 2024 04:57 — with GitHub Actions Inactive
@bb-sycl bb-sycl temporarily deployed to WindowsCILock July 5, 2024 05:41 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.