From def1123625ff26900e41dae81603c028a828d11a Mon Sep 17 00:00:00 2001 From: thor Date: Wed, 24 Jan 2024 15:25:12 +0000 Subject: [PATCH] devel/py-llvmlite: un-break at least on Linux, update to 0.41.1 with static LLVM MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This now builds a patched LLVM that is statically linked, with llvmlite patches, as upstream wants and supports as only variant. This has not been tested widely, but has been uncondtionally BROKEN before. v0.41.1 (Oct 17, 2023)¶ This is a maintenance release that includes a workaround in the test suite for ORCJit issues on the aarch64 platform. Also, this is the last release to support the Windows 32-bit platform (win32). Pull-Requests: PR #996: fix typos found by codespell (esc) PR #997: Fix issue #880 by ensuring all sources are compiled under FreeBSD. (ke6jjj) PR #998: adding sphinx_rtd_theme to RTD build to fix build (esc) PR #1001: Fix / workaround for OrcJIT blocking issues (gmarkall) Authors: esc ke6jjj gmarkall v0.41.0 (Sept 20, 2023)¶ Pull-Requests: PR #871: Refactor native library loading (folded sklam) PR #896: drop upper limit on Python for conda recipe (esc) PR #904: Create GitHub Action for llvmlite release (apmasell) PR #934: Expose TargetLibraryInfo pass (sklam) PR #935: Disable zlib for LLVM on Windows (apmasell) PR #936: Enable querying constants and value kinds (tbennun) PR #939: Bump llvmdev build number to include the nozlib change for windows (sklam) PR #940: Update CHANGE_LOG for 0.40.0 final. (stuartarchibald) PR #942: Add ORCJITv2 support (apmasell) PR #951: Add a type hint for IntType.width (apmasell) PR #952: Fix CI failing due to unsupported target triple on non-x86 platforms. (sklam) PR #958: fixup LLVM versions in version compat table (esc) PR #959: Remove support for LLVM < 14 (apmasell) PR #960: add various bullets to release checklists and sync (esc) PR #963: Allow adding comments to generated IR (apmasell) PR #966: build: support building on GNU/Hurd (pinotree) PR #967: Expose library name in OrcJIT tracker (apmasell) PR #968: Update LLVM manual build instructions (apmasell) PR #969: update changelog on main for v0.40.1 (esc) PR #983: adding RTD conf file V2 as per request (esc) PR #985: Update release checklist post 0.41.0rc1 (esc) PR #988: Fix FreeBsd build (sklam) Authors: apmasell esc folded pinotree sklam stuartarchibald tbennun v0.40.1 (June 21, 2023)¶ Pull-Requests: PR #945: Fix #944. Add .argtypes to prevent errors in pypy. (Siu Kwan Lam) PR #947: Update SVML patch for LLVM 14 (Andre Masella) PR #949: Handle PowerPC synonyms (Andre Masella) PR #950: Fix incorrect byval and other attributes on LLVM 14 (Andre Masella) Authors: Andre Masella Siu Kwan Lam v0.40.0 (May 1, 2023)¶ This release predominantly upgrades to LLVM 14 and Python 3.11. Bindings to a large number of passes are added. The minimum supported Python version is now Python 3.8. Note: A bug was discovered in LLVM’s RuntimeDyldELF on the Aarch64 platform that can cause segfaults when cross module symbols are linked. It is necessary for JIT users to build LLVM with the patch added in PR#926. Pull-Requests: PR #827: Add more LLVM pass bindings (apmasell) PR #830: Add LLVM 14 support (apmasell) PR #860: the git tag for the RC needs an rc1 suffix (esc) PR #869: bump max Python version to 3.11 (esc sklam) PR #876: Remove llvmlite.llvmpy after deprecation (apmasell) PR #883: Adds support for calling functions with ‘tail’, ‘notail’, or ‘musttail’ markers. (bslatkin) PR #886: Simplify setup.py Python version guard (mbargull) PR #892: Bump minimum supported Python version to 3.8 (jamesobutler) PR #893: Upgrade to ubuntu-20.04 for azure pipeline CI (jamesobutler) PR #899: Run Minconda install with bash (gmarkall) PR #903: Fix flake8 config and style for flake8 6 (gmarkall) PR #905: Add YouCompleteMe configuration file and ignore vim swap files (gmarkall) PR #906: Replace importlib-resources legacy API use (sklam) PR #910: Aarch64 split build for LLVM14 (sklam) PR #921: Setup AzureCI to use py311 and llvm14 (sklam) PR #922: Fix AzureCI not using llvm14 on windows (sklam) PR #926: llvmdev recipe: Add patch that clears GOTOffsetMap (apmasell gmarkall sklam) PR #930: Update changelog for 0.40.0rc1 (sklam stuartarchibald) PR #931: Remove maximum Python version limit (sklam apmasell) PR #932: Fix wheel builds (sklam) PR #935: Disable zlib for LLVM on Windows (apmasell) PR #939: Bump llvmdev build number to include the nozlib change for windows (sklam) PR #940: Update CHANGE_LOG for 0.40.0 final. (stuartarchibald) Authors: apmasell bslatkin esc gmarkall jamesobutler mbargull sklam stuartarchibald v0.39.1 (September 1, 2022)¶ This is a maintenance release to fix build issues on MacOS. Pull-Requests: PR #752: Skip test if libm is not found (Siu Kwan Lam) PR #865: Move Azure to use macos-11 (stuartarchibald) PR #874: Add zlib as a dependency for aarch64 (esc) PR #878: Update changelog (Andre Masella) v0.39.0 (July 25, 2022)¶ This release predominantly adds new features and improves functionality. It’s now possible to directly set LLVM metadata on global variables. Functions and global variables now support the specification of a section in which they should be placed. The attribute source_file had been added to the ModuleRef class, it returns the module’s original file name. The FFI library binding to LLVM is now loaded with importlib to increase compatibility with other projects and improve start-up times. Linux builds now use the parallel option to make to speed up building the FFI library. Preliminary work to expose LLVM’s optimization-remarks interface has been undertaken. The bindings are exposed and tested, but not yet documented for general use (additional work is needed). Deprecations: The llvmlite.llvmpy module has been deprecated as the functionality it provides is available through the llvmlite.ir module. See the deprecation guide in the user documentation for details and recommendations regarding replacement. Pull-Requests: PR #328: Build C files separately on Linux and support parallel make (Michał Górny) PR #754: manylinux2014 aarch64 wheels with system compilers (esc) PR #760: add support for attaching metadata to global variables (Graham Markall John Törnblom) PR #786: Update Windows and OSX CI images. (stuartarchibald) PR #801: Update ffi.py (franzhaas) PR #803: llvm::Module::GetSourceFileName (J. Aaron Pendergrass) PR #806: Bump to v0.39.0dev (esc) PR #807: Exclude ExecutionEngine tests on linux 32 (esc) PR #809: Update CHANGE_LOG for 0.38.0 (stuartarchibald) PR #813: Add m1 support to conda build scripts (esc Stan Seibert) PR #815: update local references (esc) PR #816: remove configuration landscape service as it is no longer used (esc) PR #817: remove uppper limit on Python requires (esc) PR #819: adding rc and final release checklist templates (esc) PR #823: Add section to globals (Andreas Wrisley) PR #824: add GitHub URL for PyPi (Andrii Oriekhov) PR #825: Add flag handling to more instructions. (stuartarchibald Andre Masella) PR #826: Deprecated llvmlite.llvmpy (Andre Masella) PR #831: Format C++ code (Andre Masella) PR #832: DOC: Fix the syntax for the llvmlite discourse topic link. (stuartarchibald) PR #835: Add pre-commit hooks for clang-format (Andre Masella) PR #837: Add support for optimization remarks in pass managers (Siu Kwan Lam Andre Masella) PR #846: Cherry-Pick: #842 –> main :Changelog for 0.38.1 (esc) PR #851: adding the llvm_11_consecutive_registers.patch (esc) PR #857: Delegate passmanager remarks methods (Andre Masella) PR #858: Update CHANGE_LOG for 0.39.0 (esc Graham Markall stuartarchibald) PR #863: Update changelog for 0.39.0 release (Siu Kwan Lam) PR #864: Update release date for 0.39.0 release. (stuartarchibald) PR #867: Update CHANGE_LOG 0.39.0 final. (stuartarchibald) Authors: Andrii Oriekhov Andreas Wrisley Andre Masella esc franzhaas Graham Markall J. Aaron Pendergrass John Törnblom Michał Górny Stan Seibert Siu Kwan Lam stuartarchibald --- devel/py-llvmlite/Makefile | 84 +- devel/py-llvmlite/PLIST | 17 +- devel/py-llvmlite/distinfo | 20 +- .../files/llvm14-clear-gotoffsetmap.patch | 31 + .../llvm14-remove-use-of-clonefile.patch | 54 + devel/py-llvmlite/files/llvm14-svml.patch | 2194 +++++++++++++++++ .../patches/patch-ffi_Makefile.freebsd | 23 - .../patches/patch-ffi_Makefile.linux | 13 - .../py-llvmlite/patches/patch-ffi_targets.cpp | 17 - 9 files changed, 2372 insertions(+), 81 deletions(-) create mode 100644 devel/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch create mode 100644 devel/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch create mode 100644 devel/py-llvmlite/files/llvm14-svml.patch delete mode 100644 devel/py-llvmlite/patches/patch-ffi_Makefile.freebsd delete mode 100644 devel/py-llvmlite/patches/patch-ffi_Makefile.linux delete mode 100644 devel/py-llvmlite/patches/patch-ffi_targets.cpp diff --git a/devel/py-llvmlite/Makefile b/devel/py-llvmlite/Makefile index ecbbdbede6e1..2815915f3e7a 100644 --- a/devel/py-llvmlite/Makefile +++ b/devel/py-llvmlite/Makefile @@ -1,6 +1,6 @@ -# $NetBSD: Makefile,v 1.24 2022/08/15 19:14:43 wiz Exp $ +# $NetBSD: Makefile,v 1.25 2024/01/24 15:25:12 thor Exp $ -DISTNAME= llvmlite-0.38.1 +DISTNAME= llvmlite-0.41.1 PKGNAME= ${PYPKGPREFIX}-${DISTNAME} CATEGORIES= devel python MASTER_SITES= ${MASTER_SITE_PYPI:=l/llvmlite/} @@ -10,20 +10,89 @@ HOMEPAGE= https://llvmlite.readthedocs.io/ COMMENT= Lightweight LLVM Python binding for writing JIT compilers LICENSE= 2-clause-bsd -USE_LANGUAGES= c++14 +# Statically linking in a purpose-built LLVM as upstream urges to do. +# They support only a certain version of LLVM per release, and that +# with patches. +LLVM_VERSION= 14.0.6 +DISTFILES= ${DEFAULT_DISTFILES} +DISTFILES+= llvm-${LLVM_VERSION}.src.tar.xz +DISTFILES+= lld-${LLVM_VERSION}.src.tar.xz +DISTFILES+= libunwind-${LLVM_VERSION}.src.tar.xz -# https://github.com/numba/llvmlite/pull/802 -BROKEN= "No support for llvm 14 yet." +LLVM_SITE= https://github.com/llvm/llvm-project/releases/download/llvmorg-${LLVM_VERSION}/ +SITES.llvm-${LLVM_VERSION}.src.tar.xz= ${LLVM_SITE} +SITES.lld-${LLVM_VERSION}.src.tar.xz= ${LLVM_SITE} +SITES.libunwind-${LLVM_VERSION}.src.tar.xz= ${LLVM_SITE} -# officially supports llvm 11 as of 0.37.0 -MAKE_ENV+= LLVMLITE_SKIP_LLVM_VERSION_CHECK=1 +USE_LANGUAGES= c c++ +USE_CXX_FEATURES= c++14 +# Just for LLVM build. +USE_TOOLS= cmake + +# See +# https://github.com/numba/llvmlite/blob/main/conda-recipes/llvmdev/build.sh +# for the procedure. This is what +# https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html +# points to. Need to match up this to the correct llvmlite release, as +# they do not include this in the tarball. Python people think building +# stuff from source is hard and keep it so:-/ +# I kept some upstream comments inline. + +LLVM_CMAKE_ARGS= -DCMAKE_INSTALL_PREFIX=${WRKDIR}/llvm-inst +LLVM_CMAKE_ARGS+= -DCMAKE_BUILD_TYPE:STRING=Release +LLVM_CMAKE_ARGS+= -DLLVM_ENABLE_PROJECTS:STRING=lld +# We explicitly want static linking. +LLVM_CMAKE_ARGS+= -DBUILD_SHARED_LIBS:BOOL=OFF +LLVM_CMAKE_ARGS+= -DLLVM_ENABLE_ASSERTIONS:BOOL=ON +LLVM_CMAKE_ARGS+= -DLINK_POLLY_INTO_TOOLS:BOOL=ON +# Don't really require libxml2. Turn it off explicitly to avoid accidentally linking to system libs +LLVM_CMAKE_ARGS+= -DLLVM_ENABLE_LIBXML2:BOOL=OFF +# Urgh, llvm *really* wants to link to ncurses / terminfo and we *really* do not want it to. +LLVM_CMAKE_ARGS+= -DHAVE_TERMINFO_CURSES=OFF +LLVM_CMAKE_ARGS+= -DLLVM_ENABLE_TERMINFO=OFF +# Sometimes these are reported as unused. Whatever. +LLVM_CMAKE_ARGS+= -DHAVE_TERMINFO_NCURSES=OFF +LLVM_CMAKE_ARGS+= -DHAVE_TERMINFO_NCURSESW=OFF +LLVM_CMAKE_ARGS+= -DHAVE_TERMINFO_TERMINFO=OFF +LLVM_CMAKE_ARGS+= -DHAVE_TERMINFO_TINFO=OFF +LLVM_CMAKE_ARGS+= -DHAVE_TERMIOS_H=OFF +LLVM_CMAKE_ARGS+= -DCLANG_ENABLE_LIBXML=OFF +LLVM_CMAKE_ARGS+= -DLIBOMP_INSTALL_ALIASES=OFF +LLVM_CMAKE_ARGS+= -DLLVM_ENABLE_RTTI=OFF +# Not sure if this should be adapted for pkgsrc. +LLVM_CMAKE_ARGS+= -DLLVM_TARGETS_TO_BUILD=all +LLVM_CMAKE_ARGS+= -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly +# for llvm-lit +LLVM_CMAKE_ARGS+= -DLLVM_INCLUDE_UTILS=ON +# doesn't build without the rest of LLVM project +LLVM_CMAKE_ARGS+= -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF +LLVM_CMAKE_ARGS+= -DLLVM_INCLUDE_DOCS=OFF +LLVM_CMAKE_ARGS+= -DLLVM_INCLUDE_EXAMPLES=OFF + + +MAKE_ENV+= LLVM_CONFIG=${WRKDIR}/llvm-inst/bin/llvm-config # unable to pass LLVM bit-code files to linker MAKE_ENV.NetBSD+= CXX_FLTO_FLAGS= MAKE_ENV.NetBSD+= LD_FLTO_FLAGS= +# From 3.8 on is fine. PYTHON_VERSIONS_INCOMPATIBLE= 27 +# The llvm build detects lots of stuff outside the build sandbox ... +# a python it likes, git ... just hoping that this does not matter +# much for the static lib being used by llvmlite. + pre-configure: + cd ${WRKDIR}/llvm-${LLVM_VERSION}.src && \ + for f in ${FILESDIR}/llvm*.patch; do patch -Np2 < $$f; done + ${LN} -s llvm-${LLVM_VERSION}.src ${WRKDIR}/llvm + ${LN} -s lld-${LLVM_VERSION}.src ${WRKDIR}/lld + ${LN} -s libunwind-${LLVM_VERSION}.src ${WRKDIR}/libunwind + cd ${WRKDIR} && mkdir build && cd build && \ + cmake -G'Unix Makefiles' ${LLVM_CMAKE_ARGS} ../llvm && \ + ${MAKE} -j${MAKE_JOBS} && \ + ${MAKE} -j${MAKE_JOBS} check-llvm-unit && \ + ${MAKE} install ${SED} -e 's/ -stdlib=libc++//' ${WRKSRC}/ffi/Makefile.freebsd > ${WRKSRC}/ffi/Makefile.netbsd .include "../../mk/bsd.prefs.mk" @@ -34,6 +103,5 @@ post-install: ${DESTDIR}${PREFIX}/${PYSITELIB}/llvmlite/binding/libllvmlite.dylib .endif -.include "../../lang/llvm/buildlink3.mk" .include "../../lang/python/egg.mk" .include "../../mk/bsd.pkg.mk" diff --git a/devel/py-llvmlite/PLIST b/devel/py-llvmlite/PLIST index 045a7b96d108..984aa1a8a626 100644 --- a/devel/py-llvmlite/PLIST +++ b/devel/py-llvmlite/PLIST @@ -1,4 +1,4 @@ -@comment $NetBSD: PLIST,v 1.6 2022/01/12 21:13:50 wiz Exp $ +@comment $NetBSD: PLIST,v 1.7 2024/01/24 15:25:12 thor Exp $ ${PYSITELIB}/${EGG_INFODIR}/PKG-INFO ${PYSITELIB}/${EGG_INFODIR}/SOURCES.txt ${PYSITELIB}/${EGG_INFODIR}/dependency_links.txt @@ -46,6 +46,9 @@ ${PYSITELIB}/llvmlite/binding/object_file.pyo ${PYSITELIB}/llvmlite/binding/options.py ${PYSITELIB}/llvmlite/binding/options.pyc ${PYSITELIB}/llvmlite/binding/options.pyo +${PYSITELIB}/llvmlite/binding/orcjit.py +${PYSITELIB}/llvmlite/binding/orcjit.pyc +${PYSITELIB}/llvmlite/binding/orcjit.pyo ${PYSITELIB}/llvmlite/binding/passmanagers.py ${PYSITELIB}/llvmlite/binding/passmanagers.pyc ${PYSITELIB}/llvmlite/binding/passmanagers.pyo @@ -85,15 +88,6 @@ ${PYSITELIB}/llvmlite/ir/types.pyo ${PYSITELIB}/llvmlite/ir/values.py ${PYSITELIB}/llvmlite/ir/values.pyc ${PYSITELIB}/llvmlite/ir/values.pyo -${PYSITELIB}/llvmlite/llvmpy/__init__.py -${PYSITELIB}/llvmlite/llvmpy/__init__.pyc -${PYSITELIB}/llvmlite/llvmpy/__init__.pyo -${PYSITELIB}/llvmlite/llvmpy/core.py -${PYSITELIB}/llvmlite/llvmpy/core.pyc -${PYSITELIB}/llvmlite/llvmpy/core.pyo -${PYSITELIB}/llvmlite/llvmpy/passes.py -${PYSITELIB}/llvmlite/llvmpy/passes.pyc -${PYSITELIB}/llvmlite/llvmpy/passes.pyo ${PYSITELIB}/llvmlite/tests/__init__.py ${PYSITELIB}/llvmlite/tests/__init__.pyc ${PYSITELIB}/llvmlite/tests/__init__.pyo @@ -112,9 +106,6 @@ ${PYSITELIB}/llvmlite/tests/test_binding.pyo ${PYSITELIB}/llvmlite/tests/test_ir.py ${PYSITELIB}/llvmlite/tests/test_ir.pyc ${PYSITELIB}/llvmlite/tests/test_ir.pyo -${PYSITELIB}/llvmlite/tests/test_llvmpy.py -${PYSITELIB}/llvmlite/tests/test_llvmpy.pyc -${PYSITELIB}/llvmlite/tests/test_llvmpy.pyo ${PYSITELIB}/llvmlite/tests/test_refprune.py ${PYSITELIB}/llvmlite/tests/test_refprune.pyc ${PYSITELIB}/llvmlite/tests/test_refprune.pyo diff --git a/devel/py-llvmlite/distinfo b/devel/py-llvmlite/distinfo index 6152e91fe071..6a07843135b3 100644 --- a/devel/py-llvmlite/distinfo +++ b/devel/py-llvmlite/distinfo @@ -1,9 +1,15 @@ -$NetBSD: distinfo,v 1.21 2022/05/22 12:16:59 adam Exp $ +$NetBSD: distinfo,v 1.22 2024/01/24 15:25:12 thor Exp $ -BLAKE2s (llvmlite-0.38.1.tar.gz) = ebc28cc09fccd56c5e0c02398c61a564945c279f3951e6769743538f5153b06b -SHA512 (llvmlite-0.38.1.tar.gz) = a872a8535173426feaf8af01824a22e0a439a99e67801d8e78397137aebec82ebd53aeb16d797da86f9570f90c3362d00c2180e4d3b6c564d0d490c37b2c4ed6 -Size (llvmlite-0.38.1.tar.gz) = 129131 bytes -SHA1 (patch-ffi_Makefile.freebsd) = 39a533f17952c73ef7cbfe910bc58166a106448c -SHA1 (patch-ffi_Makefile.linux) = 64fe000e738b61f0ece5c3b6cb86a1d548955c70 +BLAKE2s (libunwind-14.0.6.src.tar.xz) = 21da632762db6524a46c1f721908b233265afe83728c1de5dd7757c662db0d99 +SHA512 (libunwind-14.0.6.src.tar.xz) = c8f3804c47ac33273238899e5682f9cb52465dcceff0e0ecf9925469620c6c9a62cc2c708a35a0e156b666e1198df52c5fff1da9d5ee3194605dfd62c296b058 +Size (libunwind-14.0.6.src.tar.xz) = 108680 bytes +BLAKE2s (lld-14.0.6.src.tar.xz) = 2fc265b616bbdbaeecc8385fda204dbc28b1d871d98f4b3b3cd5183c4d6eefc8 +SHA512 (lld-14.0.6.src.tar.xz) = fad97b441f9642b73edd240af2c026259de0951d5ace42779e9e0fcf5e417252a1d744e2fc51e754a45016621ba0c70088177f88695af1c6ce290dd26873b094 +Size (lld-14.0.6.src.tar.xz) = 1366180 bytes +BLAKE2s (llvm-14.0.6.src.tar.xz) = 2d44946453add45426569fd4187654f83881341c5c0109e4ffacc60e8f73af60 +SHA512 (llvm-14.0.6.src.tar.xz) = 6461bdde27aac17fa44c3e99a85ec47ffb181d0d4e5c3ef1c4286a59583e3b0c51af3c8081a300f45b99524340773a3011380059e3b3a571c3b0a8733e96fc1d +Size (llvm-14.0.6.src.tar.xz) = 49660136 bytes +BLAKE2s (llvmlite-0.41.1.tar.gz) = 2da761d269e0be534391778303456a1f71033e65c8e51a6719c70dab07e1ae48 +SHA512 (llvmlite-0.41.1.tar.gz) = f344c49dae8494fc3e7c1b30a516f046d718d7d1aab69bab8d9f636dce3136d3970de40f0c6fd5dc48cd7292699f0afdf1e41264820d4d421ee2d1e14e321e71 +Size (llvmlite-0.41.1.tar.gz) = 146564 bytes SHA1 (patch-ffi_build.py) = 9a992dd33f624055d5c8bea3986c4243c87b4ccf -SHA1 (patch-ffi_targets.cpp) = 99f888839916fa42848f9dad2f28468b70cf668f diff --git a/devel/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch b/devel/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch new file mode 100644 index 000000000000..239f4ab20c1b --- /dev/null +++ b/devel/py-llvmlite/files/llvm14-clear-gotoffsetmap.patch @@ -0,0 +1,31 @@ +From 322c79fff224389b4df9f24ac22965867007c2fa Mon Sep 17 00:00:00 2001 +From: Graham Markall +Date: Mon, 13 Mar 2023 21:35:11 +0000 +Subject: [PATCH] RuntimeDyldELF: Clear the GOTOffsetMap when finalizing the + load + +This needs resetting so that stale entries are not left behind when the +GOT section and index are reset. + +See llvm/llvm#61402: RuntimeDyldELF doesn't clear GOTOffsetMap in +finalizeLoad(), leading to invalid GOT relocations on AArch64 - +https://github.com/llvm/llvm-project/issues/61402. +--- + llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp b/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp +index f92618afdff6..eb3c27a9406a 100644 +--- a/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp ++++ b/llvm-14.0.6.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp +@@ -2345,6 +2345,7 @@ Error RuntimeDyldELF::finalizeLoad(const ObjectFile &Obj, + } + } + ++ GOTOffsetMap.clear(); + GOTSectionID = 0; + CurrentGOTIndex = 0; + +-- +2.34.1 + diff --git a/devel/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch b/devel/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch new file mode 100644 index 000000000000..6ef9c9d61b23 --- /dev/null +++ b/devel/py-llvmlite/files/llvm14-remove-use-of-clonefile.patch @@ -0,0 +1,54 @@ +diff -ur a/llvm-14.0.6.src/lib/Support/Unix/Path.inc b/llvm-14.0.6.src/lib/Support/Unix/Path.inc +--- a/llvm-14.0.6.src/lib/Support/Unix/Path.inc 2022-03-14 05:44:55.000000000 -0400 ++++ b/llvm-14.0.6.src/lib/Support/Unix/Path.inc 2022-09-19 11:30:59.000000000 -0400 +@@ -1462,6 +1462,7 @@ + std::error_code copy_file(const Twine &From, const Twine &To) { + std::string FromS = From.str(); + std::string ToS = To.str(); ++ /* + #if __has_builtin(__builtin_available) + if (__builtin_available(macos 10.12, *)) { + // Optimistically try to use clonefile() and handle errors, rather than +@@ -1490,6 +1491,7 @@ + // cheaper. + } + #endif ++ */ + if (!copyfile(FromS.c_str(), ToS.c_str(), /*State=*/NULL, COPYFILE_DATA)) + return std::error_code(); + return std::error_code(errno, std::generic_category()); +diff -ur a/llvm-14.0.6.src/unittests/Support/Path.cpp b/llvm-14.0.6.src/unittests/Support/Path.cpp +--- a/llvm-14.0.6.src/unittests/Support/Path.cpp 2022-03-14 05:44:55.000000000 -0400 ++++ b/llvm-14.0.6.src/unittests/Support/Path.cpp 2022-09-19 11:33:07.000000000 -0400 +@@ -2267,15 +2267,15 @@ + + EXPECT_EQ(fs::setPermissions(TempPath, fs::set_uid_on_exe), NoError); + EXPECT_TRUE(CheckPermissions(fs::set_uid_on_exe)); +- ++#if !defined(__APPLE__) + EXPECT_EQ(fs::setPermissions(TempPath, fs::set_gid_on_exe), NoError); + EXPECT_TRUE(CheckPermissions(fs::set_gid_on_exe)); +- ++#endif + // Modern BSDs require root to set the sticky bit on files. + // AIX and Solaris without root will mask off (i.e., lose) the sticky bit + // on files. + #if !defined(__FreeBSD__) && !defined(__NetBSD__) && !defined(__OpenBSD__) && \ +- !defined(_AIX) && !(defined(__sun__) && defined(__svr4__)) ++ !defined(_AIX) && !(defined(__sun__) && defined(__svr4__)) && !defined(__APPLE__) + EXPECT_EQ(fs::setPermissions(TempPath, fs::sticky_bit), NoError); + EXPECT_TRUE(CheckPermissions(fs::sticky_bit)); + +@@ -2297,10 +2297,12 @@ + EXPECT_TRUE(CheckPermissions(fs::all_perms)); + #endif // !FreeBSD && !NetBSD && !OpenBSD && !AIX + ++#if !defined(__APPLE__) + EXPECT_EQ(fs::setPermissions(TempPath, fs::all_perms & ~fs::sticky_bit), + NoError); + EXPECT_TRUE(CheckPermissions(fs::all_perms & ~fs::sticky_bit)); + #endif ++#endif + } + + #ifdef _WIN32 diff --git a/devel/py-llvmlite/files/llvm14-svml.patch b/devel/py-llvmlite/files/llvm14-svml.patch new file mode 100644 index 000000000000..c753d3f5971a --- /dev/null +++ b/devel/py-llvmlite/files/llvm14-svml.patch @@ -0,0 +1,2194 @@ +From 9de32f5474f1f78990b399214bdbb6c21f8f098e Mon Sep 17 00:00:00 2001 +From: Ivan Butygin +Date: Sun, 24 Jul 2022 20:31:29 +0200 +Subject: [PATCH] Fixes vectorizer and extends SVML support + +Fixes vectorizer and extends SVML support +Patch was updated to fix SVML calling convention issues uncovered by llvm 10. +In previous versions of patch SVML calling convention was selected based on +compilation settings. So if you try to call 256bit vector function from avx512 +code function will be called with avx512 cc which is incorrect. To fix this +SVML cc was separated into 3 different cc for 128, 256 and 512bit vector lengths +which are selected based on actual input vector length. + +Original patch merged several fixes: + +1. https://reviews.llvm.org/D47188 patch fixes the problem with improper calls +to SVML library as it has non-standard calling conventions. So accordingly it +has SVML calling conventions definitions and code to set CC to the vectorized +calls. As SVML provides several implementations for the math functions we also +took into consideration fast attribute and select more fast implementation in +such case. This work is based on original Matt Masten's work. +Author: Denis Nagorny + +2. https://reviews.llvm.org/D53035 patch implements support to legalize SVML +calls by breaking down the illegal vector call instruction into multiple legal +vector call instructions during code generation. Currently the vectorizer does +not check legality of the generated SVML (or any VECLIB) call instructions, and +this can lead to potential problems even during vector type legalization. This +patch addresses this issue by adding a legality check during code generation and +replaces the illegal SVML call with corresponding legalized instructions. +(RFC: http://lists.llvm.org/pipermail/llvm-dev/2018-June/124357.html) +Author: Karthik Senthil + +diff --git a/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h b/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h +index 17d1e3f770c14..110ff08189867 100644 +--- a/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h ++++ b/llvm-14.0.6.src/include/llvm/Analysis/TargetLibraryInfo.h +@@ -39,6 +39,12 @@ struct VecDesc { + NotLibFunc + }; + ++enum SVMLAccuracy { ++ SVML_DEFAULT, ++ SVML_HA, ++ SVML_EP ++}; ++ + /// Implementation of the target library information. + /// + /// This class constructs tables that hold the target library information and +@@ -157,7 +163,7 @@ class TargetLibraryInfoImpl { + /// Return true if the function F has a vector equivalent with vectorization + /// factor VF. + bool isFunctionVectorizable(StringRef F, const ElementCount &VF) const { +- return !getVectorizedFunction(F, VF).empty(); ++ return !getVectorizedFunction(F, VF, false).empty(); + } + + /// Return true if the function F has a vector equivalent with any +@@ -166,7 +172,10 @@ class TargetLibraryInfoImpl { + + /// Return the name of the equivalent of F, vectorized with factor VF. If no + /// such mapping exists, return the empty string. +- StringRef getVectorizedFunction(StringRef F, const ElementCount &VF) const; ++ std::string getVectorizedFunction(StringRef F, const ElementCount &VF, bool IsFast) const; ++ ++ Optional getVectorizedFunctionCallingConv( ++ StringRef F, const FunctionType &FTy, const DataLayout &DL) const; + + /// Set to true iff i32 parameters to library functions should have signext + /// or zeroext attributes if they correspond to C-level int or unsigned int, +@@ -326,8 +335,13 @@ class TargetLibraryInfo { + bool isFunctionVectorizable(StringRef F) const { + return Impl->isFunctionVectorizable(F); + } +- StringRef getVectorizedFunction(StringRef F, const ElementCount &VF) const { +- return Impl->getVectorizedFunction(F, VF); ++ std::string getVectorizedFunction(StringRef F, const ElementCount &VF, bool IsFast) const { ++ return Impl->getVectorizedFunction(F, VF, IsFast); ++ } ++ ++ Optional getVectorizedFunctionCallingConv( ++ StringRef F, const FunctionType &FTy, const DataLayout &DL) const { ++ return Impl->getVectorizedFunctionCallingConv(F, FTy, DL); + } + + /// Tests if the function is both available and a candidate for optimized code +diff --git a/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h b/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h +index 78ebb35e0ea4d..3ffb57db8b18b 100644 +--- a/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h ++++ b/llvm-14.0.6.src/include/llvm/AsmParser/LLToken.h +@@ -133,6 +133,9 @@ enum Kind { + kw_fastcc, + kw_coldcc, + kw_intel_ocl_bicc, ++ kw_intel_svmlcc128, ++ kw_intel_svmlcc256, ++ kw_intel_svmlcc512, + kw_cfguard_checkcc, + kw_x86_stdcallcc, + kw_x86_fastcallcc, +diff --git a/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt b/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt +index 0498fc269b634..23bb3de41bc1a 100644 +--- a/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt ++++ b/llvm-14.0.6.src/include/llvm/IR/CMakeLists.txt +@@ -20,3 +20,7 @@ tablegen(LLVM IntrinsicsX86.h -gen-intrinsic-enums -intrinsic-prefix=x86) + tablegen(LLVM IntrinsicsXCore.h -gen-intrinsic-enums -intrinsic-prefix=xcore) + tablegen(LLVM IntrinsicsVE.h -gen-intrinsic-enums -intrinsic-prefix=ve) + add_public_tablegen_target(intrinsics_gen) ++ ++set(LLVM_TARGET_DEFINITIONS SVML.td) ++tablegen(LLVM SVML.inc -gen-svml) ++add_public_tablegen_target(svml_gen) +diff --git a/llvm-14.0.6.src/include/llvm/IR/CallingConv.h b/llvm-14.0.6.src/include/llvm/IR/CallingConv.h +index fd28542465225..096eea1a8e19b 100644 +--- a/llvm-14.0.6.src/include/llvm/IR/CallingConv.h ++++ b/llvm-14.0.6.src/include/llvm/IR/CallingConv.h +@@ -252,6 +252,11 @@ namespace CallingConv { + /// M68k_INTR - Calling convention used for M68k interrupt routines. + M68k_INTR = 101, + ++ /// Intel_SVML - Calling conventions for Intel Short Math Vector Library ++ Intel_SVML128 = 102, ++ Intel_SVML256 = 103, ++ Intel_SVML512 = 104, ++ + /// The highest possible calling convention ID. Must be some 2^k - 1. + MaxID = 1023 + }; +diff --git a/llvm-14.0.6.src/include/llvm/IR/SVML.td b/llvm-14.0.6.src/include/llvm/IR/SVML.td +new file mode 100644 +index 0000000000000..5af710404c9d9 +--- /dev/null ++++ b/llvm-14.0.6.src/include/llvm/IR/SVML.td +@@ -0,0 +1,62 @@ ++//===-- Intel_SVML.td - Defines SVML call variants ---------*- tablegen -*-===// ++// ++// The LLVM Compiler Infrastructure ++// ++// This file is distributed under the University of Illinois Open Source ++// License. See LICENSE.TXT for details. ++// ++//===----------------------------------------------------------------------===// ++// ++// This file is used by TableGen to define the different typs of SVML function ++// variants used with -fveclib=SVML. ++// ++//===----------------------------------------------------------------------===// ++ ++class SvmlVariant; ++ ++def sin : SvmlVariant; ++def cos : SvmlVariant; ++def pow : SvmlVariant; ++def exp : SvmlVariant; ++def log : SvmlVariant; ++def acos : SvmlVariant; ++def acosh : SvmlVariant; ++def asin : SvmlVariant; ++def asinh : SvmlVariant; ++def atan2 : SvmlVariant; ++def atan : SvmlVariant; ++def atanh : SvmlVariant; ++def cbrt : SvmlVariant; ++def cdfnorm : SvmlVariant; ++def cdfnorminv : SvmlVariant; ++def cosd : SvmlVariant; ++def cosh : SvmlVariant; ++def erf : SvmlVariant; ++def erfc : SvmlVariant; ++def erfcinv : SvmlVariant; ++def erfinv : SvmlVariant; ++def exp10 : SvmlVariant; ++def exp2 : SvmlVariant; ++def expm1 : SvmlVariant; ++def hypot : SvmlVariant; ++def invsqrt : SvmlVariant; ++def log10 : SvmlVariant; ++def log1p : SvmlVariant; ++def log2 : SvmlVariant; ++def sind : SvmlVariant; ++def sinh : SvmlVariant; ++def sqrt : SvmlVariant; ++def tan : SvmlVariant; ++def tanh : SvmlVariant; ++ ++// TODO: SVML does not currently provide _ha and _ep variants of these fucnctions. ++// We should call the default variant of these functions in all cases instead. ++ ++// def nearbyint : SvmlVariant; ++// def logb : SvmlVariant; ++// def floor : SvmlVariant; ++// def fmod : SvmlVariant; ++// def ceil : SvmlVariant; ++// def trunc : SvmlVariant; ++// def rint : SvmlVariant; ++// def round : SvmlVariant; +diff --git a/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt b/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt +index aec84124129f4..98286e166fbe2 100644 +--- a/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt ++++ b/llvm-14.0.6.src/lib/Analysis/CMakeLists.txt +@@ -150,6 +150,7 @@ add_llvm_component_library(LLVMAnalysis + DEPENDS + intrinsics_gen + ${MLDeps} ++ svml_gen + + LINK_LIBS + ${MLLinkDeps} +diff --git a/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp b/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp +index 02923c2c7eb14..83abde28a62a4 100644 +--- a/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp ++++ b/llvm-14.0.6.src/lib/Analysis/TargetLibraryInfo.cpp +@@ -110,6 +110,11 @@ bool TargetLibraryInfoImpl::isCallingConvCCompatible(Function *F) { + F->getFunctionType()); + } + ++static std::string svmlMangle(StringRef FnName, const bool IsFast) { ++ std::string FullName = FnName.str(); ++ return IsFast ? FullName : FullName + "_ha"; ++} ++ + /// Initialize the set of available library functions based on the specified + /// target triple. This should be carefully written so that a missing target + /// triple gets a sane set of defaults. +@@ -1876,8 +1881,9 @@ void TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib( + } + case SVML: { + const VecDesc VecFuncs[] = { +- #define TLI_DEFINE_SVML_VECFUNCS +- #include "llvm/Analysis/VecFuncs.def" ++ #define GET_SVML_VARIANTS ++ #include "llvm/IR/SVML.inc" ++ #undef GET_SVML_VARIANTS + }; + addVectorizableFunctions(VecFuncs); + break; +@@ -1897,20 +1903,51 @@ bool TargetLibraryInfoImpl::isFunctionVectorizable(StringRef funcName) const { + return I != VectorDescs.end() && StringRef(I->ScalarFnName) == funcName; + } + +-StringRef +-TargetLibraryInfoImpl::getVectorizedFunction(StringRef F, +- const ElementCount &VF) const { ++std::string TargetLibraryInfoImpl::getVectorizedFunction(StringRef F, ++ const ElementCount &VF, ++ bool IsFast) const { ++ bool FromSVML = ClVectorLibrary == SVML; + F = sanitizeFunctionName(F); + if (F.empty()) +- return F; ++ return F.str(); + std::vector::const_iterator I = + llvm::lower_bound(VectorDescs, F, compareWithScalarFnName); + while (I != VectorDescs.end() && StringRef(I->ScalarFnName) == F) { +- if (I->VectorizationFactor == VF) +- return I->VectorFnName; ++ if (I->VectorizationFactor == VF) { ++ if (FromSVML) { ++ return svmlMangle(I->VectorFnName, IsFast); ++ } ++ return I->VectorFnName.str(); ++ } + ++I; + } +- return StringRef(); ++ return std::string(); ++} ++ ++static CallingConv::ID getSVMLCallingConv(const DataLayout &DL, const FunctionType &FType) ++{ ++ assert(isa(FType.getReturnType())); ++ auto *VecCallRetType = cast(FType.getReturnType()); ++ auto TypeBitWidth = DL.getTypeSizeInBits(VecCallRetType); ++ if (TypeBitWidth == 128) { ++ return CallingConv::Intel_SVML128; ++ } else if (TypeBitWidth == 256) { ++ return CallingConv::Intel_SVML256; ++ } else if (TypeBitWidth == 512) { ++ return CallingConv::Intel_SVML512; ++ } else { ++ llvm_unreachable("Invalid vector width"); ++ } ++ return 0; // not reachable ++} ++ ++Optional ++TargetLibraryInfoImpl::getVectorizedFunctionCallingConv( ++ StringRef F, const FunctionType &FTy, const DataLayout &DL) const { ++ if (F.startswith("__svml")) { ++ return getSVMLCallingConv(DL, FTy); ++ } ++ return {}; + } + + TargetLibraryInfo TargetLibraryAnalysis::run(const Function &F, +diff --git a/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp b/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp +index e3bf41c9721b6..4f9dccd4e0724 100644 +--- a/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp ++++ b/llvm-14.0.6.src/lib/AsmParser/LLLexer.cpp +@@ -603,6 +603,9 @@ lltok::Kind LLLexer::LexIdentifier() { + KEYWORD(spir_kernel); + KEYWORD(spir_func); + KEYWORD(intel_ocl_bicc); ++ KEYWORD(intel_svmlcc128); ++ KEYWORD(intel_svmlcc256); ++ KEYWORD(intel_svmlcc512); + KEYWORD(x86_64_sysvcc); + KEYWORD(win64cc); + KEYWORD(x86_regcallcc); +diff --git a/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp b/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp +index 432ec151cf8ae..3bd6ee61024b8 100644 +--- a/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp ++++ b/llvm-14.0.6.src/lib/AsmParser/LLParser.cpp +@@ -1781,6 +1781,9 @@ void LLParser::parseOptionalDLLStorageClass(unsigned &Res) { + /// ::= 'ccc' + /// ::= 'fastcc' + /// ::= 'intel_ocl_bicc' ++/// ::= 'intel_svmlcc128' ++/// ::= 'intel_svmlcc256' ++/// ::= 'intel_svmlcc512' + /// ::= 'coldcc' + /// ::= 'cfguard_checkcc' + /// ::= 'x86_stdcallcc' +@@ -1850,6 +1853,9 @@ bool LLParser::parseOptionalCallingConv(unsigned &CC) { + case lltok::kw_spir_kernel: CC = CallingConv::SPIR_KERNEL; break; + case lltok::kw_spir_func: CC = CallingConv::SPIR_FUNC; break; + case lltok::kw_intel_ocl_bicc: CC = CallingConv::Intel_OCL_BI; break; ++ case lltok::kw_intel_svmlcc128:CC = CallingConv::Intel_SVML128; break; ++ case lltok::kw_intel_svmlcc256:CC = CallingConv::Intel_SVML256; break; ++ case lltok::kw_intel_svmlcc512:CC = CallingConv::Intel_SVML512; break; + case lltok::kw_x86_64_sysvcc: CC = CallingConv::X86_64_SysV; break; + case lltok::kw_win64cc: CC = CallingConv::Win64; break; + case lltok::kw_webkit_jscc: CC = CallingConv::WebKit_JS; break; +diff --git a/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp b/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp +index 0ff045fa787e8..175651949ef85 100644 +--- a/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp ++++ b/llvm-14.0.6.src/lib/CodeGen/ReplaceWithVeclib.cpp +@@ -157,7 +157,7 @@ static bool replaceWithCallToVeclib(const TargetLibraryInfo &TLI, + // and the exact vector width of the call operands in the + // TargetLibraryInfo. + const std::string TLIName = +- std::string(TLI.getVectorizedFunction(ScalarName, VF)); ++ std::string(TLI.getVectorizedFunction(ScalarName, VF, CI.getFastMathFlags().isFast())); + + LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Looking up TLI mapping for `" + << ScalarName << "` and vector width " << VF << ".\n"); +diff --git a/llvm-14.0.6.src/lib/IR/AsmWriter.cpp b/llvm-14.0.6.src/lib/IR/AsmWriter.cpp +index 179754e275b03..c4e95752c97e8 100644 +--- a/llvm-14.0.6.src/lib/IR/AsmWriter.cpp ++++ b/llvm-14.0.6.src/lib/IR/AsmWriter.cpp +@@ -306,6 +306,9 @@ static void PrintCallingConv(unsigned cc, raw_ostream &Out) { + case CallingConv::X86_RegCall: Out << "x86_regcallcc"; break; + case CallingConv::X86_VectorCall:Out << "x86_vectorcallcc"; break; + case CallingConv::Intel_OCL_BI: Out << "intel_ocl_bicc"; break; ++ case CallingConv::Intel_SVML128: Out << "intel_svmlcc128"; break; ++ case CallingConv::Intel_SVML256: Out << "intel_svmlcc256"; break; ++ case CallingConv::Intel_SVML512: Out << "intel_svmlcc512"; break; + case CallingConv::ARM_APCS: Out << "arm_apcscc"; break; + case CallingConv::ARM_AAPCS: Out << "arm_aapcscc"; break; + case CallingConv::ARM_AAPCS_VFP: Out << "arm_aapcs_vfpcc"; break; +diff --git a/llvm-14.0.6.src/lib/IR/Verifier.cpp b/llvm-14.0.6.src/lib/IR/Verifier.cpp +index 989d01e2e3950..bae7382a36e13 100644 +--- a/llvm-14.0.6.src/lib/IR/Verifier.cpp ++++ b/llvm-14.0.6.src/lib/IR/Verifier.cpp +@@ -2457,6 +2457,9 @@ void Verifier::visitFunction(const Function &F) { + case CallingConv::Fast: + case CallingConv::Cold: + case CallingConv::Intel_OCL_BI: ++ case CallingConv::Intel_SVML128: ++ case CallingConv::Intel_SVML256: ++ case CallingConv::Intel_SVML512: + case CallingConv::PTX_Kernel: + case CallingConv::PTX_Device: + Assert(!F.isVarArg(), "Calling convention does not support varargs or " +diff --git a/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td b/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td +index 4dd8a6cdd8982..12e65521215e4 100644 +--- a/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td ++++ b/llvm-14.0.6.src/lib/Target/X86/X86CallingConv.td +@@ -498,6 +498,21 @@ def RetCC_X86_64 : CallingConv<[ + CCDelegateTo + ]>; + ++// Intel_SVML return-value convention. ++def RetCC_Intel_SVML : CallingConv<[ ++ // Vector types are returned in XMM0,XMM1 ++ CCIfType<[v4f32, v2f64], ++ CCAssignToReg<[XMM0,XMM1]>>, ++ ++ // 256-bit FP vectors ++ CCIfType<[v8f32, v4f64], ++ CCAssignToReg<[YMM0,YMM1]>>, ++ ++ // 512-bit FP vectors ++ CCIfType<[v16f32, v8f64], ++ CCAssignToReg<[ZMM0,ZMM1]>> ++]>; ++ + // This is the return-value convention used for the entire X86 backend. + let Entry = 1 in + def RetCC_X86 : CallingConv<[ +@@ -505,6 +520,10 @@ def RetCC_X86 : CallingConv<[ + // Check if this is the Intel OpenCL built-ins calling convention + CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo>, + ++ CCIfCC<"CallingConv::Intel_SVML128", CCDelegateTo>, ++ CCIfCC<"CallingConv::Intel_SVML256", CCDelegateTo>, ++ CCIfCC<"CallingConv::Intel_SVML512", CCDelegateTo>, ++ + CCIfSubtarget<"is64Bit()", CCDelegateTo>, + CCDelegateTo + ]>; +@@ -1064,6 +1083,30 @@ def CC_Intel_OCL_BI : CallingConv<[ + CCDelegateTo + ]>; + ++// X86-64 Intel Short Vector Math Library calling convention. ++def CC_Intel_SVML : CallingConv<[ ++ ++ // The SSE vector arguments are passed in XMM registers. ++ CCIfType<[v4f32, v2f64], ++ CCAssignToReg<[XMM0, XMM1, XMM2]>>, ++ ++ // The 256-bit vector arguments are passed in YMM registers. ++ CCIfType<[v8f32, v4f64], ++ CCAssignToReg<[YMM0, YMM1, YMM2]>>, ++ ++ // The 512-bit vector arguments are passed in ZMM registers. ++ CCIfType<[v16f32, v8f64], ++ CCAssignToReg<[ZMM0, ZMM1, ZMM2]>> ++]>; ++ ++def CC_X86_32_Intr : CallingConv<[ ++ CCAssignToStack<4, 4> ++]>; ++ ++def CC_X86_64_Intr : CallingConv<[ ++ CCAssignToStack<8, 8> ++]>; ++ + //===----------------------------------------------------------------------===// + // X86 Root Argument Calling Conventions + //===----------------------------------------------------------------------===// +@@ -1115,6 +1158,9 @@ def CC_X86_64 : CallingConv<[ + let Entry = 1 in + def CC_X86 : CallingConv<[ + CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo>, ++ CCIfCC<"CallingConv::Intel_SVML128", CCDelegateTo>, ++ CCIfCC<"CallingConv::Intel_SVML256", CCDelegateTo>, ++ CCIfCC<"CallingConv::Intel_SVML512", CCDelegateTo>, + CCIfSubtarget<"is64Bit()", CCDelegateTo>, + CCDelegateTo + ]>; +@@ -1227,3 +1273,27 @@ def CSR_SysV64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP, + (sequence "R%u", 12, 15))>; + def CSR_SysV64_RegCall : CalleeSavedRegs<(add CSR_SysV64_RegCall_NoSSE, + (sequence "XMM%u", 8, 15))>; ++ ++// SVML calling convention ++def CSR_32_Intel_SVML : CalleeSavedRegs<(add CSR_32_RegCall_NoSSE)>; ++def CSR_32_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_32_Intel_SVML, ++ K4, K5, K6, K7)>; ++ ++def CSR_64_Intel_SVML_NoSSE : CalleeSavedRegs<(add RBX, RSI, RDI, RBP, RSP, R12, R13, R14, R15)>; ++ ++def CSR_64_Intel_SVML : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "XMM%u", 8, 15))>; ++def CSR_Win64_Intel_SVML : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "XMM%u", 6, 15))>; ++ ++def CSR_64_Intel_SVML_AVX : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "YMM%u", 8, 15))>; ++def CSR_Win64_Intel_SVML_AVX : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "YMM%u", 6, 15))>; ++ ++def CSR_64_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "ZMM%u", 16, 31), ++ K4, K5, K6, K7)>; ++def CSR_Win64_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE, ++ (sequence "ZMM%u", 6, 21), ++ K4, K5, K6, K7)>; +diff --git a/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp b/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp +index 8bb7e81e19bbd..1780ce3fc6467 100644 +--- a/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp ++++ b/llvm-14.0.6.src/lib/Target/X86/X86ISelLowering.cpp +@@ -3788,7 +3788,8 @@ void VarArgsLoweringHelper::forwardMustTailParameters(SDValue &Chain) { + // FIXME: Only some x86_32 calling conventions support AVX512. + if (Subtarget.useAVX512Regs() && + (is64Bit() || (CallConv == CallingConv::X86_VectorCall || +- CallConv == CallingConv::Intel_OCL_BI))) ++ CallConv == CallingConv::Intel_OCL_BI || ++ CallConv == CallingConv::Intel_SVML512))) + VecVT = MVT::v16f32; + else if (Subtarget.hasAVX()) + VecVT = MVT::v8f32; +diff --git a/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp b/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp +index 130cb61cdde24..9eec3b25ca9f2 100644 +--- a/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp ++++ b/llvm-14.0.6.src/lib/Target/X86/X86RegisterInfo.cpp +@@ -272,6 +272,42 @@ X86RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC, + } + } + ++namespace { ++std::pair getSVMLRegMaskAndSaveList( ++ bool Is64Bit, bool IsWin64, CallingConv::ID CC) { ++ assert(CC >= CallingConv::Intel_SVML128 && CC <= CallingConv::Intel_SVML512); ++ unsigned Abi = CC - CallingConv::Intel_SVML128 ; // 0 - 128, 1 - 256, 2 - 512 ++ ++ const std::pair Abi64[] = { ++ std::make_pair(CSR_64_Intel_SVML_RegMask, CSR_64_Intel_SVML_SaveList), ++ std::make_pair(CSR_64_Intel_SVML_AVX_RegMask, CSR_64_Intel_SVML_AVX_SaveList), ++ std::make_pair(CSR_64_Intel_SVML_AVX512_RegMask, CSR_64_Intel_SVML_AVX512_SaveList), ++ }; ++ ++ const std::pair AbiWin64[] = { ++ std::make_pair(CSR_Win64_Intel_SVML_RegMask, CSR_Win64_Intel_SVML_SaveList), ++ std::make_pair(CSR_Win64_Intel_SVML_AVX_RegMask, CSR_Win64_Intel_SVML_AVX_SaveList), ++ std::make_pair(CSR_Win64_Intel_SVML_AVX512_RegMask, CSR_Win64_Intel_SVML_AVX512_SaveList), ++ }; ++ ++ const std::pair Abi32[] = { ++ std::make_pair(CSR_32_Intel_SVML_RegMask, CSR_32_Intel_SVML_SaveList), ++ std::make_pair(CSR_32_Intel_SVML_RegMask, CSR_32_Intel_SVML_SaveList), ++ std::make_pair(CSR_32_Intel_SVML_AVX512_RegMask, CSR_32_Intel_SVML_AVX512_SaveList), ++ }; ++ ++ if (Is64Bit) { ++ if (IsWin64) { ++ return AbiWin64[Abi]; ++ } else { ++ return Abi64[Abi]; ++ } ++ } else { ++ return Abi32[Abi]; ++ } ++} ++} ++ + const MCPhysReg * + X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const { + assert(MF && "MachineFunction required"); +@@ -327,6 +363,11 @@ X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const { + return CSR_64_Intel_OCL_BI_SaveList; + break; + } ++ case CallingConv::Intel_SVML128: ++ case CallingConv::Intel_SVML256: ++ case CallingConv::Intel_SVML512: { ++ return getSVMLRegMaskAndSaveList(Is64Bit, IsWin64, CC).second; ++ } + case CallingConv::HHVM: + return CSR_64_HHVM_SaveList; + case CallingConv::X86_RegCall: +@@ -449,6 +490,11 @@ X86RegisterInfo::getCallPreservedMask(const MachineFunction &MF, + return CSR_64_Intel_OCL_BI_RegMask; + break; + } ++ case CallingConv::Intel_SVML128: ++ case CallingConv::Intel_SVML256: ++ case CallingConv::Intel_SVML512: { ++ return getSVMLRegMaskAndSaveList(Is64Bit, IsWin64, CC).first; ++ } + case CallingConv::HHVM: + return CSR_64_HHVM_RegMask; + case CallingConv::X86_RegCall: +diff --git a/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h b/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h +index 5d773f0c57dfb..6bdf5bc6f3fe9 100644 +--- a/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h ++++ b/llvm-14.0.6.src/lib/Target/X86/X86Subtarget.h +@@ -916,6 +916,9 @@ class X86Subtarget final : public X86GenSubtargetInfo { + case CallingConv::X86_ThisCall: + case CallingConv::X86_VectorCall: + case CallingConv::Intel_OCL_BI: ++ case CallingConv::Intel_SVML128: ++ case CallingConv::Intel_SVML256: ++ case CallingConv::Intel_SVML512: + return isTargetWin64(); + // This convention allows using the Win64 convention on other targets. + case CallingConv::Win64: +diff --git a/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp b/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp +index 047bf5569ded3..59897785f156c 100644 +--- a/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp ++++ b/llvm-14.0.6.src/lib/Transforms/Utils/InjectTLIMappings.cpp +@@ -92,7 +92,7 @@ static void addMappingsFromTLI(const TargetLibraryInfo &TLI, CallInst &CI) { + + auto AddVariantDecl = [&](const ElementCount &VF) { + const std::string TLIName = +- std::string(TLI.getVectorizedFunction(ScalarName, VF)); ++ std::string(TLI.getVectorizedFunction(ScalarName, VF, CI.getFastMathFlags().isFast())); + if (!TLIName.empty()) { + std::string MangledName = + VFABI::mangleTLIVectorName(TLIName, ScalarName, CI.arg_size(), VF); +diff --git a/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp +index 46ff0994e04e7..f472af5e1a835 100644 +--- a/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp ++++ b/llvm-14.0.6.src/lib/Transforms/Vectorize/LoopVectorize.cpp +@@ -712,6 +712,27 @@ class InnerLoopVectorizer { + virtual void printDebugTracesAtStart(){}; + virtual void printDebugTracesAtEnd(){}; + ++ /// Check legality of given SVML call instruction \p VecCall generated for ++ /// scalar call \p Call. If illegal then the appropriate legal instruction ++ /// is returned. ++ Value *legalizeSVMLCall(CallInst *VecCall, CallInst *Call); ++ ++ /// Returns the legal VF for a call instruction \p CI using TTI information ++ /// and vector type. ++ ElementCount getLegalVFForCall(CallInst *CI); ++ ++ /// Partially vectorize a given call \p Call by breaking it down into multiple ++ /// calls of \p LegalCall, decided by the variant VF \p LegalVF. ++ Value *partialVectorizeCall(CallInst *Call, CallInst *LegalCall, ++ unsigned LegalVF); ++ ++ /// Generate shufflevector instruction for a vector value \p V based on the ++ /// current \p Part and a smaller VF \p LegalVF. ++ Value *generateShuffleValue(Value *V, unsigned LegalVF, unsigned Part); ++ ++ /// Combine partially vectorized calls stored in \p CallResults. ++ Value *combinePartialVecCalls(SmallVectorImpl &CallResults); ++ + /// The original loop. + Loop *OrigLoop; + +@@ -4596,6 +4617,17 @@ static bool mayDivideByZero(Instruction &I) { + return !CInt || CInt->isZero(); + } + ++static void setVectorFunctionCallingConv(CallInst &CI, const DataLayout &DL, ++ const TargetLibraryInfo &TLI) { ++ Function *VectorF = CI.getCalledFunction(); ++ FunctionType *FTy = VectorF->getFunctionType(); ++ StringRef VFName = VectorF->getName(); ++ auto CC = TLI.getVectorizedFunctionCallingConv(VFName, *FTy, DL); ++ if (CC) { ++ CI.setCallingConv(*CC); ++ } ++} ++ + void InnerLoopVectorizer::widenCallInstruction(CallInst &I, VPValue *Def, + VPUser &ArgOperands, + VPTransformState &State) { +@@ -4664,9 +4696,246 @@ void InnerLoopVectorizer::widenCallInstruction(CallInst &I, VPValue *Def, + if (isa(V)) + V->copyFastMathFlags(CI); + ++ const DataLayout &DL = V->getModule()->getDataLayout(); ++ setVectorFunctionCallingConv(*V, DL, *TLI); ++ ++ // Perform legalization of SVML call instruction only if original call ++ // was not Intrinsic ++ if (!UseVectorIntrinsic && ++ (V->getCalledFunction()->getName()).startswith("__svml")) { ++ // assert((V->getCalledFunction()->getName()).startswith("__svml")); ++ LLVM_DEBUG(dbgs() << "LV(SVML): Vector call inst:"; V->dump()); ++ auto *LegalV = cast(legalizeSVMLCall(V, CI)); ++ LLVM_DEBUG(dbgs() << "LV: Completed SVML legalization.\n LegalV: "; ++ LegalV->dump()); ++ State.set(Def, LegalV, Part); ++ addMetadata(LegalV, &I); ++ } else { + State.set(Def, V, Part); + addMetadata(V, &I); ++ } ++ } ++} ++ ++//===----------------------------------------------------------------------===// ++// Implementation of functions for SVML vector call legalization. ++//===----------------------------------------------------------------------===// ++// ++// Unlike other VECLIBs, SVML needs to be used with target-legal ++// vector types. Otherwise, link failures and/or runtime failures ++// will occur. A motivating example could be - ++// ++// double *a; ++// float *b; ++// #pragma clang loop vectorize_width(8) ++// for(i = 0; i < N; ++i) { ++// a[i] = sin(i); // Legal SVML VF must be 4 or below on AVX ++// b[i] = cosf(i); // VF can be 8 on AVX since 8 floats can fit in YMM ++// } ++// ++// Current implementation of vector code generation in LV is ++// driven based on a single VF (in InnerLoopVectorizer::VF). This ++// inhibits the flexibility of adjusting/choosing different VF ++// for different instructions. ++// ++// Due to this limitation it is much more straightforward to ++// first generate the illegal sin8 (svml_sin8 for SVML vector ++// library) call and then legalize it than trying to avoid ++// generating illegal code from the beginning. ++// ++// A solution for this problem is to check legality of the ++// call instruction right after generating it in vectorizer and ++// if it is illegal we split the call arguments and issue multiple ++// calls to match the legal VF. This is demonstrated currently for ++// the SVML vector library calls (non-intrinsic version only). ++// ++// Future directions and extensions: ++// 1) This legalization example shows us that a good direction ++// for the VPlan framework would be to model the vector call ++// instructions in a way that legal VF for each call is chosen ++// correctly within vectorizer and illegal code generation is ++// avoided. ++// 2) This logic can also be extended to general vector functions ++// i.e. legalization OpenMP decalre simd functions. The ++// requirements needed for this will be documented soon. ++ ++Value *InnerLoopVectorizer::legalizeSVMLCall(CallInst *VecCall, ++ CallInst *Call) { ++ ElementCount LegalVF = getLegalVFForCall(VecCall); ++ ++ assert(LegalVF.getKnownMinValue() > 1 && ++ "Legal VF for SVML call must be greater than 1 to vectorize"); ++ ++ if (LegalVF == VF) ++ return VecCall; ++ else if (LegalVF.getKnownMinValue() > VF.getKnownMinValue()) ++ // TODO: handle case when we are underfilling vectors ++ return VecCall; ++ ++ // Legal VF for this SVML call is smaller than chosen VF, break it down into ++ // smaller call instructions ++ ++ // Convert args, types and return type to match legal VF ++ SmallVector NewTys; ++ SmallVector NewArgs; ++ ++ for (Value *ArgOperand : Call->args()) { ++ Type *Ty = ToVectorTy(ArgOperand->getType(), LegalVF); ++ NewTys.push_back(Ty); ++ NewArgs.push_back(UndefValue::get(Ty)); + } ++ ++ // Construct legal vector function ++ const VFShape Shape = ++ VFShape::get(*Call, LegalVF /*EC*/, false /*HasGlobalPred*/); ++ Function *LegalVectorF = VFDatabase(*Call).getVectorizedFunction(Shape); ++ assert(LegalVectorF != nullptr && "Can't create legal vector function."); ++ ++ LLVM_DEBUG(dbgs() << "LV(SVML): LegalVectorF: "; LegalVectorF->dump()); ++ ++ SmallVector OpBundles; ++ Call->getOperandBundlesAsDefs(OpBundles); ++ auto LegalV = std::unique_ptr(CallInst::Create(LegalVectorF, NewArgs, OpBundles)); ++ ++ if (isa(LegalV)) ++ LegalV->copyFastMathFlags(Call); ++ ++ const DataLayout &DL = VecCall->getModule()->getDataLayout(); ++ // Set SVML calling conventions ++ setVectorFunctionCallingConv(*LegalV, DL, *TLI); ++ ++ LLVM_DEBUG(dbgs() << "LV(SVML): LegalV: "; LegalV->dump()); ++ ++ Value *LegalizedCall = partialVectorizeCall(VecCall, LegalV.get(), LegalVF.getKnownMinValue()); ++ ++ LLVM_DEBUG(dbgs() << "LV(SVML): LegalizedCall: "; LegalizedCall->dump()); ++ ++ // Remove the illegal call from Builder ++ VecCall->eraseFromParent(); ++ ++ return LegalizedCall; ++} ++ ++ElementCount InnerLoopVectorizer::getLegalVFForCall(CallInst *CI) { ++ const DataLayout DL = CI->getModule()->getDataLayout(); ++ FunctionType *CallFT = CI->getFunctionType(); ++ // All functions that need legalization should have a vector return type. ++ // This is true for all SVML functions that are currently supported. ++ assert(isa(CallFT->getReturnType()) && ++ "Return type of call that needs legalization is not a vector."); ++ auto *VecCallRetType = cast(CallFT->getReturnType()); ++ Type *ElemType = VecCallRetType->getElementType(); ++ ++ unsigned TypeBitWidth = DL.getTypeSizeInBits(ElemType); ++ unsigned VectorBitWidth = TTI->getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector); ++ unsigned LegalVF = VectorBitWidth / TypeBitWidth; ++ ++ LLVM_DEBUG(dbgs() << "LV(SVML): Type Bit Width: " << TypeBitWidth << "\n"); ++ LLVM_DEBUG(dbgs() << "LV(SVML): Current VL: " << VF << "\n"); ++ LLVM_DEBUG(dbgs() << "LV(SVML): Vector Bit Width: " << VectorBitWidth ++ << "\n"); ++ LLVM_DEBUG(dbgs() << "LV(SVML): Legal Target VL: " << LegalVF << "\n"); ++ ++ return ElementCount::getFixed(LegalVF); ++} ++ ++// Partial vectorization of a call instruction is achieved by making clones of ++// \p LegalCall and overwriting its argument operands with shufflevector ++// equivalent decided based on \p LegalVF and current Part being filled. ++Value *InnerLoopVectorizer::partialVectorizeCall(CallInst *Call, ++ CallInst *LegalCall, ++ unsigned LegalVF) { ++ unsigned NumParts = VF.getKnownMinValue() / LegalVF; ++ LLVM_DEBUG(dbgs() << "LV(SVML): NumParts: " << NumParts << "\n"); ++ SmallVector CallResults; ++ ++ for (unsigned Part = 0; Part < NumParts; ++Part) { ++ auto *ClonedCall = cast(LegalCall->clone()); ++ ++ // Update the arg operand of cloned call to shufflevector ++ for (unsigned i = 0, ie = Call->arg_size(); i != ie; ++i) { ++ auto *NewOp = generateShuffleValue(Call->getArgOperand(i), LegalVF, Part); ++ ClonedCall->setArgOperand(i, NewOp); ++ } ++ ++ LLVM_DEBUG(dbgs() << "LV(SVML): ClonedCall: "; ClonedCall->dump()); ++ ++ auto *PartialVecCall = Builder.Insert(ClonedCall); ++ CallResults.push_back(PartialVecCall); ++ } ++ ++ return combinePartialVecCalls(CallResults); ++} ++ ++Value *InnerLoopVectorizer::generateShuffleValue(Value *V, unsigned LegalVF, ++ unsigned Part) { ++ // Example: ++ // Consider the following vector code - ++ // %1 = sitofp <4 x i32> %0 to <4 x double> ++ // %2 = call <4 x double> @__svml_sin4(<4 x double> %1) ++ // ++ // If the LegalVF is 2, we partially vectorize the sin4 call by invoking ++ // generateShuffleValue on the operand %1 ++ // If Part = 1, output value is - ++ // %shuffle = shufflevector <4 x double> %1, <4 x double> undef, <2 x i32> ++ // and if Part = 2, output is - ++ // %shuffle7 =shufflevector <4 x double> %1, <4 x double> undef, <2 x i32> ++ ++ assert(isa(V->getType()) && ++ "Cannot generate shuffles for non-vector values."); ++ SmallVector ShuffleMask; ++ Value *Undef = UndefValue::get(V->getType()); ++ ++ unsigned ElemIdx = Part * LegalVF; ++ ++ for (unsigned K = 0; K < LegalVF; K++) ++ ShuffleMask.push_back(static_cast(ElemIdx + K)); ++ ++ auto *ShuffleInst = ++ Builder.CreateShuffleVector(V, Undef, ShuffleMask, "shuffle"); ++ ++ return ShuffleInst; ++} ++ ++// Results of the calls executed by smaller legal call instructions must be ++// combined to match the original VF for later use. This is done by constructing ++// shufflevector instructions in a cumulative fashion. ++Value *InnerLoopVectorizer::combinePartialVecCalls( ++ SmallVectorImpl &CallResults) { ++ assert(isa(CallResults[0]->getType()) && ++ "Cannot combine calls with non-vector results."); ++ auto *CallType = cast(CallResults[0]->getType()); ++ ++ Value *CombinedShuffle; ++ unsigned NumElems = CallType->getElementCount().getKnownMinValue() * 2; ++ unsigned NumRegs = CallResults.size(); ++ ++ assert(NumRegs >= 2 && isPowerOf2_32(NumRegs) && ++ "Number of partial vector calls to combine must be a power of 2 " ++ "(atleast 2^1)"); ++ ++ while (NumRegs > 1) { ++ for (unsigned I = 0; I < NumRegs; I += 2) { ++ SmallVector ShuffleMask; ++ for (unsigned J = 0; J < NumElems; J++) ++ ShuffleMask.push_back(static_cast(J)); ++ ++ CombinedShuffle = Builder.CreateShuffleVector( ++ CallResults[I], CallResults[I + 1], ShuffleMask, "combined"); ++ LLVM_DEBUG(dbgs() << "LV(SVML): CombinedShuffle:"; ++ CombinedShuffle->dump()); ++ CallResults.push_back(CombinedShuffle); ++ } ++ ++ SmallVector::iterator Start = CallResults.begin(); ++ SmallVector::iterator End = Start + NumRegs; ++ CallResults.erase(Start, End); ++ ++ NumElems *= 2; ++ NumRegs /= 2; ++ } ++ ++ return CombinedShuffle; + } + + void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) { +diff --git a/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp +index 644372483edde..342f018b92184 100644 +--- a/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp ++++ b/llvm-14.0.6.src/lib/Transforms/Vectorize/SLPVectorizer.cpp +@@ -6322,6 +6322,17 @@ Value *BoUpSLP::vectorizeTree(ArrayRef VL) { + return Vec; + } + ++static void setVectorFunctionCallingConv(CallInst &CI, const DataLayout &DL, ++ const TargetLibraryInfo &TLI) { ++ Function *VectorF = CI.getCalledFunction(); ++ FunctionType *FTy = VectorF->getFunctionType(); ++ StringRef VFName = VectorF->getName(); ++ auto CC = TLI.getVectorizedFunctionCallingConv(VFName, *FTy, DL); ++ if (CC) { ++ CI.setCallingConv(*CC); ++ } ++} ++ + Value *BoUpSLP::vectorizeTree(TreeEntry *E) { + IRBuilder<>::InsertPointGuard Guard(Builder); + +@@ -6794,7 +6805,12 @@ Value *BoUpSLP::vectorizeTree(TreeEntry *E) { + + SmallVector OpBundles; + CI->getOperandBundlesAsDefs(OpBundles); +- Value *V = Builder.CreateCall(CF, OpVecs, OpBundles); ++ ++ CallInst *NewCall = Builder.CreateCall(CF, OpVecs, OpBundles); ++ const DataLayout &DL = NewCall->getModule()->getDataLayout(); ++ setVectorFunctionCallingConv(*NewCall, DL, *TLI); ++ ++ Value *V = NewCall; + + // The scalar argument uses an in-tree scalar so we add the new vectorized + // call to ExternalUses list to make sure that an extract will be +diff --git a/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll b/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll +index df8b7c498bd00..63a36549f18fd 100644 +--- a/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll ++++ b/llvm-14.0.6.src/test/CodeGen/Generic/replace-intrinsics-with-veclib.ll +@@ -10,7 +10,7 @@ target triple = "x86_64-unknown-linux-gnu" + define <4 x double> @exp_v4(<4 x double> %in) { + ; SVML-LABEL: define {{[^@]+}}@exp_v4 + ; SVML-SAME: (<4 x double> [[IN:%.*]]) { +-; SVML-NEXT: [[TMP1:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[IN]]) ++; SVML-NEXT: [[TMP1:%.*]] = call <4 x double> @__svml_exp4_ha(<4 x double> [[IN]]) + ; SVML-NEXT: ret <4 x double> [[TMP1]] + ; + ; LIBMVEC-X86-LABEL: define {{[^@]+}}@exp_v4 +@@ -37,7 +37,7 @@ declare <4 x double> @llvm.exp.v4f64(<4 x double>) #0 + define <4 x float> @exp_f32(<4 x float> %in) { + ; SVML-LABEL: define {{[^@]+}}@exp_f32 + ; SVML-SAME: (<4 x float> [[IN:%.*]]) { +-; SVML-NEXT: [[TMP1:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[IN]]) ++; SVML-NEXT: [[TMP1:%.*]] = call <4 x float> @__svml_expf4_ha(<4 x float> [[IN]]) + ; SVML-NEXT: ret <4 x float> [[TMP1]] + ; + ; LIBMVEC-X86-LABEL: define {{[^@]+}}@exp_f32 +diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll +index a6e191c3d6923..d6e2e11106949 100644 +--- a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll ++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll +@@ -39,7 +39,8 @@ for.end: ; preds = %for.body + declare double @__exp_finite(double) #0 + + ; CHECK-LABEL: @exp_f64 +-; CHECK: <4 x double> @__svml_exp4 ++; CHECK: <2 x double> @__svml_exp2 ++; CHECK: <2 x double> @__svml_exp2 + ; CHECK: ret + define void @exp_f64(double* nocapture %varray) { + entry: +@@ -99,7 +100,8 @@ for.end: ; preds = %for.body + declare double @__log_finite(double) #0 + + ; CHECK-LABEL: @log_f64 +-; CHECK: <4 x double> @__svml_log4 ++; CHECK: <2 x double> @__svml_log2 ++; CHECK: <2 x double> @__svml_log2 + ; CHECK: ret + define void @log_f64(double* nocapture %varray) { + entry: +@@ -159,7 +161,8 @@ for.end: ; preds = %for.body + declare double @__pow_finite(double, double) #0 + + ; CHECK-LABEL: @pow_f64 +-; CHECK: <4 x double> @__svml_pow4 ++; CHECK: <2 x double> @__svml_pow2 ++; CHECK: <2 x double> @__svml_pow2 + ; CHECK: ret + define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) { + entry: +@@ -190,7 +193,8 @@ declare float @__exp2f_finite(float) #0 + + define void @exp2f_finite(float* nocapture %varray) { + ; CHECK-LABEL: @exp2f_finite( +-; CHECK: call <4 x float> @__svml_exp2f4(<4 x float> %{{.*}}) ++; CHECK: call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> %{{.*}}) ++; CHECK: call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> %{{.*}}) + ; CHECK: ret void + ; + entry: +@@ -219,7 +223,8 @@ declare double @__exp2_finite(double) #0 + + define void @exp2_finite(double* nocapture %varray) { + ; CHECK-LABEL: @exp2_finite( +-; CHECK: call <4 x double> @__svml_exp24(<4 x double> {{.*}}) ++; CHECK: call intel_svmlcc128 <2 x double> @__svml_exp22_ha(<2 x double> {{.*}}) ++; CHECK: call intel_svmlcc128 <2 x double> @__svml_exp22_ha(<2 x double> {{.*}}) + ; CHECK: ret void + ; + entry: +@@ -276,7 +281,8 @@ for.end: ; preds = %for.body + declare double @__log2_finite(double) #0 + + ; CHECK-LABEL: @log2_f64 +-; CHECK: <4 x double> @__svml_log24 ++; CHECK: <2 x double> @__svml_log22 ++; CHECK: <2 x double> @__svml_log22 + ; CHECK: ret + define void @log2_f64(double* nocapture %varray) { + entry: +@@ -333,7 +339,8 @@ for.end: ; preds = %for.body + declare double @__log10_finite(double) #0 + + ; CHECK-LABEL: @log10_f64 +-; CHECK: <4 x double> @__svml_log104 ++; CHECK: <2 x double> @__svml_log102 ++; CHECK: <2 x double> @__svml_log102 + ; CHECK: ret + define void @log10_f64(double* nocapture %varray) { + entry: +@@ -390,7 +397,8 @@ for.end: ; preds = %for.body + declare double @__sqrt_finite(double) #0 + + ; CHECK-LABEL: @sqrt_f64 +-; CHECK: <4 x double> @__svml_sqrt4 ++; CHECK: <2 x double> @__svml_sqrt2 ++; CHECK: <2 x double> @__svml_sqrt2 + ; CHECK: ret + define void @sqrt_f64(double* nocapture %varray) { + entry: +diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll +index 42c280df6ad02..088bbdcf1aa4a 100644 +--- a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll ++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-calls.ll +@@ -48,7 +48,7 @@ declare float @llvm.exp2.f32(float) #0 + + define void @sin_f64(double* nocapture %varray) { + ; CHECK-LABEL: @sin_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -71,7 +71,7 @@ for.end: + + define void @sin_f32(float* nocapture %varray) { + ; CHECK-LABEL: @sin_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -94,7 +94,7 @@ for.end: + + define void @sin_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @sin_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -117,7 +117,7 @@ for.end: + + define void @sin_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @sin_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -140,7 +140,7 @@ for.end: + + define void @cos_f64(double* nocapture %varray) { + ; CHECK-LABEL: @cos_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -163,7 +163,7 @@ for.end: + + define void @cos_f32(float* nocapture %varray) { + ; CHECK-LABEL: @cos_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -186,7 +186,7 @@ for.end: + + define void @cos_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @cos_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -209,7 +209,7 @@ for.end: + + define void @cos_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @cos_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -232,7 +232,7 @@ for.end: + + define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) { + ; CHECK-LABEL: @pow_f64( +-; CHECK: [[TMP8:%.*]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]]) ++; CHECK: [[TMP8:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -257,7 +257,7 @@ for.end: + + define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) { + ; CHECK-LABEL: @pow_f64_intrinsic( +-; CHECK: [[TMP8:%.*]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]]) ++; CHECK: [[TMP8:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.*]], <4 x double> [[WIDE_LOAD:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -282,7 +282,7 @@ for.end: + + define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) { + ; CHECK-LABEL: @pow_f32( +-; CHECK: [[TMP8:%.*]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]]) ++; CHECK: [[TMP8:%.*]] = call intel_svmlcc128 <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -307,7 +307,7 @@ for.end: + + define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) { + ; CHECK-LABEL: @pow_f32_intrinsic( +-; CHECK: [[TMP8:%.*]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]]) ++; CHECK: [[TMP8:%.*]] = call intel_svmlcc128 <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.*]], <4 x float> [[WIDE_LOAD:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -332,7 +332,7 @@ for.end: + + define void @exp_f64(double* nocapture %varray) { + ; CHECK-LABEL: @exp_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -355,7 +355,7 @@ for.end: + + define void @exp_f32(float* nocapture %varray) { + ; CHECK-LABEL: @exp_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -378,7 +378,7 @@ for.end: + + define void @exp_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @exp_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -401,7 +401,7 @@ for.end: + + define void @exp_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @exp_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -424,7 +424,7 @@ for.end: + + define void @log_f64(double* nocapture %varray) { + ; CHECK-LABEL: @log_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -447,7 +447,7 @@ for.end: + + define void @log_f32(float* nocapture %varray) { + ; CHECK-LABEL: @log_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -470,7 +470,7 @@ for.end: + + define void @log_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @log_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -493,7 +493,7 @@ for.end: + + define void @log_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @log_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -516,7 +516,7 @@ for.end: + + define void @log2_f64(double* nocapture %varray) { + ; CHECK-LABEL: @log2_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log24(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log24_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -539,7 +539,7 @@ for.end: + + define void @log2_f32(float* nocapture %varray) { + ; CHECK-LABEL: @log2_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_log2f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log2f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -562,7 +562,7 @@ for.end: + + define void @log2_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @log2_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log24(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log24_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -585,7 +585,7 @@ for.end: + + define void @log2_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @log2_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_log2f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log2f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -608,7 +608,7 @@ for.end: + + define void @log10_f64(double* nocapture %varray) { + ; CHECK-LABEL: @log10_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log104(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log104_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -631,7 +631,7 @@ for.end: + + define void @log10_f32(float* nocapture %varray) { + ; CHECK-LABEL: @log10_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_log10f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log10f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -654,7 +654,7 @@ for.end: + + define void @log10_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @log10_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_log104(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log104_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -677,7 +677,7 @@ for.end: + + define void @log10_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @log10_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_log10f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_log10f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -700,7 +700,7 @@ for.end: + + define void @sqrt_f64(double* nocapture %varray) { + ; CHECK-LABEL: @sqrt_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_sqrt4(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sqrt4_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -723,7 +723,7 @@ for.end: + + define void @sqrt_f32(float* nocapture %varray) { + ; CHECK-LABEL: @sqrt_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_sqrtf4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_sqrtf4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -746,7 +746,7 @@ for.end: + + define void @exp2_f64(double* nocapture %varray) { + ; CHECK-LABEL: @exp2_f64( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_exp24(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp24_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -769,7 +769,7 @@ for.end: + + define void @exp2_f32(float* nocapture %varray) { + ; CHECK-LABEL: @exp2_f32( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_exp2f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -792,7 +792,7 @@ for.end: + + define void @exp2_f64_intrinsic(double* nocapture %varray) { + ; CHECK-LABEL: @exp2_f64_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x double> @__svml_exp24(<4 x double> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp24_ha(<4 x double> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -815,7 +815,7 @@ for.end: + + define void @exp2_f32_intrinsic(float* nocapture %varray) { + ; CHECK-LABEL: @exp2_f32_intrinsic( +-; CHECK: [[TMP5:%.*]] = call <4 x float> @__svml_exp2f4(<4 x float> [[TMP4:%.*]]) ++; CHECK: [[TMP5:%.*]] = call intel_svmlcc128 <4 x float> @__svml_exp2f4_ha(<4 x float> [[TMP4:%.*]]) + ; CHECK: ret void + ; + entry: +@@ -836,4 +836,44 @@ for.end: + ret void + } + ++; CHECK-LABEL: @atan2_finite ++; CHECK: intel_svmlcc256 <4 x double> @__svml_atan24( ++; CHECK: intel_svmlcc256 <4 x double> @__svml_atan24( ++; CHECK: ret ++ ++declare double @__atan2_finite(double, double) local_unnamed_addr #0 ++ ++define void @atan2_finite([100 x double]* nocapture %varray) local_unnamed_addr #0 { ++entry: ++ br label %for.cond1.preheader ++ ++for.cond1.preheader: ; preds = %for.inc7, %entry ++ %indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc7 ] ++ %0 = trunc i64 %indvars.iv19 to i32 ++ %conv = sitofp i32 %0 to double ++ br label %for.body3 ++ ++for.body3: ; preds = %for.body3, %for.cond1.preheader ++ %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ] ++ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 ++ %1 = trunc i64 %indvars.iv.next to i32 ++ %conv4 = sitofp i32 %1 to double ++ %call = tail call fast double @__atan2_finite(double %conv, double %conv4) ++ %arrayidx6 = getelementptr inbounds [100 x double], [100 x double]* %varray, i64 %indvars.iv19, i64 %indvars.iv ++ store double %call, double* %arrayidx6, align 8 ++ %exitcond = icmp eq i64 %indvars.iv.next, 100 ++ br i1 %exitcond, label %for.inc7, label %for.body3, !llvm.loop !5 ++ ++for.inc7: ; preds = %for.body3 ++ %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1 ++ %exitcond21 = icmp eq i64 %indvars.iv.next20, 100 ++ br i1 %exitcond21, label %for.end9, label %for.cond1.preheader ++ ++for.end9: ; preds = %for.inc7 ++ ret void ++} ++ + attributes #0 = { nounwind readnone } ++!5 = distinct !{!5, !6, !7} ++!6 = !{!"llvm.loop.vectorize.width", i32 8} ++!7 = !{!"llvm.loop.vectorize.enable", i1 true} +diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll +new file mode 100644 +index 0000000000000..326c763994343 +--- /dev/null ++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-calls.ll +@@ -0,0 +1,513 @@ ++; Check legalization of SVML calls, including intrinsic versions (like @llvm..). ++ ++; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -force-vector-width=8 -force-vector-interleave=1 -mattr=avx -S < %s | FileCheck %s ++ ++target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" ++target triple = "x86_64-unknown-linux-gnu" ++ ++declare double @sin(double) #0 ++declare float @sinf(float) #0 ++declare double @llvm.sin.f64(double) #0 ++declare float @llvm.sin.f32(float) #0 ++ ++declare double @cos(double) #0 ++declare float @cosf(float) #0 ++declare double @llvm.cos.f64(double) #0 ++declare float @llvm.cos.f32(float) #0 ++ ++declare double @pow(double, double) #0 ++declare float @powf(float, float) #0 ++declare double @llvm.pow.f64(double, double) #0 ++declare float @llvm.pow.f32(float, float) #0 ++ ++declare double @exp(double) #0 ++declare float @expf(float) #0 ++declare double @llvm.exp.f64(double) #0 ++declare float @llvm.exp.f32(float) #0 ++ ++declare double @log(double) #0 ++declare float @logf(float) #0 ++declare double @llvm.log.f64(double) #0 ++declare float @llvm.log.f32(float) #0 ++ ++ ++define void @sin_f64(double* nocapture %varray) { ++; CHECK-LABEL: @sin_f64( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @sin(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @sin_f32(float* nocapture %varray) { ++; CHECK-LABEL: @sin_f32( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_sinf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @sinf(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @sin_f64_intrinsic(double* nocapture %varray) { ++; CHECK-LABEL: @sin_f64_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @llvm.sin.f64(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @sin_f32_intrinsic(float* nocapture %varray) { ++; CHECK-LABEL: @sin_f32_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_sinf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @llvm.sin.f32(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @cos_f64(double* nocapture %varray) { ++; CHECK-LABEL: @cos_f64( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @cos(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @cos_f32(float* nocapture %varray) { ++; CHECK-LABEL: @cos_f32( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_cosf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @cosf(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @cos_f64_intrinsic(double* nocapture %varray) { ++; CHECK-LABEL: @cos_f64_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @llvm.cos.f64(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @cos_f32_intrinsic(float* nocapture %varray) { ++; CHECK-LABEL: @cos_f32_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_cosf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @llvm.cos.f32(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) { ++; CHECK-LABEL: @pow_f64( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP2:%.*]], <4 x double> [[TMP3:%.*]]) ++; CHECK: [[TMP4:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP5:%.*]], <4 x double> [[TMP6:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %arrayidx = getelementptr inbounds double, double* %exp, i64 %iv ++ %tmp1 = load double, double* %arrayidx, align 4 ++ %tmp2 = tail call double @pow(double %conv, double %tmp1) ++ %arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %tmp2, double* %arrayidx2, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) { ++; CHECK-LABEL: @pow_f64_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP2:%.*]], <4 x double> [[TMP3:%.*]]) ++; CHECK: [[TMP4:%.*]] = call intel_svmlcc256 <4 x double> @__svml_pow4_ha(<4 x double> [[TMP5:%.*]], <4 x double> [[TMP6:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %arrayidx = getelementptr inbounds double, double* %exp, i64 %iv ++ %tmp1 = load double, double* %arrayidx, align 4 ++ %tmp2 = tail call double @llvm.pow.f64(double %conv, double %tmp1) ++ %arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %tmp2, double* %arrayidx2, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) { ++; CHECK-LABEL: @pow_f32( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_powf8_ha(<8 x float> [[TMP2:%.*]], <8 x float> [[WIDE_LOAD:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %arrayidx = getelementptr inbounds float, float* %exp, i64 %iv ++ %tmp1 = load float, float* %arrayidx, align 4 ++ %tmp2 = tail call float @powf(float %conv, float %tmp1) ++ %arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %tmp2, float* %arrayidx2, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) { ++; CHECK-LABEL: @pow_f32_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_powf8_ha(<8 x float> [[TMP2:%.*]], <8 x float> [[TMP3:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %arrayidx = getelementptr inbounds float, float* %exp, i64 %iv ++ %tmp1 = load float, float* %arrayidx, align 4 ++ %tmp2 = tail call float @llvm.pow.f32(float %conv, float %tmp1) ++ %arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %tmp2, float* %arrayidx2, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @exp_f64(double* nocapture %varray) { ++; CHECK-LABEL: @exp_f64( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @exp(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @exp_f32(float* nocapture %varray) { ++; CHECK-LABEL: @exp_f32( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_expf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @expf(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @exp_f64_intrinsic(double* nocapture %varray) { ++; CHECK-LABEL: @exp_f64_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @llvm.exp.f64(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @exp_f32_intrinsic(float* nocapture %varray) { ++; CHECK-LABEL: @exp_f32_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_expf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @llvm.exp.f32(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @log_f64(double* nocapture %varray) { ++; CHECK-LABEL: @log_f64( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @log(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @log_f32(float* nocapture %varray) { ++; CHECK-LABEL: @log_f32( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_logf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @logf(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @log_f64_intrinsic(double* nocapture %varray) { ++; CHECK-LABEL: @log_f64_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP2:%.*]]) ++; CHECK: [[TMP3:%.*]] = call intel_svmlcc256 <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to double ++ %call = tail call double @llvm.log.f64(double %conv) ++ %arrayidx = getelementptr inbounds double, double* %varray, i64 %iv ++ store double %call, double* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++define void @log_f32_intrinsic(float* nocapture %varray) { ++; CHECK-LABEL: @log_f32_intrinsic( ++; CHECK: [[TMP1:%.*]] = call intel_svmlcc256 <8 x float> @__svml_logf8_ha(<8 x float> [[TMP2:%.*]]) ++; CHECK: ret void ++; ++entry: ++ br label %for.body ++ ++for.body: ++ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] ++ %tmp = trunc i64 %iv to i32 ++ %conv = sitofp i32 %tmp to float ++ %call = tail call float @llvm.log.f32(float %conv) ++ %arrayidx = getelementptr inbounds float, float* %varray, i64 %iv ++ store float %call, float* %arrayidx, align 4 ++ %iv.next = add nuw nsw i64 %iv, 1 ++ %exitcond = icmp eq i64 %iv.next, 1000 ++ br i1 %exitcond, label %for.end, label %for.body ++ ++for.end: ++ ret void ++} ++ ++attributes #0 = { nounwind readnone } ++ +diff --git a/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll +new file mode 100644 +index 0000000000000..9422653445dc2 +--- /dev/null ++++ b/llvm-14.0.6.src/test/Transforms/LoopVectorize/X86/svml-legal-codegen.ll +@@ -0,0 +1,61 @@ ++; Check that vector codegen splits illegal sin8 call to two sin4 calls on AVX for double datatype. ++; The C code used to generate this test: ++ ++; #include ++; ++; void foo(double *a, int N){ ++; int i; ++; #pragma clang loop vectorize_width(8) ++; for (i=0;i [[I0:%.*]] to <8 x double> ++; CHECK-NEXT: [[S1:%shuffle.*]] = shufflevector <8 x double> [[I1]], <8 x double> undef, <4 x i32> ++; CHECK-NEXT: [[I2:%.*]] = call fast intel_svmlcc256 <4 x double> @__svml_sin4(<4 x double> [[S1]]) ++; CHECK-NEXT: [[S2:%shuffle.*]] = shufflevector <8 x double> [[I1]], <8 x double> undef, <4 x i32> ++; CHECK-NEXT: [[I3:%.*]] = call fast intel_svmlcc256 <4 x double> @__svml_sin4(<4 x double> [[S2]]) ++; CHECK-NEXT: [[comb:%combined.*]] = shufflevector <4 x double> [[I2]], <4 x double> [[I3]], <8 x i32> ++; CHECK: store <8 x double> [[comb]], <8 x double>* [[TMP:%.*]], align 8 ++ ++ ++target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" ++target triple = "x86_64-unknown-linux-gnu" ++ ++; Function Attrs: nounwind uwtable ++define dso_local void @foo(double* nocapture %a, i32 %N) local_unnamed_addr #0 { ++entry: ++ %cmp5 = icmp sgt i32 %N, 0 ++ br i1 %cmp5, label %for.body.preheader, label %for.end ++ ++for.body.preheader: ; preds = %entry ++ %wide.trip.count = zext i32 %N to i64 ++ br label %for.body ++ ++for.body: ; preds = %for.body, %for.body.preheader ++ %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ] ++ %0 = trunc i64 %indvars.iv to i32 ++ %conv = sitofp i32 %0 to double ++ %call = tail call fast double @sin(double %conv) #2 ++ %arrayidx = getelementptr inbounds double, double* %a, i64 %indvars.iv ++ store double %call, double* %arrayidx, align 8, !tbaa !2 ++ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 ++ %exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count ++ br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6 ++ ++for.end: ; preds = %for.body, %entry ++ ret void ++} ++ ++; Function Attrs: nounwind ++declare dso_local double @sin(double) local_unnamed_addr #1 ++ ++!2 = !{!3, !3, i64 0} ++!3 = !{!"double", !4, i64 0} ++!4 = !{!"omnipotent char", !5, i64 0} ++!5 = !{!"Simple C/C++ TBAA"} ++!6 = distinct !{!6, !7} ++!7 = !{!"llvm.loop.vectorize.width", i32 8} +diff --git a/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll b/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll +index e8c83c4d9bd1f..615fdc29176a2 100644 +--- a/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll ++++ b/llvm-14.0.6.src/test/Transforms/Util/add-TLI-mappings.ll +@@ -12,12 +12,12 @@ target triple = "x86_64-unknown-linux-gnu" + + ; COMMON-LABEL: @llvm.compiler.used = appending global + ; SVML-SAME: [6 x i8*] [ +-; SVML-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2 to i8*), +-; SVML-SAME: i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4 to i8*), +-; SVML-SAME: i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8 to i8*), +-; SVML-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4 to i8*), +-; SVML-SAME: i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8 to i8*), +-; SVML-SAME: i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16 to i8*) ++; SVML-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2_ha to i8*), ++; SVML-SAME: i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4_ha to i8*), ++; SVML-SAME: i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8_ha to i8*), ++; SVML-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4_ha to i8*), ++; SVML-SAME: i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8_ha to i8*), ++; SVML-SAME: i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16_ha to i8*) + ; MASSV-SAME: [2 x i8*] [ + ; MASSV-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__sind2 to i8*), + ; MASSV-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__log10f4 to i8*) +@@ -59,9 +59,9 @@ declare float @llvm.log10.f32(float) #0 + attributes #0 = { nounwind readnone } + + ; SVML: attributes #[[SIN]] = { "vector-function-abi-variant"= +-; SVML-SAME: "_ZGV_LLVM_N2v_sin(__svml_sin2), +-; SVML-SAME: _ZGV_LLVM_N4v_sin(__svml_sin4), +-; SVML-SAME: _ZGV_LLVM_N8v_sin(__svml_sin8)" } ++; SVML-SAME: "_ZGV_LLVM_N2v_sin(__svml_sin2_ha), ++; SVML-SAME: _ZGV_LLVM_N4v_sin(__svml_sin4_ha), ++; SVML-SAME: _ZGV_LLVM_N8v_sin(__svml_sin8_ha)" } + + ; MASSV: attributes #[[SIN]] = { "vector-function-abi-variant"= + ; MASSV-SAME: "_ZGV_LLVM_N2v_sin(__sind2)" } +diff --git a/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt b/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt +index 97df6a55d1b59..199e0285c9e5d 100644 +--- a/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt ++++ b/llvm-14.0.6.src/utils/TableGen/CMakeLists.txt +@@ -47,6 +47,7 @@ add_tablegen(llvm-tblgen LLVM + SearchableTableEmitter.cpp + SubtargetEmitter.cpp + SubtargetFeatureInfo.cpp ++ SVMLEmitter.cpp + TableGen.cpp + Types.cpp + X86DisassemblerTables.cpp +diff --git a/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp b/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp +new file mode 100644 +index 0000000000000..a5aeea48db28b +--- /dev/null ++++ b/llvm-14.0.6.src/utils/TableGen/SVMLEmitter.cpp +@@ -0,0 +1,110 @@ ++//===------ SVMLEmitter.cpp - Generate SVML function variants -------------===// ++// ++// The LLVM Compiler Infrastructure ++// ++// This file is distributed under the University of Illinois Open Source ++// License. See LICENSE.TXT for details. ++// ++//===----------------------------------------------------------------------===// ++// ++// This tablegen backend emits the scalar to svml function map for TLI. ++// ++//===----------------------------------------------------------------------===// ++ ++#include "CodeGenTarget.h" ++#include "llvm/Support/Format.h" ++#include "llvm/TableGen/Error.h" ++#include "llvm/TableGen/Record.h" ++#include "llvm/TableGen/TableGenBackend.h" ++#include ++#include ++ ++using namespace llvm; ++ ++#define DEBUG_TYPE "SVMLVariants" ++#include "llvm/Support/Debug.h" ++ ++namespace { ++ ++class SVMLVariantsEmitter { ++ ++ RecordKeeper &Records; ++ ++private: ++ void emitSVMLVariants(raw_ostream &OS); ++ ++public: ++ SVMLVariantsEmitter(RecordKeeper &R) : Records(R) {} ++ ++ void run(raw_ostream &OS); ++}; ++} // End anonymous namespace ++ ++/// \brief Emit the set of SVML variant function names. ++// The default is to emit the high accuracy SVML variants until a mechanism is ++// introduced to allow a selection of different variants through precision ++// requirements specified by the user. This code generates mappings to svml ++// that are in the scalar form of llvm intrinsics, math library calls, or the ++// finite variants of math library calls. ++void SVMLVariantsEmitter::emitSVMLVariants(raw_ostream &OS) { ++ ++ const unsigned MinSinglePrecVL = 4; ++ const unsigned MaxSinglePrecVL = 16; ++ const unsigned MinDoublePrecVL = 2; ++ const unsigned MaxDoublePrecVL = 8; ++ ++ OS << "#ifdef GET_SVML_VARIANTS\n"; ++ ++ for (const auto &D : Records.getAllDerivedDefinitions("SvmlVariant")) { ++ StringRef SvmlVariantNameStr = D->getName(); ++ // Single Precision SVML ++ for (unsigned VL = MinSinglePrecVL; VL <= MaxSinglePrecVL; VL *= 2) { ++ // Emit the scalar math library function to svml function entry. ++ OS << "{\"" << SvmlVariantNameStr << "f" << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", " ++ << "ElementCount::getFixed(" << VL << ")},\n"; ++ ++ // Emit the scalar intrinsic to svml function entry. ++ OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f32" << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", " ++ << "ElementCount::getFixed(" << VL << ")},\n"; ++ ++ // Emit the finite math library function to svml function entry. ++ OS << "{\"__" << SvmlVariantNameStr << "f_finite" << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", " ++ << "ElementCount::getFixed(" << VL << ")},\n"; ++ } ++ ++ // Double Precision SVML ++ for (unsigned VL = MinDoublePrecVL; VL <= MaxDoublePrecVL; VL *= 2) { ++ // Emit the scalar math library function to svml function entry. ++ OS << "{\"" << SvmlVariantNameStr << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << "ElementCount::getFixed(" << VL ++ << ")},\n"; ++ ++ // Emit the scalar intrinsic to svml function entry. ++ OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f64" << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << "ElementCount::getFixed(" << VL ++ << ")},\n"; ++ ++ // Emit the finite math library function to svml function entry. ++ OS << "{\"__" << SvmlVariantNameStr << "_finite" << "\", "; ++ OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " ++ << "ElementCount::getFixed(" << VL << ")},\n"; ++ } ++ } ++ ++ OS << "#endif // GET_SVML_VARIANTS\n\n"; ++} ++ ++void SVMLVariantsEmitter::run(raw_ostream &OS) { ++ emitSVMLVariants(OS); ++} ++ ++namespace llvm { ++ ++void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS) { ++ SVMLVariantsEmitter(RK).run(OS); ++} ++ ++} // End llvm namespace +diff --git a/llvm-14.0.6.src/utils/TableGen/TableGen.cpp b/llvm-14.0.6.src/utils/TableGen/TableGen.cpp +index 2d4a45f889be6..603d0c223b33a 100644 +--- a/llvm-14.0.6.src/utils/TableGen/TableGen.cpp ++++ b/llvm-14.0.6.src/utils/TableGen/TableGen.cpp +@@ -57,6 +57,7 @@ enum ActionType { + GenAutomata, + GenDirectivesEnumDecl, + GenDirectivesEnumImpl, ++ GenSVMLVariants, + }; + + namespace llvm { +@@ -138,7 +139,9 @@ cl::opt Action( + clEnumValN(GenDirectivesEnumDecl, "gen-directive-decl", + "Generate directive related declaration code (header file)"), + clEnumValN(GenDirectivesEnumImpl, "gen-directive-impl", +- "Generate directive related implementation code"))); ++ "Generate directive related implementation code"), ++ clEnumValN(GenSVMLVariants, "gen-svml", ++ "Generate SVML variant function names"))); + + cl::OptionCategory PrintEnumsCat("Options for -print-enums"); + cl::opt Class("class", cl::desc("Print Enum list for this class"), +@@ -272,6 +275,9 @@ bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) { + case GenDirectivesEnumImpl: + EmitDirectivesImpl(Records, OS); + break; ++ case GenSVMLVariants: ++ EmitSVMLVariants(Records, OS); ++ break; + } + + return false; +diff --git a/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h b/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h +index 71db8dc77b052..86c3a3068c2dc 100644 +--- a/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h ++++ b/llvm-14.0.6.src/utils/TableGen/TableGenBackends.h +@@ -93,6 +93,7 @@ void EmitExegesis(RecordKeeper &RK, raw_ostream &OS); + void EmitAutomata(RecordKeeper &RK, raw_ostream &OS); + void EmitDirectivesDecl(RecordKeeper &RK, raw_ostream &OS); + void EmitDirectivesImpl(RecordKeeper &RK, raw_ostream &OS); ++void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS); + + } // End llvm namespace + +diff --git a/llvm-14.0.6.src/utils/vim/syntax/llvm.vim b/llvm-14.0.6.src/utils/vim/syntax/llvm.vim +index 205db16b7d8cd..2572ab5a59e1b 100644 +--- a/llvm-14.0.6.src/utils/vim/syntax/llvm.vim ++++ b/llvm-14.0.6.src/utils/vim/syntax/llvm.vim +@@ -104,6 +104,7 @@ syn keyword llvmKeyword + \ inreg + \ intel_ocl_bicc + \ inteldialect ++ \ intel_svmlcc + \ internal + \ jumptable + \ linkonce diff --git a/devel/py-llvmlite/patches/patch-ffi_Makefile.freebsd b/devel/py-llvmlite/patches/patch-ffi_Makefile.freebsd deleted file mode 100644 index b34b412ae09b..000000000000 --- a/devel/py-llvmlite/patches/patch-ffi_Makefile.freebsd +++ /dev/null @@ -1,23 +0,0 @@ -$NetBSD: patch-ffi_Makefile.freebsd,v 1.2 2022/01/14 19:49:10 adam Exp $ - -Add missing source code. -Add -fPIC for linking. - ---- ffi/Makefile.freebsd.orig 2021-03-25 14:26:22.000477300 +0000 -+++ ffi/Makefile.freebsd -@@ -11,13 +11,13 @@ LIBS = $(LLVM_LIBS) - INCLUDE = core.h - SRC = assembly.cpp bitcode.cpp core.cpp initfini.cpp module.cpp value.cpp \ - executionengine.cpp transforms.cpp passmanagers.cpp targets.cpp dylib.cpp \ -- linker.cpp object_file.cpp -+ linker.cpp object_file.cpp custom_passes.cpp - OUTPUT = libllvmlite.so - - all: $(OUTPUT) - - $(OUTPUT): $(SRC) $(INCLUDE) -- $(CXX) -shared $(CXXFLAGS) $(SRC) -o $(OUTPUT) $(LDFLAGS) $(LIBS) -+ $(CXX) -shared $(CXXFLAGS) $(SRC) -o $(OUTPUT) $(LDFLAGS) $(LIBS) -fPIC - - clean: - rm -rf test diff --git a/devel/py-llvmlite/patches/patch-ffi_Makefile.linux b/devel/py-llvmlite/patches/patch-ffi_Makefile.linux deleted file mode 100644 index f0d9b8cc612a..000000000000 --- a/devel/py-llvmlite/patches/patch-ffi_Makefile.linux +++ /dev/null @@ -1,13 +0,0 @@ -$NetBSD: patch-ffi_Makefile.linux,v 1.1 2019/12/19 22:12:43 joerg Exp $ - ---- ffi/Makefile.linux.orig 2019-12-19 19:40:48.890888990 +0000 -+++ ffi/Makefile.linux -@@ -19,7 +19,7 @@ all: $(OUTPUT) - $(OUTPUT): $(SRC) $(INCLUDE) - # static-libstdc++ avoids runtime dependencies on a - # particular libstdc++ version. -- $(CXX) $(CXX_STATIC_LINK) -shared $(CXXFLAGS) $(SRC) -o $(OUTPUT) $(LDFLAGS) $(LIBS) -+ $(CXX) $(CXX_STATIC_LINK) -shared $(CXXFLAGS) $(SRC) -o $(OUTPUT) $(LDFLAGS) $(LIBS) -fPIC - - clean: - rm -rf test $(OUTPUT) diff --git a/devel/py-llvmlite/patches/patch-ffi_targets.cpp b/devel/py-llvmlite/patches/patch-ffi_targets.cpp deleted file mode 100644 index 3734d44c8bda..000000000000 --- a/devel/py-llvmlite/patches/patch-ffi_targets.cpp +++ /dev/null @@ -1,17 +0,0 @@ -$NetBSD: patch-ffi_targets.cpp,v 1.2 2022/01/14 19:49:10 adam Exp $ - -Stopgap fix for llvm-12+ -https://github.com/numba/llvmlite/pull/802/files - ---- ffi/targets.cpp.orig 2022-01-14 14:39:38.000000000 +0000 -+++ ffi/targets.cpp -@@ -233,7 +233,9 @@ LLVMPY_CreateTargetMachine(LLVMTargetRef - rm = Reloc::DynamicNoPIC; - - TargetOptions opt; -+#if LLVM_VERSION_MAJOR < 12 - opt.PrintMachineCode = PrintMC; -+#endif - opt.MCOptions.ABIName = ABIName; - - bool jit = JIT;