Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build/pkgs/openblas: Stop openblas from using explicit make -j N; but use make -j 1 on ubuntu-trusty #36671

Merged
merged 4 commits into from
Nov 12, 2023

Conversation

mkoeppe
Copy link
Contributor

@mkoeppe mkoeppe commented Nov 6, 2023

OpenMathLib/OpenBLAS#828

Fixes part of #34899 (comment)

Tests at https://github.com/mkoeppe/sage/actions/runs/6779033300: openblas finishes successfully in https://github.com/mkoeppe/sage/actions/runs/6779033300/job/18425453802#step:11:3863

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@mkoeppe mkoeppe added this to the sage-10.2 milestone Nov 6, 2023
@mkoeppe mkoeppe self-assigned this Nov 6, 2023
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 7, 2023

This gets rid of the excessive make -j, but doesn't solve the problem on ubuntu-trusty yet.
It hangs at

[openblas-0.3.23] ar  -ru ../../libopenblas_haswellp-r0.3.23.a strtrs_UNU_single.o strtrs_UNN_single.o strtrs_UTU_single.o strtrs_UTN_single.o strtrs_LNU_single.o strtrs_LNN_single.o strtrs_LTU_single.o strtrs_LTN_single.o strtrs_UNU_parallel.o strtrs_UNN_parallel.o strtrs_UTU_parallel.o strtrs_UTN_parallel.o strtrs_LNU_parallel.o strtrs_LNN_parallel.o strtrs_LTU_parallel.o strtrs_LTN_parallel.o dtrtrs_UNU_single.o dtrtrs_UNN_single.o dtrtrs_UTU_single.o dtrtrs_UTN_single.o dtrtrs_LNU_single.o dtrtrs_LNN_single.o dtrtrs_LTU_single.o dtrtrs_LTN_single.o dtrtrs_UNU_parallel.o dtrtrs_UNN_parallel.o dtrtrs_UTU_parallel.o dtrtrs_UTN_parallel.o dtrtrs_LNU_parallel.o dtrtrs_LNN_parallel.o dtrtrs_LTU_parallel.o dtrtrs_LTN_parallel.o ctrtrs_UNU_single.o ctrtrs_UNN_single.o ctrtrs_UTU_single.o ctrtrs_UTN_single.o ctrtrs_URU_single.o ctrtrs_URN_single.o ctrtrs_UCU_single.o ctrtrs_UCN_single.o ctrtrs_LNU_single.o ctrtrs_LNN_single.o ctrtrs_LTU_single.o ctrtrs_LTN_single.o ctrtrs_LRU_single.o ctrtrs_LRN_single.o ctrtrs_LCU_single.o ctrtrs_LCN_single.o ctrtrs_UNU_parallel.o ctrtrs_UNN_parallel.o ctrtrs_UTU_parallel.o ctrtrs_UTN_parallel.o ctrtrs_URU_parallel.o ctrtrs_URN_parallel.o ctrtrs_UCU_parallel.o ctrtrs_UCN_parallel.o ctrtrs_LNU_parallel.o ctrtrs_LNN_parallel.o ctrtrs_LTU_parallel.o ctrtrs_LTN_parallel.o ctrtrs_LRU_parallel.o ctrtrs_LRN_parallel.o ctrtrs_LCU_parallel.o ctrtrs_LCN_parallel.o ztrtrs_UNU_single.o ztrtrs_UNN_single.o ztrtrs_UTU_single.o ztrtrs_UTN_single.o ztrtrs_URU_single.o ztrtrs_URN_single.o ztrtrs_UCU_single.o ztrtrs_UCN_single.o ztrtrs_LNU_single.o ztrtrs_LNN_single.o ztrtrs_LTU_single.o ztrtrs_LTN_single.o ztrtrs_LRU_single.o ztrtrs_LRN_single.o ztrtrs_LCU_single.o ztrtrs_LCN_single.o ztrtrs_UNU_parallel.o ztrtrs_UNN_parallel.o ztrtrs_UTU_parallel.o ztrtrs_UTN_parallel.o ztrtrs_URU_parallel.o ztrtrs_URN_parallel.o ztrtrs_UCU_parallel.o ztrtrs_UCN_parallel.o ztrtrs_LNU_parallel.o ztrtrs_LNN_parallel.o ztrtrs_LTU_parallel.o ztrtrs_LTN_parallel.o ztrtrs_LRU_parallel.o ztrtrs_LRN_parallel.o ztrtrs_LCU_parallel.o ztrtrs_LCN_parallel.o
[openblas-0.3.23] ar: `u' modifier ignored since `D' is the default (see `U')
[openblas-0.3.23] make -C SRC

Entering the build container shows:

$ docker exec 9f14c71998a5 ps -ef                                                                                                                                                                      git:sage_conf_pypi_fix
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:10 ?        00:00:00 /bin/sh -c export PATH=/usr/lib/binutils-2.26/bin:$PATH &&  make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ${TARGETS_PRE}
root         7     1  0 01:10 ?        00:00:00 make SAGE_SPKG=sage-spkg -y -o -k V=0 V=1 openblas
root        10     7  0 01:10 ?        00:00:00 /bin/sh -c build/bin/sage-logger \ ."cd build/make && ./install 'openblas'" logs/install.log
root        11    10  0 01:10 ?        00:00:00 bash build/bin/sage-logger cd build/make && ./install 'openblas' logs/install.log
root        15    11  0 01:10 ?        00:00:00 bash build/bin/sage-logger cd build/make && ./install 'openblas' logs/install.log
root        16    11  0 01:10 ?        00:00:00 bash build/bin/sage-logger cd build/make && ./install 'openblas' logs/install.log
root        17    15  0 01:10 ?        00:00:00 bash ./install openblas
root        18    16  0 01:10 ?        00:00:00 tee -a logs/install.log
root        19    16  0 01:10 ?        00:00:00 cat
root        51    17  0 01:10 ?        00:00:00 make -j8 openblas
root        52    51  0 01:10 ?        00:00:00 make --no-print-directory openblas-SAGE_LOCAL-no-deps
root        53    52  0 01:10 ?        00:00:00 /bin/bash -c if [ -z '/sage/local' ]; then echo "Error: The installation tree SAGE_LOCAL has been disabled" 2>&1; echo "" 2>&1; exit 1; else sage-logger -p 'SAGE_CHECK=warn PATH=/sage/src/bin:/sage/local/bin:$PATH sage-spkg -y -o -y -o  openblas-0.3.23 /sage/local' '/sage/logs/pkgs/openblas-0.3.23.log'; fi
root        54    53  0 01:10 ?        00:00:00 bash /sage/build/bin/sage-logger -p SAGE_CHECK=warn PATH=/sage/src/bin:/sage/local/bin:$PATH sage-spkg -y -o -y -o  openblas-0.3.23 /sage/local /sage/logs/pkgs/openblas-0.3.23.log
root        59    54  0 01:10 ?        00:00:00 bash /sage/build/bin/sage-logger -p SAGE_CHECK=warn PATH=/sage/src/bin:/sage/local/bin:$PATH sage-spkg -y -o -y -o  openblas-0.3.23 /sage/local /sage/logs/pkgs/openblas-0.3.23.log
root        60    54  0 01:10 ?        00:00:00 bash /sage/build/bin/sage-logger -p SAGE_CHECK=warn PATH=/sage/src/bin:/sage/local/bin:$PATH sage-spkg -y -o -y -o  openblas-0.3.23 /sage/local /sage/logs/pkgs/openblas-0.3.23.log
root        61    59  0 01:10 ?        00:00:00 bash /sage/build/bin/sage-spkg -y -o -y -o openblas-0.3.23 /sage/local
root        62    60  0 01:10 ?        00:00:00 tee -a /sage/logs/pkgs/openblas-0.3.23.log
root        63    60  0 01:10 ?        00:00:00 bash /sage/build/bin/sage-logger -p SAGE_CHECK=warn PATH=/sage/src/bin:/sage/local/bin:$PATH sage-spkg -y -o -y -o  openblas-0.3.23 /sage/local /sage/logs/pkgs/openblas-0.3.23.log
root        64    63  0 01:10 ?        00:00:05 sed --unbuffered s/^/[openblas-0.3.23] /
root     29285    61  0 01:15 ?        00:00:00 bash ./spkg-check
root     29299 29285  0 01:15 ?        00:00:00 make tests MAKE_NB_JOBS=0 USE_TLS=1
root     32834 29299  0 01:16 ?        00:00:00 make -C ./lapack-netlib lapacklib
root     32835 32834 99 01:16 ?        00:35:46 make -C SRC
root     32867     0  0 01:51 ?        00:00:00 ps -ef

Looks like we are running into the ancient bug https://savannah.gnu.org/bugs/?15919 ("Make-3.81 rc1 hangs with -j 2 but not with -j 1")

@mkoeppe mkoeppe changed the title build/pkgs/openblas: Stop openblas from using explicit 'make -j' build/pkgs/openblas: Stop openblas from using explicit 'make -j'; but use make -j1 on ubuntu-trusty Nov 7, 2023
@mkoeppe mkoeppe changed the title build/pkgs/openblas: Stop openblas from using explicit 'make -j'; but use make -j1 on ubuntu-trusty build/pkgs/openblas: Stop openblas from using explicit make -j N; but use make -j 1 on ubuntu-trusty Nov 7, 2023
@mkoeppe mkoeppe requested a review from dimpase November 7, 2023 15:58
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 7, 2023

Doesn't by itself succeed in fixing the concurrency bugs described in #34899. On fedora-38-minimal (https://github.com/mkoeppe/sage/actions/runs/6779033300/job/18425466208) I see

2023-11-07T07:38:16.8377013Z ar  -ru ../../libopenblasp-r0.3.23.a strtri_UU_single.o strtri_UN_single.o strtri_LU_single.o strtri_LN_single.o strtri_UU_parallel.o strtri_UN_parallel.o strtri_LU_parallel.o strtri_LN_parallel.o dtrtri_UU_single.o dtrtri_UN_single.o dtrtri_LU_single.o dtrtri_LN_single.o dtrtri_UU_parallel.o dtrtri_UN_parallel.o dtrtri_LU_parallel.o dtrtri_LN_parallel.o ctrtri_UU_single.o ctrtri_UN_single.o ctrtri_LU_single.o ctrtri_LN_single.o ctrtri_UU_parallel.o ctrtri_UN_parallel.o ctrtri_LU_parallel.o ctrtri_LN_parallel.o ztrtri_UU_single.o ztrtri_UN_single.o ztrtri_LU_single.o ztrtri_LN_single.o ztrtri_UU_parallel.o ztrtri_UN_parallel.o ztrtri_LU_parallel.o ztrtri_LN_parallel.o
2023-11-07T07:38:16.8377139Z make -C TESTING/MATGEN
2023-11-07T07:38:16.8377645Z gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fPIC -fno-tree-vectorize -c -o slatms.o slatms.f
2023-11-07T07:38:16.8377931Z ar: ../../libopenblasp-r0.3.23.a: error reading zlatzm.o: file truncated
2023-11-07T07:38:16.8378072Z make[5]: *** [../../Makefile.tail:46: libs] Error 1
2023-11-07T07:38:16.8378189Z make[4]: *** [Makefile:10: libs] Error 1
2023-11-07T07:38:16.8378303Z make[3]: *** [Makefile:184: libs] Error 1

Also failing on fedora-39-minimal

  [openblas-0.3.23]   /usr/bin/ld: ../libopenblasp-r0.3.23.a: member ../libopenblasp-r0.3.23.a(saxpy.o) in archive is not an object
  [openblas-0.3.23]   collect2: error: ld returned 1 exit status
  [openblas-0.3.23]   make[4]: *** [Makefile:195: ../libopenblasp-r0.3.23.so] Error 1
  [openblas-0.3.23]   make[4]: Target 'so' not remade because of errors.
  [openblas-0.3.23]   make[3]: *** [Makefile:132: shared] Error 2

Copy link

github-actions bot commented Nov 7, 2023

Documentation preview for this PR (built with commit 6d1a957; changes) is ready! 🎉

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 8, 2023

Let's get this in please.

@jhpalmieri
Copy link
Member

On OS X, this slows down the openblas build considerably: from 1.5 minutes to 8.5 minutes on a fast machine. How much should I care?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 9, 2023

Is this with setting MAKE="make -jSOMETHING" for building Sage?

@jhpalmieri
Copy link
Member

Is this with setting MAKE="make -jSOMETHING" for building Sage?

Yes, MAKE='make -j8' and MAKEFLAGS=-j8.

# Ensure USE_TLS=1 ; see https://github.com/sagemath/sage/issues/27256
OPENBLAS_CONFIGURE="$OPENBLAS_CONFIGURE USE_TLS=1"

if ! (sdh_make libs netlib shared $OPENBLAS_CONFIGURE); then
if ! (sdh_make libs $OPENBLAS_CONFIGURE && sdh_make netlib $OPENBLAS_CONFIGURE && sdh_make shared $OPENBLAS_CONFIGURE); then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to separate the make targets like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears to fix the build failures mentioned in #36671 (comment)

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 9, 2023

And does -j8 saturate your fast machine during the openblas build? Without this PR, openblas was effectively building with "-j INFINITY"

@jhpalmieri
Copy link
Member

And does -j8 saturate your fast machine during the openblas build? Without this PR, openblas was effectively building with "-j INFINITY"

The machine doesn't act sluggish, but "Activity Monitor" shows a lot of CPU activity. Without this PR, I get the same behavior (high load) regardless of whether MAKE is set, and that's suspicious.

Let's merge it.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Nov 9, 2023

Thank you!

vbraun pushed a commit to vbraun/sage that referenced this pull request Nov 11, 2023
…cit `make -j N`; but use `make -j 1` on `ubuntu-trusty`

<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->
OpenMathLib/OpenBLAS#828

<!-- Why is this change required? What problem does it solve? -->
Fixes part of
sagemath#34899 (comment)

Tests at https://github.com/mkoeppe/sage/actions/runs/6779033300:
openblas finishes successfully in https://github.com/mkoeppe/sage/action
s/runs/6779033300/job/18425453802#step:11:3863
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [ ] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->

URL: sagemath#36671
Reported by: Matthias Köppe
Reviewer(s): John H. Palmieri, Matthias Köppe
@vbraun vbraun merged commit a0872be into sagemath:develop Nov 12, 2023
27 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants