Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sayef/mds profiling dev #91

Draft
wants to merge 50 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ea335f5
Convert incorrect string datatypes
nkoukpaizan Mar 21, 2023
ec5837c
Update build environment based on amd compilers
nkoukpaizan Mar 21, 2023
14cac87
Pull some changes from 'crusher-dev' branch
nkoukpaizan Apr 10, 2023
a1550e5
Merge branch 'develop' into nicholson/develop-crusher-dev
nkoukpaizan Apr 10, 2023
a83ea98
Pull changes from more recent 'crusher-dev' branch
nkoukpaizan Apr 10, 2023
c451f3b
Update spack
nkoukpaizan Apr 10, 2023
8064a2c
Update modules for Crusher amd/5.2.0 pointing to project home directory.
nkoukpaizan Apr 10, 2023
d9e72aa
Remove 'optimized' modules on Crusher for now
nkoukpaizan Apr 10, 2023
ee29be5
Load default libfabric on Crusher
nkoukpaizan Apr 10, 2023
bf12571
Manually pull more recent changes from 'crusher-dev-suppress-print' b…
nkoukpaizan Apr 12, 2023
cf8a7d0
Missed a spot cary-mpich/8.1.17->8.1.23
nkoukpaizan Apr 12, 2023
57c4b29
Upgrade to cray-mpich/8.1.25
nkoukpaizan Apr 12, 2023
8b77b37
New dependencies after cray-mpich/8.1.25 upgrade
nkoukpaizan Apr 12, 2023
a292b9e
Do not create SOPFLOW vectors and matrices when solver is HIOP.
abhyshr Apr 29, 2023
a601a38
Updated hiop options file
May 30, 2023
132d6cb
Add scripts from incline-dev branch.
cameronrutherford Apr 17, 2023
532d0f6
Fix opflow format string to using c_str.
cameronrutherford Jun 21, 2023
9a195aa
Update SLURM args for incline ci
Jun 22, 2023
89c770f
Use srun for incline tests
Jun 24, 2023
24ee09d
Copy sepecific coinhsl in bsub script
Jun 29, 2023
1da764d
Added scenario and contingency files.
abhyshr Jul 4, 2023
b73ddc7
Update spack repo link
Jul 4, 2023
e2d94d4
Add frontier build scripts
Jul 4, 2023
73ef9b3
Merge branch 'kpp2-branch' into sayef/mds_profiling_dev
Jul 4, 2023
c650057
Add frontier buildsystem scripts [skip ci]
Jul 5, 2023
ce84277
Add inital opflow performance data [skip-ci]
Jul 13, 2023
6783f2e
New profile data [skip-ci]
Jul 13, 2023
aa0d4a2
Use srun only for gitlab ci
Jul 25, 2023
7f47bf2
Sayef sbatch scripts [skip-ci]
Jul 27, 2023
a8e487c
opflow frontier sbatch script [skip-ci]
Jul 27, 2023
d2502fc
Solving merge conflict with mds to incline_dev [skip-ci]
Jul 27, 2023
db83377
Additional incline tests with result output to json [skip-ci]
Jul 28, 2023
e12303b
Update Perf anaysis termination cases [skip-ci]
Jul 28, 2023
a9faeea
Add JSON pretty print for profiling [skip-ci]
Jul 28, 2023
4865c1f
Profile visualizer json parsing [skip-ci]
Jul 28, 2023
f78d5af
Add profiling results [skip-ci]
Jul 28, 2023
4c9a88b
Add frontier profile dumps [skip-ci]
Aug 2, 2023
b861c37
Update profile visualizer [skip-ci]
Aug 3, 2023
35cbc49
Add srun in perf_pipeline with start end condition [skip-ci]
Aug 8, 2023
f4fda6f
Add profling data, toml, and scripts [skip-ci]
Aug 10, 2023
3546d20
Adding notebooks for the proflier [skip-ci]
Aug 10, 2023
e84f83b
App path added in profiler [skip-ci]
Aug 11, 2023
3ad6086
Update readme for auto profiler [skip-ci]
Aug 11, 2023
af2d4f2
Update scrolling fix for auto profiler readme [skip-ci]
Aug 11, 2023
742b6de
Fix expgo_profiler notebook output [skip-ci]
Aug 11, 2023
3971aa0
Update profiler readme [skip-ci]
Aug 11, 2023
b79f31e
Move toml file to a directory [skip-ci]
Aug 11, 2023
1319433
Update clang-hip readme
Aug 11, 2023
512853f
Remove incline testing stuffs
Aug 11, 2023
ab14065
Revert gitlab pipeline removing incline
Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ install/*
*__pycache__*
*nvblas.conf
spack-*
spack_*
.vscode/
coinhsl-*
105 changes: 105 additions & 0 deletions .gitlab/ci/crusher.gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Crusher Variables
stages:
- build
- test

workflow:
rules:
# Only run if we are running manually or through a schedule
- if: $CI_PIPELINE_SOURCE == "web"
- if: $CI_PIPELINE_SOURCE == "schedule"
- when: never

.crusher_build_variables:
variables:
WORKDIR: /gpfs/alpine/csc359/proj-shared/ci/${CI_PIPELINE_ID}

.crusher_test_variables:
variables:
# Don't clone for test jobs
GIT_STRATEGY: none
# Only for slurm tagged jobs...
SCHEDULER_PARAMETERS: "-N 1 -A CSC359_crusher --time=60"
WORKDIR: /gpfs/alpine/csc359/proj-shared/ci/${CI_PIPELINE_ID}

# Crusher Jobs
Crusher Build:
stage: build
tags: [crusher, shell]
script:
- mkdir -p "$WORKDIR"
- cp -r . "$WORKDIR"
- cd "$WORKDIR"
- export srcdir=$WORKDIR builddir=$WORKDIR/build installdir=$WORKDIR/install
- MY_CLUSTER=crusher ./buildsystem/build.sh --build-only --job=clang-hip
- res=$?
- exit $res
extends: .crusher_build_variables

Crusher Test:
stage: test
dependencies: ["Crusher Build"]
tags: [crusher, slurm]
script:
- cd "$WORKDIR"
- export srcdir=$WORKDIR builddir=$WORKDIR/build installdir=$WORKDIR/install
# Logger test failing on Crusher
- export CTESTARGS="--output-on-failure -E Python"
- MY_CLUSTER=crusher ./buildsystem/build.sh --test-only --job=clang-hip
- res=$?
- exit $res
after_script:
- cd "$WORKDIR/.."
- rm -rf "$WORKDIR"
extends: .crusher_test_variables

Crusher Python Test:
stage: test
dependencies: ["Crusher Build"]
variables:
# Don't clone for test jobs
GIT_STRATEGY: none
allow_failure: true
tags: [crusher, slurm]
script:
- cd "$WORKDIR"
- export srcdir=$WORKDIR builddir=$WORKDIR/build installdir=$WORKDIR/install
- export CTESTARGS="--output-on-failure -R Python"
- MY_CLUSTER=crusher ./buildsystem/build.sh --test-only --job=clang-hip
- res=$?
- exit $res
extends: .crusher_test_variables
# ---

# Reporting Crusher Status to PNNL
.report-status:
variables:
GIT_STRATEGY: none
tags: [crusher, shell]
script:
# For complete details on the GitLab API please see:
# https://docs.gitlab.com/ee/api/commits.html#post-the-build-status-to-a-commit
# Make sure to create the token with Developer level access and API scope
- curl -X POST -H @${GITLAB_CURL_HEADERS} https://gitlab.pnnl.gov/api/v4/projects/251/statuses/${CI_COMMIT_SHA}?state=${CI_JOB_NAME}\&name=Crusher\&target_url=${CI_PIPELINE_URL}
environment:
name: reporting-gitlab

pending:
extends: .report-status
stage: .pre

# Post running status to show builds at least passed, tests are running
running:
extends: .report-status
stage: test

success:
extends: .report-status
stage: .post

failed:
stage: .post
extends: .report-status
rules:
- when: on_failure
# ---
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
url = https://github.com/pybind/pybind11.git
[submodule "tpl/spack"]
path = tpl/spack
url = https://github.com/CameronRutherford/spack.git
url = https://github.com/spack/spack.git
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ set(EXAGO_CTEST_LAUNCH_COMMAND
CACHE STRING "Command to use when launching tests"
)

option(EXAGO_ENABLE_LOGGING "Enable internal logging" OFF)
option(EXAGO_ENABLE_LOGGING "Enable internal logging" ON)

# When building with HIP support, OpenMP is not supported.
if(EXAGO_ENABLE_RAJA AND EXAGO_ENABLE_HIP)
Expand Down
1 change: 1 addition & 0 deletions applications/sopflow_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ int main(int argc, char **argv) {
ierr = SOPFLOWDestroy(&sopflow);
CHKERRQ(ierr);

MPI_Barrier(PETSC_COMM_WORLD);
ExaGOFinalize();
// PetscFinalize();
return 0;
Expand Down
2 changes: 2 additions & 0 deletions buildsystem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Each folder which builds a configuration of ExaGO should have a following:
Platforms:

- Crusher
- Frontier
- Incline

Description:

Expand Down
33 changes: 27 additions & 6 deletions buildsystem/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -197,29 +197,50 @@ then
fi

# Correctly identify clusters based on hostname
# Valid cluster suggestions are spack based,
# however you can have any job with a valid:
# buildsystem/<job-name>/<platform>Variables.sh
case $MY_CLUSTER in
newell*)
export MY_CLUSTER=newell
;;
incline*|dmi*)
export MY_CLUSTER=incline
;;
dl*|deception|*fat*)
export MY_CLUSTER=deception
;;
crusher*)
export MY_CLUSTER=crusher
;;
ascent*)
export MY_CLUSTER=ascent
;;
summit*)
export MY_CLUSTER=summit
;;
frontier*)
export MY_CLUSTER=frontier
;;
*)
echo "Cluster $MY_CLUSTER not identified - you'll have to set relevant variables manually."
echo "${MY_CLUSTER} did not match any directories in /buildsystem/spack/"
echo "Try one of the following platforms: "
echo $(ls -d ./buildsystem/spack/*/ | tr '\n' '\0' | xargs -0 -n 1 basename )
exit 1
;;
esac

ulimit -s unlimited || echo 'Could not set stack size to unlimited.'
ulimit -l unlimited || echo 'Could not set max locked memory to unlimited.'

. /etc/profile.d/modules.sh
module purge

varfile="$SRCDIR/buildsystem/$JOB/$(echo $MY_CLUSTER)Variables.sh"

if [[ -f "$varfile" ]]; then
source "$varfile"
echo Sourced system-specific variables for $MY_CLUSTER
# source varfile without stderr or stout if it exists, error if failure
set -xv
source $varfile || { echo "Could not source $varfile"; exit 1; }
# source $varfile 2>/dev/null || { echo "Could not source $varfile"; exit 1; }
set +xv
fi

# module list
Expand Down
20 changes: 13 additions & 7 deletions buildsystem/clang-hip/cache.cmake
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
message(STATUS "Loading CMake cache for a GCC+CUDA+MPI build")
message(STATUS "Loading CMake cache for a GCC+HIP+MPI build")

set(prefix ${CMAKE_SOURCE_DIR}/install)
message(STATUS "Setting initial installation prefix to ${prefix}")
Expand All @@ -7,14 +7,14 @@ set(CMAKE_INSTALL_PREFIX
CACHE PATH ""
)

set(arch 60)
message(STATUS "Using initial cuda architecture of ${arch}")
set(CMAKE_CUDA_ARCHITECTURES
set(arch gfx90a)
message(STATUS "Using initial AMD GPU target architecture of ${arch}")
set(AMDGPU_TARGETS
${arch}
CACHE STRING ""
)

message(STATUS "Building static libraries only since HiOp is static.")
message(STATUS "Just building static libraries for now.")
set(EXAGO_BUILD_SHARED
OFF
CACHE BOOL ""
Expand All @@ -30,7 +30,7 @@ set(CMAKE_BUILD_TYPE
CACHE STRING ""
)

message(STATUS "Enabling GPU, HiOp, MPI, PETSC, and RAJA")
message(STATUS "Enabling GPU (HIP), HiOp, MPI, PETSC, and RAJA")
set(EXAGO_ENABLE_GPU
ON
CACHE BOOL ""
Expand Down Expand Up @@ -72,10 +72,16 @@ set(EXAGO_RUN_TESTS
CACHE BOOL ""
)

message(STATUS "Enabling Python when building without Ipopt")
message(STATUS "Disabling Python when building with Ipopt on Crusher")
set(EXAGO_ENABLE_PYTHON
OFF
CACHE BOOL ""
)

message(STATUS "Enabling Logging")
set(EXAGO_ENABLE_LOGGING
ON
CACHE BOOL ""
)

message(STATUS "Done setting initial CMake cache")
18 changes: 12 additions & 6 deletions buildsystem/clang-hip/crusher/base.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,21 @@
export MY_CLUSTER=crusher
export PROJ_DIR=/autofs/nccs-svm1_proj/csc359

module purge
module reset

# System modules
module load rocm/5.2.0
module load libfabric/1.15.0.0
module load PrgEnv-amd
module load craype-x86-trento
module load craype-accel-amd-gfx90a
module load amd/5.2.0
module load cray-mpich/8.1.25
module load libfabric

# Consider changing to $(which clang) as for deception
export CC=/opt/rocm-5.2.0/llvm/bin/clang
export CXX=/opt/rocm-5.2.0/llvm/bin/clang++
export FC=/opt/rocm-5.2.0/llvm/bin/flang
export CC=/opt/rocm-5.2.0/llvm/bin/amdclang
export CXX=/opt/rocm-5.2.0/llvm/bin/amdclang++
export FC=/opt/rocm-5.2.0/llvm/bin/amdflang

export EXTRA_CMAKE_ARGS="$EXTRA_CMAKE_ARGS -DEXAGO_CTEST_LAUNCH_COMMAND='srun'"
export EXTRA_CMAKE_ARGS="$EXTRA_CMAKE_ARGS -DAMDGPU_TARGETS='gfx90a'"

21 changes: 21 additions & 0 deletions buildsystem/clang-hip/frontier/base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

export MY_CLUSTER=frontier
export PROJ_DIR=/autofs/nccs-svm1_proj/csc359

module reset

# System modules
module load PrgEnv-amd
module load craype-x86-trento
module load craype-accel-amd-gfx90a
module load amd/5.2.0
module load cray-mpich/8.1.25
module load libfabric

# Consider changing to $(which clang) as for deception
export CC=/opt/rocm-5.2.0/llvm/bin/amdclang
export CXX=/opt/rocm-5.2.0/llvm/bin/amdclang++
export FC=/opt/rocm-5.2.0/llvm/bin/amdflang

export EXTRA_CMAKE_ARGS="$EXTRA_CMAKE_ARGS -DEXAGO_CTEST_LAUNCH_COMMAND='srun'"
10 changes: 10 additions & 0 deletions buildsystem/clang-hip/frontier/frontierExago.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/frontier/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/frontier/modules/exago.sh

10 changes: 10 additions & 0 deletions buildsystem/clang-hip/frontier/frontierOptimizedExago.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/frontier/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/frontier/modules/exago-optimized.sh

10 changes: 10 additions & 0 deletions buildsystem/clang-hip/frontier/frontierOptimizedVariables.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/frontier/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/frontier/modules/optimized-dependencies.sh

11 changes: 11 additions & 0 deletions buildsystem/clang-hip/frontierVariables.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/frontier/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/frontier/modules/dependencies.sh


28 changes: 28 additions & 0 deletions buildsystem/clang-hip/incline/base.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

. /etc/profile.d/modules.sh

module purge

# MPI module is finnicky on incline
modules=$(module list 2>&1)
if echo $modules | grep -q 'openmpi'; then
module load gcc/8.4.0
module rm openmpi
fi

# System modules
module load gcc/8.4.0
module load openmpi/4.1.4
module load rocm/5.3.0
module load cmake/3.21.4
# Must load python module!
module load python/3.7.0

# Consider changing to $(which clang) as for deception
export CC=$(which clang)
export CXX=$(which clang++)
export FC=$(which gfortran)

export EXTRA_CMAKE_ARGS="$EXTRA_CMAKE_ARGS -DAMDGPU_TARGETS='gfx908'"

9 changes: 9 additions & 0 deletions buildsystem/clang-hip/incline/inclineExago.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/incline/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/incline/modules/exago.sh
9 changes: 9 additions & 0 deletions buildsystem/clang-hip/incline/inclineOptimizedExago.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

export SRCDIR=${SRCDIR:-$PWD}

# Platform specific configuration
source $SRCDIR/buildsystem/clang-hip/incline/base.sh

# Spack modules
source $SRCDIR/buildsystem/spack/incline/modules/exago-optimized.sh
Loading