diff --git a/assignments/shell-slurm-practice.md b/assignments/shell-slurm-practice.md index 54e5b8a..460f0ea 100644 --- a/assignments/shell-slurm-practice.md +++ b/assignments/shell-slurm-practice.md @@ -24,6 +24,8 @@ The second script, `analyze.sh`, will be a bash script that gets submitted as a The third script, `submit.sh`, will be run on the login nodes (with `bash`, not `sbatch`) and submit a job array of size 10 that runs `energize.sh`, once per array task. It will then parse the job id from running sbatch (search the `sbatch` man page for "parsable") and submit a second job that is dependent on the first completing successfully; if the first job fails, the second job should not run. The second job will run `analyze.sh`. You will need to pass the job id from the first job to the `analyze.sh` either as a command line argument or an environment variable. +**Make sure not to run `submit.sh` from a node-local directory like `/tmp` or `/dev/shm`**--if you do so, no outfile will show up since the job will try to write it to the compute node's storage rather than the storage of the node you're on. + Put these scripts into a tar file. Compress the tar file using gzip. Submit this `.tar.gz` file through [Canvas](https://byu.instructure.com/courses/21221/assignments). ## Grading diff --git a/readings/jthread.md b/readings/jthread.md new file mode 100644 index 0000000..433bcbb --- /dev/null +++ b/readings/jthread.md @@ -0,0 +1,155 @@ +# C++ Threading + +[`std::jthread`](https://en.cppreference.com/w/cpp/thread/jthread)s are C++ 20's flagship mechanism for [threading](threading.md). Unlike [OpenMP](openmp.md), adding them to a program means that the program is squarely parallel and must be compiled as such. + + + +## Compiling with C++ Threads + +`g++ -pthread ...` will compile with C++ threads; it's usually easy to find the required flag on the man page or the internet for any compiler. `CMakeLists.txt` requires two things: finding the threading package and linking it to a target: + +```cmake +# At the top, under the `project` call: +find_package(Threads REQUIRED) +# With other compilation calls +add_executable(blah blah.cpp) +target_link_libraries(blah PRIVATE Threads::Threads) +``` + + + +## Using C++ Threads + +`std::jthread`s are simple in principle--they spawn, run a function while the main program is executing, then [join](https://en.cppreference.com/w/cpp/thread/jthread/join) automatically. For example, the following program prints arabic numerals in the main thread, latin in the other: + +```c++ +#include +#include +#include + +int main() { + // Latin numerals print in a spawned thread + auto latin_numeral_thread = std::jthread([]{ // using lambdas to spawn threads is the convention. + for (auto &numeral: {"I", "II", "III", "IV", "V"}) { + std::this_thread::sleep_for(std::chrono::milliseconds(50)); + std::cout << numeral << std::endl; + } + }); + // Arabic numerals print in the main thread + for (int i=1; i<=5; i++) { + std::this_thread::sleep_for(std::chrono::milliseconds(45)); + std::cout << i << " is written as "; + } + return 0; // latin_numeral_thread is automatically joined +} + +/* Compile with `g++ -std=c++20 -pthread -o numerals numerals.cpp` + * Output looks like: + * 1 is written as I + * 2 is written as II + * 3 is written as III + * 4 is written as IV + * 5 is written as V + */ +``` + +This is simple enough when no coordination is required, or when clock-based coordination is sufficient as above, but getting threads to synchronize or communicate correctly is challenging. Even getting two threads to interleave the printing of "PING" and "PONG" requires 3 variables dedicated solely to coordination: + +```c++ +#include +#include +#include +#include + +int main() { + // Simulation parameters + const int n_volleys = 4; + bool pinging = true; // start with PING + std::mutex mtx; + std::condition_variable cv; + // PONG in worker thread + auto pong_thread = std::jthread([&]{ + for (int i=0; i +#include + +int main() { + std::atomic counter = 0; + #pragma omp parallel for + for (size_t i = 0; i < 20000000; ++i) { + counter += 1; + } + std::cout << counter << std::endl; + return 0; +} +``` + +### C++ Mutexes + +[`std::mutex`](https://en.cppreference.com/w/cpp/thread/mutex) and [`std::counting_semaphore`](https://en.cppreference.com/w/cpp/thread/counting_semaphore) provide [mutexes and sempahores](threading.md#mutexes-and-semaphores) in C++. It's best to [use](#using-c-threads) [`std::unique_lock`](https://en.cppreference.com/w/cpp/thread/unique_lock) and [`std::lock_guard`](https://en.cppreference.com/w/cpp/thread/lock_guard) rather than locking and unlocking mutexes manually. + +### C++ Condition Variables + +[Condition variables](https://www.cplusplus.com/reference/condition_variable/condition_variable/) encapsulate the idea of waiting until some condition is true. They have three main methods: + +- `wait` +- `notify_one` +- `notify_all` + +The example below shows a simple thread-safe queue with `push` and `pop` methods. Safety is ensured by the `condition_variable` and `mutex`. **Take a bit of time to make sure you understand how this works and why it's safe**: + +```c++ +std::mutex mtx; +std::condition_variable cv; +auto pop(std::queue &queue) { + std::unique_lock lock(mtx); + while (queue.empty()) { + cv.wait(lock); + } + return queue.pop(); +} +void push(std::queue &queue, auto val) { + std::unique_lock lock(mtx); + queue.push(val); + cv.notify_one(); +} +``` + +### C++ Barriers + +Use [`std::barrier`](https://en.cppreference.com/w/cpp/thread/barrier) for [barriers](threading.md#barriers) in C++20. diff --git a/readings/openmp.md b/readings/openmp.md index fcdd318..ea1331b 100644 --- a/readings/openmp.md +++ b/readings/openmp.md @@ -73,8 +73,15 @@ OpenMP comes with these built-in reduction operators in C and C++: The race condition in the [first example](#openmp-threading), which results from multiple threads trying to simultaneously modify `counter`, can be fixed with a sum reduction: ```c++ -#pragma omp parallel for reduction(+:counter) -for (long i{0}; i < 20'000'000l; ++i) { - counter += 1; +#include + +int main() { + size_t counter = 0; + #pragma omp parallel for reduction(+:counter) + for (size_t i = 0; i < 20000000; ++i) { + counter += 1; + } + std::cout << counter << std::endl; + return 0; } ```