multi-threaded model initialization #737

ngc92 · 2024-08-12T15:18:49Z

A simple demonstration of initializing from multiple threads.
Factored out the parallel part; by making it a separate function, it becomes less likely that we accidentally change shared state by using variables from the outer scope.

ademeure · 2024-08-12T17:11:56Z

train_gpt2.cu

@@ -504,6 +504,47 @@ void gpt2_build_from_checkpoint(GPT2 *model, const char* checkpoint_path, bool w
    cudaCheck(cudaDeviceSynchronize());
 }

+void gpt2_init_layer(GPT2 *model, int l, mt19937_state* rng, floatX* params) {
+    int offset = 0;


This should be size_t for larger model sizes.

ademeure · 2024-08-12T17:13:12Z

Looks good to me, it improves startup time for -e "d72" from ~100s to ~15s on a 1xH100 node with 26 CPU cores! :)

multi-threaded model initialization

011f59c

ngc92 force-pushed the threaded-init branch from 15d94f7 to 011f59c Compare August 12, 2024 15:29

ademeure reviewed Aug 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-threaded model initialization #737

multi-threaded model initialization #737

ngc92 commented Aug 12, 2024

ademeure Aug 12, 2024

ademeure commented Aug 12, 2024

multi-threaded model initialization #737

Are you sure you want to change the base?

multi-threaded model initialization #737

Conversation

ngc92 commented Aug 12, 2024

ademeure Aug 12, 2024

Choose a reason for hiding this comment

ademeure commented Aug 12, 2024