Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train mem usage and other improvements #2439

Merged
merged 104 commits into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
5d124d0
fix track_max_mem in forward_batch_wo_cache_flash_attn_train
xaedes Jun 15, 2023
d39c8e6
remove unnecessary Adam(W) optimizer tensors.
xaedes Jun 15, 2023
d395b19
add gradient clipping to AdamW
xaedes Jun 15, 2023
d7003a9
Fix reset of unused g->nodes and g->grads to NULL
xaedes Jun 17, 2023
6e3f95b
implement gradient checkpointing for training
xaedes Jul 28, 2023
e05e441
remove unused compute buffer 3
xaedes Jun 27, 2023
ed4319e
add and use function ggml_build_backward_expand to avoid stack overfl…
xaedes Jul 28, 2023
a80f184
change AdamW decay parameter to work like the torch AdamW decay param…
xaedes Jun 29, 2023
f175ead
change default AdamW weight decay parameter used in training to 0.1 a…
xaedes Jun 29, 2023
97964a4
change default AdamW weight decay parameter defined in ggml to 0.0, m…
xaedes Jun 29, 2023
2c6985f
bug fixes for cross entropy loss
xaedes Jul 2, 2023
2d1e6e0
fix test-grad0 for cross_entropy_loss
xaedes Jul 2, 2023
864e7e3
fix test-grad0 for soft_max
xaedes Jul 2, 2023
87febee
improve finite differences of test-grad0 by using double instead of f…
xaedes Jul 2, 2023
51dc770
change cross_entropy_loss to output average over all rows
xaedes Jul 2, 2023
3744a9b
improve gradient checkpointing
xaedes Jul 2, 2023
fc379a2
disable gradient checkpointing debug output
xaedes Jul 2, 2023
d0fbb7d
llama : fix rope usage in train-text-from-scratch after ChatGLM change
xaedes Jul 28, 2023
c6a18e1
add more training parameters:
xaedes Jul 2, 2023
ce937bc
replace memcpy with reshape operation so that the graph is not cut at…
xaedes Jul 2, 2023
ff759d9
remove unused function argument from get_example_targets_batch
xaedes Jul 2, 2023
e843d6e
measure and print total training time
xaedes Jul 2, 2023
bfc3119
add optimization callback to ggml_opt_resume_g
xaedes Jul 2, 2023
d7aa4d9
use optimization callback in training
xaedes Jul 2, 2023
e6ff072
add minimum number of tensor dimensions to apply weight decay (defaul…
xaedes Jul 2, 2023
58024d3
rename training parameter cos-decay-alpha to cos-decay-min and clarif…
xaedes Jul 3, 2023
17a0898
fix increase of model.train_samples and model.train_tokens
xaedes Jul 3, 2023
24a4b09
change sampling parameters for prediction after training to defaults …
xaedes Jul 3, 2023
1065c3b
tighten abs error bounds for cross_entropy_loss in test-grad0
xaedes Jul 3, 2023
dbbc263
add conditional compilation of using F16 exp in flash attention
xaedes Jul 3, 2023
47055c9
tighten abs error bounds for flash_attn in test-grad0
xaedes Jul 3, 2023
0f6a8ab
tighten abs error bounds for sqrt in test-grad0
xaedes Jul 3, 2023
87035b9
remove out-commented vectorized code of opt_adam
xaedes Jul 3, 2023
ecdc161
ggml : update ggml_rms_norm_back with configurable eps
xaedes Jul 28, 2023
c1a5e11
llama training : fix ggml_rms_norm_back calls to pass configurable eps
xaedes Jul 28, 2023
22cb368
remove trailing whitespace
xaedes Jul 28, 2023
d43af4b
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes Aug 6, 2023
2bf422e
add train function using automatic gradient checkpointing backward pa…
xaedes Aug 6, 2023
fc826c8
in train function replace add_inplace by regular add
xaedes Aug 14, 2023
d437415
don't use allocate hash_map on context
xaedes Aug 14, 2023
cfddc36
correctly clone reshape and permute operations by also cloning tensor…
xaedes Aug 14, 2023
0dd496c
fix variable name and add missing type cast
xaedes Aug 14, 2023
52c92c0
terminate recursive tensor cloning when reaching tensor without src t…
xaedes Aug 14, 2023
345f516
correctly clone view tensors by setting data pointers
xaedes Aug 14, 2023
5a11b75
fix variable names
xaedes Aug 14, 2023
b2f1310
swap arguments to commutative ops to be the same as in `forward_batch…
xaedes Aug 14, 2023
5884b43
add input tensors as checkpoints
xaedes Aug 14, 2023
9716eb8
fix variable name and add missing boolean negation
xaedes Aug 14, 2023
38f4438
make sure some tensors are not reallocated by inserting new temporary…
xaedes Aug 14, 2023
d6c5b03
fix ASSERT to work with zero layers
xaedes Aug 14, 2023
4ed096c
add training options whether to use allocator and/or unified training…
xaedes Aug 14, 2023
865c4cd
integrate unified training function which may use memory allocator
xaedes Aug 14, 2023
3e99a8d
format name of cloned tensors with " (clone)" suffix
xaedes Aug 14, 2023
75baed2
set names for tensors in unified train function for easier debugging
xaedes Aug 14, 2023
fe788a1
allocate graph on context using ggml_new_graph
xaedes Aug 14, 2023
c954f41
remove handwritten training functions
xaedes Aug 14, 2023
271e4d6
remove unused training parameters "use_scratch" and "use_unified"
xaedes Aug 14, 2023
6f161c7
remove trailing whitespace
xaedes Aug 14, 2023
3794dce
remove unused train params: mem_compute1_gb & mem_compute2_gb
xaedes Aug 14, 2023
6e280b2
remove unused forward_batch function
xaedes Aug 14, 2023
faf3e21
add debug asserts in ggml_allocr_alloc to some common pitfalls when u…
xaedes Aug 14, 2023
098654c
only use ggml_allocr_alloc when tensor has NULL data and is no view
xaedes Aug 14, 2023
3e6468b
fix test when to create temporary backward graph
xaedes Aug 14, 2023
5622846
fix memory "leak" in optimizers
xaedes Aug 14, 2023
3b5515b
reverse order of for loop in ggml_build_backward_expand to save memor…
xaedes Aug 14, 2023
0c52c65
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes Aug 24, 2023
4072f20
add missing lctx argument to get_example_targets_batch
xaedes Aug 24, 2023
f51c5d7
implement llama model file saving using gguf
xaedes Aug 24, 2023
5407981
implement loading/saving of checkpointing files using GGUF
xaedes Aug 24, 2023
6a20f7a
bug fixes
xaedes Aug 25, 2023
167dd2d
add checkpoint file version for future compatibility
xaedes Aug 26, 2023
2978e03
update readme with gguf filenames
xaedes Aug 26, 2023
0c494cc
save & load opt->just_initialized value
xaedes Aug 27, 2023
3a91c97
add first draft for checkpoint conversion script
xaedes Aug 27, 2023
a6f3a47
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes Aug 27, 2023
cb42324
add gguf arch and ftype
xaedes Aug 27, 2023
495a62a
save opt parameter counter as uint64
xaedes Aug 27, 2023
ef899fb
add gguf key and tensor names for optimizer and training
xaedes Aug 27, 2023
d71069c
add layer_norm_rms_eps to checkpoint convert script
xaedes Aug 27, 2023
91a4cca
use same GGUF_GET_KEY macro as in llama.cpp
xaedes Aug 27, 2023
0b2c85b
use norm_rms_eps, and rope parameters and command line options to set…
xaedes Aug 27, 2023
ca5b344
fix memory corruption bug in gguf
xaedes Aug 27, 2023
5d94997
add gguf example cmake file
xaedes Aug 27, 2023
76d2794
bug fixes in tokenize_file
xaedes Aug 27, 2023
4882ff0
bug fixes in load_llama_model_gguf
xaedes Aug 27, 2023
152cfaa
bug fix: init model when no checkpoint was loaded
xaedes Aug 27, 2023
1f83343
bug fix in read_tensor_by_name
xaedes Aug 28, 2023
3d8d884
bug fix in load_opt_context_gguf
xaedes Aug 28, 2023
e86b3e3
avoid printing lots of spaced on the unusual case that loss gets nan
xaedes Aug 28, 2023
daa0b6c
set name of tensors with empty name from what was read from gguf
xaedes Aug 28, 2023
f97f92b
remove trailing whitespace
xaedes Aug 28, 2023
c690c20
print data checksums before saving and after loading to verify correc…
xaedes Aug 28, 2023
5f27ade
bug fixes for convert-train-checkpoint-to-gguf
xaedes Aug 28, 2023
e8df9e6
temporarily add code to write old checkpoint files
xaedes Aug 28, 2023
31c093c
bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints…
xaedes Aug 28, 2023
63bf200
remove code used to verify correctness of checkpoint file conversion
xaedes Aug 28, 2023
3155019
remove trailing whitespace
xaedes Aug 28, 2023
3e7dfd0
remove prediction related code
xaedes Aug 28, 2023
17ab46d
update train-text-from-scratch README.md
xaedes Aug 28, 2023
12c4e5b
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes Aug 28, 2023
a925e93
fix non-windows GGML_ALIGNED_REALLOC
xaedes Aug 28, 2023
440d221
add missing blank line at end of file
xaedes Aug 28, 2023
f6828cb
remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gg…
xaedes Aug 28, 2023
93535a4
train : fix compile warnings
ggerganov Aug 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <string>
#include <unordered_set>
#include <vector>
#include <cinttypes>

#if defined(__APPLE__) && defined(__MACH__)
#include <sys/types.h>
Expand Down Expand Up @@ -938,8 +939,8 @@ std::string get_sortable_timestamp() {

const int64_t ns = std::chrono::duration_cast<std::chrono::nanoseconds>(
current_time.time_since_epoch() % 1000000000).count();
char timestamp_ns[10];
snprintf(timestamp_ns, 11, "%09ld", ns);
char timestamp_ns[11];
snprintf(timestamp_ns, 11, "%09" PRId64, ns);

return std::string(timestamp_no_ns) + "." + std::string(timestamp_ns);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -681,7 +681,6 @@ void save_as_llama_model(struct llama_vocab * vocab, struct my_llama_model * mod

// for rms-att-weight
int row_length = model->hparams.n_embd;
const auto & hparams = model->hparams;
int n_ff = model->hparams.n_ff;

for (uint32_t i = 0; i < model->hparams.n_layer; ++i){
Expand Down
5 changes: 5 additions & 0 deletions examples/gguf/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
set(TARGET gguf)
add_executable(${TARGET} gguf.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
14 changes: 7 additions & 7 deletions examples/train-text-from-scratch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ wget https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/s

# train
./bin/train-text-from-scratch \
--vocab-model ../models/ggml-vocab.bin \
--vocab-model ../models/ggml-vocab-llama.gguf \
--ctx 64 --embd 256 --head 8 --layer 16 \
--checkpoint-in chk-shakespeare-256x16.bin \
--checkpoint-out chk-shakespeare-256x16.bin \
--model-out ggml-shakespeare-256x16-f32.bin \
--checkpoint-in chk-shakespeare-256x16.gguf \
--checkpoint-out chk-shakespeare-256x16.gguf \
--model-out ggml-shakespeare-256x16-f32.gguf \
--train-data "shakespeare.txt" \
-t 6 -b 16 -n 32 --seed 1 --adam-iter 16 \
--print-details-interval 0 --predict 16 --use-flash
-t 6 -b 16 --seed 1 --adam-iter 256 \
--no-checkpointing

# predict
./bin/main -m ggml-shakespeare-256x16-f32.bin
./bin/main -m ggml-shakespeare-256x16-f32.gguf
```
492 changes: 492 additions & 0 deletions examples/train-text-from-scratch/convert-train-checkpoint-to-gguf.py

Large diffs are not rendered by default.

Loading