From 25c80b93a377efe151d7d24a678e444bd4ae8dec Mon Sep 17 00:00:00 2001 From: Holden Date: Tue, 2 Apr 2024 00:13:46 +0800 Subject: [PATCH] Squash commits until Mixtral support for merging MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit llama : restore prefix space in llama tokenizer (#4081) gguf : fix potential infinite loops while parsing (#4100) Co-authored-by: Bernhard Gstrein Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time llama : fix data units (#4101) * llama : fix data units ggml-ci * Revert "llama : fix data units" This reverts commit f5feac831fe225ed7f3db938d115732a49dccfc4. * llama : disambiguate data units ggml-ci cuda : get_row_rounding F32 (#4095) * Fix #4017 * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel --------- Co-authored-by: Jared Van Bortel finetune : zero the loraB initial vectors (#4082) * finetune : zero the loraB initial vectors Without this, the first iteration is starting out far from the base model, instead of exactly on it. Zeroing loraB is what the paper recommends. loralib also zeroes at least one of the init vector pairs (though it departs from the paper in using a different distribution for the other vector, in some cases). * tabs to spaces * Use ggml_set_zero instead of adding a new function finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079) * Remove logically superfluous assertions and order by dimension * Use cblas_sgemm() to implement ggml_compute_forward_out_prod() * Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace * Add openBLAS support for sgemm() in compute_forward_out_prod() llama : add functions to get the model's metadata (#4013) * llama : add functions to get the model's metadata * format -> std::to_string * better documentation train : move number of gpu layers argument parsing to common/train.cpp (#4074) - introduces help entry for the argument - cuts '--gpu-layers' form in order to simplify usage and documentation. Signed-off-by: Jiri Podivin Co-authored-by: Jiri Podivin py : remove superfluous import statements (#4076) Signed-off-by: Jiri Podivin Co-authored-by: Jiri Podivin llava : fix compilation warning that fread return value is not used (#4069) common : improve yaml log escaping (#4080) * logging: improve escaping in yaml output * logging: include review feedback py : Falcon HF compatibility (#4104) Falcon HF compatibility convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089) Co-authored-by: Don Mahurin <@> examples : add tokenize (#4039) tokenize : fix trailing whitespace build : support ppc64le build for make and CMake (#3963) * build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov --------- Co-authored-by: Georgi Gerganov llama : increase max nodes (#4115) Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124) * ggml-cuda.cu: Clean up warnings when compiling with clang * ggml-cuda.cu: Move static items into anonymous namespace * ggml-cuda.cu: Fix use of namespace start macro * Revert "ggml-cuda.cu: Fix use of namespace start macro" This reverts commit 26c11490266c096e3e5731e05270a8f73a5b2874. * Revert "ggml-cuda.cu: Move static items into anonymous namespace" This reverts commit e29757e0f7535d1ac314300f0324684cc785e06c. scripts : Remove missed baichuan convert script (#4127) tokenize example: Respect normal add BOS token behavior (#4126) Allow building with Makefile gguf-py : export chat templates (#4125) * gguf-py : export chat templates * llama.cpp : escape new lines in gguf kv info prints * gguf-py : bump version * gguf-py : check chat_template type * gguf-py : initialize chat_template gitignore : tokenize common : comma should be semicolon (#4137) server : relay error messages (#4131) finetune : add --n-gpu-layers flag info to --help (#4128) Revert "finetune : add --n-gpu-layers flag info to --help (#4128)" This reverts commit 05e8301e4593e2a67b4bae24f093dd12ce5cc7c2. speculative : fix prompt tokenization in speculative example (#4025) * Support special tokens and not adding BOS to prompt in speculative * Adapt to new should_add_bos function * Ensure tgt and dft have same add_bos setting ci : add flake8 to github actions (python linting) (#4129) Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead) main : Add ChatML functionality to main example (#4046) Co-authored-by: Sebastian Cramond readme : update ROCm Windows instructions (#4122) * Update README.md * Update README.md Co-authored-by: Jared Van Bortel --------- Co-authored-by: Jared Van Bortel finetune - update readme to mention llama support only (#4148) stablelm : simplify + speedup generation (#4153) docs : add llama-star arch idea examples : fix typo in parallel example doc comment (#4181) Signed-off-by: Daniel Bevenius readme : update hot topics llama : KV cache view API + better KV cache management (#4170) * llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> Fix incorrect format strings and uninitialized variables. (#4133) * Fix incorrect format strings and uninitialized variables. * Address comments * Add the missing include statement readme : use PATH for Windows ROCm (#4195) * Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md main.swift : fix eos checking (#4197) llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter. convert : fix tensors using grad in some models (#4173) ggml-cuda : support stablelm rope (#4156) * ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov llama : set metal log callback correctly (#4204) server : OAI API compatibility (#4198) * Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi LΓΌtke readme : update hot topics Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189) llama : grammar `reserve` space in `decode_utf8` (#4210) * reserve space for codepoints * improvement for the appended 0 scripts : Use mmap in torch load (#4202) * Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference metal : fix yarn (#4220) get the correct n_orig_ctx in metal lookahead : add example for lookahead decoding (#4207) * lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren readme : update hot topics lookahead : support `-n -1` infinite generation ggml : fix -Warray-bounds warning with gcc (#4231) examples : iOS example with swift ui (#4159) * copy to llama.cpp as subdir * attempt enabling metal, fails * ggml metal compiles! * Update README.md * initial conversion to new format, utf8 errors? * bug fixes, but now has an invalid memory access :( * added O3, now has insufficient memory access * begin sync with master * update to match latest code, new errors * fixed it! * fix for loop conditionals, increase result size * fix current workflow errors * attempt a llama.swiftui workflow * Update .github/workflows/build.yml Co-authored-by: Georgi Gerganov --------- Co-authored-by: Georgi Gerganov readme : add Amica to UI list (#4230) cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240) * ggml : use blas even if src0 is not F32 * llama : use n_threads_batch only when n_tokens >= 32 ggml-ci * llama : revert n_threads_batch logic ggml-ci ggml : restore abort() in GGML_ASSERT (#4242) readme : add FreeChat (#4248) examples : add readme files py : fix oai proxy (#3972) * fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n` llama : fix typical sampling (#4261) Typical sampling was broken because after copying new_candidates into canditates, the "sorted" bool is left at "true", but the new data is no longer sorted according to probability. Patch to set "sorted" to false. Test: Generating with temp=0.0001 (approx. argmax) should generate the same sequence at typical>=1.0 and typical=0.9999 (approx. disabled, but enters the typical sampling codepath). convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) llama : fix alignment of general.name in print meta (#4254) * llama: fix alignment of general.name in print meta This commit fixes the alignment of the general.name field in the llm_load_print_meta function. Currently the output looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` And with this commit it looks like this: ```console llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 13.02 B llm_load_print_meta: model size = 6.86 GiB (4.53 BPW) llm_load_print_meta: general.name = LLaMA v2 ``` Signed-off-by: Daniel Bevenius * llama: fix alignment of special tokens Signed-off-by: Daniel Bevenius --------- Signed-off-by: Daniel Bevenius readme : fix typo (#4253) llama.cpp uses GitHub Actions, not Gitlab Actions. cmake : fix the metal file foder path (#4217) batched.swift : update README.md (#4214) docs: update how to run docker : add finetune option (#4211) readme : fix (#4135) * fix: readme * chore: resolve comments * chore: resolve comments main : pass LOG_TEE callback to llama.cpp log (#4033) * main : Call llama_log_set to use LOG_TEE * tabs to spaces llava : ShareGPT4V compatibility (vision encoder only loading) (#4172) * ShareGPT4 compatibility (vision encoder only loading) Load only a CLIP vision encoder (as supplied by ShareGPT finetunes) Corrects the argument parsing for --img_mean and --img_std (which were previously not parsed but attempted to access) Defines defaults for img_mean and img_std which are equal to the llava 1.5 CLIP encoder, so you do not have to provide them * Update convert-image-encoder-to-gguf.py build : fix build info generation and cleanup Makefile (#3920) * cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae. make : fix Apple clang determination bug (#4272) Co-authored-by: Will Findley server : add single-client multi-prompt support (#4232) * * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel * * change to set --------- Co-authored-by: Jared Van Bortel server : add --log-disable to disable logging to file (#4260) * * add --log-disable to disable logging to file in the server example * * typo fix ggml : add ggml_soft_max_ext (#4256) * metal : implement soft_max_ext * cuda : implement soft_max_ext * ggml : implement soft_max_ext (CPU) * batched-bench : print threads ggml-ci * metal : simplify soft_max encoding ggml-ci * cuda : use 512 threads for soft_max instead of 32 * ggml : update soft max cpu * cuda : do warp-based block reduce * cuda : increase max block size to 1024 * cuda : fix warp reduction initialization of shared mem * metal : warp-based reduction for soft max kernel * metal : warp-based reduce for rms_norm * metal : simplify soft max kernel ggml-ci * alloc : fix build with debug py : add requirements file for convert-hf-to-gguf.py (#4277) This commit adds a requirements file for the convert-hf-to-gguf.py script, and also add the torch and transformers packages to it. The motivation for this is that currently running convert-hf-to-gguf.py will produce the following error: ```console $ python3 -m venv venv $ source venv/bin/activate (venv) $ pip install -r requirements.txt Collecting numpy==1.24.4 Collecting sentencepiece==0.1.98 Collecting gguf>=0.1.0 Installing collected packages: sentencepiece, numpy, gguf Successfully installed gguf-0.5.1 numpy-1.24.4 sentencepiece-0.1.98 (venv) $ python convert-hf-to-gguf.py --help Traceback (most recent call last): File "llama.cpp/convert-hf-to-gguf.py", line 16, in import torch ModuleNotFoundError: No module named 'torch' ``` With this commit, and using requirements-hf-to-gguf.txt instead of requirements.txt, the script can be run and shows the help output. Signed-off-by: Daniel Bevenius llama : fix integer overflow during quantization (#4284) happens with multi-threaded quantization of Qwen-72B ggml-ci llama : add Qwen support (#4281) * enable qwen to llama.cpp * llama : do not GPU split bias tensors --------- Co-authored-by: Georgi Gerganov llama : support attention bias on LLaMA architecture (#4283) * Support attention_bias on LLaMA architecture QKVO bias, should fix InternLM (https://github.com/ggerganov/llama.cpp/issues/3133) and works for LLaMAfied Qwen models (https://github.com/ggerganov/llama.cpp/pull/3743#issuecomment-1825923608). * check existence of qkvo bias while loading llama models Tested on LLaMA2, CUDA and CPU. * Update llama.cpp build : enable libstdc++ assertions for debug builds (#4275) swift : fix token_to_piece implementation (#4278) * Fix token_to_piece implementation in Swift * Fix errors llama : support optional tensors (#4283) llama : avoid using "optional" keyword (#4283) llama : pad KV cache size (#4280) * llama : pad KV cache size to 32 * metal : try to improve batched decoding py : add grammar to oai like api (#4294) server : fix OpenAI API `stop` field to be optional (#4299) (cherry picked from commit Mozilla-Ocho/llamafile@e8c92bcb84ae3bcbf0d617b7ee6a5413bcbd58af) ggml : fix soft max out-of-bounds access (#4307) ggml-ci ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308) * ggml : fix soft max out-of-bounds access ggml-ci * ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() ggml-ci grammar-parser : fix typo (#4318) preceeding -> preceding swift : fix prompt tokenization logic (#4321) swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325) simple : update error message for KV cache check (#4324) This commit updates the error message that is printed when the KV cache is not big enough to hold all the prompt and generated tokens. Specifically it removes the reference to n_parallel and replaces it with n_len. Signed-off-by: Daniel Bevenius swift : revert compiler checks for swift package (#4332) sampling : custom samplers order (#4285) * Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces llama : allow overriding GGUF metadata when loading model (#4092) * feat: Allow overriding GGUF metadata when loading model * Fix the one time GCC is stricter than clang about something * Step1 * Refactor... basically everything! * Nuke obsolete GetArrayLen struct * simplify std::string specialization * Various cleanups Add informational output when overrides are applied Warn user when an override with the wrong type is specified * Fix broken logic for parsing bool KV overrides Fix issue where overrides didn't apply when key missing in GGUF metadata Resolve merge changes * llama : rearrange model params * Update new GET_KEY call Add note that metadata KV overrides aren't reflected in initial metadata KV info dump --------- Co-authored-by: cebtenzzre Co-authored-by: Georgi Gerganov grammar : pre-computed pieces + reserve mem + less string copies (#4330) * reserve space for codepoints * improvement for the appended 0 * used precomputed token text for grammar sample * reserve canidates_decoded * reserve canidates_grammar * remove candidates_decoded * Revert "remove candidates_decoded" This reverts commit 3773328080e6a139ee83198329a13cf4ff61d707. * changed decode_utf8 to take src by ref speculative : support `--color` (#4343) * speculative: add some colors * minor : add braces --------- Co-authored-by: Georgi Gerganov common : fix compile warning server : recognize cache_prompt parameter in OAI API (#4347) train : fix #4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (#4351) On commit b1108 (44c117f4) xaedes added ggml_allocr * alloc = NULL; ... (many lines in between) if (alloc) { ggml_allocr_free(alloc); } Which is correct, but it's easy to lose context after many lines in between. On commit b1287 (0e76a899) xaedes made a big change. From here on, alloc is freed eagerly. alloc = ggml_allocr_new(...) ... (short lines of code) ggml_allocr_free(alloc) This happens a few times, but alloc is never set to NULL, and many lines below, we still have if (alloc) { ggml_allocr_free(alloc); } which causes a double-free. llama : per-layer KV cache + quantum K cache (#4309) * per-layer KV * remove unnecessary copies * less code duplication, offload k and v separately * llama : offload KV cache per-layer * llama : offload K shift tensors * llama : offload for rest of the model arches * llama : enable offload debug temporarily * llama : keep the KV related layers on the device * llama : remove mirrors, perform Device -> Host when partial offload * common : add command-line arg to disable KV cache offloading * llama : update session save/load * llama : support quantum K cache (#4312) * llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren * readme : add API change notice --------- Co-authored-by: slaren sync : ggml (new ops, tests, backend, etc.) (#4359) * sync : ggml (part 1) * sync : ggml (part 2, CUDA) * sync : ggml (part 3, Metal) * ggml : build fixes ggml-ci * cuda : restore lost changes * cuda : restore lost changes (StableLM rope) * cmake : enable separable compilation for CUDA ggml-ci * ggml-cuda : remove device side dequantize * Revert "cmake : enable separable compilation for CUDA" This reverts commit 09e35d04b1c4ca67f9685690160b35bc885a89ac. * cuda : remove assert for rope * tests : add test-backend-ops * ggml : fix bug in ggml_concat * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()` * ci : try to fix macOS * ggml-backend : remove backend self-registration * ci : disable Metal for macOS cmake build ggml-ci * metal : fix "supports family" call * metal : fix assert * metal : print resource path ggml-ci --------- Co-authored-by: slaren grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396) Update README.md (#4388) Fix small typo. ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (#4424) server : fix local model name in server (#4420) llama : document logits_all deprecation (#4418) llama_context_params.logits_all is a parameter for controlling llama_eval. This documents that logits_all should not be used with llama_decode and llama_batch. build : target Windows 8 for standard mingw-w64 (#4405) * build : target Windows 8 for standard mingw-w64 * make : fix missing console.o deps This was causing a link error with `make all` on Windows. english : use `typos` to fix comments and logs (#4354) server : tweak default sampling parameters (#4367) * Set a more typical Top P setting as the default * Update temp max llama : add Mixtral support (#4406) * convert : support Mixtral as LLAMA arch * convert : fix n_ff typo * llama : model loading * ggml : sync latest ggml_mul_mat_id * llama : update graph to support MoE * llama : fix cur -> cur_expert * llama : first working version * llama : fix expert weighting in the FFN * ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu only) * ggml : add n_as argument to ggml_mul_mat_id * ggml : fix ggml_get_rows to take into account ne02 / ne11 * metal : add more general support for ggml_get_rows + tests * llama : add basic support for offloading moe with CUDA * metal : add/mul/div use general kernel when src1 not cont * metal : reduce the kernel launches for ggml_mul_mat_id * ggml : get_rows : support non-contiguos tensors with gaps, generalize up to 3D * ggml : update get_rows f16 and q * cuda : support non-contiguous src1 in get_rows * llama : offload missing ffn_moe_silu * metal : fix ggml_get_rows to work with non-cont src1 * metal : add indirect mat-vec kernels for all quantization types * llama : do not quantize expert gating tensors * llama : add n_expert and n_expert_used to hparams + change quants * test-backend-ops : add moe test * cuda : fix get_rows when ncols is odd * convert : determine n_ctx correctly * metal : fix ggml_mul_mat_id for F32 * test-backend-ops : make experts more evenly probable (test_moe) * test-backend-ops : cleanup, add moe test for batches * test-backend-ops : add cpy from f32 -> all types test * test-backend-ops : fix dequantize block offset * llama : fix hard-coded number of experts * test-backend-ops : simplify and disable slow tests to avoid CI timeout * test-backend-ops : disable MOE test with thread sanitizer * cuda : fix mul_mat_id with multi gpu * convert : use 1e6 rope_freq_base for mixtral * convert : fix style * convert : support safetensors format * gguf-py : bump version * metal : add cpy f16 -> f32 kernel * metal : fix binary ops for ne10 % 4 != 0 * test-backend-ops : add one more sum_rows test * ggml : do not use BLAS with ggml_mul_mat_id * convert-hf : support for mixtral-instruct (#4428) * convert : typo fix, add additional hyperparameters, use LLaMA arch for Mixtral-instruct * convert : use sentencepiece tokenizer for Mixtral-instruct * convert : make flake8 happy * metal : fix soft_max kernels ref: https://github.com/ggerganov/ggml/pull/621/commits/1914017863d2f9ab8ecc0281cc2a56d683668b92 * metal : limit kernels to not use more than the allowed threads --------- Co-authored-by: Georgi Gerganov Co-authored-by: Radek Pilar --- .devops/tools.sh | 4 + .github/workflows/build.yml | 26 +- .github/workflows/python-lint.yml | 20 + .gitignore | 28 +- CMakeLists.txt | 38 +- Makefile | 71 +- Package.swift | 46 +- README.md | 24 +- common/CMakeLists.txt | 9 +- common/common.cpp | 247 +- common/common.h | 31 +- common/grammar-parser.cpp | 2 +- common/log.h | 8 +- common/sampling.cpp | 62 +- common/sampling.h | 36 +- common/train.cpp | 12 + convert-baichuan-hf-to-gguf.py | 317 -- convert-hf-to-gguf.py | 186 +- convert-llama-ggml-to-gguf.py | 54 +- convert-persimmon-to-gguf.py | 4 +- convert.py | 139 +- docs/llama-star/idea-arch.key | Bin 0 -> 488591 bytes docs/llama-star/idea-arch.pdf | Bin 0 -> 42334 bytes examples/CMakeLists.txt | 2 + examples/batched-bench/batched-bench.cpp | 2 +- examples/batched.swift/README.md | 2 +- examples/batched.swift/Sources/main.swift | 17 +- examples/finetune/README.md | 2 +- .../convert-finetune-checkpoint-to-gguf.py | 2 - examples/finetune/finetune.cpp | 35 +- examples/infill/infill.cpp | 9 +- examples/llama-bench/llama-bench.cpp | 111 +- examples/llama.swiftui/.gitignore | 1 + examples/llama.swiftui/README.md | 7 + .../llama.cpp.swift/LibLlama.swift | 208 ++ .../llama.cpp.swift/bridging-header.h | 5 + .../llama.swiftui.xcodeproj/project.pbxproj | 481 ++++ .../contents.xcworkspacedata | 7 + .../xcshareddata/IDEWorkspaceChecks.plist | 8 + .../AccentColor.colorset/Contents.json | 11 + .../AppIcon.appiconset/Contents.json | 13 + .../Assets.xcassets/Contents.json | 6 + .../llama.swiftui/Models/LlamaState.swift | 45 + .../Preview Assets.xcassets/Contents.json | 6 + .../llama.swiftui/Resources/models/.gitignore | 0 .../llama.swiftui/UI/ContentView.swift | 42 + .../llama.swiftui/llama_swiftuiApp.swift | 10 + examples/llava/clip.cpp | 2 +- .../llava/convert-image-encoder-to-gguf.py | 54 +- examples/llava/llava-cli.cpp | 3 +- examples/llava/llava.cpp | 9 +- examples/lookahead/CMakeLists.txt | 5 + examples/lookahead/README.md | 7 + examples/lookahead/lookahead.cpp | 487 ++++ examples/main/main.cpp | 46 +- examples/parallel/parallel.cpp | 11 +- examples/perplexity/perplexity.cpp | 8 +- examples/quantize-stats/quantize-stats.cpp | 1 - examples/server/README.md | 51 +- examples/server/api_like_OAI.py | 47 +- examples/server/json.hpp | 2 +- examples/server/public/completion.js | 6 +- examples/server/public/index.html | 10 +- examples/server/server.cpp | 538 +++- examples/simple/simple.cpp | 2 +- examples/speculative/README.md | 8 + examples/speculative/speculative.cpp | 34 +- examples/tokenize/CMakeLists.txt | 5 + examples/tokenize/tokenize.cpp | 44 + .../train-text-from-scratch.cpp | 4 - ggml-alloc.c | 51 +- ggml-alloc.h | 9 +- ggml-backend-impl.h | 67 +- ggml-backend.c | 719 ++++- ggml-backend.h | 79 +- ggml-cuda.cu | 1945 ++++++++++--- ggml-cuda.h | 10 +- ggml-impl.h | 2 +- ggml-metal.h | 6 + ggml-metal.m | 990 +++++-- ggml-metal.metal | 2541 ++++++++++++++--- ggml-opencl.cpp | 12 +- ggml-quants.c | 6 +- ggml.c | 774 +++-- ggml.h | 73 +- gguf-py/README.md | 2 +- gguf-py/gguf/constants.py | 65 +- gguf-py/gguf/gguf_writer.py | 11 +- gguf-py/gguf/tensor_mapping.py | 55 +- gguf-py/gguf/vocab.py | 39 +- gguf-py/pyproject.toml | 2 +- gguf-py/scripts/gguf-dump.py | 15 +- llama.cpp | 1819 ++++++++---- llama.h | 115 +- prompts/chat-with-qwen.txt | 1 + requirements-hf-to-gguf.txt | 3 + scripts/build-info.cmake | 22 - scripts/gen-build-info-cpp.cmake | 24 + scripts/sync-ggml.sh | 5 +- tests/CMakeLists.txt | 28 +- tests/test-backend-ops.cpp | 1490 ++++++++++ tests/test-grad0.cpp | 2 +- tests/test-quantize-perf.cpp | 4 +- tests/test-tokenizer-0-falcon.py | 58 +- tests/test-tokenizer-0-llama.py | 54 +- 105 files changed, 12091 insertions(+), 2787 deletions(-) create mode 100644 .github/workflows/python-lint.yml delete mode 100755 convert-baichuan-hf-to-gguf.py create mode 100755 docs/llama-star/idea-arch.key create mode 100644 docs/llama-star/idea-arch.pdf create mode 100644 examples/llama.swiftui/.gitignore create mode 100644 examples/llama.swiftui/README.md create mode 100644 examples/llama.swiftui/llama.cpp.swift/LibLlama.swift create mode 100644 examples/llama.swiftui/llama.cpp.swift/bridging-header.h create mode 100644 examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj create mode 100644 examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/contents.xcworkspacedata create mode 100644 examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/xcshareddata/IDEWorkspaceChecks.plist create mode 100644 examples/llama.swiftui/llama.swiftui/Assets.xcassets/AccentColor.colorset/Contents.json create mode 100644 examples/llama.swiftui/llama.swiftui/Assets.xcassets/AppIcon.appiconset/Contents.json create mode 100644 examples/llama.swiftui/llama.swiftui/Assets.xcassets/Contents.json create mode 100644 examples/llama.swiftui/llama.swiftui/Models/LlamaState.swift create mode 100644 examples/llama.swiftui/llama.swiftui/Preview Content/Preview Assets.xcassets/Contents.json create mode 100644 examples/llama.swiftui/llama.swiftui/Resources/models/.gitignore create mode 100644 examples/llama.swiftui/llama.swiftui/UI/ContentView.swift create mode 100644 examples/llama.swiftui/llama.swiftui/llama_swiftuiApp.swift create mode 100644 examples/lookahead/CMakeLists.txt create mode 100644 examples/lookahead/README.md create mode 100644 examples/lookahead/lookahead.cpp create mode 100644 examples/speculative/README.md create mode 100644 examples/tokenize/CMakeLists.txt create mode 100644 examples/tokenize/tokenize.cpp create mode 100644 prompts/chat-with-qwen.txt create mode 100644 requirements-hf-to-gguf.txt create mode 100644 scripts/gen-build-info-cpp.cmake create mode 100644 tests/test-backend-ops.cpp diff --git a/.devops/tools.sh b/.devops/tools.sh index 9d999315f3887..3a7d274e46619 100755 --- a/.devops/tools.sh +++ b/.devops/tools.sh @@ -13,6 +13,8 @@ elif [[ "$arg1" == '--quantize' || "$arg1" == '-q' ]]; then ./quantize "$@" elif [[ "$arg1" == '--run' || "$arg1" == '-r' ]]; then ./main "$@" +elif [[ "$arg1" == '--finetune' || "$arg1" == '-f' ]]; then + ./finetune "$@" elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then echo "Converting PTH to GGML..." for i in `ls $1/$2/ggml-model-f16.bin*`; do @@ -34,6 +36,8 @@ else echo " ex: --outtype f16 \"/models/7B/\" " echo " --quantize (-q): Optimize with quantization process ggml" echo " ex: \"/models/7B/ggml-model-f16.bin\" \"/models/7B/ggml-model-q4_0.bin\" 2" + echo " --finetune (-f): Run finetune command to create a lora finetune of the model" + echo " See documentation for finetune for command-line parameters" echo " --all-in-one (-a): Execute --convert & --quantize" echo " ex: \"/models/\" 7B" echo " --server (-s): Run a model on the server" diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index bc295d52d2d5d..a5090e398c1cc 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -143,6 +143,9 @@ jobs: cd build ctest --verbose + # TODO: build with LLAMA_NO_METAL because test-backend-ops fail on "Apple Paravirtual device" and I don't know + # how to debug it. + # ref: https://github.com/ggerganov/llama.cpp/actions/runs/7131777249/job/19420981052#step:5:1124 macOS-latest-make: runs-on: macos-latest @@ -160,14 +163,18 @@ jobs: - name: Build id: make_build run: | - make -j $(sysctl -n hw.logicalcpu) + LLAMA_NO_METAL=1 make -j $(sysctl -n hw.logicalcpu) - name: Test id: make_test run: | - make tests -j $(sysctl -n hw.logicalcpu) - make test -j $(sysctl -n hw.logicalcpu) + LLAMA_NO_METAL=1 make tests -j $(sysctl -n hw.logicalcpu) + LLAMA_NO_METAL=1 make test -j $(sysctl -n hw.logicalcpu) + # TODO: build with LLAMA_METAL=OFF because test-backend-ops fail on "Apple Paravirtual device" and I don't know + # how to debug it. + # ref: https://github.com/ggerganov/llama.cpp/actions/runs/7132125951/job/19422043567?pr=4359#step:5:6584 + # would be great if we fix these macOS-latest-cmake: runs-on: macos-latest @@ -188,7 +195,7 @@ jobs: sysctl -a mkdir build cd build - cmake .. + cmake -DLLAMA_METAL=OFF .. cmake --build . --config Release -j $(sysctl -n hw.logicalcpu) - name: Test @@ -498,6 +505,17 @@ jobs: path: | cudart-llama-bin-win-cu${{ matrix.cuda }}-x64.zip + ios-xcode-build: + runs-on: macos-latest + + steps: + - name: Checkout code + uses: actions/checkout@v3 + + - name: Build Xcode project + run: xcodebuild -project examples/llama.swiftui/llama.swiftui.xcodeproj -scheme llama.swiftui -sdk iphoneos CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY= -destination 'generic/platform=iOS' build + + # freeBSD-latest: # runs-on: macos-12 # steps: diff --git a/.github/workflows/python-lint.yml b/.github/workflows/python-lint.yml new file mode 100644 index 0000000000000..56d17b66cecf1 --- /dev/null +++ b/.github/workflows/python-lint.yml @@ -0,0 +1,20 @@ +name: flake8 Lint + +on: [push, pull_request] + +jobs: + flake8-lint: + runs-on: ubuntu-latest + name: Lint + steps: + - name: Check out source repository + uses: actions/checkout@v3 + - name: Set up Python environment + uses: actions/setup-python@v4 + with: + python-version: "3.11" + - name: flake8 Lint + uses: py-actions/flake8@v2 + with: + ignore: "E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704" + exclude: "examples/*,examples/*/**,*/**/__init__.py" diff --git a/.gitignore b/.gitignore index 708e8582e16c4..76b3d2861826e 100644 --- a/.gitignore +++ b/.gitignore @@ -47,6 +47,7 @@ models-mnt /libllama.so /llama-bench /llava-cli +/lookahead /main /metal /perplexity @@ -64,6 +65,7 @@ models-mnt /speculative /parallel /train-text-from-scratch +/tokenize /vdot /common/build-info.cpp arm_neon.h @@ -86,15 +88,17 @@ poetry.lock poetry.toml # Test binaries -tests/test-grammar-parser -tests/test-llama-grammar -tests/test-double-float -tests/test-grad0 -tests/test-opt -tests/test-quantize-fns -tests/test-quantize-perf -tests/test-sampling -tests/test-tokenizer-0-llama -tests/test-tokenizer-0-falcon -tests/test-tokenizer-1-llama -tests/test-tokenizer-1-bpe +/tests/test-grammar-parser +/tests/test-llama-grammar +/tests/test-double-float +/tests/test-grad0 +/tests/test-opt +/tests/test-quantize-fns +/tests/test-quantize-perf +/tests/test-sampling +/tests/test-tokenizer-0-llama +/tests/test-tokenizer-0-falcon +/tests/test-tokenizer-1-llama +/tests/test-tokenizer-1-bpe +/tests/test-rope +/tests/test-backend-ops diff --git a/CMakeLists.txt b/CMakeLists.txt index db1f42f1eda6a..eea4673d18496 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -43,6 +43,7 @@ else() endif() # general +option(BUILD_SHARED_LIBS "build shared libraries" OFF) option(LLAMA_STATIC "llama: static link libraries" OFF) option(LLAMA_NATIVE "llama: enable -march=native flag" ON) option(LLAMA_LTO "llama: enable link time optimization" OFF) @@ -96,9 +97,12 @@ option(LLAMA_METAL_NDEBUG "llama: disable Metal debugging" option(LLAMA_MPI "llama: use MPI" OFF) option(LLAMA_QKK_64 "llama: use super-block size of 64 for k-quants" OFF) -option(LLAMA_BUILD_TESTS "llama: build tests" ${LLAMA_STANDALONE}) -option(LLAMA_BUILD_EXAMPLES "llama: build examples" ${LLAMA_STANDALONE}) -option(LLAMA_BUILD_SERVER "llama: build server example" ON) +option(LLAMA_BUILD_TESTS "llama: build tests" ${LLAMA_STANDALONE}) +option(LLAMA_BUILD_EXAMPLES "llama: build examples" ${LLAMA_STANDALONE}) +option(LLAMA_BUILD_SERVER "llama: build server example" ON) + +# Required for relocatable CMake package +include(${CMAKE_CURRENT_SOURCE_DIR}/scripts/build-info.cmake) # # Compile flags @@ -112,6 +116,11 @@ set(THREADS_PREFER_PTHREAD_FLAG ON) find_package(Threads REQUIRED) include(CheckCXXCompilerFlag) +# enable libstdc++ assertions for debug builds +if (CMAKE_SYSTEM_NAME MATCHES "Linux") + add_compile_definitions($<$:_GLIBCXX_ASSERTIONS>) +endif() + if (NOT MSVC) if (LLAMA_SANITIZE_THREAD) add_compile_options(-fsanitize=thread) @@ -161,7 +170,7 @@ if (LLAMA_METAL) #add_compile_definitions(GGML_METAL_DIR_KERNELS="${CMAKE_CURRENT_SOURCE_DIR}/") # copy ggml-metal.metal to bin directory - configure_file(ggml-metal.metal bin/ggml-metal.metal COPYONLY) + configure_file(ggml-metal.metal ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/ggml-metal.metal COPYONLY) set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} ${FOUNDATION_LIBRARY} @@ -574,12 +583,21 @@ elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "^(x86_64|i686|AMD64)$" OR "${CMAKE_GE endif() elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64") message(STATUS "PowerPC detected") - add_compile_options(-mcpu=native -mtune=native) - #TODO: Add targets for Power8/Power9 (Altivec/VSX) and Power10(MMA) and query for big endian systems (ppc64/le/be) + if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64le") + add_compile_options(-mcpu=powerpc64le) + else() + add_compile_options(-mcpu=native -mtune=native) + #TODO: Add targets for Power8/Power9 (Altivec/VSX) and Power10(MMA) and query for big endian systems (ppc64/le/be) + endif() else() message(STATUS "Unknown architecture") endif() +if (MINGW) + # Target Windows 8 for PrefetchVirtualMemory + add_compile_definitions(_WIN32_WINNT=0x602) +endif() + # # POSIX conformance # @@ -649,11 +667,11 @@ add_library(ggml OBJECT ggml-backend.h ggml-quants.c ggml-quants.h - ${GGML_SOURCES_CUDA} ${GGML_HEADERS_CUDA} + ${GGML_SOURCES_CUDA} ${GGML_HEADERS_CUDA} ${GGML_SOURCES_OPENCL} ${GGML_HEADERS_OPENCL} - ${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL} - ${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI} - ${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA} + ${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL} + ${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI} + ${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA} ) target_include_directories(ggml PUBLIC . ${LLAMA_EXTRA_INCLUDES}) diff --git a/Makefile b/Makefile index 36d08811e32b6..b7afda2b570e5 100644 --- a/Makefile +++ b/Makefile @@ -2,13 +2,14 @@ BUILD_TARGETS = \ main quantize quantize-stats perplexity embedding vdot q8dot train-text-from-scratch convert-llama2c-to-ggml \ simple batched batched-bench save-load-state server gguf llama-bench libllava.a llava-cli baby-llama beam-search \ - speculative infill benchmark-matmult parallel finetune export-lora tests/test-c.o + speculative infill tokenize benchmark-matmult parallel finetune export-lora lookahead tests/test-c.o # Binaries only useful for tests TEST_TARGETS = \ tests/test-llama-grammar tests/test-grammar-parser tests/test-double-float tests/test-grad0 tests/test-opt \ tests/test-quantize-fns tests/test-quantize-perf tests/test-sampling tests/test-tokenizer-0-llama \ - tests/test-tokenizer-0-falcon tests/test-tokenizer-1-llama tests/test-tokenizer-1-bpe + tests/test-tokenizer-0-falcon tests/test-tokenizer-1-llama tests/test-tokenizer-1-bpe tests/test-rope \ + tests/test-backend-ops # Code coverage output files COV_TARGETS = *.gcno tests/*.gcno *.gcda tests/*.gcda *.gcov tests/*.gcov lcov-report gcovr-report @@ -30,7 +31,7 @@ ifeq '' '$(findstring clang,$(shell $(CC) --version))' CC_VER := $(shell $(CC) -dumpfullversion -dumpversion | awk -F. '{ printf("%02d%02d%02d", $$1, $$2, $$3) }') else CC_IS_CLANG=1 - ifeq '' '$(findstring Apple LLVM,$(shell $(CC) --version))' + ifeq '' '$(findstring Apple,$(shell $(CC) --version))' CC_IS_LLVM_CLANG=1 else CC_IS_APPLE_CLANG=1 @@ -174,6 +175,10 @@ ifdef LLAMA_DEBUG MK_CFLAGS += -O0 -g MK_CXXFLAGS += -O0 -g MK_LDFLAGS += -g + + ifeq ($(UNAME_S),Linux) + MK_CXXFLAGS += -Wp,-D_GLIBCXX_ASSERTIONS + endif else MK_CPPFLAGS += -DNDEBUG endif @@ -301,12 +306,15 @@ ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64)) #MK_CXXFLAGS += -mssse3 endif -# The stack is only 16-byte aligned on Windows, so don't let gcc emit aligned moves. -# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 -# https://github.com/ggerganov/llama.cpp/issues/2922 ifneq '' '$(findstring mingw,$(shell $(CC) -dumpmachine))' + # The stack is only 16-byte aligned on Windows, so don't let gcc emit aligned moves. + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 + # https://github.com/ggerganov/llama.cpp/issues/2922 MK_CFLAGS += -Xassembler -muse-unaligned-vector-move MK_CXXFLAGS += -Xassembler -muse-unaligned-vector-move + + # Target Windows 8 for PrefetchVirtualMemory + MK_CPPFLAGS += -D_WIN32_WINNT=0x602 endif ifneq ($(filter aarch64%,$(UNAME_M)),) @@ -342,6 +350,12 @@ ifneq ($(filter ppc64%,$(UNAME_M)),) endif endif +ifneq ($(filter ppc64le%,$(UNAME_M)),) + MK_CFLAGS += -mcpu=powerpc64le + MK_CXXFLAGS += -mcpu=powerpc64le + CUDA_POWER_ARCH = 1 +endif + else MK_CFLAGS += -march=rv64gcv -mabi=lp64d MK_CXXFLAGS += -march=rv64gcv -mabi=lp64d @@ -385,6 +399,11 @@ ifdef LLAMA_CUBLAS MK_LDFLAGS += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib OBJS += ggml-cuda.o NVCCFLAGS = --forward-unknown-to-host-compiler -use_fast_math + +ifdef LLAMA_DEBUG + NVCCFLAGS += -lineinfo +endif + ifdef LLAMA_CUDA_NVCC NVCC = $(LLAMA_CUDA_NVCC) else @@ -392,6 +411,8 @@ else endif #LLAMA_CUDA_NVCC ifdef CUDA_DOCKER_ARCH NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH) +else ifdef CUDA_POWER_ARCH + NVCCFLAGS += else NVCCFLAGS += -arch=native endif # CUDA_DOCKER_ARCH @@ -586,6 +607,9 @@ infill: examples/infill/infill.cpp ggml.o llama.o $(C simple: examples/simple/simple.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) +tokenize: examples/tokenize/tokenize.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) + $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) + batched: examples/batched/batched.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) @@ -637,7 +661,7 @@ beam-search: examples/beam-search/beam-search.cpp ggml.o llama.o $(COMMON_DEPS) finetune: examples/finetune/finetune.cpp ggml.o llama.o $(COMMON_DEPS) train.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -export-lora: examples/export-lora/export-lora.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +export-lora: examples/export-lora/export-lora.cpp ggml.o common/common.h $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) speculative: examples/speculative/speculative.cpp ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS) @@ -646,6 +670,9 @@ speculative: examples/speculative/speculative.cpp ggml.o llama.o $(COMMON_DEPS) parallel: examples/parallel/parallel.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) +lookahead: examples/lookahead/lookahead.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) + $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) + ifdef LLAMA_METAL metal: examples/metal/metal.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS) @@ -687,41 +714,47 @@ vdot: pocs/vdot/vdot.cpp ggml.o $(OBJS) q8dot: pocs/vdot/q8dot.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $^ -o $@ $(LDFLAGS) -tests/test-llama-grammar: tests/test-llama-grammar.cpp ggml.o $(COMMON_DEPS) grammar-parser.o $(OBJS) +tests/test-llama-grammar: tests/test-llama-grammar.cpp ggml.o grammar-parser.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-grammar-parser: tests/test-grammar-parser.cpp ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS) +tests/test-grammar-parser: tests/test-grammar-parser.cpp ggml.o llama.o grammar-parser.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-double-float: tests/test-double-float.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-double-float: tests/test-double-float.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-grad0: tests/test-grad0.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-grad0: tests/test-grad0.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-opt: tests/test-opt.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-opt: tests/test-opt.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-quantize-fns: tests/test-quantize-fns.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-quantize-fns: tests/test-quantize-fns.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-quantize-perf: tests/test-quantize-perf.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-quantize-perf: tests/test-quantize-perf.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-sampling: tests/test-sampling.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-sampling: tests/test-sampling.cpp ggml.o llama.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-tokenizer-0-falcon: tests/test-tokenizer-0-falcon.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-tokenizer-0-falcon: tests/test-tokenizer-0-falcon.cpp ggml.o llama.o $(COMMON_DEPS) console.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-tokenizer-0-llama: tests/test-tokenizer-0-llama.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-tokenizer-0-llama: tests/test-tokenizer-0-llama.cpp ggml.o llama.o $(COMMON_DEPS) console.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-tokenizer-1-bpe: tests/test-tokenizer-1-bpe.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-tokenizer-1-bpe: tests/test-tokenizer-1-bpe.cpp ggml.o llama.o $(COMMON_DEPS) console.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) -tests/test-tokenizer-1-llama: tests/test-tokenizer-1-llama.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS) +tests/test-tokenizer-1-llama: tests/test-tokenizer-1-llama.cpp ggml.o llama.o $(COMMON_DEPS) console.o $(OBJS) + $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) + +tests/test-rope: tests/test-rope.cpp ggml.o $(OBJS) $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) tests/test-c.o: tests/test-c.c llama.h $(CC) $(CFLAGS) -c $(filter-out %.h,$^) -o $@ + +tests/test-backend-ops: tests/test-backend-ops.cpp ggml.o $(OBJS) + $(CXX) $(CXXFLAGS) $(filter-out %.h,$^) -o $@ $(LDFLAGS) diff --git a/Package.swift b/Package.swift index 5b3bd72cafe19..18d610d6941d2 100644 --- a/Package.swift +++ b/Package.swift @@ -2,33 +2,14 @@ import PackageDescription -#if arch(arm) || arch(arm64) -let platforms: [SupportedPlatform]? = [ - .macOS(.v12), - .iOS(.v14), - .watchOS(.v4), - .tvOS(.v14) -] -let exclude: [String] = [] -let resources: [Resource] = [ - .process("ggml-metal.metal") -] -let additionalSources: [String] = ["ggml-metal.m"] -let additionalSettings: [CSetting] = [ - .unsafeFlags(["-fno-objc-arc"]), - .define("GGML_USE_METAL") -] -#else -let platforms: [SupportedPlatform]? = nil -let exclude: [String] = ["ggml-metal.metal"] -let resources: [Resource] = [] -let additionalSources: [String] = [] -let additionalSettings: [CSetting] = [] -#endif - let package = Package( name: "llama", - platforms: platforms, + platforms: [ + .macOS(.v12), + .iOS(.v14), + .watchOS(.v4), + .tvOS(.v14) + ], products: [ .library(name: "llama", targets: ["llama"]), ], @@ -36,25 +17,30 @@ let package = Package( .target( name: "llama", path: ".", - exclude: exclude, + exclude: [], sources: [ "ggml.c", "llama.cpp", "ggml-alloc.c", "ggml-backend.c", "ggml-quants.c", - ] + additionalSources, - resources: resources, + "ggml-metal.m", + ], + resources: [ + .process("ggml-metal.metal") + ], publicHeadersPath: "spm-headers", cSettings: [ .unsafeFlags(["-Wno-shorten-64-to-32", "-O3", "-DNDEBUG"]), - .define("GGML_USE_ACCELERATE") + .define("GGML_USE_ACCELERATE"), + .unsafeFlags(["-fno-objc-arc"]), + .define("GGML_USE_METAL"), // NOTE: NEW_LAPACK will required iOS version 16.4+ // We should consider add this in the future when we drop support for iOS 14 // (ref: ref: https://developer.apple.com/documentation/accelerate/1513264-cblas_sgemm?language=objc) // .define("ACCELERATE_NEW_LAPACK"), // .define("ACCELERATE_LAPACK_ILP64") - ] + additionalSettings, + ], linkerSettings: [ .linkedFramework("Accelerate") ] diff --git a/README.md b/README.md index 4de06476569f9..ce026b8d1d851 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,10 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++ ### Hot topics -- *No hot topics atm. Open to suggestions about what is hot today* +- **llama.h API change for handling KV cache offloading and data type: https://github.com/ggerganov/llama.cpp/pull/4309** +- Using `llama.cpp` with AWS instances: https://github.com/ggerganov/llama.cpp/discussions/4225 +- Looking for contributions to improve and maintain the `server` example: https://github.com/ggerganov/llama.cpp/issues/4216 +- Collecting Apple Silicon performance stats: https://github.com/ggerganov/llama.cpp/discussions/4167 ---- @@ -114,6 +117,8 @@ as the main playground for developing new features for the [ggml](https://github - [nat/openplayground](https://github.com/nat/openplayground) - [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) - [withcatai/catai](https://github.com/withcatai/catai) +- [semperai/amica](https://github.com/semperai/amica) +- [psugihara/FreeChat](https://github.com/psugihara/FreeChat) --- @@ -320,7 +325,7 @@ mpirun -hostfile hostfile -n 3 ./main -m ./models/7B/ggml-model-q4_0.gguf -n 128 ### BLAS Build -Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance. There are currently three different implementations of it: +Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). Support with CPU-only BLAS implementations doesn't affect the normal generation performance. We may see generation performance improvements with GPU-involved BLAS implementations, e.g. cuBLAS, hipBLAS and CLBlast. There are currently several different BLAS implementations available for build and use: - #### Accelerate Framework: @@ -410,19 +415,28 @@ Building the program with BLAS support may lead to some performance improvements This provides BLAS acceleration on HIP-supported AMD GPUs. Make sure to have ROCm installed. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html). - Windows support is coming soon... - Using `make`: ```bash make LLAMA_HIPBLAS=1 ``` - - Using `CMake`: + - Using `CMake` for Linux: ```bash mkdir build cd build CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DLLAMA_HIPBLAS=ON cmake --build . ``` + - Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS): + ```bash + set PATH=%HIP_PATH%\bin;%PATH% + mkdir build + cd build + cmake -G Ninja -DAMDGPU_TARGETS=gfx1100 -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ .. + cmake --build . + ``` + Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors) + The environment variable [`HIP_VISIBLE_DEVICES`](https://rocm.docs.amd.com/en/latest/understand/gpu_isolation.html#hip-visible-devices) can be used to specify which GPU(s) will be used. If your GPU is not officially supported you can use the environment variable [`HSA_OVERRIDE_GFX_VERSION`] set to a similar GPU, for example 10.3.0 on RDNA2 or 11.0.0 on RDNA3. @@ -883,7 +897,7 @@ Additionally, there the following images, similar to the above: - `ghcr.io/ggerganov/llama.cpp:full-rocm`: Same as `full` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`) - `ghcr.io/ggerganov/llama.cpp:light-rocm`: Same as `light` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`) -The GPU enabled images are not currently tested by CI beyond being built. They are not built with any variation from the ones in the Dockerfiles defined in [.devops/](.devops/) and the Gitlab Action defined in [.github/workflows/docker.yml](.github/workflows/docker.yml). If you need different settings (for example, a different CUDA or ROCm library, you'll need to build the images locally for now). +The GPU enabled images are not currently tested by CI beyond being built. They are not built with any variation from the ones in the Dockerfiles defined in [.devops/](.devops/) and the GitHub Action defined in [.github/workflows/docker.yml](.github/workflows/docker.yml). If you need different settings (for example, a different CUDA or ROCm library, you'll need to build the images locally for now). #### Usage diff --git a/common/CMakeLists.txt b/common/CMakeLists.txt index 4f930bdc59059..b5d5453d2d357 100644 --- a/common/CMakeLists.txt +++ b/common/CMakeLists.txt @@ -11,7 +11,12 @@ if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/../.git") if(NOT IS_DIRECTORY "${GIT_DIR}") file(READ ${GIT_DIR} REAL_GIT_DIR_LINK) string(REGEX REPLACE "gitdir: (.*)\n$" "\\1" REAL_GIT_DIR ${REAL_GIT_DIR_LINK}) - set(GIT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../${REAL_GIT_DIR}") + string(FIND "${REAL_GIT_DIR}" "/" SLASH_POS) + if (SLASH_POS EQUAL 0) + set(GIT_DIR "${REAL_GIT_DIR}") + else() + set(GIT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../${REAL_GIT_DIR}") + endif() endif() set(GIT_INDEX "${GIT_DIR}/index") @@ -26,7 +31,7 @@ add_custom_command( COMMENT "Generating build details from Git" COMMAND ${CMAKE_COMMAND} -DMSVC=${MSVC} -DCMAKE_C_COMPILER_VERSION=${CMAKE_C_COMPILER_VERSION} -DCMAKE_C_COMPILER_ID=${CMAKE_C_COMPILER_ID} -DCMAKE_VS_PLATFORM_NAME=${CMAKE_VS_PLATFORM_NAME} - -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER} -P "${CMAKE_CURRENT_SOURCE_DIR}/../scripts/build-info.cmake" + -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER} -P "${CMAKE_CURRENT_SOURCE_DIR}/../scripts/gen-build-info-cpp.cmake" WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/.." DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/build-info.cpp.in" ${GIT_INDEX} VERBATIM diff --git a/common/common.cpp b/common/common.cpp index 6a711420004b4..4a61ae5937f64 100644 --- a/common/common.cpp +++ b/common/common.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -277,8 +278,18 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) { break; } params.yarn_beta_slow = std::stof(argv[i]); - } else if (arg == "--memory-f32") { - params.memory_f16 = false; + } else if (arg == "--samplers") { + if (++i >= argc) { + invalid_param = true; + break; + } + sparams.samplers_sequence = parse_samplers_input(argv[i]); + } else if (arg == "--sampling-seq") { + if (++i >= argc) { + invalid_param = true; + break; + } + sparams.samplers_sequence = argv[i]; } else if (arg == "--top-p") { if (++i >= argc) { invalid_param = true; @@ -491,8 +502,18 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) { params.interactive_first = true; } else if (arg == "-ins" || arg == "--instruct") { params.instruct = true; + } else if (arg == "-cml" || arg == "--chatml") { + params.chatml = true; } else if (arg == "--infill") { params.infill = true; + } else if (arg == "-dkvc" || arg == "--dump-kv-cache") { + params.dump_kv_cache = true; + } else if (arg == "-nkvo" || arg == "--no-kv-offload") { + params.no_kv_offload = true; + } else if (arg == "-ctk" || arg == "--cache-type-k") { + params.cache_type_k = argv[++i]; + } else if (arg == "-ctv" || arg == "--cache-type-v") { + params.cache_type_v = argv[++i]; } else if (arg == "--multiline-input") { params.multiline_input = true; } else if (arg == "--simple-io") { @@ -673,6 +694,47 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) { std::istreambuf_iterator(), std::back_inserter(sparams.grammar) ); + } else if (arg == "--override-kv") { + if (++i >= argc) { + invalid_param = true; + break; + } + char * sep = strchr(argv[i], '='); + if (sep == nullptr || sep - argv[i] >= 128) { + fprintf(stderr, "error: Malformed KV override: %s\n", argv[i]); + invalid_param = true; + break; + } + struct llama_model_kv_override kvo; + std::strncpy(kvo.key, argv[i], sep - argv[i]); + kvo.key[sep - argv[i]] = 0; + sep++; + if (strncmp(sep, "int:", 4) == 0) { + sep += 4; + kvo.tag = LLAMA_KV_OVERRIDE_INT; + kvo.int_value = std::atol(sep); + } else if (strncmp(sep, "float:", 6) == 0) { + sep += 6; + kvo.tag = LLAMA_KV_OVERRIDE_FLOAT; + kvo.float_value = std::atof(sep); + } else if (strncmp(sep, "bool:", 5) == 0) { + sep += 5; + kvo.tag = LLAMA_KV_OVERRIDE_BOOL; + if (std::strcmp(sep, "true") == 0) { + kvo.bool_value = true; + } else if (std::strcmp(sep, "false") == 0) { + kvo.bool_value = false; + } else { + fprintf(stderr, "error: Invalid boolean value for KV override: %s\n", argv[i]); + invalid_param = true; + break; + } + } else { + fprintf(stderr, "error: Invalid type for KV override: %s\n", argv[i]); + invalid_param = true; + break; + } + params.kv_overrides.push_back(kvo); #ifndef LOG_DISABLE_LOGS // Parse args for logging parameters } else if ( log_param_single_parse( argv[i] ) ) { @@ -716,6 +778,11 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) { } } + if (!params.kv_overrides.empty()) { + params.kv_overrides.emplace_back(llama_model_kv_override()); + params.kv_overrides.back().key[0] = 0; + } + return true; } @@ -730,6 +797,7 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { printf(" -i, --interactive run in interactive mode\n"); printf(" --interactive-first run in interactive mode and wait for input right away\n"); printf(" -ins, --instruct run in instruction mode (use with Alpaca models)\n"); + printf(" -cml, --chatml run in chatml mode (use with ChatML-compatible models)\n"); printf(" --multiline-input allows you to write or paste multiple lines without ending each in '\\'\n"); printf(" -r PROMPT, --reverse-prompt PROMPT\n"); printf(" halt generation at PROMPT, return control in interactive mode\n"); @@ -755,6 +823,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { printf(" -n N, --n-predict N number of tokens to predict (default: %d, -1 = infinity, -2 = until context filled)\n", params.n_predict); printf(" -c N, --ctx-size N size of the prompt context (default: %d, 0 = loaded from model)\n", params.n_ctx); printf(" -b N, --batch-size N batch size for prompt processing (default: %d)\n", params.n_batch); + printf(" --samplers samplers that will be used for generation in the order, separated by \';\', for example: \"top_k;tfs;typical;top_p;min_p;temp\"\n"); + printf(" --sampling-seq simplified sequence for samplers that will be used (default: %s)\n", sparams.samplers_sequence.c_str()); printf(" --top-k N top-k sampling (default: %d, 0 = disabled)\n", sparams.top_k); printf(" --top-p N top-p sampling (default: %.1f, 1.0 = disabled)\n", (double)sparams.top_p); printf(" --min-p N min-p sampling (default: %.1f, 0.0 = disabled)\n", (double)sparams.min_p); @@ -792,8 +862,6 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { printf(" --yarn-beta-fast N YaRN: low correction dim or beta (default: %.1f)\n", params.yarn_beta_fast); printf(" --ignore-eos ignore end of stream token and continue generating (implies --logit-bias 2-inf)\n"); printf(" --no-penalize-nl do not penalize newline token\n"); - printf(" --memory-f32 use f32 instead of f16 for memory key+value (default: disabled)\n"); - printf(" not recommended: doubles context memory required and no measurable increase in quality\n"); printf(" --temp N temperature (default: %.1f)\n", (double)sparams.temp); printf(" --logits-all return logits for all tokens in the batch (default: disabled)\n"); printf(" --hellaswag compute HellaSwag score over random tasks from datafile supplied with -f\n"); @@ -832,6 +900,14 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { #endif // GGML_USE_CUBLAS #endif printf(" --verbose-prompt print prompt before generation\n"); + printf(" -dkvc, --dump-kv-cache\n"); + printf(" verbose print of the KV cache\n"); + printf(" -nkvo, --no-kv-offload\n"); + printf(" disable KV offload\n"); + printf(" -ctk TYPE, --cache-type-k TYPE\n"); + printf(" KV cache data type for K (default: %s)\n", params.cache_type_k.c_str()); + printf(" -ctv TYPE, --cache-type-v TYPE\n"); + printf(" KV cache data type for V (default: %s)\n", params.cache_type_v.c_str()); printf(" --simple-io use basic IO for better compatibility in subprocesses and limited consoles\n"); printf(" --lora FNAME apply LoRA adapter (implies --no-mmap)\n"); printf(" --lora-scaled FNAME S apply LoRA adapter with user defined scaling S (implies --no-mmap)\n"); @@ -842,6 +918,9 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) { printf(" draft model for speculative decoding (default: %s)\n", params.model.c_str()); printf(" -ld LOGDIR, --logdir LOGDIR\n"); printf(" path under which to save YAML logs (no logging if unset)\n"); + printf(" --override-kv KEY=TYPE:VALUE\n"); + printf(" advanced option to override model metadata by key. may be specified multiple times.\n"); + printf(" types: int, float, bool. example: --override-kv tokenizer.ggml.add_bos_token=bool:false\n"); printf("\n"); #ifndef LOG_DISABLE_LOGS log_print_usage(); @@ -878,6 +957,48 @@ std::string gpt_random_prompt(std::mt19937 & rng) { GGML_UNREACHABLE(); } +// +// String parsing +// + +std::string parse_samplers_input(std::string input) { + std::string output = ""; + // since samplers names are written multiple ways + // make it ready for both system names and input names + std::unordered_map samplers_symbols { + {"top_k", 'k'}, + {"top-k", 'k'}, + {"top_p", 'p'}, + {"top-p", 'p'}, + {"nucleus", 'p'}, + {"typical_p", 'y'}, + {"typical-p", 'y'}, + {"typical", 'y'}, + {"min_p", 'm'}, + {"min-p", 'm'}, + {"tfs_z", 'f'}, + {"tfs-z", 'f'}, + {"tfs", 'f'}, + {"temp", 't'}, + {"temperature",'t'} + }; + // expected format example: "temp;top_k;tfs_z;typical_p;top_p;min_p" + size_t separator = input.find(';'); + while (separator != input.npos) { + std::string name = input.substr(0,separator); + input = input.substr(separator+1); + separator = input.find(';'); + + if (samplers_symbols.find(name) != samplers_symbols.end()) { + output += samplers_symbols[name]; + } + } + if (samplers_symbols.find(input) != samplers_symbols.end()) { + output += samplers_symbols[input]; + } + return output; +} + // // Model utils // @@ -892,10 +1013,39 @@ struct llama_model_params llama_model_params_from_gpt_params(const gpt_params & mparams.tensor_split = params.tensor_split; mparams.use_mmap = params.use_mmap; mparams.use_mlock = params.use_mlock; + if (params.kv_overrides.empty()) { + mparams.kv_overrides = NULL; + } else { + GGML_ASSERT(params.kv_overrides.back().key[0] == 0 && "KV overrides not terminated with empty key"); + mparams.kv_overrides = params.kv_overrides.data(); + } return mparams; } +static ggml_type kv_cache_type_from_str(const std::string & s) { + if (s == "f16") { + return GGML_TYPE_F16; + } + if (s == "q8_0") { + return GGML_TYPE_Q8_0; + } + if (s == "q4_0") { + return GGML_TYPE_Q4_0; + } + if (s == "q4_1") { + return GGML_TYPE_Q4_1; + } + if (s == "q5_0") { + return GGML_TYPE_Q5_0; + } + if (s == "q5_1") { + return GGML_TYPE_Q5_1; + } + + throw std::runtime_error("Invalid cache type: " + s); +} + struct llama_context_params llama_context_params_from_gpt_params(const gpt_params & params) { auto cparams = llama_context_default_params(); @@ -905,7 +1055,6 @@ struct llama_context_params llama_context_params_from_gpt_params(const gpt_param cparams.n_threads_batch = params.n_threads_batch == -1 ? params.n_threads : params.n_threads_batch; cparams.mul_mat_q = params.mul_mat_q; cparams.seed = params.seed; - cparams.f16_kv = params.memory_f16; cparams.logits_all = params.logits_all; cparams.embedding = params.embedding; cparams.rope_scaling_type = params.rope_scaling_type; @@ -916,6 +1065,10 @@ struct llama_context_params llama_context_params_from_gpt_params(const gpt_param cparams.yarn_beta_fast = params.yarn_beta_fast; cparams.yarn_beta_slow = params.yarn_beta_slow; cparams.yarn_orig_ctx = params.yarn_orig_ctx; + cparams.offload_kqv = !params.no_kv_offload; + + cparams.type_k = kv_cache_type_from_str(params.cache_type_k); + cparams.type_v = kv_cache_type_from_str(params.cache_type_v); return cparams; } @@ -931,7 +1084,7 @@ void llama_batch_add( const std::vector & seq_ids, bool logits) { batch.token [batch.n_tokens] = id; - batch.pos [batch.n_tokens] = pos, + batch.pos [batch.n_tokens] = pos; batch.n_seq_id[batch.n_tokens] = seq_ids.size(); for (size_t i = 0; i < seq_ids.size(); ++i) { batch.seq_id[batch.n_tokens][i] = seq_ids[i]; @@ -1072,6 +1225,12 @@ std::string llama_detokenize_bpe(llama_context * ctx, const std::vector= 0) { seq_count++; } + } + putchar(slot_chars[std::min(sizeof(slot_chars) - 2, size_t(seq_count))]); + } + + printf("\n=== Done dumping\n"); +} + +void dump_kv_cache_view_seqs(const llama_kv_cache_view & view, int row_size) { + static const char slot_chars[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; + + printf("=== Dumping KV cache. total cells %d, max sequences per cell %d, populated cells %d, total tokens in cache %d, largest empty slot=%d @ %d\n", + view.n_cells, view.n_max_seq, view.used_cells, view.token_count, view.max_contiguous, view.max_contiguous_idx); + + std::unordered_map seqs; + llama_kv_cache_view_cell * c_curr = view.cells; + llama_seq_id * cs_curr = view.cells_sequences; + + for (int i = 0; i < view.n_cells; i++, c_curr++, cs_curr += view.n_max_seq) { + for (int j = 0; j < view.n_max_seq; j++) { + if (cs_curr[j] < 0) { continue; } + if (seqs.find(cs_curr[j]) == seqs.end()) { + if (seqs.size() + 1 >= sizeof(slot_chars)) { break; } + seqs[cs_curr[j]] = seqs.size(); + } + } + if (seqs.size() + 1 >= sizeof(slot_chars)) { break; } + } + + printf("=== Sequence legend: "); + for (const auto & it : seqs) { + printf("%zu=%d, ", it.second, it.first); + } + printf("'+'=other sequence ids"); + + c_curr = view.cells; + cs_curr = view.cells_sequences; + for (int i = 0; i < view.n_cells; i++, c_curr++, cs_curr += view.n_max_seq) { + if (i % row_size == 0) { + printf("\n%5d: ", i); + } + for (int j = 0; j < view.n_max_seq; j++) { + if (cs_curr[j] >= 0) { + const auto & it = seqs.find(cs_curr[j]); + putchar(it != seqs.end() ? int(slot_chars[it->second]) : '+'); + } else { + putchar('.'); + } + } + putchar(' '); + } + + printf("\n=== Done dumping\n"); +} diff --git a/common/common.h b/common/common.h index dd6b002eb94ba..e87ce113398b3 100644 --- a/common/common.h +++ b/common/common.h @@ -86,6 +86,8 @@ struct gpt_params { std::vector antiprompt; // string upon seeing which more user input is prompted std::string logdir = ""; // directory in which to save YAML log files + std::vector kv_overrides; + // TODO: avoid tuple, use struct std::vector> lora_adapter; // lora adapter path with user defined scale std::string lora_base = ""; // base model path for the lora adapter @@ -98,10 +100,10 @@ struct gpt_params { size_t hellaswag_tasks = 400; // number of tasks to use when computing the HellaSwag score bool mul_mat_q = true; // if true, use mul_mat_q kernels instead of cuBLAS - bool memory_f16 = true; // use f16 instead of f32 for memory kv bool random_prompt = false; // do not randomize prompt if none provided bool use_color = false; // use color to distinguish generations and inputs bool interactive = false; // interactive mode + bool chatml = false; // chatml mode (used for models trained on chatml syntax) bool prompt_cache_all = false; // save user input and generations to prompt cache bool prompt_cache_ro = false; // open the prompt cache read-only and do not update it @@ -121,10 +123,15 @@ struct gpt_params { bool numa = false; // attempt optimizations that help on some NUMA systems bool verbose_prompt = false; // print prompt tokens before generation bool infill = false; // use infill mode + bool dump_kv_cache = false; // dump the KV cache contents for debugging purposes + bool no_kv_offload = false; // disable KV offloading + + std::string cache_type_k = "f16"; // KV cache data type for the K + std::string cache_type_v = "f16"; // KV cache data type for the V // multimodal models (see examples/llava) std::string mmproj = ""; // path to multimodal projector - std::string image = ""; // path to an image file + std::string image = ""; // path to an image file }; bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params); @@ -139,6 +146,12 @@ std::string gpt_random_prompt(std::mt19937 & rng); void process_escapes(std::string& input); +// +// String parsing +// + +std::string parse_samplers_input(std::string input); + // // Model utils // @@ -200,6 +213,10 @@ std::string llama_detokenize_bpe( llama_context * ctx, const std::vector & tokens); +// Uses the value from the model metadata if possible, otherwise +// defaults to true when model type is SPM, otherwise false. +bool llama_should_add_bos_token(const llama_model * model); + // // YAML utils // @@ -213,3 +230,13 @@ std::string get_sortable_timestamp(); void dump_non_result_info_yaml( FILE * stream, const gpt_params & params, const llama_context * lctx, const std::string & timestamp, const std::vector & prompt_tokens, const char * model_desc); + +// +// KV cache utils +// + +// Dump the KV cache view with the number of sequences per cell. +void dump_kv_cache_view(const llama_kv_cache_view & view, int row_size = 80); + +// Dump the KV cache view showing individual sequences in each cell (long output). +void dump_kv_cache_view_seqs(const llama_kv_cache_view & view, int row_size = 40); diff --git a/common/grammar-parser.cpp b/common/grammar-parser.cpp index ff51cc8034c8b..bf89a96f3617f 100644 --- a/common/grammar-parser.cpp +++ b/common/grammar-parser.cpp @@ -190,7 +190,7 @@ namespace grammar_parser { pos = parse_space(pos + 1, is_nested); } else if (*pos == '*' || *pos == '+' || *pos == '?') { // repetition operator if (last_sym_start == out_elements.size()) { - throw std::runtime_error(std::string("expecting preceeding item to */+/? at ") + pos); + throw std::runtime_error(std::string("expecting preceding item to */+/? at ") + pos); } // apply transformation to previous symbol (last_sym_start to end) according to diff --git a/common/log.h b/common/log.h index c0e814861e0c6..e4e1b9f4f01aa 100644 --- a/common/log.h +++ b/common/log.h @@ -61,13 +61,13 @@ // #define LOG_TARGET stderr // #include "log.h" // -// The log target can also be redirected to a diffrent function +// The log target can also be redirected to a different function // like so: // -// #define LOG_TARGET log_handler_diffrent() +// #define LOG_TARGET log_handler_different() // #include "log.h" // -// FILE* log_handler_diffrent() +// FILE* log_handler_different() // { // return stderr; // } @@ -421,7 +421,7 @@ inline FILE *log_handler2_impl(bool change = false, LogTriState append = LogTriS // Disables logs entirely at runtime. // Makes LOG() and LOG_TEE() produce no output, -// untill enabled back. +// until enabled back. #define log_disable() log_disable_impl() // INTERNAL, DO NOT USE diff --git a/common/sampling.cpp b/common/sampling.cpp index 1317024c2c11c..f4e76df31bee3 100644 --- a/common/sampling.cpp +++ b/common/sampling.cpp @@ -99,6 +99,56 @@ std::string llama_sampling_print(const llama_sampling_params & params) { return std::string(result); } +std::string llama_sampling_order_print(const llama_sampling_params & params) { + std::string result = "CFG -> Penalties "; + if (params.mirostat == 0) { + for (auto s : params.samplers_sequence) { + switch (s) { + case 'k': result += "-> top_k "; break; + case 'f': result += "-> tfs_z "; break; + case 'y': result += "-> typical_p "; break; + case 'p': result += "-> top_p "; break; + case 'm': result += "-> min_p "; break; + case 't': result += "-> temp "; break; + default : break; + } + } + } else { + result += "-> mirostat "; + } + + return result; +} + +// no reasons to expose this function in header +static void sampler_queue( + struct llama_context * ctx_main, + const llama_sampling_params & params, + llama_token_data_array & cur_p, + size_t & min_keep) { + const int n_vocab = llama_n_vocab(llama_get_model(ctx_main)); + + const float temp = params.temp; + const int32_t top_k = params.top_k <= 0 ? n_vocab : params.top_k; + const float top_p = params.top_p; + const float min_p = params.min_p; + const float tfs_z = params.tfs_z; + const float typical_p = params.typical_p; + const std::string & samplers_sequence = params.samplers_sequence; + + for (auto s : samplers_sequence) { + switch (s){ + case 'k': llama_sample_top_k (ctx_main, &cur_p, top_k, min_keep); break; + case 'f': llama_sample_tail_free(ctx_main, &cur_p, tfs_z, min_keep); break; + case 'y': llama_sample_typical (ctx_main, &cur_p, typical_p, min_keep); break; + case 'p': llama_sample_top_p (ctx_main, &cur_p, top_p, min_keep); break; + case 'm': llama_sample_min_p (ctx_main, &cur_p, min_p, min_keep); break; + case 't': llama_sample_temp (ctx_main, &cur_p, temp); break; + default : break; + } + } +} + llama_token llama_sampling_sample( struct llama_sampling_context * ctx_sampling, struct llama_context * ctx_main, @@ -109,11 +159,6 @@ llama_token llama_sampling_sample( const int n_vocab = llama_n_vocab(llama_get_model(ctx_main)); const float temp = params.temp; - const int32_t top_k = params.top_k <= 0 ? n_vocab : params.top_k; - const float top_p = params.top_p; - const float min_p = params.min_p; - const float tfs_z = params.tfs_z; - const float typical_p = params.typical_p; const int32_t penalty_last_n = params.penalty_last_n < 0 ? params.n_prev : params.penalty_last_n; const float penalty_repeat = params.penalty_repeat; const float penalty_freq = params.penalty_freq; @@ -188,12 +233,7 @@ llama_token llama_sampling_sample( // temperature sampling size_t min_keep = std::max(1, params.n_probs); - llama_sample_top_k (ctx_main, &cur_p, top_k, min_keep); - llama_sample_tail_free(ctx_main, &cur_p, tfs_z, min_keep); - llama_sample_typical (ctx_main, &cur_p, typical_p, min_keep); - llama_sample_top_p (ctx_main, &cur_p, top_p, min_keep); - llama_sample_min_p (ctx_main, &cur_p, min_p, min_keep); - llama_sample_temp (ctx_main, &cur_p, temp); + sampler_queue(ctx_main, params, cur_p, min_keep); id = llama_sample_token(ctx_main, &cur_p); diff --git a/common/sampling.h b/common/sampling.h index 7c9b8dcf23bcb..fdfa9eed1467b 100644 --- a/common/sampling.h +++ b/common/sampling.h @@ -10,22 +10,23 @@ // sampling parameters typedef struct llama_sampling_params { - int32_t n_prev = 64; // number of previous tokens to remember - int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens. - int32_t top_k = 40; // <= 0 to use vocab size - float top_p = 0.95f; // 1.0 = disabled - float min_p = 0.05f; // 0.0 = disabled - float tfs_z = 1.00f; // 1.0 = disabled - float typical_p = 1.00f; // 1.0 = disabled - float temp = 0.80f; // 1.0 = disabled - int32_t penalty_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size) - float penalty_repeat = 1.10f; // 1.0 = disabled - float penalty_freq = 0.00f; // 0.0 = disabled - float penalty_present = 0.00f; // 0.0 = disabled - int32_t mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0 - float mirostat_tau = 5.00f; // target entropy - float mirostat_eta = 0.10f; // learning rate - bool penalize_nl = true; // consider newlines as a repeatable token + int32_t n_prev = 64; // number of previous tokens to remember + int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens. + int32_t top_k = 40; // <= 0 to use vocab size + float top_p = 0.95f; // 1.0 = disabled + float min_p = 0.05f; // 0.0 = disabled + float tfs_z = 1.00f; // 1.0 = disabled + float typical_p = 1.00f; // 1.0 = disabled + float temp = 0.80f; // 1.0 = disabled + int32_t penalty_last_n = 64; // last n tokens to penalize (0 = disable penalty, -1 = context size) + float penalty_repeat = 1.10f; // 1.0 = disabled + float penalty_freq = 0.00f; // 0.0 = disabled + float penalty_present = 0.00f; // 0.0 = disabled + int32_t mirostat = 0; // 0 = disabled, 1 = mirostat, 2 = mirostat 2.0 + float mirostat_tau = 5.00f; // target entropy + float mirostat_eta = 0.10f; // learning rate + bool penalize_nl = true; // consider newlines as a repeatable token + std::string samplers_sequence = "kfypmt"; // top_k, tail_free, typical_p, top_p, min_p, temp std::string grammar; // optional BNF-like grammar to constrain sampling @@ -80,6 +81,9 @@ std::string llama_sampling_prev_str(llama_sampling_context * ctx_sampling, llama // Print sampling parameters into a string std::string llama_sampling_print(const llama_sampling_params & params); +// Print sampling order into a string +std::string llama_sampling_order_print(const llama_sampling_params & params); + // this is a common sampling function used across the examples for convenience // it can serve as a starting point for implementing your own sampling function // Note: When using multiple sequences, it is the caller's responsibility to call diff --git a/common/train.cpp b/common/train.cpp index 964b156b5abe4..773e2c59cc669 100644 --- a/common/train.cpp +++ b/common/train.cpp @@ -1136,6 +1136,7 @@ void print_common_train_usage(int /*argc*/, char ** /*argv*/, const struct train fprintf(stderr, " --adam-beta2 N AdamW beta2 in interval [0,1). How much to smooth the second moment of gradients. (default %f)\n", params->adam_beta2); fprintf(stderr, " --adam-gclip N AdamW gradient clipping. Disabled when zero. (default %f)\n", params->adam_gclip); fprintf(stderr, " --adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero. (default %f)\n", params->adam_eps_f); + fprintf(stderr, " -ngl N, --n-gpu-layers N Number of model layers to offload to GPU (default %d)", params->n_gpu_layers); fprintf(stderr, "\n"); } @@ -1355,6 +1356,17 @@ bool consume_common_train_arg( return true; } params->adam_gclip = std::stof(argv[i]); + } else if (arg == "-ngl" || arg == "--n-gpu-layers") { + if (++i >= argc) { + *invalid_param = true; + return true; + } +#ifdef LLAMA_SUPPORTS_GPU_OFFLOAD + params->n_gpu_layers = std::stoi(argv[i]); +#else + fprintf(stderr, "warning: not compiled with GPU offload support, --n-gpu-layers option will be ignored\n"); + fprintf(stderr, "warning: see main README.md for information on enabling GPU BLAS support\n"); +#endif } else if (arg == "-h" || arg == "--help") { params->print_usage = true; return true; diff --git a/convert-baichuan-hf-to-gguf.py b/convert-baichuan-hf-to-gguf.py deleted file mode 100755 index 789602351ca9d..0000000000000 --- a/convert-baichuan-hf-to-gguf.py +++ /dev/null @@ -1,317 +0,0 @@ -#!/usr/bin/env python3 -# HF baichuan --> gguf conversion - -from __future__ import annotations - -import argparse -import json -import os -import struct -import sys -from pathlib import Path -from typing import TYPE_CHECKING, Any -import itertools -import numpy as np -import torch -from sentencepiece import SentencePieceProcessor # type: ignore[import] - -if 'NO_LOCAL_GGUF' not in os.environ: - sys.path.insert(1, str(Path(__file__).parent / 'gguf-py')) -import gguf - - -if TYPE_CHECKING: - from typing import TypeAlias - -NDArray: TypeAlias = 'np.ndarray[Any, Any]' - -# reverse HF permute back to original pth layout - - -def reverse_hf_permute(weights: NDArray, n_head: int, n_kv_head: int | None = None) -> NDArray: - if n_kv_head is not None and n_head != n_kv_head: - n_head //= n_kv_head - - return (weights.reshape(n_head, 2, weights.shape[0] // n_head // 2, *weights.shape[1:]) - .swapaxes(1, 2) - .reshape(weights.shape)) - -def reverse_hf_permute_part(weights: NDArray, n_part: int, n_head: int, n_head_kv: int| None = None) -> NDArray: - r = weights.shape[0] // 3 - return (reverse_hf_permute(weights[r * n_part : r * n_part + r, ...], n_head, n_head_kv)) - -def reverse_hf_part(weights: NDArray, n_part: int) -> NDArray: - r = weights.shape[0] // 3 - return weights[r * n_part : r * n_part + r, ...] - -def count_model_parts(dir_model: str) -> int: - num_parts = 0 - - for filename in os.listdir(dir_model): - if filename.startswith("pytorch_model-"): - num_parts += 1 - - if num_parts > 0: - print("gguf: found " + str(num_parts) + " model parts") - - return num_parts - - - -def parse_args() -> argparse.Namespace: - parser = argparse.ArgumentParser(description="Convert a HuggingFace LLaMA model to a GGML compatible file") - parser.add_argument( - "--vocab-only", action="store_true", - help="extract only the vocab", - ) - parser.add_argument( - "--outfile", type=Path, - help="path to write to; default: based on input", - ) - parser.add_argument( - "model", type=Path, - help="directory containing model file, or model file itself (*.bin)", - ) - parser.add_argument( - "ftype", type=int, choices=[0, 1], default=1, nargs='?', - help="output format - use 0 for float32, 1 for float16", - ) - parser.add_argument("--bigendian", action="store_true", help="model is executed on big endian machine") - return parser.parse_args() - -args = parse_args() - -dir_model = args.model -ftype = args.ftype -if not dir_model.is_dir(): - print(f'Error: {args.model} is not a directory', file = sys.stderr) - sys.exit(1) - -endianess = gguf.GGUFEndian.LITTLE -if args.bigendian: - endianess = gguf.GGUFEndian.BIG -endianess_str = "Big Endian" if args.bigendian else "Little Endian" -print(f"gguf: Conversion Endianess {endianess}") -# possible tensor data types -# ftype == 0 -> float32 -# ftype == 1 -> float16 - -# map from ftype to string -ftype_str = ["f32", "f16"] - -if args.outfile is not None: - fname_out = args.outfile -else: - # output in the same directory as the model by default - fname_out = dir_model / f'ggml-model-{ftype_str[ftype]}.gguf' - -print("gguf: loading model "+dir_model.name) - -with open(dir_model / "config.json", "r", encoding="utf-8") as f: - hparams = json.load(f) -print("hello print: ",hparams["architectures"][0]) -if hparams["architectures"][0] != "BaichuanForCausalLM" and hparams["architectures"][0] != "BaiChuanForCausalLM": - print("Model architecture not supported: " + hparams["architectures"][0]) - - sys.exit() - -# get number of model parts -num_parts = count_model_parts(dir_model) -print(f"num_parts:{num_parts}\n") -ARCH=gguf.MODEL_ARCH.BAICHUAN -gguf_writer = gguf.GGUFWriter(fname_out, gguf.MODEL_ARCH_NAMES[ARCH], endianess=endianess) - -print("gguf: get model metadata") - -block_count = hparams["num_hidden_layers"] -head_count = hparams["num_attention_heads"] - -if "num_key_value_heads" in hparams: - head_count_kv = hparams["num_key_value_heads"] -else: - head_count_kv = head_count - -if "_name_or_path" in hparams: - hf_repo = hparams["_name_or_path"] -else: - hf_repo = "" - -if "max_sequence_length" in hparams: - ctx_length = hparams["max_sequence_length"] -elif "max_position_embeddings" in hparams: - ctx_length = hparams["max_position_embeddings"] -elif "model_max_length" in hparams: - ctx_length = hparams["model_max_length"] -else: - print("gguf: can not find ctx length parameter.") - - sys.exit() - - -gguf_writer.add_name(dir_model.name) -gguf_writer.add_source_hf_repo(hf_repo) -gguf_writer.add_tensor_data_layout("Meta AI original pth") -gguf_writer.add_context_length(ctx_length) -gguf_writer.add_embedding_length(hparams["hidden_size"]) -gguf_writer.add_block_count(block_count) -gguf_writer.add_feed_forward_length(hparams["intermediate_size"]) -gguf_writer.add_rope_dimension_count(hparams["hidden_size"] // hparams["num_attention_heads"]) -gguf_writer.add_head_count(head_count) -gguf_writer.add_head_count_kv(head_count_kv) -gguf_writer.add_layer_norm_rms_eps(hparams["rms_norm_eps"]) - -if "rope_scaling" in hparams and hparams["rope_scaling"] != None and "factor" in hparams["rope_scaling"]: - if "type" in hparams["rope_scaling"]: - if hparams["rope_scaling"]["type"] == "linear": - gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.LINEAR) - gguf_writer.add_rope_scaling_factor(hparams["rope_scaling"]["factor"]) - - -# TOKENIZATION - -print("gguf: get tokenizer metadata") - -tokens: list[bytes] = [] -scores: list[float] = [] -toktypes: list[int] = [] - -tokenizer_model_file = dir_model / 'tokenizer.model' -if not tokenizer_model_file.is_file(): - print(f'Error: Missing {tokenizer_model_file}', file = sys.stderr) - sys.exit(1) - -# vocab type sentencepiece -print("gguf: get sentencepiece tokenizer vocab, scores and token types") - -tokenizer = SentencePieceProcessor(str(tokenizer_model_file)) -vocab_size = hparams.get('vocab_size') -if vocab_size is None: - vocab_size = tokenizer.vocab_size() - -for i in range(vocab_size): - text: bytes - score: float - - piece = tokenizer.id_to_piece(i) - text = piece.encode("utf-8") - score = tokenizer.get_score(i) - - toktype = 1 # defualt to normal token type - if tokenizer.is_unknown(i): - toktype = 2 - if tokenizer.is_control(i): - toktype = 3 - - # toktype = 4 is user-defined = tokens from added_tokens.json - - if tokenizer.is_unused(i): - toktype = 5 - if tokenizer.is_byte(i): - toktype = 6 - - tokens.append(text) - scores.append(score) - toktypes.append(toktype) - -added_tokens_file = dir_model / 'added_tokens.json' -if added_tokens_file.is_file(): - with open(added_tokens_file, "r", encoding="utf-8") as f: - addtokens_json = json.load(f) - - print("gguf: get added tokens") - - for key in addtokens_json: - tokens.append( key.encode("utf-8") ) - scores.append(-1000.0) - toktypes.append(4) # user-defined token type - - -gguf_writer.add_tokenizer_model("llama") -gguf_writer.add_token_list(tokens) -gguf_writer.add_token_scores(scores) -gguf_writer.add_token_types(toktypes) - -special_vocab = gguf.SpecialVocab(dir_model, n_vocab = len(tokens)) -special_vocab.add_to_gguf(gguf_writer) - -# TENSORS - -tensor_map = gguf.get_tensor_name_map(ARCH,block_count) - -# tensor info -print("gguf: get tensor metadata") - -if num_parts == 0: - part_names = iter(("pytorch_model.bin",)) -else: - part_names = ( - f"pytorch_model-{n:05}-of-{num_parts:05}.bin" for n in range(1, num_parts + 1) - ) - - -for part_name in part_names: - if args.vocab_only: - break - print("gguf: loading model part '" + part_name + "'") - model_part = torch.load(f"{dir_model}/{part_name}", map_location="cpu") - - tmp=model_part - for i in range(block_count): - if f"model.layers.{i}.self_attn.W_pack.weight" in model_part: - print(f"Unpacking and permuting layer {i}") - tmp[f"model.layers.{i}.self_attn.q_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],0,head_count,head_count) - tmp[f"model.layers.{i}.self_attn.k_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],1,head_count,head_count_kv) - tmp[f"model.layers.{i}.self_attn.v_proj.weight"]=reverse_hf_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],2) - del tmp[f"model.layers.{i}.self_attn.W_pack.weight"] - - for name in model_part.keys(): - data = model_part[name] - # we don't need these - if name.endswith(".rotary_emb.inv_freq"): - continue - - old_dtype = data.dtype - - # convert any unsupported data types to float32 - if data.dtype != torch.float16 and data.dtype != torch.float32: - data = data.to(torch.float32) - - data = data.squeeze().numpy() - - # map tensor names - new_name = tensor_map.get_name(name, try_suffixes = (".weight", ".bias")) - if new_name is None: - print("Can not map tensor '" + name + "'") - sys.exit() - - n_dims = len(data.shape) - data_dtype = data.dtype - - # if f32 desired, convert any float16 to float32 - if ftype == 0 and data_dtype == np.float16: - data = data.astype(np.float32) - - # TODO: Why cant we use these float16 as-is? There should be not reason to store float16 as float32 - if ftype == 1 and data_dtype == np.float16 and n_dims == 1: - data = data.astype(np.float32) - - # if f16 desired, convert any float32 2-dim weight tensors to float16 - if ftype == 1 and data_dtype == np.float32 and name.endswith(".weight") and n_dims == 2: - data = data.astype(np.float16) - - print(name + " -> " + new_name + ", n_dims = " + str(n_dims) + ", " + str(old_dtype) + " --> " + str(data.dtype)) - gguf_writer.add_tensor(new_name, data) - - -print("gguf: write header") -gguf_writer.write_header_to_file() -print("gguf: write metadata") -gguf_writer.write_kv_data_to_file() -if not args.vocab_only: - print("gguf: write tensors") - gguf_writer.write_tensors_to_file() - -gguf_writer.close() - -print(f"gguf: model successfully exported to '{fname_out}'") -print("") diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py index e7db7591260af..e46a7813a78e9 100755 --- a/convert-hf-to-gguf.py +++ b/convert-hf-to-gguf.py @@ -10,7 +10,7 @@ import sys from enum import IntEnum from pathlib import Path -from typing import TYPE_CHECKING, Any, ContextManager, Iterator, cast +from typing import TYPE_CHECKING, Any, ContextManager, Iterator, cast, Optional import numpy as np import torch @@ -59,7 +59,7 @@ def get_tensors(self) -> Iterator[tuple[str, Tensor]]: from safetensors import safe_open ctx = cast(ContextManager[Any], safe_open(self.dir_model / part_name, framework="pt", device="cpu")) else: - ctx = contextlib.nullcontext(torch.load(self.dir_model / part_name, map_location="cpu")) + ctx = contextlib.nullcontext(torch.load(str(self.dir_model / part_name), map_location="cpu", mmap=True, weights_only=True)) with ctx as model_part: for name in model_part.keys(): @@ -77,8 +77,18 @@ def set_gguf_parameters(self): self.gguf_writer.add_embedding_length(n_embd) if (n_ff := self.hparams.get("intermediate_size")) is not None: self.gguf_writer.add_feed_forward_length(n_ff) - if (n_head := self.hparams.get("num_attention_head")) is not None: + if (n_head := self.hparams.get("num_attention_heads")) is not None: self.gguf_writer.add_head_count(n_head) + if (n_head_kv := self.hparams.get("num_key_value_heads")) is not None: + self.gguf_writer.add_head_count_kv(n_head_kv) + + if (n_rms_eps := self.hparams.get("rms_norm_eps")) is not None: + self.gguf_writer.add_layer_norm_rms_eps(n_rms_eps) + if (n_experts := self.hparams.get("num_local_experts")) is not None: + self.gguf_writer.add_expert_count(n_experts) + if (n_experts_used := self.hparams.get("num_experts_per_tok")) is not None: + self.gguf_writer.add_expert_used_count(n_experts_used) + self.gguf_writer.add_parallel_residual(self.hparams.get("use_parallel_residual", True)) def write_tensors(self): @@ -168,6 +178,10 @@ def from_model_architecture(model_architecture): return PersimmonModel if model_architecture in ("StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"): return StableLMModel + if model_architecture == "QWenLMHeadModel": + return QwenModel + if model_architecture == "MixtralForCausalLM": + return MixtralModel return Model def _is_model_safetensors(self) -> bool: @@ -193,7 +207,7 @@ def _get_model_architecture(self) -> gguf.MODEL_ARCH: return gguf.MODEL_ARCH.MPT if arch in ("BaichuanForCausalLM", "BaiChuanForCausalLM"): return gguf.MODEL_ARCH.BAICHUAN - if arch == "FalconForCausalLM": + if arch in ("FalconForCausalLM", "RWForCausalLM"): return gguf.MODEL_ARCH.FALCON if arch == "GPTBigCodeForCausalLM": return gguf.MODEL_ARCH.STARCODER @@ -203,6 +217,10 @@ def _get_model_architecture(self) -> gguf.MODEL_ARCH: return gguf.MODEL_ARCH.PERSIMMON if arch in ("StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"): return gguf.MODEL_ARCH.STABLELM + if arch == "QWenLMHeadModel": + return gguf.MODEL_ARCH.QWEN + if arch == "MixtralForCausalLM": + return gguf.MODEL_ARCH.LLAMA raise NotImplementedError(f'Architecture "{arch}" not supported!') @@ -827,13 +845,144 @@ def set_gguf_parameters(self): self.gguf_writer.add_embedding_length(hparams["hidden_size"]) self.gguf_writer.add_block_count(block_count) self.gguf_writer.add_feed_forward_length(hparams["intermediate_size"]) - self.gguf_writer.add_rope_dimension_count(int(hparams["rope_pct"]*(hparams["hidden_size"] // hparams["num_attention_heads"]))) + self.gguf_writer.add_rope_dimension_count(int(hparams["rope_pct"] * (hparams["hidden_size"] // hparams["num_attention_heads"]))) self.gguf_writer.add_head_count(hparams["num_attention_heads"]) self.gguf_writer.add_parallel_residual(hparams["use_parallel_residual"] if "use_parallel_residual" in hparams else True) self.gguf_writer.add_layer_norm_eps(1e-5) + +class MixtralModel(Model): + def set_vocab(self): + self._set_vocab_sentencepiece() + + +class QwenModel(Model): + @staticmethod + def token_bytes_to_string(b): + from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode + byte_encoder = bytes_to_unicode() + return ''.join([byte_encoder[ord(char)] for char in b.decode('latin-1')]) + + @staticmethod + def bpe(mergeable_ranks: dict[bytes, int], token: bytes, max_rank: Optional[int] = None) -> list[bytes]: + parts = [bytes([b]) for b in token] + while True: + min_idx = None + min_rank = None + for i, pair in enumerate(zip(parts[:-1], parts[1:])): + rank = mergeable_ranks.get(pair[0] + pair[1]) + if rank is not None and (min_rank is None or rank < min_rank): + min_idx = i + min_rank = rank + if min_rank is None or (max_rank is not None and min_rank >= max_rank): + break + assert min_idx is not None + parts = parts[:min_idx] + [parts[min_idx] + parts[min_idx + 1]] + parts[min_idx + 2:] + return parts + + def set_vocab(self): + dir_model = self.dir_model + hparams = self.hparams + tokens: list[bytearray] = [] + toktypes: list[int] = [] + + from transformers import AutoTokenizer # type: ignore[attr-defined] + tokenizer = AutoTokenizer.from_pretrained(dir_model, trust_remote_code=True) + vocab_size = hparams["vocab_size"] + assert max(tokenizer.get_vocab().values()) < vocab_size + + merges = [] + vocab = {} + mergeable_ranks = tokenizer.mergeable_ranks + for token, rank in mergeable_ranks.items(): + vocab[self.token_bytes_to_string(token)] = rank + if len(token) == 1: + continue + merged = QwenModel.bpe(mergeable_ranks, token, max_rank=rank) + assert len(merged) == 2 + merges.append(' '.join(map(self.token_bytes_to_string, merged))) + + reverse_vocab = {id_ : encoded_tok for encoded_tok, id_ in vocab.items()} + added_vocab = tokenizer.special_tokens + + for i in range(vocab_size): + if i not in reverse_vocab: + pad_token = f"[PAD{i}]".encode("utf-8") + tokens.append(bytearray(pad_token)) + toktypes.append(gguf.TokenType.USER_DEFINED) + elif reverse_vocab[i] in added_vocab: + tokens.append(reverse_vocab[i]) + toktypes.append(gguf.TokenType.CONTROL) + else: + tokens.append(reverse_vocab[i]) + toktypes.append(gguf.TokenType.NORMAL) + + self.gguf_writer.add_tokenizer_model("gpt2") + self.gguf_writer.add_token_list(tokens) + self.gguf_writer.add_token_types(toktypes) + + special_vocab = gguf.SpecialVocab(dir_model, load_merges=False) + special_vocab.merges = merges + special_vocab._set_special_token("bos", tokenizer.special_tokens["<|endoftext|>"]) + special_vocab._set_special_token("eos", tokenizer.special_tokens["<|endoftext|>"]) + special_vocab._set_special_token("unk", tokenizer.special_tokens["<|endoftext|>"]) + special_vocab.add_to_gguf(self.gguf_writer) + + def set_gguf_parameters(self): + self.gguf_writer.add_name("Qwen") + self.gguf_writer.add_context_length(self.hparams["max_position_embeddings"]) + self.gguf_writer.add_block_count(self.hparams["num_hidden_layers"]) + self.gguf_writer.add_embedding_length(self.hparams["hidden_size"]) + self.gguf_writer.add_feed_forward_length(self.hparams["intermediate_size"]) + self.gguf_writer.add_rope_freq_base(self.hparams["rotary_emb_base"]) + self.gguf_writer.add_rope_dimension_count(self.hparams["hidden_size"] // self.hparams["num_attention_heads"]) + self.gguf_writer.add_head_count(self.hparams["num_attention_heads"]) + self.gguf_writer.add_layer_norm_rms_eps(self.hparams["layer_norm_epsilon"]) + + def write_tensors(self): + block_count = self.hparams["num_hidden_layers"] + model_kv = dict(self.get_tensors()) + tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count) + for name, data_torch in model_kv.items(): + # we don't need these + if name.endswith(".rotary_emb.inv_freq"): + continue + + old_dtype = data_torch.dtype + + # convert any unsupported data types to float32 + if data_torch.dtype not in (torch.float16, torch.float32): + data_torch = data_torch.to(torch.float32) + + data = data_torch.squeeze().numpy() + + # map tensor names + new_name = tensor_map.get_name(name, try_suffixes=(".weight", ".bias")) + if new_name is None: + print(f"Can not map tensor {name!r}") + sys.exit() + + n_dims = len(data.shape) + data_dtype = data.dtype + + # if f32 desired, convert any float16 to float32 + if self.ftype == 0 and data_dtype == np.float16: + data = data.astype(np.float32) + + # TODO: Why cant we use these float16 as-is? There should be not reason to store float16 as float32 + if self.ftype == 1 and data_dtype == np.float16 and n_dims == 1: + data = data.astype(np.float32) + + # if f16 desired, convert any float32 2-dim weight tensors to float16 + if self.ftype == 1 and data_dtype == np.float32 and name.endswith(".weight") and n_dims == 2: + data = data.astype(np.float16) + + print(f"{new_name}, n_dims = {n_dims}, {old_dtype} --> {data.dtype}") + self.gguf_writer.add_tensor(new_name, data) + ###### CONVERSION LOGIC ###### + def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser(description="Convert a huggingface model to a GGML compatible file") parser.add_argument( @@ -879,20 +1028,21 @@ def parse_args() -> argparse.Namespace: hparams = Model.load_hparams(dir_model) -model_class = Model.from_model_architecture(hparams["architectures"][0]) -model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian) +with torch.inference_mode(): + model_class = Model.from_model_architecture(hparams["architectures"][0]) + model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian) -print("Set model parameters") -model_instance.set_gguf_parameters() + print("Set model parameters") + model_instance.set_gguf_parameters() -print("Set model tokenizer") -model_instance.set_vocab() + print("Set model tokenizer") + model_instance.set_vocab() -if args.vocab_only: - print(f"Exporting model vocab to '{fname_out}'") - model_instance.write_vocab() -else: - print(f"Exporting model to '{fname_out}'") - model_instance.write() + if args.vocab_only: + print(f"Exporting model vocab to '{fname_out}'") + model_instance.write_vocab() + else: + print(f"Exporting model to '{fname_out}'") + model_instance.write() -print(f"Model successfully exported to '{fname_out}'") + print(f"Model successfully exported to '{fname_out}'") diff --git a/convert-llama-ggml-to-gguf.py b/convert-llama-ggml-to-gguf.py index d898d81c4c445..e359330afc51f 100755 --- a/convert-llama-ggml-to-gguf.py +++ b/convert-llama-ggml-to-gguf.py @@ -2,7 +2,6 @@ from __future__ import annotations import argparse -import math import struct import sys from enum import IntEnum @@ -15,11 +14,13 @@ sys.path.insert(1, str(Path(__file__).parent / 'gguf-py')) import gguf + class GGMLFormat(IntEnum): GGML = 0 GGMF = 1 GGJT = 2 + class GGMLFType(IntEnum): ALL_F32 = 0 MOSTLY_F16 = 1 @@ -39,6 +40,7 @@ class GGMLFType(IntEnum): MOSTLY_Q5_K_M = 17 MOSTLY_Q6_K = 18 + class Hyperparameters: def __init__(self): self.n_vocab = self.n_embd = self.n_mult = self.n_head = 0 @@ -70,6 +72,7 @@ def load(self, data, offset): def __str__(self): return f'' + class Vocab: def __init__(self, load_scores = True): self.items = [] @@ -91,6 +94,7 @@ def load(self, data, offset, n_vocab): self.items.append((item_text, item_score)) return offset - orig_offset + class Tensor: def __init__(self, use_padding = True): self.name = None @@ -124,6 +128,7 @@ def load(self, data, offset): # print(n_dims, name_len, dtype, self.dims, self.name, pad) return offset - orig_offset + class GGMLModel: def __init__(self): self.hyperparameters = None @@ -160,8 +165,8 @@ def validate_conversion(self, ftype): if ftype not in (GGMLFType.ALL_F32, GGMLFType.MOSTLY_F16): err = 'Quantizations changed in GGJTv2. Can only convert unquantized GGML files older than GGJTv2.' elif (self.file_format == GGMLFormat.GGJT and self.format_version == 2): - if ftype in ( GGMLFType.MOSTLY_Q4_0, GGMLFType.MOSTLY_Q4_1, - GGMLFType.MOSTLY_Q4_1_SOME_F16, GGMLFType.MOSTLY_Q8_0): + if ftype in (GGMLFType.MOSTLY_Q4_0, GGMLFType.MOSTLY_Q4_1, + GGMLFType.MOSTLY_Q4_1_SOME_F16, GGMLFType.MOSTLY_Q8_0): err = 'Q4 and Q8 quantizations changed in GGJTv3.' if len(err) > 0: raise ValueError(f'{err} Sorry, your {self.file_format.name}v{self.format_version} file of type {ftype.name} is not eligible for conversion.') @@ -188,6 +193,7 @@ def load(self, data, offset): hp.set_n_ff(self) return offset + class GGMLToGGUF: def __init__(self, ggml_model, data, cfg, params_override = None, vocab_override = None, special_vocab = None): hp = ggml_model.hyperparameters @@ -218,7 +224,7 @@ def save(self): gguf_writer = gguf.GGUFWriter( self.cfg.output, gguf.MODEL_ARCH_NAMES[gguf.MODEL_ARCH.LLAMA], - use_temp_file = False ) + use_temp_file = False) self.add_params(gguf_writer) self.add_vocab(gguf_writer) if self.special_vocab is not None: @@ -342,7 +348,8 @@ def add_tensors(self, gguf_writer): mapped_name, data[tensor.start_offset:tensor.start_offset + tensor.len_bytes], raw_shape = tempdims, - raw_dtype = tensor.dtype ) + raw_dtype = tensor.dtype) + def handle_metadata(cfg, hp): import convert @@ -366,38 +373,40 @@ def handle_metadata(cfg, hp): raise ValueError('Unable to load metadata') vocab = convert.load_vocab( cfg.vocab_dir if cfg.vocab_dir is not None else cfg.model_metadata_dir, - cfg.vocabtype ) + cfg.vocabtype) # FIXME: Respect cfg.vocab_dir? svocab = gguf.SpecialVocab(cfg.model_metadata_dir, - load_merges = cfg.vocabtype == 'bpe', - n_vocab = vocab.vocab_size) + load_merges = cfg.vocabtype == 'bpe', + n_vocab = vocab.vocab_size) convert.check_vocab_size(params, vocab) return (params, vocab, svocab) + def handle_args(): parser = argparse.ArgumentParser(description = 'Convert GGML models to GGUF') parser.add_argument('--input', '-i', type = Path, required = True, - help = 'Input GGMLv3 filename') + help = 'Input GGMLv3 filename') parser.add_argument('--output', '-o', type = Path, required = True, - help ='Output GGUF filename') + help ='Output GGUF filename') parser.add_argument('--name', - help = 'Set model name') + help = 'Set model name') parser.add_argument('--desc', - help = 'Set model description') + help = 'Set model description') parser.add_argument('--gqa', type = int, default = 1, - help = 'grouped-query attention factor (use 8 for LLaMA2 70B)') + help = 'grouped-query attention factor (use 8 for LLaMA2 70B)') parser.add_argument('--eps', default = '5.0e-06', - help = 'RMS norm eps: Use 1e-6 for LLaMA1 and OpenLLaMA, use 1e-5 for LLaMA2') + help = 'RMS norm eps: Use 1e-6 for LLaMA1 and OpenLLaMA, use 1e-5 for LLaMA2') parser.add_argument('--context-length', '-c', type=int, default = 2048, - help = 'Default max context length: LLaMA1 is typically 2048, LLaMA2 is typically 4096') + help = 'Default max context length: LLaMA1 is typically 2048, LLaMA2 is typically 4096') parser.add_argument('--model-metadata-dir', '-m', type = Path, - help ='Load HuggingFace/.pth vocab and metadata from the specified directory') + help ='Load HuggingFace/.pth vocab and metadata from the specified directory') parser.add_argument("--vocab-dir", type=Path, - help="directory containing tokenizer.model, if separate from model file - only meaningful with --model-metadata-dir") + help="directory containing tokenizer.model, if separate from model file - only meaningful with --model-metadata-dir") parser.add_argument("--vocabtype", choices=["spm", "bpe"], default="spm", - help="vocab format - only meaningful with --model-metadata-dir and/or --vocab-dir (default: spm)") + help="vocab format - only meaningful with --model-metadata-dir and/or --vocab-dir (default: spm)") return parser.parse_args() + def main(): cfg = handle_args() print(f'* Using config: {cfg}') @@ -407,7 +416,7 @@ def main(): data = np.memmap(cfg.input, mode = 'r') model = GGMLModel() print('* Scanning GGML input file') - offset = model.load(data, 0) + offset = model.load(data, 0) # noqa print(f'* GGML model hyperparameters: {model.hyperparameters}') vocab_override = None params_override = None @@ -422,12 +431,15 @@ def main(): print('\n=== WARNING === Special tokens may not be converted correctly. Use --model-metadata-dir if possible === WARNING ===\n') if model.file_format == GGMLFormat.GGML: print('! This is a very old GGML file that does not contain vocab scores. Strongly recommend using model metadata!') - converter = GGMLToGGUF(model, data, cfg, + converter = GGMLToGGUF( + model, data, cfg, params_override = params_override, vocab_override = vocab_override, - special_vocab = special_vocab ) + special_vocab = special_vocab + ) converter.save() print(f'* Successful completion. Output saved to: {cfg.output}') + if __name__ == '__main__': main() diff --git a/convert-persimmon-to-gguf.py b/convert-persimmon-to-gguf.py index 240f87306e578..206b7d5ff9e31 100644 --- a/convert-persimmon-to-gguf.py +++ b/convert-persimmon-to-gguf.py @@ -9,6 +9,7 @@ sys.path.insert(1, str(Path(__file__).parent / 'gguf-py')) import gguf + def _flatten_dict(dct, tensors, prefix=None): assert isinstance(dct, dict) for key in dct.keys(): @@ -21,6 +22,7 @@ def _flatten_dict(dct, tensors, prefix=None): raise ValueError(type(dct[key])) return None + def _get_sentencepiece_tokenizer_info(dir_model: Path): tokenizer_path = dir_model / 'adept_vocab.model' print('gguf: getting sentencepiece tokenizer from', tokenizer_path) @@ -54,6 +56,7 @@ def _get_sentencepiece_tokenizer_info(dir_model: Path): pass return tokens, scores, toktypes + def main(): parser = argparse.ArgumentParser(description="Convert a Persimmon model from Adept (e.g. Persimmon 8b chat) to a GGML compatible file") parser.add_argument("--outfile", type=Path, help="path to write to; default: based on input") @@ -125,6 +128,5 @@ def main(): print("") - if __name__ == '__main__': main() diff --git a/convert.py b/convert.py index 3d6216f1d4e7a..e4b69d172f728 100755 --- a/convert.py +++ b/convert.py @@ -42,10 +42,12 @@ ARCH = gguf.MODEL_ARCH.LLAMA DEFAULT_CONCURRENCY = 8 + # # data types # + @dataclass(frozen=True) class DataType: name: str @@ -55,14 +57,17 @@ class DataType: def elements_to_bytes(self, n_elements: int) -> int: return n_elements * self.dtype.itemsize + @dataclass(frozen=True) class UnquantizedDataType(DataType): pass -DT_F16 = UnquantizedDataType('F16', dtype = np.dtype(np.float16), valid_conversions = ['F32', 'Q8_0']) -DT_F32 = UnquantizedDataType('F32', dtype = np.dtype(np.float32), valid_conversions = ['F16', 'Q8_0']) -DT_I32 = UnquantizedDataType('I32', dtype = np.dtype(np.int16), valid_conversions = []) -DT_BF16 = UnquantizedDataType('BF16', dtype = np.dtype(np.uint16), valid_conversions = ['F32', 'F16', 'Q8_0']) + +DT_F16 = UnquantizedDataType('F16', dtype = np.dtype(np.float16), valid_conversions = ['F32', 'Q8_0']) +DT_F32 = UnquantizedDataType('F32', dtype = np.dtype(np.float32), valid_conversions = ['F16', 'Q8_0']) +DT_I32 = UnquantizedDataType('I32', dtype = np.dtype(np.int16), valid_conversions = []) +DT_BF16 = UnquantizedDataType('BF16', dtype = np.dtype(np.uint16), valid_conversions = ['F32', 'F16', 'Q8_0']) + @dataclass(frozen=True) class QuantizedDataType(DataType): @@ -77,6 +82,7 @@ def elements_to_bytes(self, n_elements: int) -> int: assert n_elements % self.block_size == 0, f'Invalid number of elements {n_elements} for {self.name} with block size {self.block_size}' return self.quantized_dtype.itemsize * (n_elements // self.block_size) + @dataclass(frozen=True) class Q8_0QuantizedDataType(QuantizedDataType): # Mini Q8_0 quantization in Python! @@ -86,6 +92,7 @@ def quantize(self, arr: NDArray) -> NDArray: n_blocks = arr.size // self.block_size blocks = arr.reshape((n_blocks, self.block_size)) # Much faster implementation of block quantization contributed by @Cebtenzzre + def quantize_blocks_q8_0(blocks: NDArray) -> Iterable[tuple[Any, Any]]: d = abs(blocks).max(axis = 1) / np.float32(127) with np.errstate(divide = 'ignore'): @@ -94,10 +101,11 @@ def quantize_blocks_q8_0(blocks: NDArray) -> Iterable[tuple[Any, Any]]: yield from zip(d, qs) return np.fromiter(quantize_blocks_q8_0(blocks), count = n_blocks, dtype = self.quantized_dtype) + DT_Q8_0 = Q8_0QuantizedDataType('Q8_0', - dtype = np.dtype(np.float32), valid_conversions = [], - ggml_type = gguf.GGMLQuantizationType.Q8_0, block_size = 32, - quantized_dtype = np.dtype([('d', ' Iterable[tuple[Any, Any]]: # TODO: match this with `llama_ftype` # TODO: rename to LLAMAFileType # TODO: move to `gguf.py` + + class GGMLFileType(enum.IntEnum): AllF32 = 0 MostlyF16 = 1 # except 1d tensors @@ -128,6 +138,7 @@ def type_for_tensor(self, name: str, tensor: LazyTensor) -> DataType: # 1D tensors are always F32. return dt if len(tensor.shape) > 1 else DT_F32 + GGML_FILE_TYPE_TO_DATA_TYPE: dict[GGMLFileType, DataType] = { GGMLFileType.AllF32 : DT_F32, GGMLFileType.MostlyF16 : DT_F16, @@ -138,16 +149,19 @@ def type_for_tensor(self, name: str, tensor: LazyTensor) -> DataType: # hparams loading # + @dataclass class Params: - n_vocab: int - n_embd: int - n_layer: int - n_ctx: int - n_ff: int - n_head: int - n_head_kv: int - f_norm_eps: float + n_vocab: int + n_embd: int + n_layer: int + n_ctx: int + n_ff: int + n_head: int + n_head_kv: int + n_experts: int | None = None + n_experts_used: int | None = None + f_norm_eps: float | None = None rope_scaling_type: gguf.RopeScalingType | None = None f_rope_freq_base: float | None = None @@ -167,11 +181,11 @@ def guessed(model: LazyModel) -> Params: # try transformer naming first if "model.layers.0.self_attn.q_proj.weight" in model: - n_layer=next(i for i in itertools.count() if f"model.layers.{i}.self_attn.q_proj.weight" not in model) + n_layer = next(i for i in itertools.count() if f"model.layers.{i}.self_attn.q_proj.weight" not in model) elif "model.layers.0.self_attn.W_pack.weight" in model: # next: try baichuan naming - n_layer=next(i for i in itertools.count() if f"model.layers.{i}.self_attn.W_pack.weight" not in model) + n_layer = next(i for i in itertools.count() if f"model.layers.{i}.self_attn.W_pack.weight" not in model) else: - n_layer=next(i for i in itertools.count() if f"layers.{i}.attention.wq.weight" not in model) + n_layer = next(i for i in itertools.count() if f"layers.{i}.attention.wq.weight" not in model) if n_layer < 1: raise Exception("failed to guess 'n_layer'. This model is unknown or unsupported.\n" @@ -222,6 +236,13 @@ def loadHFTransformerJson(model: LazyModel, config_path: Path) -> Params: raise Exception("failed to guess 'n_ctx'. This model is unknown or unsupported.\n" "Suggestion: provide 'config.json' of the model in the same directory containing model files.") + n_experts = None + n_experts_used = None + + if "num_local_experts" in config: + n_experts = config["num_local_experts"] + n_experts_used = config["num_experts_per_tok"] + return Params( n_vocab = config["vocab_size"], n_embd = config["hidden_size"], @@ -230,6 +251,8 @@ def loadHFTransformerJson(model: LazyModel, config_path: Path) -> Params: n_ff = config["intermediate_size"], n_head = (n_head := config["num_attention_heads"]), n_head_kv = config.get("num_key_value_heads", n_head), + n_experts = n_experts, + n_experts_used = n_experts_used, f_norm_eps = config["rms_norm_eps"], f_rope_freq_base = config.get("rope_theta"), rope_scaling_type = rope_scaling_type, @@ -244,8 +267,15 @@ def loadHFTransformerJson(model: LazyModel, config_path: Path) -> Params: def loadOriginalParamsJson(model: LazyModel, config_path: Path) -> Params: config = json.load(open(config_path)) + n_experts = None + n_experts_used = None + f_rope_freq_base = None + # hack to determine LLaMA v1 vs v2 vs CodeLlama - if config.get("rope_theta") == 1000000: + if config.get("moe"): + # Mixtral + n_ctx = 32768 + elif config.get("rope_theta") == 1000000: # CodeLlama n_ctx = 16384 elif config["norm_eps"] == 1e-05: @@ -255,16 +285,27 @@ def loadOriginalParamsJson(model: LazyModel, config_path: Path) -> Params: # LLaMA v1 n_ctx = 2048 + if "layers.0.feed_forward.w1.weight" in model: + n_ff = model["layers.0.feed_forward.w1.weight"].shape[0] + + if config.get("moe"): + n_ff = model["layers.0.feed_forward.experts.0.w1.weight"].shape[0] + n_experts = config["moe"]["num_experts"] + n_experts_used = config["moe"]["num_experts_per_tok"] + f_rope_freq_base = 1e6 + return Params( - n_vocab = config.get("vocab_size", model["tok_embeddings.weight"].shape[0]), + n_vocab = model["tok_embeddings.weight"].shape[0], n_embd = config["dim"], n_layer = config["n_layers"], n_ctx = n_ctx, - n_ff = model["layers.0.feed_forward.w1.weight"].shape[0], + n_ff = n_ff, n_head = (n_head := config["n_heads"]), n_head_kv = config.get("n_kv_heads", n_head), + n_experts = n_experts, + n_experts_used = n_experts_used, f_norm_eps = config["norm_eps"], - f_rope_freq_base = config.get("rope_theta"), + f_rope_freq_base = config.get("rope_theta", f_rope_freq_base), ) @staticmethod @@ -308,7 +349,7 @@ def __init__(self, fname_tokenizer: Path, fname_added_tokens: Path | None) -> No (item['content'], item['id']) for item in tokenizer_json.get('added_tokens', []) # Added tokens here can be duplicates of the main vocabulary. - if item['content'] not in self.bpe_tokenizer ) + if item['content'] not in self.bpe_tokenizer) vocab_size: int = len(self.bpe_tokenizer) expected_ids = list(range(vocab_size, vocab_size + len(added_tokens))) @@ -326,7 +367,6 @@ def __init__(self, fname_tokenizer: Path, fname_added_tokens: Path | None) -> No def bpe_tokens(self) -> Iterable[tuple[bytes, float, gguf.TokenType]]: tokenizer = self.bpe_tokenizer - from transformers.models.gpt2 import tokenization_gpt2 reverse_vocab = {id: encoded_tok for encoded_tok, id in tokenizer.items()} for i, _ in enumerate(tokenizer): @@ -406,6 +446,7 @@ def all_tokens(self) -> Iterable[tuple[bytes, float, gguf.TokenType]]: def __repr__(self) -> str: return f"" + Vocab: TypeAlias = 'BpeVocab | SentencePieceVocab' # @@ -413,13 +454,14 @@ def __repr__(self) -> str: # TODO: reuse (probably move to gguf.py?) # + def permute(weights: NDArray, n_head: int, n_head_kv: int) -> NDArray: - #print( "permute debug " + str(weights.shape[0]) + " x " + str(weights.shape[1]) + " nhead " + str(n_head) + " nheadkv " + str(n_kv_head) ) + # print( "permute debug " + str(weights.shape[0]) + " x " + str(weights.shape[1]) + " nhead " + str(n_head) + " nheadkv " + str(n_kv_head) ) if n_head_kv is not None and n_head != n_head_kv: n_head = n_head_kv return (weights.reshape(n_head, 2, weights.shape[0] // n_head // 2, *weights.shape[1:]) - .swapaxes(1, 2) - .reshape(weights.shape)) + .swapaxes(1, 2) + .reshape(weights.shape)) class Tensor(metaclass=ABCMeta): @@ -500,7 +542,7 @@ def load(self) -> Tensor: ret = self._load() # Should be okay if it maps to the same numpy type? assert ret.data_type == self.data_type or (self.data_type.dtype == ret.data_type.dtype), \ - (self.data_type, ret.data_type, self.description) + (self.data_type, ret.data_type, self.description) return ret def astype(self, data_type: DataType) -> LazyTensor: @@ -573,7 +615,7 @@ def merge_multifile_models(models_plus: list[ModelPlus]) -> ModelPlus: if any("model.embed_tokens.weight" in mp.model for mp in models_plus): # Transformers models put different tensors in different files, but - # don't split indivdual tensors between files. + # don't split individual tensors between files. model: LazyModel = {} for mp in models_plus: model.update(mp.model) @@ -588,6 +630,7 @@ def load() -> Tensor: return lazy_tensor.load().permute(n_head, n_head_kv) return LazyTensor(load, lazy_tensor.shape, lazy_tensor.data_type, f'permute({n_head}, {n_head_kv}) ' + lazy_tensor.description) + def permute_part_lazy(lazy_tensor: LazyTensor, n_part: int, n_head: int, n_head_kv: int) -> LazyTensor: def load() -> Tensor: return lazy_tensor.load().permute_part(n_part, n_head, n_head_kv) @@ -595,6 +638,7 @@ def load() -> Tensor: s[0] = s[0] // 3 return LazyTensor(load, s, lazy_tensor.data_type, f'permute({n_head}, {n_head_kv}) ' + lazy_tensor.description) + def part_lazy(lazy_tensor: LazyTensor, n_part: int) -> LazyTensor: def load() -> Tensor: return lazy_tensor.load().part(n_part) @@ -664,7 +708,7 @@ def rebuild_from_type_v2(func, new_type, args, state): return func(*args) CLASSES: dict[tuple[str, str], Any] = { - # getattr used here as a workaround for mypy not being smart enough to detrmine + # getattr used here as a workaround for mypy not being smart enough to determine # the staticmethods have a __func__ attribute. ('torch._tensor', '_rebuild_from_type_v2'): getattr(rebuild_from_type_v2, '__func__'), ('torch._utils', '_rebuild_tensor_v2'): getattr(lazy_rebuild_tensor_v2, '__func__'), @@ -690,6 +734,7 @@ def lazy_load_torch_file(outer_fp: IO[bytes], path: Path) -> ModelPlus: data_base_path=pickle_paths[0][:-4], zip_file=zf) model = unpickler.load() + if 'model' in model: model = model['model'] as_dict = dict(model.items()) return ModelPlus(model=as_dict, paths=[path], format='torch', vocab=None) @@ -743,6 +788,7 @@ def lazy_load_file(path: Path) -> ModelPlus: In = TypeVar('In') Out = TypeVar('Out') + def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], concurrency: int, max_workers: int | None = None, use_processpool_executor: bool = False) -> Iterable[Out]: '''Parallel map, but with backpressure. If the caller doesn't call `next` fast enough, this will stop calling `func` at some point rather than @@ -777,6 +823,7 @@ def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], conc break yield result + def check_vocab_size(params: Params, vocab: Vocab) -> None: if params.n_vocab != vocab.vocab_size: assert isinstance(vocab, BpeVocab) or isinstance(vocab, SentencePieceVocab) @@ -795,7 +842,7 @@ def check_vocab_size(params: Params, vocab: Vocab) -> None: class OutputFile: - def __init__(self, fname_out: Path, endianess:gguf.GGUFEndian=gguf.GGUFEndian.LITTLE) -> None: + def __init__(self, fname_out: Path, endianess:gguf.GGUFEndian = gguf.GGUFEndian.LITTLE) -> None: self.gguf = gguf.GGUFWriter(fname_out, gguf.MODEL_ARCH_NAMES[ARCH], endianess=endianess) def add_meta_arch(self, params: Params) -> None: @@ -815,7 +862,17 @@ def add_meta_arch(self, params: Params) -> None: self.gguf.add_rope_dimension_count(params.n_embd // params.n_head) self.gguf.add_head_count (params.n_head) self.gguf.add_head_count_kv (params.n_head_kv) - self.gguf.add_layer_norm_rms_eps (params.f_norm_eps) + + if params.n_experts: + self.gguf.add_expert_count(params.n_experts) + + if params.n_experts_used: + self.gguf.add_expert_used_count(params.n_experts_used) + + if params.f_norm_eps: + self.gguf.add_layer_norm_rms_eps(params.f_norm_eps) + else: + raise ValueError('f_norm_eps is None') if params.f_rope_freq_base is not None: self.gguf.add_rope_freq_base(params.f_rope_freq_base) @@ -875,7 +932,7 @@ def close(self) -> None: self.gguf.close() @staticmethod - def write_vocab_only(fname_out: Path, params: Params, vocab: Vocab, svocab: gguf.SpecialVocab, endianess:gguf.GGUFEndian=gguf.GGUFEndian.LITTLE) -> None: + def write_vocab_only(fname_out: Path, params: Params, vocab: Vocab, svocab: gguf.SpecialVocab, endianess:gguf.GGUFEndian = gguf.GGUFEndian.LITTLE) -> None: check_vocab_size(params, vocab) of = OutputFile(fname_out, endianess=endianess) @@ -937,8 +994,9 @@ def write_all(fname_out: Path, ftype: GGMLFileType, params: Params, model: LazyM of.close() + def pick_output_type(model: LazyModel, output_type_str: str | None) -> GGMLFileType: - wq_type = model[gguf.TENSOR_NAMES[gguf.MODEL_TENSOR.ATTN_Q].format(bid=0)+".weight"].data_type + wq_type = model[gguf.TENSOR_NAMES[gguf.MODEL_TENSOR.ATTN_Q].format(bid=0) + ".weight"].data_type if output_type_str == "f32" or (output_type_str is None and wq_type == DT_F32): return GGMLFileType.AllF32 @@ -951,10 +1009,12 @@ def pick_output_type(model: LazyModel, output_type_str: str | None) -> GGMLFileT raise Exception(f"Unexpected combination of types: {name_to_type}") + def convert_to_output_type(model: LazyModel, output_type: GGMLFileType) -> LazyModel: return {name: tensor.astype(output_type.type_for_tensor(name, tensor)) for (name, tensor) in model.items()} + def convert_model_names(model: LazyModel, params: Params) -> LazyModel: tmap = gguf.TensorNameMap(ARCH, params.n_layer) should_skip: set[gguf.MODEL_TENSOR] = set(gguf.MODEL_TENSOR_SKIP.get(ARCH, [])) @@ -967,7 +1027,7 @@ def convert_model_names(model: LazyModel, params: Params) -> LazyModel: print(f"Permuting layer {i}") tmp[f"model.layers.{i}.self_attn.q_proj.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.q_proj.weight"], params.n_head, params.n_head) tmp[f"model.layers.{i}.self_attn.k_proj.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.k_proj.weight"], params.n_head, params.n_head_kv) - #tmp[f"model.layers.{i}.self_attn.v_proj.weight"] = model[f"model.layers.{i}.self_attn.v_proj.weight"] + # tmp[f"model.layers.{i}.self_attn.v_proj.weight"] = model[f"model.layers.{i}.self_attn.v_proj.weight"] elif f"model.layers.{i}.self_attn.W_pack.weight" in model: print(f"Unpacking and permuting layer {i}") tmp[f"model.layers.{i}.self_attn.q_proj.weight"] = permute_part_lazy(model[f"model.layers.{i}.self_attn.W_pack.weight"], 0, params.n_head, params.n_head) @@ -992,6 +1052,7 @@ def convert_model_names(model: LazyModel, params: Params) -> LazyModel: return out + def nth_multifile_path(path: Path, n: int) -> Path | None: '''Given any path belonging to a multi-file model (e.g. foo.bin.1), return the nth path in the model. @@ -1173,8 +1234,8 @@ def main(args_in: list[str] | None = None) -> None: # FIXME: Try to respect vocab_dir somehow? vocab = load_vocab(args.vocab_dir or args.model, args.vocabtype) special_vocab = gguf.SpecialVocab(model_plus.paths[0].parent, - load_merges = args.vocabtype == 'bpe', - n_vocab = vocab.vocab_size) + load_merges = args.vocabtype == 'bpe', + n_vocab = vocab.vocab_size) outfile = args.outfile OutputFile.write_vocab_only(outfile, params, vocab, special_vocab) print(f"Wrote {outfile}") @@ -1187,8 +1248,8 @@ def main(args_in: list[str] | None = None) -> None: vocab = load_vocab(vocab_dir, args.vocabtype) # FIXME: Try to respect vocab_dir somehow? special_vocab = gguf.SpecialVocab(model_plus.paths[0].parent, - load_merges = args.vocabtype == 'bpe', - n_vocab = vocab.vocab_size) + load_merges = args.vocabtype == 'bpe', + n_vocab = vocab.vocab_size) model = model_plus.model model = convert_model_names(model, params) diff --git a/docs/llama-star/idea-arch.key b/docs/llama-star/idea-arch.key new file mode 100755 index 0000000000000000000000000000000000000000..3e068e7075c2ebb53a270b57c51e9281618d8f29 GIT binary patch literal 488591 zcmbTd1ymft(=WQXLxA9#EQAo;AvgpJA-Dv04eo)kIKeGQ(BKvzxbNaF2^tm&PVij_ zu)wn1&Huaie)qid-Z}TY+U=R&>8`Hn>gk#8>Z;LF$HIOL_(#Rg5Cg(_6a!H|0Prt~ zpz~f?`B?F}`tT?T%8M&0$jkHaO9+Yc2#bln;!%(n7T{49Rg&NrQhFsJF3Qg%!7nDj z>+Iq1@atg%@aVO&nlb_AA!^!hq3#g%2kHaV^QKM8314`%-ld^FFL zu$&BN#6Acm$Ua6Wggk6vK37vzw9wMgR91VV^q)py(pq_Vy#H6()y>CCM@50v=$$bu z-X6L&|Js8lD;sYQc`dCs|M34h|HJ=HSBw9}JIC`+*71Ewl6aKLfNOJ)<_9P(-+%M{ zzam0gJ8v6w2@;+7Z9Keu&>g{z##;V99{+GK8b9_zHwukY{$Ypz!lnPP^?%{^|M2MP zD57~L(U{f7(aIK$kI|Uh`v1sx_&>1gd*6T8{YU>aCAM?Z*F~TIjR8i$D}WQg8{h$W z53m9R0$2fB0HOa6e&PS{D+1in+<^cubUg=vBftk;M;@K?Pl6p9s{z~qHUM4#HyR59 z_|Zx5Uq7SkqUmbkfA$~uf9FN~1^}cX4-XGr|DE?50su6K0RV&@|DE?-8~`9E0suxk zJgmH|{v!?xeZ;g!kFfI!0D$Bj06_TzUB|@RUP$mi4gEn`Y*N6$N$po|DhK-S}#m& zY%FZNe|lkH`u|g$92@6}04{~RHlCFyT80~S6z=P@5ECEX!TvYcG4ZiS#7aLfj_)W^DFuv0Fi^TxS+)Q{F(M0H|$3$`v;=l`S0QHTyoL^)Ze8||8hDG`-b9 zxrjG5NfN9V*j`UfqO>y&CFmGt;BL$@`9}``YDX_M-lEOelf8!cVCN2@?-(JzBMPM( z8zF%uUohIXAd~^hORrmIPgV)UxOAFzfN2bC z#U@QvtF@x3P9;#rMmck7PJu!#zb#)Qegxt17NEDYGp#S<1@m(|#Eix!7b` zblBUp_0;g|xL|t1!w-NA<5{2RpyekT*@XVj-+R%iDE93?3zcC*?8Ej4W}NV@L^Fb} zd_Hx5TVZ$jGb^C;0N}$*DJaDl*4*76k>=ABIDCnH`}!?~NC8pqmm-DHcYdEfY29V? zt=1+4cVEQa6C+l(huSDt4$d_km+YcOH7bEoq1!HGS$=XPp#(?6Bq2fp7dkN`>)h`6 zf5=xQ@U3#Va?=81rxzJw02?fyZRl`QW{N@9WN`)5Fq#vh@*}MPFX*|M)SD%*H@lO? zjyasx6jYK_Z7_+k9k@jhYNfCZay$Uj?b8jYS4GH|KLGmWD~D5>N>c}yBdxGbVpy?* zF@82;*+cjq0JhTfX1ZH4^;HvqoN>nft^)p76$w?#yA8eiz^bFv#g0M*CwuL1{dPT8 zuuPkgo=Z+fvmmQ)ngTFYneN6xyFEEO;Y8X6+HYlZ{ISHi!0f{r^t|K}YeQ^^D_)v( zX|RARQMikRT+J#-s8Z8GYTt`3A}E{c00dQjAPK9BT_WQCIka!1eF$m(2B7 zvE3bmi03ww&L)w)Y&Ltul7?wWaoALAt&>dFq?j0O%Igr0HRl=FFYCK(uvNC`&v3qvtP_aI@Tkh?JylN((D#c$+k2Mut$P}|pN4!I3VhqznBx!AMnmjwe$e)8TVhY=y(3Ky3t0y`uC&BqdP`BxBbk%W%?Mo197qliRlA zmjbb64pJX+AQq1z2sq#FMM-ckE!I6#13HkaW&F-w&Um+w>vbw|=n?R(X3;5VeE9Q$ zw|Hp zC40k3D27=rvCo=xgJ=LhSzkURUi6}<0XAAiSWDo^t-ZyST*?oa7;Ze&qRvY^3a{_w zz;c(#^F-KpcPn!$nVywJy5?yP$p}o?T4v|DEvq@OQt>8RjY^wWqM6^QGHDGLi1V(T@ygLkOx=+s zLSa|6BDyN@^MfIR@h-WlfDVIshI6{ri*FKFDVwtSK}-- zT#|AL;vVc?3EkB5sFnuQk)92SHKxbnQ-$-i@F=XGUTB8DD{4s4@Pb!aJ+=#XPYG41 z+tXk55R^M~St{Ht`WZrUAR{_o5-vGI=}bdiR3F*!0B~3~8E}1Rc7m1BK|`kky-~XV z#a;u})pnLPX(-8cWea@Vd9o5*tRr;ja_N7CjaK6g>gNNXjXLZBK*XTj#LNiOktC2( zsaCa$`iZfA_5FnJB`2*DhEX5tsr*PCY8r$CvU&huAvszRH+ObZ>}?UQwcUwi3UfTE zEoYQ#U&(kT2VB&sYEpvR?R>SexwTbh4p9W?>r@51!*_R=qJ2OlrcS@f-t#_XX8d^) zJds-yV1@luL!?oOGA;7Si*}+8g}l@H*lpqm02Tgt((CzEpU>Zp3mqQ-LYg7*lXp^n zYGD(OcW*wvb!yIu3<96EO;{+7H@hP^>W`)2{#%0&004>sX}uWCw@qvB_-zX=n(rl=9fpcs`ypB1wt%+>)3+U=blI(ewxJ>mOo%|*a{nCE+c;O zcAM&RSxIX5B(D@Q)SqhGHTVnss2>7blzqf4u-3f($9zBw1^I2LQ24{k8jUvk&5%eFdo*HU(n(aEI zoWx82W5^8Lq5n5b`)oP=3W)AwrQqkVLnLOBEF;nc{-%znMe{Lj-Oj8lP#1i=@)o}p zWtD=znWnl!xRPy`mAscb%q#+r8FZ&u3m6%pFPmo4X$X`v<-0rPE zI=fm%@Rd9O`cy*rg4q$&>yHZ23j^}M3!RHCl8HJ&A}=*#iv3jRKZQGeW}6Gtd~Ee4 z<4Zb5cErumod&4yrIaHWA1-!6b|--pfs=g_J_%WrB}Ev26jQTP?=qnxs6M)W#4+Wz zG?`ZR&6Qpvh9ufw{9cH4re<64gVC{0i<)qUP5pg$Z3W+#B)hD|0qwmZnRn}}nN64H zviTRni*pH#{1Vt6@>I7McN|DJ*nT{c&jE(I?W^gcPY2n16*zChhhiyNj?DF$PgUNMY1DYqulONe5Kj1Zil)f$k`H#%!)sz6}{sZ8&t!}@1-=G z8)6s3DG}#fr+Dk4cu{c`i(uLQ9mXu3)nE9CTefMxey0yOsUzn=k0)f_~KC4^x7t;TK<*G0xyzTO=U>w&K0r3@(ia73BW?Z z3J8&{I}j66WZ~4u%>0+Ddb}a@bsB-%<6T?DB<()kRl|ep{(R8D1E7Vh*W@>j%Mgoh zLzCEx9y=S%j1Eiwx7e>n1r`=u&S^}3xCcwrn+F8xGPrTu2Usu>C?+J^%m?X@dG!!f z=k;9AqrS^p{zh;^!b3Qn4Im^Cr+1VC(97|UIZSG$zH@GC4N=^aA;ZT3(82z}OU*8e z9uu&z$ zgTV2&f)#F?txY#=*xANOc+IXKc&6c6N;QAy_1-D`qH^0vHoEq>=C`uW z)r#;}fg`!QYOFV36t?$B=9a#W{ox|1gG!33cRX8%H>orXd|}g^HuAjiacI51hT~~EF)g$-d#|jux6xYd zQ9YFkC)gonNe#s~FyoGBYF*p`|FCX8k8ErTt&_iDz`~BcTRtj4AYYJhy=t2OyL_Ut zqwJyI{U>_rII%57C?aEwmTKHQ0IYcuQLgBf3rtj$u3YKB^-%WWANVfOJ0o(yLK>_n znomA|QtXtU9hj!J+rgUPz?w9IB|dii{86jMmNdb?J{+c9?0+T9&jb)P*2rw z>TJxHic`B*8t?nG5tr<)VZI(Hb$KW-QpOKV9-KYmAs*TaOR%z6AVO~D+7Dx4x5@vw ztgBF&t+StJq2Sh`9sI${miSGay`4~As@^$fAjPVdJ%~VugBUA+ns)ozZG%sObz{Tw z&d}d`lZaqjzgb6hw@Z^?`pIOQ;Hgw$J_Msz8js2WMXvGyD69gVq1UuqP7+}c zKklkmh7EFDix1YBY^>=rtWw&Z=+d%=7a3uC`K1MNNYt4qBT}WCe;#-_3pF>IN>S4J zVYN9rULjY3u1`El@(SzsV?Ke!4~^eu^OkmAFj5pT5GxRu8>UrYCkb3$)<2E83T+3s zhmWyOUKFN4W?DXDqz%v>Bn50An3qjE0I%H@5nbL43<0~Ur%&H8KAP(2)w{6>TWDkS zlIs&6a{Gm}MmTKNoTRFCj;rc8w6*I;ExscxreF-G;if%d`mO|$3gNpib@xf+mv^x9 z_6#_uN;@H9g%(91+c7*eE!+aOx*2rut7SM4xpH&Cd_hiEv9*l6U$ghI6c#W1 z{4zBv-zv0XQi)?bh)a`!Thn1!+ca@8sh(Rzh@#E_j&_$<(6lFXI^8L+H>w!J5PO}# z(CVw$h~YUi%@2qbxTfL)QlNTSqY_+FO-GjXl=FZACzI1-MgO&J=zx?%gheKtE$cQC z`Xi1bQ1OPvAum}_Y$Ng=t_Q!s4+n9oOVEhsve&?-VU!!vYK!Tu$wWBzj_K$zi&w+P z4Lf-TRVl`3rla!+6or|$SwoCgll?o(j4efm%g{7A=qe!e!*xxd&*E|RcKR<=1ZbnM z0RM+cZ|V^NR$yCNmnf{ObrA9AonqT4Z6(VKl$&ZIuB zBdd&JaqLtA7vhwqX|Ol*(Yda^UH7U19O4ru5dlhFiMoKHWNimy9Luv;hjS-1c<(c3 zv&iv3Ti!TuNPB1Q`;lsji-a&87}prd*MwWO81QNgP}ym1Pm<(C3GxbVdk*F$gqeUj zdpaYu;s>tU-J4HTvd8snw=)1qQsIN+gbjeii80TfP-e8bC&4p+PV|L1Gt1aEjcvxk zC+-pJC)MMjk~ll|Wo-hvWBf6uJ_!WTRVzXrrn|iuBO0>}dLjF$Kj!#l60N#Jr<|=8 zd4}2JJmMV>j|#f_dazW$wF+!8*Y)c`0=Cy6OhxeH*qZ9VguqRPBKe*G@H^eUY$6La znB(v_)Hu4zGo;DF)ICSS7Vqg>o7Hizw*V}6R+7@4HmUa5Vdq(qvVa~jSvv3wq!e7U z?>D*HCVH=AMU#O`kgcBXl?PL#TgPEtHJx_?imErjrT;c-FwxT|m)fy5rK+=?sLRqG zS{WyUY-GuJC_-x^k$ZlP3POn+v;!+I zCuWmxKwXn8sv#W1-JOn%&gO1)RK_p8;*3t!fXPO44W_;{VpCP^uKe*CeM!!ecO%R_ z+=nH)Q&7J6Rl@|MEvI7FIH>?D8PZF9OAQpC@kUuDL}6g~r_1+=h?XtC#Q96&(xigz zR12N;Kb_bwC8w|PhYX3m#INgfK@O1R2f#23GZ=`dP)2xHI(-Q0zWUD4V|~uRAnEIM zF4d?Z03aTuB*?PveUXe6?~smd4*%YDyFI0||feb$*X z*u|amzum3wm6rY%zRf2)#;FK4T!c8f@I4M7QkeJsNZKz^>$~3)s^-jny6GF>wl2R@ zt%|GAF+s&34@{m1{aQg0sNbof7s0Q=!tW8jIFJ}5-ezutd!@^Vo-UE7N)_(b#l=5; zZu<$M6zorcI#`W>jvH?j0cf*6v5;KWxey0Qc%ROEr!d>C@6Y&QcW8X{hlYI@bW455 zx}hU!jozpvnZd-7c#??aFAU+jSs48QXyO}S0O2CuF7j|G%+}J?ZiLAO?8>L2!g1ZSkqI_H&_=tHn^NydOodWlu{%pObS*@uzKGJe-x{hf;Q`U`NuggnW8F1E7esK>X^jfP9+{LT+?H{hkMW-+a) z%C+h5L(YS5CRd+88Hhn^mLPi66FA2~i3vtx0ag^hXNI_F$8L~X?W#^|*5P`^r7ZqE z!_hYX1kD#cCvd#)5oq)th$!0Pi!p!tb4A(ZxqJ79SbV&%EmXzhC?eH@bmr~_k_65X zEhWEc5tS)`l!PoyDEe0iyk;;HL`l452;?MsVvk-%p?yEtHmr;HS+RWMs9lsA?$U(c zN!WBC)-1UCz95*U+{Hi$b?`Kt7!!!0E{a+I_U&Vg^j{bjqwVW2w;lkG+RM}X7j_3< zj|(KFm}Y&bXt{~w!oLPRH{bkwRM;F(V3!?6gh#+OdPYCh;@2S8-)SO?c!7tuR^#cF+IkhYf72<} zAfjwT*BwF2XIEV1>G)$jJ>TRMfi=VG)a(~6p=KxdB_Yed8yp7spuacPTVVcMeg)Ty zaX-y}(eCi420DFVP5%(w7vY%!WX;>5N7+9B>>dEWX09yn3qa2K4E-n!MQ`uTKkmNI z*}CPq;c7lVlmu@YmFqnpvz7GQI-D;TIt^Gu=GXDyLN99v^nnFdHv*saN~RvCW$~Gq zyuCNn?@8V-;28N?Z=i2EUpkjrWr~nNTSFCn7EI2WG%u4ATckmQX=bsP>SrgTXDuI} z`rM2ZP)q2U;sLUOR;>0n9j0WR4*>U>q@Re3D@DkCVhCfv_?#a6WlGmEME`^i}*%{_;32Rt1mxUH;J;$)WaM@(Yf(%R_ObD|`twx->9OZKHkvjcg zn_K%9x@xMi8|m~e)|_@zoqYFtcRb}{B->n-XpDkWFrjV!-B^n?<(a7(rDRmi@x@Qb=Ec zrS3CP+#!n=#KMJukP#4`{TvUNS;@QFg|9CmXn!(gs6|ZK-tRUg*9EVZ8qg zah`!nUg^R8zQDb8;nF8a8QR6C4RD7jZ2rM?qZO%!&~yAgr$QRj70GyGa%eu{(#w{+)kJz%wnP}s>#;! zt`*I%?}7r@Wu23uy4v?uvJNiqnk{ewM(oxL?W!lZ){IT7Ry?t%zNATCxaB_8 z`ihUMgePJhca#qR>?QP59PI*WVo9P=fWOb8kBMmE)LTFJv(FDv^mU!No&1QHnPA=6 zZ2gX!>K5BhCJCq?Fb2gUiGpyeN?kk&YTY;9m-HflShUzFikO16eSieA3&j?2+PJ5% zM$6}=zbxpBB;12zLVaftI)#+JWQcNK`=_G0d(>iYF=MmtzJp~!e*Qh z(=5@%@hM-#@ff=HnG$`!G`!<0x)YfU(k4}dVyu2moer2LV`aQ2(G@u4^k(#%0S+d{ zE~wh6PRj)s$c>()2@2wPL(TRY-I4XIZ1K@g`Vt3hG1@{|WRSP#Tx&=P&tRY{^X)wN z+MnV%A=L^UPV59mf%Fi+0s=IwNS&tXL0+Pfoy;0a_I1pr*(!+(l4VNZ$(EwV)Xgv8 z%lVmj#}ccTxGQ#S#CQf)S7=pf-n}^4I%2O^1gL=JgWF|#YF@h&=GL@^>lp}5TnHtc zVdeimEzqpkR`@|OVL|Kkql9~o*Pk+fYC0%XQoj(P;ZuKe3b~$}T@7&P``+GWbh2~a ztD}a!6LeWAyU7HLLBd+^Wrr(1lRf-Ij>3sMhnrp^v|EU5a2f)ccNc(KR8qr&ul-W0 zh0;9u$&8^t!IZ&on%mntVOv9wfRk;VY*UOCZYN`HFZ8c5+c_GBx#AauD`HM)X!up= zzH9P25~8>B--Q^OKA+&?WW5hKGVi++34m@b^}&%8i&&AqZ#lv6z{Y+VcF zW1jKLabS2`XlDGV&@o8ewsCn)q-+I6vd)VHIds6yzaOw;qqpiXh4bmxc{~Tjv`rbC<<$q%Dkh$7N3^_dkkAc%eQkU-C8pA9bDxk*bu#HK zOff3=BHV79L7p|Zf>Z%AiS>M}-oBNrB7lC~pOjn?I0)$!=SX95JE;h}_V+#o+*TU} zD5{q9<{R=%X^{2Zx@N|Db6>UonO(utA!qDPZdI}m5j1oObe7+iT7337MoqsVPA}rv zp8XS$+BWuA+P4)sHvn56{1nGed#vIBs&2eIu{-548kfMxJGJ9?1CK(x5WJZ)whMTB z=7d$)w{!7b2HGLq#h#I-8su8Xv^vNJusJ8)Tz4lZbv(lu$H0mmU&I}ZlaI9 z`Ha-a)V!zmo0bWIU84TdzGY3zxg3@w9V?7fnP4cm0Mh+O!``En)tvRJf#(fSrSXdoA73)QwZ z>ct*;qy$J)B+n~o(Pv5}I1H#CbY}v&NDpUX$gJN&Q>viulAeJD;kRP%{-G+B^h~8C4FgEP-bDXin)E3EuF(X$V zy6Cm9iF4C}z_+;XqGkH;>mhcolBNi6*uda#eBp}Q>JjTWkD7i|8&>|I36pHG=JE)N0C-0b()Szp zBpA2}A}~5yYSnh{T9T3{7HdnN^ah)b*D?aqJW{GdNhPETn?W`f%iFK-vmg!@BuL>* z`Mp2T_vB`y-B>SJ1u24%%R4&_V^TTiXGtzkmlx;0Hhi z;>#A8CDe3XGxHP?Bh=01?Z@q-2$1UE5wh@i97`qobG|sDhG`L9_Io&N;@i#;y!$jw=gS!= zp(NeuSuKjusI}}vGJgB_CMA3Ou2gNdgf8VYE^6yRF(v7@KR`3@@1nik_$Ri9#9yLj zo9)I7GETNCw`O3U({;zSfEJ$gdCLlDv5U^vX)e@VUr5hq1&- zsg*glVWOsS!=Vz$)^j&rtZG5y&0Mdr#5g_7V^Y}HlRkv!7T)|fA78kSoe0r&rlfx+ ze`8gsHjO~C%W!wO;6>s~otUQ) zT_%TV25n{6=vx(gZ<1+HWDu!SHC4Pg_UyW+wI^%zmPmeaKsoa+9tT5SVr+zt?Og4w z;G@gmm@AWTcZ7<&?=z0~=hb|&YmBIJ$V(a8U`aSf;9(ee7Vn|(fo$U(*3QUA2L%FnA#XaeNOG9&n7ocmqI`)gevo7HqmM3 zv<5QqZryo}3I87Zg%atvyR^~JYq~B-F8j8Cx?@=giGlNN`k&C08zYuphpz|zRpa}+ z5)}LRV`Kjn%{kxG#OQYf0ShKh7;{eb`;^-($j3w|!XlR8DHHYk&)o!8L7UPq4)+*b zRZLe{R=`tFVxgi!{pQl+D{61@sPULUPAG;yv%&bVU#*BQHOuv%F3l_k=_6~2iR{y% zCJnY7akIo#=e0Uy0`nAuF`gaMviuMy^p=gul^MdFWl%Gw?YfXs7{s#rtYULTT6E>T z)3YjZb(5f=!Q@fsES)+q!v|lH?*kIZP-rc9SeJ<#o^MnZbew+lGx%>~$36Vdxlk z8FBkd7K^rh&1~+B&6V3wRB)x89KKlWSFg)oEu&J51Pm`kS+W`Fo&sgXWUD~UiTi`e zp%#tL>fvWw4*h?+C2F0Ai5&TigES1DKkL+s;}Z^r^U|hXD76Es>oi3!E0B~feHKkI zCuj>E8xG$;9>Cf0zglOp{pm|XSX7kym9o#5#y!n!lHm^i{lpxP5I!szkZqr#KbNlS zwB|MZ;%N$>N0V3Nh1ZOE()*7y6OzmPbVrnS7L^a_)8=d4GMvos23RrDfN zsT5z`#VhE?b8$VvQC%@57WINU2?k3mWQ71>JY!w6Z{qqyNTkCqiZS||hNReB@fcD+ z*QtoyqQeKUBkq%sWaupx#2X|Lyz2eg-+HD4L6enuF^ZcHc8Au)lov|e*c9X_+f=*; z*&0u5{x8Y`4u3P|k86Kki1;67g6)rRF5f#Wa$d^PhcJYU%Mc=V;hG~fG=^fI&MHV$ zROe38uV;wQqC1k=xQka#KQn%34I5keOEbcU-l}o=fsR1>O^!I*?8!JOuiE4>;4<=6 zZ~Icz<3!8dT0I$DZnw4|_r7SGs-g4{LJmQ<7)qf@j8tpAbL)ZuO$y)iwB6NKB7(HW zC0`+?2N+6!h{0|e3Wj@XP=tnCE09Rp#;j-*Nr2C1*vt^aSg?kxdlf_KM^~)LrjebJ z5k?*cW@CL`e9va&a^2hKNv_=2B$sMxKXMZ?^3&9}G{Kq9_rGn@C2wbzs)8hB4C_mN zpgxF>aeo^d->-E?cn=z)s25!?Yego#dNU?chV$}mAqFP9;Q zl(8q|NfnC_s}Dk_l(OiMDz=!DbSwDT=EsVD2*vug(N8N5&ClRBnGMg^2#5n6F>r*^ ztXaLRy)D}RSk7A!aUZguD~vpPBY;I{QUO-mmaPTl&K%3%DZ2ayfcc$1oEG|D>8XF( zF?|17MYqp$jbAmz*Ae=Pu?O_EtsfH4Oou4i;Gr2|rb!4igzpyfcGZ?Kq#4=%k@bC= zA-ld(5NTNSI(T&a_2iX$E|G~u&gsq1G97G_i!Mi6X$cOQf(38&g;Y1TNXYo1-_2@@ z2R}`vRNlS>Zii9HVri@cIEvO7RI`e&%^Hb>nkS;m&pr{ zjdIndzRX$eIL8&-VpBd-{Yqg-K!7{-1j_?;iZLR!z1~Gvdm6HWopoyOquJdKAfMnO z?;Q?-EKLY|Emm=p@j-!sVkgAC)Z%os(jP}I>ymoS(YB{omeE(`_hYbL@Gbj!Jz1 z$fkjQw>$s{jS2T3W|7y3*i9yDmKHN5h86szlRtMUC3Olrlt_-K_xyh^lQ;GnJU7p1 zOp3^%rCle+et@;7SblG zZL;zf90w?zPB=@9TX|Opzv+5a5&_=nYP)2~dl00J!=2Ul$b65Yo?Ra}IXKtNsQLYj z=K-+E3@6SM5N^Clvtmrn=!K5r|GpamzK_|J7&}Um7SdZhbM5A1+E&5-$v!~GPWo;~ z#X3a)Uo_yague%SA=qhTu@F4&78Bo`Io)RE)^-CwN&5t`Y=+dflwPd74exmt8Ms62 z$rISC#nv&l^8xy%A!fnH4t_>9?s#}?m$YYv-4T0n4&zhre%&!Bz_X%T2*`aUGlgC$ z47y!?ivmgoN1SeZus)KI4y=!tbvgS)z0LqDbv~?pe(_9Fk29SR--ZQWKRV!&B`f#9 z{OLl@`t+e1>v_56-`J9w>^b~jiG3q*UU6EydfM&X`V~s!MPb<(gTxwDyS6k=1F8I8 z)fF{mX+=v1nH{o_4~OG>f0@>jM?tYUs~`3gIUupGRoboIU1#*}^Zb4Wu?KBrn3Jo! z6R!p2I9`qUgXl+iy#UJE36yzgOP>kBabywE_~th0>sxg?6GtXBHuUnQ{xAzga1ZMf z){KJsr8`CkWdomDF|Y~>5Kcz&i|dX)QGIW@pKc+Y1#vRg{S#z#=9d$8D|z<{sWd0U z95M~&o!|9Q`}UWsElJhJ7E`5lW)l4MWcX}EVtWGQz+=xOw80YnH%R5&HbSLUKVnjH zj9lf5P(&0C0U&YE3*55v8>agJpi3qQ87%;|rBB7388o$h4^ymSsDG!JCP#-&%^21< zW1PMT`6UdBxgzg#<*urLW~JGU;!txiU|e> z`Lkb+!JpMo7!P^HbK7}TPWGN?L~yl0b_=n7CY`GyN(RBAJzXw|UV%dH4iqZfLQGxr zq4ZCwW>&{HG8p1u9`ro}TfY5>jJ5g}y|mSra=pn4DIL^C9SabppGWpVSI^aKZXkCm0mTD@ERXAjYY8Hp z%$ZgHnqc-YGm!7nftx1w-i~ysy6jyTFPc_f3qnoC3QUR*pbvmJ5piJ_(s!%QhSy$f zTE@`9iwW)|zv^mMd-m*5fz_o;4fK48cD;q`1qf}(MsVDh$hw(PzFxR&Ef7BzW*D>S zXq!7(QzSTjicykwuzG?&6v^r(R+x38(ztk8D@sY48z|kGp4OoHhhSJxA$6K-{Y2Kq z0vg8`F>n>q00)1QVr?RdoHKY|-0j-!8XQ<126hd*a%c$&9Fj1Qq*MH5d01)TUwg89 z@9dIL!Rh$r4*kl)`G{^%vN`k*O<}`Ke>~sun!>?$m<}aNma84#ssX-VX+kz(`{vtI zaXfT%;l}56PWa3S3zX~155WmXE&0yVoFfV*$K0<#q9e-9U%1Q8v%UA$xJC*3mgz-t zHj|5k+Or-YQ!}M88__`PL>Td5b*`I332UX)(R?maby%9-&sixQKgr z)71y~(x<<-gZNrdwhTWoJ~1^NV#DVSd;e@Iv_WgvFqx$Vxl60G=BiduNghGRNy zVeyb8>u0~%>pNPhI7^K>qvYH@jJJYP$)-y@Zmcw(qB|*WEQ&{G7JFxN3}PcX+Y9l| z@B|5~j~QEYPCia!&39$L9su&F6ZpGSL#v zwdd-u?Xo;bebkHTHs<7+0q^%5@~Hty1Qz;SMw9{&FUk zaEUr!7I$?E?zf#BcatHv%v%?9a0A?hZ^GHOsX#`|(=e$@c*)_%tu<)*)*=@2D0jp) z(4Qo6@XW{6!irKEekxDciHRvmN?SGUrH0vax^!g>JLx%`k-Yh7xhML`b|37WJPY={ zhsdxULWpDlr?8PQ5zynADuy72C3)6!SE-+DYH!2z!>ebzX!< z%>h^ioNc4;>>|D57{Q9FPU`x=)>n9 z=nq?*H?2#oGGRg=dW_rb_(Oy@v<>pGaZT`BpN&@T#IvWz= z6q8TWKtD^VMfR*G;l#Ef#)y)wkx#+;_xZ5+ja3Rn-1**uY}4O+!^tFCSJV1j?@H4t z#+4$YG^Mu0g|#+*k6uiOT)^DbAPg@WMHvvI22<{E*t}Qi_h5VZxdA%kZ&jKYAMKm0}@-CnQ5^HtJnD#OCwkv!$k6F=!$2~ z{Wxqo7fyB+jrdrN{zM=|O6^-8NmQvhJ=~9ZmecoBnd_3@6vH4DDdx8%DpMg_M*A)c z=){%TeHMx~z$rv(YLiPH9jB#Keqj7Y^rdJqG}wjs6;+zOq2n%2_Lom_YK#(e?=x!! zkEgp~zWgn0CoZ+F7hov#mMNe^)XS zFdq|Pc}j^*?bm;}+4|K!sH0fHT1_GHd2dq_#-;lZAFkPv1>uuYB%0;U#%`ZEyJkoC05Gm$FlDtP zBvRHSu}Ej3M9NyhysvWLG$$roXR%OFKUw%XMG^~OW^;=7ClrceP-BZD_m*yaq~1K3 z_IQHMK1ty#1xVkAFA_u!#_cvf8zf7d8>kJ4R`20b|LDU$=xmK?q8=0kYe)1v z7d~%YUWw^M3>NPrK;X@5Q&~;yycc8DIz4m7=gpa!%j^qBzCnc|x_l){qB(v?2R8*z z(BjEpzx&;x#z!jm#UQ&ob(5N8Vk4vb3gbCmtDtwaZ1zc0uT?HEG0x4{^d)U0nYz^K zlR?%GfECJ3nq#%?FEUR8;LH>K%$#ciozc%ezdIDz&~V|@@r%VEO;9GVXT+hpHar$U z&(c2;sdvC&p)T0v(x;vI=43j}UFR!Y@mZqEeY3_oTIyHJ3lo?E1p*e+ptyZlYcu#c zf~H*c7M$*~r9S4fQl9$7{e;KOHdKH$%1b{40hG=7P{6zw!2JLqi`zU6e48PfF9DsN zaj~eKTXVJ9b2dzR@>$+gq|H?rP#)J^{zYiKD3IfD*`Mg!b1zXjc@-nT&Kf_BUQX(@l^Vq!NpE649@VWOhUe?r=t0X(>Rkn71Bg;xT_&p&nM_G$?&YedA$+^y z7PHF<=bzW%dB(xe`5lXP=FQy7n>4YNryFsoHQ5bt;din~lw`0pLM`x)E5L6+2T>K| z_Y=yS^CWaN;G-q`ZfDl>0-e{^{E}&J_lWvLkf|X*Wg9C_4DX%=tG6N??lc2j|7yHF z9W^(4*{y40W*pJnP1$cOfhjh=E%_PNYwIw-`BuDt&u1%FtG4!_gM)p4ecex`iSXLw zq~oHtT8g-_2>M5nlS!V51Y`w*DV{+!g55Xl5#BYW{<=lvH@eS*Cy0Z+RDsj23V5b1 zO#-_|h_m)D#a_`6v=?h5Tb44@kl9AeR8U!7u zLjYBIm6;{`W8mcWW6oH`;iHl3&(+lwZ027ZMpqIX5`0ElJfJ;$x!+8QR5GYt4~J4H zDtkAc6BFFTUT^5DYcd?X%*!!vDO;2GbO5=NtaXmdB?3R=CVLDACEUMt0fT>Vi;O9k zyiZkOjNwi_r@IJODL>O5@^2FSDC(DJAYMnW(IM`6-Z07t4@)fw5IN{HJ4wo+a;@Et zWn=Uj(rdVhCfG>=z4581Zqhtm3g<}T1tHXe?~GT+=bTU~10 zWsD}eDjk#7CO*OVKO`x?B-IF({pM0%{rhV`DoK3cdjkE`(3^JgzRjjlagp2D?LrLA zs)K79V^BKVmM!};mS=*h_>al+?zanv!K`rqfd(P9kCn^Gq7p=_fmk>>%Dp&E;azg0 zrk2bub1P`yfCxHRU)r7tIN12d>AkTmh2ZKCW0GEOyzgWfKk2*e!B9&FvIP4c+3lGK zhW?keO#@GdNR|!15IayjVH51!;bhxX?LC^4S{iP9x!6-2P|lSt*EY{>m*nD~WLC}* z3-LeC3$E>lNY$zIgrps1K`n;PBB<9nf7Hfl1XQs;dGkJzf%cP^Q1_rbX~(c0h0BrG zH{Co53&s70c?q+bX%a4_g=bScY%43u{o>ZZUEi@^jBS(pUmB7~HCg@O?uh3!k?!U9 zc??5EL=%>lId;mD6cV(+#^)SZ-o(*6%z>jKm#%xU{&gcJC{N4GEbz}hr9rtbC>kh~~GHJs0D)qRT zeObP%+hRhN5&^&=#Nc?b7C3Y6fPN#N!}4;L1~pHdfG_T5Y0D@+O}gd}?j(5k9+DN+ z9|Jcc#cGQNZ=+Tl`xsf(uxCSUx!QP*1W=UY2}x@ESTZt(ikTUA1+6yPFY{`OZYt1o zNf5m85XAx$>z`22adkIqUORuzL(*_Vmx8H^l}Cw=Hs4P!LCW%MBj{H&=J_06bvOig z6mv4$>SMpg#VlXfmIs8ER2n0{-q)j8*MjZt!BW6YhQ-tqAB~}sjBDi2nit#H;Q?AD ze@H^&k7T$OTBI?5_ZtV-Gp&~rpGj_rD>GmY?%006YXr-WZ@zu$ukHUCydBqgZJjKz zfRugK@r`{c%2J*xVdooLO@n0^9c4KYW1DWoY7XK7a1}Gp?&!=+jfdKadfe&z^|X7O9Y8MN<8X(lQdSndqXK0g!rYS;7$ZL3+H%&-)2Z8tRS<-Wf) z<&jik3oIKhadLS%RP?ZNy#FrA{}Lx8Ka`qdxX@Dzfi&_oeP!W_0cXW*c#e8r#& zOxEVcKzjB}HzJ`F3~0wnsK?unT$hF@LK^fCVnVQ@=Ph6N1ljNF`ToT&olLotw?I?r zsl@Z)j?|}u6hB_mu=BqbvU)4j9>R?y?g)Pb$xd&&U4OlfA0G%=TC*T z;#$MFf9_z#CLN-a16HJ0a>3ip*wh1h9C;E?-$V8mL}kg7 zo4ZHH=;W}`&SvWu2YBY4KOz?Ef(-7ni?UT%z3_=_6z(Q=^xvUT96&|(w2s2ICdxAP3K_1JZJ`Xyk}64f|G z>ZU5W@9U*xk$=SS%Yj?k9R=xRy!elRZ>PS8YX_y^6sJ2khle7wK+ZlXAe#ROyuhmm z{g9@c0dlVjwzIg)Qm;Mn?BKc)8G4n3EvuJO4f#pb)nvyzg>rX=^oEOn!4!ubC1X`b zW6C1tewvf#-VZ0A#i)MOhqXM2OpiVwy2XDRQG~c{ThY6CoE#c*?Tz|XcmUtrsua9a zduM#0-{$tp&7acG%TtF%T_l9Pj>SF}u#LA=(*sq|m2iv?2UfYeWl3h@hoj|b98Y85 z7742zAk>0nbl1EBVHk}C45cVo1K<2!SkQ74*gFc(rSO&WO%q7ymuHy(Ypz_?->$=F z)@TN|*q5gEi~Xr~goTHApHHSoPB58I?@!AmJtVCJ&w``3DFce`SrOek`!?q0_)cC~ z-lTU22{wV!@5TcjO+`q9`d&g{V7vQxN>f$*H6CLBZdQ-t9(wXFFlLyQ&O&CHFCJ`Z z#h!K10y}p|US}4phlB37S$$a_fgk>AyUO$e>s`6n29AW@&@QEL(;AI+i!^C@;VTr` z-OnEGHSCKyHR@xN{^gcLk(Cs0I%ywY5v}p8VH>uW@qX-X%D{~@$q__tWl6+B>vrRE zTT#RPss7cFTqB#OD^_9AOU6P8aFN$;zP;14_qX{|RW^POA5dMS-*&?RELb4^J4nK6 zA^`*4^zf@6sFtjbLU66SQrYrKTXUz{%f;(Pc$;&&VQzNFD$F+qeCTlBE@a0WighHwly$e7fTYTekRiWaB^FpfHT+AhtXZb|_52|~ zx?WpF;};?vz+10!h$fCH1PHW3hwgAk=M{o|fR}s&d<|du={EXP>7R81+r0_}C%=$t z{*jI%-D^EHL}B{>`Cu8U`U}Ylg$o?|Bc38_>&tA=LCbRvQMjWg@a*MNrvW{VSYEkj zL?0B5-$JyoKB%x>E;4j9988VYBn4<*#mlmUrPVEg|}BD?KA)J%o3$cyx4T3$P?>NC@(D zr$hH!<;K2Tyox*P0=V!Dbuq7-DSVL<+#FUIkmf7s?TAu7wp)`nN8ThhtziwTE0Td> z4#7S`YeT!>a;W9=du^?C@7L8|2ayWIv$ydTmG$sOEaE5^#hSWZ`g3Ev?m=W#y)SF{&Y2AE$1?4t47&Q7cOVq=~#+`R1WE_h8R(FRo@La z4c12GpIjr%!WL?!A7N3mo1$kS<*A(4AJcz)}r#lfY?`~~Ddvoay&hwkGPucmvHonDfT%~|W42^%jZBshqX!hz^4A-48z7w`m(Yt*I&FXX z_cqkaM|#(b;(FjNUQID1Rs(JI8yyg`e?NFpM3i1H@7yrzK+rTV^sRjKo+zusV}5d_ z7Hy#mLmWlP51>2B9B&qOa|>8+FY4Fxs`|s`^mB;Pma8u2NZ`9s2LZWYA$&KUSS2ft ziHFz^P(ecmWJOj)2kIMxNUeq(?q~V-&xLw;RWxMO?p8njUTP{}dV3_j)uYjlIv+@d zYVClB4cAD#1Q!OEYv5$@RMIawbVAr3If3?4)#k>wd0W(6TfUA75dmk8zveCe9)dU) zL+TOO)Wq5#pNu+}A@IfA{I z=^KL~N5q?!oEO!tQc7*EbGyl)_*7)k*9gUkUO=N=?@Tn;V|0x#M%(?k8r`vLb0ptt za@cBSelvJ~O9sXI+Vle>uX4+$(sa9|-OwJsM?<=WTn2^p-^&p-OO}4+x7N1b$FUtA z;Dc`kkLRo0>1@w#KtXubt_uyok=3AP;M;@Z3`g4y+{5OLvHja@N&jMF;e8KRLkgD# zFs5Bj4YWQ8qo)KRIT8eSuo?Wc+-Hg}_KLU%N6_Grbsle#fbS6Wi!OYS#95gcZl>MG z)KS!X>Ui$~I`{>$B3_LnB;$TVb0_jVR>d-)BR5L6Doxr0p=?p&AVPH!sw0!cCh?3c z>GN+kT$Gv$ZWYpKerkhnScaB>%khND1)e z-IoARGSpgFuo54cVA)hinjm$#3dD^DN`KvqBJC50 zn%gXN$s=131EzB@+?=xV^dP;RDwvJ z^j8_%>^PkZbT8O`&-^k`e*VDyX2o!*X#W%e_aKcEch$O8OU=w0c?W(rT#UoHu0u+O&G5b+CLOSQ>u-Gc0$)}taj{qRuVPIrm29CO zyEot{XJidHHbPoF<&mptpmKH61=e38?{*;>gp?zFHsW%Hh#J+2)EHSOdHO5Q$>G?q2Tl-Pa3vzk%w^ z9HTB};&Hskkh#se4=IL_qOHtYJN0dbnv4(sG>A_Aw$Pxfos;=QNwvmh!MaiQozSZ2 zAo!YLKBaL--|EfREFFPF|6il8Sz5GtGDy_9(5HF0(k?<|3xHmosi1ap777KYQ)L-` z_B$z#w{J}&vY+_;N%s`?dFT_U^B~+iV2AJgqBb@YYriZOuuW(^;)#N~k<}(yWL!Lq z77Wn%e3Ouxb(yro%81B8fd3H7&(k&6|!qm-=fu8GSSOTaXHujqNM4&{{cb_ zd{U0?w6mc%3a>MGu1)j2V!O2-ocOE#d;H(Or8{n-Gwa;Z{}HszLAoqhVyKpB&>^H zOAL*9qpxgjnZ%^8mEyApHqs5kkBH7`-r^W{&|c+-fF@56P1BUuilMyoScT3jvS8X& z$xn}>E+l6Q5fI-VI^a|E05WlBI1!_W8+rQ^9Xq6y{M*J|ipg;yK;i5mYpoAI?Jv6V z651d4TT-()HzJ6JyU3%G9g!$q9~(Tz8ivJZ*IawXCR8C@+Y#A_>ek~0jWrwzZVOL; zdmu)-LtY=6XJ5z>eTS~9pYQ;wBG z2H-dvx|`*eJIXc&PRWS?=lrXLDa?Vl>Yfcq|8!db#th$V$m4_}#2UeNwmX7YcX1uQk=h4Am2Ey(e%FC_+&bh=VV+KeN3~^Zr7&`gbVn zMjxtOc@D#eVGIY^hZ(}Dqyvq`d3Ryr<)x+Fhib{98A`wR$7>nwFUb{-&&rmpuddMx zNUI(3!ESnOhRLtUQsuH~-i6#_rl6 z?^R7rbjQb3KlecWgf zdRGATsoAkkC&oFtir>mPiX5hVdR8MwpuCsq>K4M|_l>UtT7o@VO*P0{XqPe81uGs99n!yHpq@H|GTU>80^eyO}EcxwxFE3T0@ z$H?9pS9DJ9JYIXd(!Bid3c#iI1pfg)1HnrE#m4;ICP3fp$R`DE>KP|ZdrcQBX-EqR zQl*_x7nruXC@6v_Ku}>s>laxFm)?$Dj5+hA>X=TR;^Oc-XX`Oq?^>e<`Tq!*eeJ;9 zL^9m5Ilk)=vpA9HKra)MMX)Dq-lw&iyv!QBI+!ZV$CJ!tp4LWOjAspLqJbi4T6}(q zZ9R-;P5C+^hf~88Wu^ZcojF*;JekJI+4bwQv{b1in7!r9Pda_<-*{+V$rsBQ0mY`> z(ycL3$2+F+^~QbocLGZP5tx*?mS-ln_Dt;GL}=OJo5-8HPQ0eODKx~SRMZC|6^aAZ z2VmAIx}RIGkC1o2)IP}b=~z&&Co{;ne>-f^9D&oogaDs4F3RGmpoA@wCF3o(pPi{j zv&}wIrw0kL&n||6Uq_x2Bx{t9RT2tSoc9WSVB6HQ!%;l@aOZ>=e8XkVKdFvDhd))YKuB0t0OnB2D}aVS61wh zx}anT6U0>+Qxwx9yx4TAXI^-QI8-t6;nVeQ8UD#sNN3vWNyreR@_B(2VD16B2XRVe z|8rAFzX)!^RQU}FMmp=oZM~v1`s(7=H_G{MWxNQf3s{@LF#=ALQ6NL4SpCxwzxfHx z)K}t{cS##IlQVl!K^avfOPg!;3kjGA29%iMo_OWXhqGkuqs$}lLdy+6ekX#tBJ+tu zove*+nDQC8HB&9E3muA_fS(U(<3!Q4-aro&4Z>z~CALeEb7^9#bWN+Y6|3(p_C7wH ztSS76Z$dneB&goOuA!*jFot*b>eTF?52HS!Qigq^M(xtL{hdRerBTEZRmnXwp7J;Yalq}S z0DAiTusxT(>a-~4?Z!2L=1SGhdKQN4K4A`)9%2aU-xN$D+CPs{6u6YkV9mmzJ~Ui*2^qlR2lv$nbA`F~V8KV4r% zQ&j@`dn?4db`}eJS7LE<5LR&zUJLjljULdSM?g^2?-CX?qu!*R&Ub#1SXaLPUh{OY z+Ov;Pnf&mAxsL|@1aNYUQAckb!$zXaprTs+K}8GKZ2CYU(dVAhZK zCR+LF6q7ZsR~CAtu{wqXXus+Co44~YW=VC1D&JkvzIU0KO*$-2)=ba`m3;*H=3?Oq zC9%aOj3j4EcS^0B0%6EtiC~!mHP@HN2W!BSa2q5Tv0Q_PLbr`q)cC7H-l~$U zYJ}b2ty0Okv2)S;<`|kucF18|yPfL`2q7UhN6?&7pw6#_(E1pi<1JYmJL?vd)=-Vy zj6n=LNn0?)OF|~yI=t#ZPVG7b(v{@(it@V>IHk)F$+y$xfjaU64D~3Wz7`%GYs@us zJ`eNo@K{c^HE{_@^m>4RD$MQhYDw#2<82(y87_t7S{;5i>a!U=-{aP=m24ho4){3SDEb1AgBTyTv z-OJ~GDgt5dVZgpZ%VI<-hGvpJn&!rK8G?^qzbtn1Ri|06`S}2|b`~WTp}E!1L-wRW zOQ23I{4NRPRuqSdm{PhKwsBC8SukH|etaT|HGZ(MrGNc~(9urSFqcwD2>7KdwLj~n zAVd@71zzLZksm+&()l$O^o?Mv$c&Pn!O@pgnyuFJm$lKCG2_qsT2 zltOebZKuZH%qIqSHy!;)z~6IKkJTAmv>I8;dcUV|cK#Fc_pjDvE=CjRcvp&h98xhL z2cmB-E9%d4`fg=)h*aP0Pjz=N&-&4eQRkI2YRA(FqH!P9R0F1V%C;5 zYs2HFxm=Y@$rmjwxi z?q+HJ?o0CHdrtiQ#rNyFoov#7C6O2ZvKqd=0jobJueHrzS$VnV6>j*a`ZKu|MlD91 zu_ykqI_3<%HvG&^2Bf6AZz zn#^Y8A3J`21pwcoQd62~PH7UigrY8dzKl8jST7NNsghG@7Jx~NUqd~n4IS;`{zg7c z3P{9T;(931@2_drTF!*sOorZ8%k>E5zRoag7>u_MX<7N8@>sVmNp#Tnr#F2qi=Phq z0Kt`t2=n=DD}{90y&MCFm^sbMRyh@VgoyuW$u6zy6sG5sG@BUOcLIgRaGD~n_gR7OfKKI>+cYQN zZmQ#*T;I^XkA60iw*SKW2TbBUwFgQ`ajCA^%reie%m2(R?GuaRrJX#vn?paKavLqb zV3lauiXt7rRJ^O-`1a;8agiF-SylVUbBFOxyF6E5KbNV1U-Lr>Z}eqUWs0BVYMPW( zgzbIMBLLE;y!aPQQpa_w!==i91kNAYhqZMO9M}I5*x9WKe0E1RPnG|upFGZvqv}jw z+4oFx%;(524EMFkf;LEoHJTgFF5_qDlVG**!9+O~xY!;aK{gR9%ZP!3T63x|WxegX zt<^BYpaf?`FrT9R=Oa?vo!R^)PU<6etKsJE?$Giav$Vm1i7oj9p51JoBtNAHmWncW z^XH)l?cWCDXtL~Ymp%#YiTKe>ykJ2pq?VdJN)#uBg6dIg8IPRSDoVuU3F&macM6_}YNR6BO!V#f9qqBRayeHUl(quwk2BlK9-6JHf$Aha_w^oOk66rr5 zspmyaf9;3TtbuPc;b$aL1hd3Q)#MP{3o^d8my`&Vd&N#HAEaZHM?U`Z+jpoK3xl82 zz32$f2Wn%w4)weU~ku%I_+ok=l^;q2<#HfsjP+UhVPAEPuS)OVs(74A|y+cGVYg~jmR2i99w(_R}oT&sH*)5C!Z z??`AS#zuF*fQgw?R>XR;)esd&3IU}$euKb0zYj_-3(aod&3DV8M=A}uGBdN>Xd(H} z`zT`FmiljwZQWw(fX5a=unWoTry)XVA5nITOo@V(KhaWW*ZUSf7x`aEI_x`Ja%wW- zyv+s72?!&b9uk6kXVCm*U;4*~^+Fm&5^Nj;H<_*lzKG_35+bNn@j9Z`2WfPu^teP9 z+A$FNie^OodDR(%qJ^FoCgR<24rA}`JrQD5P`H?nLs4I`14Mdcy34k{h4`eGwyc60 z5iV;i7EmqdkRC(OA3dTr7_|-+J{HNrtYaByA6jC|J3MQEvv!qAeDuiT;p)KCok9|3 zq4gV|>u=p1OSNZBSC*nP=fp+r)NjpP`cmC%hxw`fdzH;aN3l$(lw!eLgh$m3ZE6F< zBZoK2EtyHfU7y=K>WK*#uLv#?e?R8IgYg52yAuGJ0u=Ai$j%&~k4(J|EYQPx?0YfW zEXpV*Zzh!s68%yDm#YD?mwHHm5-4Lje3k#(`rdz1x-XO@{{ z`je0&8RX!JkYkDJqY(;qoAaALJv-+L=H&V&fxneSLHGX^CE4MTH|Y7!K@?8zEmlm4 z_^o_Q;jrE5<4;zNzKnB{(QV)DA7&?s%~7j4^E6Tm-;Q<6ny90X!+cB8fmT|dO$0?{ z2Q!3ndZpHHQ;3J2w^{F5M6rKPia>_^M}T`k^%IB3MCoJ;nBkj)OkASq?kfDb-wW+> z6K%H#W^p1Mk7zFn$j*7f-x!0qE-bT_}qUn+6!#%X^*&w*!?D>x#ogdE=Wt&HD z1>-qJ(b^l7yVyS;41^~Kd^ttfQmicBmey>_ec#*v=n5{sx#cPseo|=@?)SZm?^1U; z$WGMY97mps;Ev{|2KWFdus$nTYCh!Ebz^Wj^euEkbaY-!Fp^HuIrSz}MYuh2{AfM+ zT=5>{$2nR7(8tyijbRM28(Cg19}DndIltIVXX%$^S1cB#$qDMviOdgxrCebzaDxyA zoFGt!ZTs{kvazo+*vB5e-p&0M(%S9n6zLSBP@WfNwjYqrPA?LoL#nJn9a@wGXt_BB zV5}mhS|E)>*Nv(-pMjzv^6v&Y^s;;Ht(5go3T`Inp0j+R9K0ape;Rq2@6gX!=<#{V zN%^^wj%vnwxXtG2+P@-=4ClY@PFHnJ^ZT2@xf3rdu70ZdqPB;_J%;S40t~5KnL^~4 zXx(yJoH$pz@F|udJ0f#?>j4QEz4J$XqoY^it4|Mxxf{WQ6x&=@79jVKM0h#I%SxB} z$EKTk9L4P`bzbcOCNm;8=o@6{Iyw-!j?n7kG>0Dy70^Fh@;(o^b9el1)l!#tMwT%< zT;w7~XF;5n6vHB5ISr3y>)kv=K;2D?`IE)i znqLjYeR$6xYAzBySMrplS_HYcBI9}1Swti9z9c?r&**7uEZ>d;>9k&7ZpZ{R{qi7H4EJp0$jM)`l`V( zsc;M@y@8blBHM^BMWKX(<8jSI!4F_pe&B8rP7u_`3-It+vx5r_i8El8ktvO?6$q-q zw>jTLii<;Pd7sCO+er|1w~{`mnGB}>fxiVti)t35zF%kDRR-8a<0Oa-kojI5*Ye(u zDN8C|Y%A_gS8<;=hxWad zDd)f7aBS*0G`5TPC z&R9do`&OZMhHcf5tqBf`@8-Q663$2M1#0fFq%$w+PX8mwaozJzze!>KWC%MysfU3g zS0Ih4BV=7x*NHRn9cGC>VesSd$1=+$ucA%_ka;h9FE*rDjpEb{SR6A*Pe+Eu2_lB@ z%>PP12ru+8cRjkREI#CBZW3=jVgtou2mHe-zw1eIXH&n5@SM-R&eadwI!sjCk_X!>K!U#-DP8Ieuc9ME|8}P7Qbu*Bn;l8bWdXX&&Uf)}_8P z1C2%4#=_*$v0%Y=`wEN3&B=xJ0S3x%7u@^(;$Qjfij;N0^EDd7g@kw2@8*Fo@8gPH ze4JCMnuotvY*{qMCv@VjU5UTzn0iu*ecXpI$^xM4{g4K&eomzo_)UKp4O2*x6x0%( zYZ5~xb_Vrmw^cX|R6L_Sqnhd*!DXR35CGAQDtf!DkK3mX7@vswEx{Zt!x9r+5z1xG-_nOk;BDrY`@;RUk4JXOVoTx!;9x{XWU)^|-t~4EdV)5n2U~)lannwyc5~Gf#KvN*M&6Xk_Lfs1Q z&!0p$eQdAGo7VT+*BlTb7Z{AP9Ir}aWzrrvCqye^-S6UnqHxd}Kzpu>0yREjQ*Yz* zCSzT*yR$jpY8@InDRy4m6l3-cEU;lQ{-@8xy_hev4C9U!K%O3X2bwB|G z*`p%OMV_oFH+zSY#)(R0mbF*@O%gARHe>%SWGlyWU0cCU{5cS;5l+TsuT&MVI%@{z zOQNS-m=@sqzs)g#gn09>J&4Zz=ml%p?0uJqzCpD1LL!Th#@z9_+hl6QEnpPQ@LT<) zzUDVeaD6^}CRIB3y7!Wx+v4O0(H+W@i8NDn*})KT*b4q399Ur=@(Co4n-=S|GzWYJ zeI3u&)U@DP)989}h!iiV)V~h-f_qN#CF7I-KI3f|j;s-*3)H+zfq2Jv0cz1!MvG05 z_zVj6Dvz*t;_q{&HI!x;e?KHMKI2bTcQU!C6#o3LH9msD-j9;FK9q8gwO?gSxJFsk zr+HgEcN80z&`@c=!=_u}6I@<}qd1DFbF)Kg8?LXzgHQ0HA>|SrM7?H%w$gN7l%?oq z843=s)Kk{AtU0Z9Zg$&EkSdP((+f>x>v>RA3UKC*2T|kjxq(YE;t6?zRlhdS7CJvN z5uwh7u3(4E$wHq;S*&7=d)o|?^G({%tLRFLj=sZlhrCFKBzN53V;Zz}^|!3Jz$ae^s7riYsc*|%cV4tf zYEIL8)bY>((ot)Zmm)Gkp9VXc1=oQca`Q`XXUa|wJor#212Jy){3r?Ui;tHckck(a z+>s!BCYh7?XF4=y$X05pde@4Z!X5D5i$VMScVo7EkHk)Y4((TUI^y5xm9@#K^Tnnh zrG@NwX+UvQ#kFTQ+AR=&ljNa=@bCpW)LQmZh_992n>U{5`Z@T*7nWwhN9pHrfX922 z@Y(-8d_eCdfV-ZTTh}AxHANV^A*B`*y4jGh_ZZ}*a;BHsHEI(sK!XVLhY+ZyX#Pp__59bRU z*UI|~zl`v7#+m~c4xfucZA+`%mn@ji+zEgtgXEpXyDVApFVLa8DXbmkRh1ostMCWl z>SAkp2tKc}IpirXc#Bi`Z{vuZ%XQ9O;ra;M z3&&$F4mF2iMM0h)x(Oc?_s44%%mFWo-txBHIzP38 z4|CuH8@zOVl-t50+0aGj{R#{B>nh6KXN3i;EQ#04mw|VS;I@^dv4O(cbWqEztk)zk z>7=lozlbv2aQA-%h4udtYzyOLs`XRJX^@eFBTC=m~Xt|`y z)bHyMC5h1-+)Z5`WL_2StG_r>N} zI%DpK!o**P?D&G`Js>pe8yh;f&w|EI$)hhH44e}aKM@(f3k&3UaO;m)E(kDKxNr0W-QzUZoC=^64J5fQ>O+kn;Wv}$yr)PO<}Z#< zK;$e+yrTLzPa}qX2xg0$^x*=8jQ`f*4tSS3-$gO_)}|ssMAq)!kr~dWwuXI}Q{!N| z_ac7%hNX^(Pe@4N>c(&6V~|zJ&`8xAh-UaYDhw_Z*TXS$S6}`r7i!vfIkzQG`Rg-C z@fn|J)IEHQ<>}ytdkyl$d!t^$wxJZZvtA?<@@-+A_EDsn{_*CuAH8`n3HS&g`)w$m z1k63zsRHt#u$A&jSeE_auo9%vZ<+1OiZ0JNEgR`$}&&kHU_1CXf**}T2sU$rdq{MDd z8)E4vhSJ=Epw?0>{k2P8RWVOZ{i5UkOx5KRbI|wYpG{`{(EX4B}cYMYs3EHenvd{bL~X@ac~?p zIxOXsNhK}!ChcC!Yq2e@2ZZePqu57xjVk1owPitE)7+HNHH6g|S; zoxGi4jc+m@(FNtH-S(R5Xb54ksrl)j`4FT;%o(N5`d#S2}lWM#j z4l7T>icaKFc@x(bcXeG)Pz&a%(+vyWoMN#y_12oRo#`>7uh-*7)JJ6h;pN8rVs}*# zXE1bqSjoDFV~J`w89t2RX^U@>oy_G|xh!kHr^A!zo;CJN`$*dE+X{D6ZFDd!(T>`k zi?(i2+>9~iEc}J&uHVBSDoBw8t}Qn*k#*}I?{y>Rdm``BL2i(>0hnC<4|I_>5+AnT zCFT@sHILP}-@PCG%Q8ols86*w!$$3taMj{TBwDe>-fgZ0p!lJoNf0JYjJe+seNRcF`wt3`2J z5*L^B@R^p0oeORqfHbpPYeX=gELCvdM@w7GsW`!p<9z-KTlso%ybavosWyHY=*8EI z^l~=2U;TW*?@eOZdHL_ng|#C60l$9uv6AWUEvn#!{cu_sKBd^<`ZNBdx6)7T zRq8H{U|OkJKC@`*`_M2R5?~c0QYEx$dfR4dE1@&9e|{=}JQxAhZ;!MhAyI*2)dy{j z+RY4+tD%uA%1(;6tp{LA#e<}x-*Xw7)!ZF{)Lsy+UZuMlLuXr}*+NS014C&G zYi3YGG(SyOgc&FOp`Ahm)Rt%w_XHS@PRA=J@hvWTU5jUQm;v|=zj@GuUkN$$I)gDB z3|W=K=U=HxT@VdiN`u4I4&StHq|KA(Z!+C%w>NvbQ%&S1KDamZ+aC$U1)D*fz}pM$ z*FIP}|DBkM9}&Y^1HLmo^DE{(?d>MQDmg3tK9T}P(`?!s-Ww0vh-uTv?yB$VaQs;E zyBal4D>q0JcPx;hg(=7kwdqdp*4P&@JItiRw0Vjhwb1?cK;oW%Q{^ExFk%M40(8P~ z{gFDpaP5exazG?&VB`+H7Vc;rwYx@>{CCDNy0OlZOv)wGy7_E()Tq|eX!!O+EML+e z=wFf*i-h2&sGtjeDc-$rjCJD^UDDzjX}u3M%;WR-)>??1@hpN^bnd*fijE#`c(v;@@)3vIkEWbA)0@z+#$scb^nZlSgE zRtKXLF=5M#_9;iV4n~kKavLq#&(#K${GLhVe6aF*Q6OPU8!SscK5_|7%+c;siKg@$ z;qEFz>dmCuQZ1?7!-}EBQMo$xhV($zMdyw=?*mhZ4xQa3n;$|1Khk%`xwE2GKWLq8 zpGafGBBr+Q1Kv+*UF)Nb!h8^8yjGQ6diG)b^5%Z-^1i~>;T`wnc>>q>Vy(USyu(0j zcC>*rk)ZmKQ$mDYq&Zb^rPj}*`7s~4MYSYtF5?|mMBFi@gYvEoUqL|J=mp@G|v6HTgU)Cw5w#GZGfM@_a3MgU$dRvuNCwmKpU^mVadlgnT8A7SQT9Tr(MX zTa+lSKNHtwRBJQ7!j|)^-wobsOl&sSLb8ZcWTBduWN$Mix*bFL;Ez|7#&s8(Tl`cM z!+Myj{&)26dBdNU^y%NZX8tl#W1i8k<8$Pm4TxU|D=6?39ZTcL3&K#j@RbuEIEV@u zvLx=K;^R|3+FD}VobHTq(4CCZOk>KOFe7ssqr1P^rvN@UNu-VKgXaq)Ow2Q~td^7y z!_I(vz(iIhqX6%-Yn`I|pgv#^4Up(Rg6JnOjs=k1vnR_Bl;3;;*wy zoXw*t6aCvpnx9Ke?Rz;AN2z_qzq}UfmkM)997#M?;S0xaDBOiUU&oJc$CmbUDykkb zU(FhI#-g3mr&X%nEl99tqoUP>Fr4`8Rs8$=yXj$xbV^zQaTK>UEH?XA;dj$%vIU+V zwz~1vC%xQHbc*Dqh~%5plu!gzhnN>R`0J*Wb5j<_B)v89@>jGAXm0Wy%Waa0a5|B<6|7ql^=@)| z|5-Eneob{hG|VGh&8-~j948L7+%c~G;T=47Eso|Xiozd`Gx2I{=o$DqlyvA?O$^!G zusTw)wd&qtKI8TAEyC)vKV#1!UkA%`r*ExoEXwtH=Q}_!yOsFh`$~PqFf!Sm)oyYtYl{ar+5EGo=<1@k6E%PVr zn}Sa2JT_B%z)8GyD}!dRgZX3vrmum>zBz5i7QOL#L&)-!A;W-W%4u@hXg{lidJRzj0gRXK9qBz+!o8%InsYc+9fp zz4|xS(wb5_#AZb&_84kAtRZ@)8aFSAE_tR#j^eKb!MBOKvz8kVZwQHl>7R~kXjV_~ zS3NXm7hriuVt5o3RK%^IA*7(L}EfVs+v-E zrFRMII4;<3PCjGayhJ4qK}Y&&I6{K^bHNVuZvGp88r{`A9p%Ra4-qK>^q^|m+YHX% zA!dEpDg6JY6X<$Z33?Ae3(lb}Q6#70cRnS%aNt4C`+C-Zfw##p zvabHo=f~iaEpmAE+kT@V=h@Weghn0y60eLu&08a_pT;XfHJ|wFj#)JTxbQOpmpp!t zkO{0Nc7V#t$S|%@WTCk^;fV+=X;Z_;d!DgFvj4aFj47)pO;+6)sVqu<(C|KPjxD#( z1N<4p;OT@+O$ay@B)5+eBt7i_{wjB^rSu#(9dw;${>*!?SEBa37jJZ7@KBH0es!0l z#%8NkW(&1Q*0Wpjgz|v`8OZnBGY)@-^6EfM*z!LQI7Fseg4S|z4_F=it8k0lO`7c< zRej}$8Iv91ogx?NKHgK}nZ!oFd0sJJGWu!rXAGLRX8UqTRlo0Zkj&aI>)h{ZFZ1Bf zKFNU={j$d@R&g|#lsw4%U9i5_#9-XHbhKT3D}Z4IYPTSZ20>_b69NPzo!BHXy?KrVS z;R4nF&Po`rnGz0n?+UfM#(;Gs$hpyp)UTgPqofpL5sUI5c8F85G-5K>E&R7@rgr1c z;TLuN#YQ_p2Oj{!m46k$OQtUtV&BX}2E4`8MjU|pNPt8>W!U>YsFwOiE&7`O{=uyj zHrW!xTt~qFM!gBJ9{+rL+81OEsb%V3j9z@&!6&lZE9x=OG;9UTT$

(=n=t+Ycb` z&&geF`9GiB{Oo7^U@lM|c)4#)DRFUj@Kur&tyLBgX)Rdd61o>&oH3WCIp?STew<*~ z5AtM;uh!T0i)nU>@Wvm{zw0_Dh}O$5Ua?*yIbPVE^rlnJ&cNE6XYnGQX3$0FW!_4@ zjGAc*zqLpH$E#OUzRS+)m1QN-O**apwdRvlJbgSVKe!^Y5C7fI$!gK};Mkc=(Lce^ zXxFz%UY$s8{T_9w2q^=Oxrlu6Js~)eP+0l2JY6;~^2B_4HsakIRMZFmk-a}xvj-}WVK$jZXT}sy4(xd($t-^9uR1$%Qbgt) za4M=+?QO>uQ3udG^r#=;p5g6E3i2_1V>-s)qR4Yl@tRby=Kz1Td~U2~+9_cR&CSvi z;<6;*g=l1t-jC!CF5E!pCSaLzQKP7FcrEpTmZ|>1xffr|4hP*D-{(A?9{|M&Q6MYx zjStGjU>@;B>nXqQ!Hq88>mPNjYn@d2E$j<<*lYY!jn>zlDL3Yek7AZeq-40Pb|ziN zJ0xFIxPVpf>PP-kAUK5#n&z5msH@foSH||NnH~f;g#~>PXv5L>_VV8_B1Z-RpJy18 zH>;T`bKEufCz+>`9c_N9PUO6yYKtO+CGGkteOTI*4{)%f{;fHQ`=}-q(rA5`5ut|G zx^7wEyHrs`xX#ZEN(|Ee%#eH6@UVb9?$o!lM~krDwh>=eUD4sCN8636+k;emPE*-K zQ#qCP9~A9GzdrAm##-e()3M71FXPwO`~)~9LK2IJuFmfr{)u#q&s$q(uQ}j{eQc1X zOrzmc?Bbl3A-VecN_THv-@9&TU%;M{tvNpZ zaS|l7Yi6K0V!culW&9>1eJ0Uu(Oa3V=pm4%0{yeM;5|c21I#MU)Gz283)!@MH0Rn}|JssNN7Qk!WuC_EQGopwxV%dbcFk7ZQMFF?u(KmaC}`(K zVW(UvfbWw>HW~$APo={u&qX|b(O#tEG%G7Y{rogkgx?$cbIRw0M}5P57?GESw^6JH z%rD+j?Gi>b+myXm=^1>pFELSgb?=85sqVfKHS?RR_4 z_a6zj%fc&RGf_RvNS9>uemnLvF;P+b25EA*W&p1hNNO!rqeXvws4-1p#QG!mi9XUC zEaxUsToZd{6f}=u^n6v{r`P{0vV)3G*3sN2 zb0#G#(al}YzQp^_fJM3eg>2i`n&5`Kxdr#oMJ9ml0&bof#>fDNpvjl?Ti`KY904?z zwe}C_%+b*kHJ|2;NyF}VR?xjg6dQ4#w|Oe0@eYJ0YX-)X(O{cY+z@k=cKP!1zQH18 z@?rLg$Bd#k+xs#yZf`lP1d%+RUvwKlK(3oq#(ze}r zpX66w=K2Z0bVqBAT-J-Vd!#Js%&|*Dna>@F z=M{W=a1jDC9Z7;ryRnlnH_MT1@_r3z*UHc8WF#K*JndFaQH}R1Bdqo^a9;6C?a0qH zX=+n_Hr8TJ6xO$_`p9V4+RiAK>s1h-dIbzHoM5oPWmSB?eAmy z1MxDDP0A^*?SeI7guj{T!x%BH?a_A2+lF6vAZyJ>A(YeSP@)he)F*I{JFy~>iGwT+ zJoImpzs{LR-EuhF{G|?YmR<6wxa1fO%;?)vZKCX&; zpf&q(19PzYtG_?qG^ZW8+2yF&I{e|^VUiuh^KQNc4GW_M>PvXL%AMAzLT?Uh>SWuW z_63X5%ib$q3_e~fKCwl8(ud;ul7_9(opf=aSt(MZo5cM>S?18TFa}xeGBtq-_BPB^ zOVprG)*Xl?py*BFC|fBBb^bggg^2_(d+^+Txp0r4dcBBDt}>xZG=UN?87q|=u4rO? z;dgz(VwoKjHhPMIgNlCqtl4eH5H8*LqWu7vk!H8?OH?W{pV$yJuk*_{nU*dNvjG0H z-emi9aUA#Fv;@ah+7>UnMTwud!7p$vq#NK`x&u9y#of!JxsCAsyha;<|MPmgLu#=6P9{!J3{cMZHe*uu0kZ z9~UKBiAzPH(hmlGo5jcN3r=|$=W;24%zFL^abp@fC(B-cn=8$>SY1%ESH^Zd8852- zii@8&+wo0aK-Qv2vYlrWC#%=O#G`0CTEN)3hT<%VgV>rqpIl!%97o(U)75KjZIKo1 zOgV-f=+xg%Jyde%fo9mmvV^5NvR&^2{Nm%!IgT=YfoC)BEYz}$HF5(Seaz3_*l9th zNUv)oC*AAK-e2_ob0PgJ%Vt@-p|KQ|6Y)-eV5)o|Kwyee)S1RSUJEFBJJsM$qePlu ztL6|B!2D#eWQqWyUje;L+Sa$#!ZZ;j$SizXBg(J z>DG08ct%98aLV+oan|J(U7){GWG6jH1A1s)9r}xr>b7uckWmsI*E~&|pGP+)X|cJpu3QngP(&#hj-FwI zqaQYb+h>Dn;vp^~jR*ja6Dj*}UvPie>vcATLJYW3;|nDPDbKgfBxdTPpOdlgQJ>?D zzP#3_!zT<`bojEiJRRH>dlo~tCy)o*;FY$%X+j2J%nK0NN9u)2@In~FNx%9%sTFZU zghodB-%GLP%Pg(tg@RV{l99Qebb(KlOaiq=MRKpYj3{q30i;hVFdK|Q^PpDXtm=SW z-X{N^+Ip+c$EWRxks@1|!)SKitw7A{ndQjs?niZ&2BD9hu!jbQPOtsI?*pggH<5-x z>^jwqpXbN>);t)8-Snpm)CanYc=7sa#|ynmPDrC-wvFd*6}=$sbqD6>L;zkV4dC~F z3dn>rKi%em!04iO^&L%ZnMI#9e(@FGurFWEKi=QIprMH~O!HuO=jD8<9-0UX(xF=zwTz$6i7uCz@f zGq5IrR5Q8p{;`7l;}p2`vmOq!@=;GI6=KiyG4IoL9P>Cx1JhvY(k?Rg_G^e{>1v2h zYyC_I9*pz+gxGQQ&5uPhvwp0I1zJk1*Yl1~n91wg15@_Pc$5&FWDaaqBaer3RI59U*0yDmsz*n$Oy+vfHlu+nc#s{e^RM z5a z<5Eo(Enr|OL~3?BhaJ8dt;&B<@zSO> z#>lY#U9AUpziYG8P(f4sv%#}wz}b!;5(%vn`wT!VTH3PT&eqf%&G%qvI?@&0?Oo}= z?ZBMh7d;lPyHeom))(43HE=(q9@8hqiM;M|=Z82g*8OfjPCwcSkjQ&dki}_TJ~HR8 zzoyB$3USdyaRP<_qTV$c-^myVeTZdIfb0a_U+d=lo<$2FcHv@3-HM{G&-Axm+S_f? zoeEnqTlG-OP{8}>w%7*}w^-5(O?oRf=UbAj4Rds77n^^a5zec_W)X*!=>DlC#B;7- zo|s1maLz)>&}OLLFsW49ziao=Ebe6{Mo+4f-EWJlr2P<%`g1?Hni~=pMsZK46}AWr z(?Lu*0Nx1t?{b>Tr-E)+qZB4KHGZVNoc8%sRc$C8=i;&%@w$cWnIq*#aoh+3Eiazm zskrMNqI*_CC?(w-_m>qFCYh`AC+#xmdFg(bwMNYxJEy8x=IY7~lQL#PcM&rU!4pkg z_}s^?QS6Ax|2p=EH{)^-&sYQ^t8cHkF{UcD`3`k2|T~pJa>G z**cv7mvLW)XV*L;|Ir0F5<=bpW!!^`|0z*UBk>p_1z`AtjSx9-?O=Be0I#gRE%#Md zgJpesqykI%;>i%Nc-mX4z^ME(n~C=nhqBMhjpo@i8a{Jvyh-PCX|Ur+%`7GnSVK+> za_@&OtSQb_k3yXHRK>C@xNe5a6g)jy7E6d^UkrIBy%G%L*=OMI*}u`zwFA5%m;{@# z)gTm}&G@((`8`lvhy^&VZT@ikMrA+0{-kd~uC?=vv}BpPHSJ)f@zVpk+Q>@*3p%|v zF;pkdHQkiRLZei@GvLy(UK&Vgz-Cj+!B9Vs-8LhccukVwvG`s&VbzCY4BWBm5tQyF zmK)g)N*kfbz`g6pQ@Z9?^mD@?4!(m)t|R2Zmrde*UsZY{l|S8 zP1#9CX(og%^!6E%>G;yb)_Iv@tIoPhYwu#_^o{(==_?{{ChiYksWdZY{;kjbm2)}( zQ$w*vtPfMvwOhoqYqe^XV7;^;HC?B-AuKLc_0n*l~NL^8OpbhxE@}TK{T1zo`7^TgI}+y4>sXH_|e5-kCV5)=o-w zD*_rUu{WI=wz-Juq2m#sT{atv5^<4NS}!Lr4(ABNS_TbBoR|RiNAHa#2H9OhwZPpN zElWRrJUPo~oUoUhpzE!3$qFiCr?sJ+{)Ke8ZSnUW=ByE9Ld2nHV&CDNEJI&)GmBWw z&OC)z-{;sddQ$Hx^+69M@PBMdrzz;;?6|6Il*Cd2{*bQ7<-l5@4nD#@!=!pC^1-x8 z?>+OZHDK^*p-GKqzy0RcB_6`wgqfR>R(EE|gA#*Sg$vf){ppQHUll!)0IV*hlDD&S z*0Tkf=6Y-XwxT_O1NydtWgNR&Sex`9NqW1=eY-J-3}Z!)QIA%=fL`$|a}E?r;uv2> zLwMZhSD%CovC}VItOgv4QCa`@8N66fOqV$J`2 zis6^3Xftov8mV*6NsK&=GZ31f;6t_E>+sBobiw4RBt*)0sKxoO%8t1mI*DVxEak#D#j9~&I*54Y5#(<(p1<*8*lxk*#VQO$z~ zI1Q?vme$W+^1-Y@iZQYldi(pmcwbr%-+?uNfY*LY+GjS&1M)Zex3c+yalMpJ+uXxV+(#Z&WZqZa$P@Q4pr8K|*RSzC>99B^N>l~lAVsKj z&wCBT2Rg^weA-k<7$o3gE4_kyU`byh4|cEJ(hAM?Xm=w-b+Vdh`@r_?miZYw1}U44 zThtscx1>)CX01JI<^sg$tOHLz#d1x)UAv=h4nTdA9oJHC*GQ&JI^xGj zQ+PVM+Lo-D-sTZ4wf@>>`K72Xp{LiU&GUG8h&2+~e8dh6&|#>?uZAY*r?*juE@+O8 zFEUx2JLx@7E{W(Kq%9e3jy6ZvCW(LI*-ej4K%c~UzxK0y-l~C}H8P#skkGc^AB0Y= z?}o3Mso~KN3RZUR9?zuI`K>^=RW3fm?U7>eVq~#9j5H3LFIcBOjXe&J2(KaShr<|i z$kK-}|Jn&-B%GtWhQ<*|-oN2$u9mRR@-$aRxj`X;{7_ZMWtmnfLx9ZG z!=N)%4bsgIalZqlv}qox!UPr+k?#?2vtc!qszrte0(ez*Mq>H?B7to)3g0PL=Y2x0 za_j2FrCpgk0b9U*dsHHPX|w-!Wgcn6$OBhjTdX!SoBpHdaKJY(E0vl}|MI2z$M69S zVf3d#Q=Zu2t{O#B?OQAFguLuao8OCOfOC3pXyngjO%BG)E67qX;Br&15*aFlG9S2x#&93x8D3=0PI-_2<7x@rb)xI#;LgT{&KfS@%&Ns@q2ZX0o~c>w|qQ z6a!pw9g3z9M;gH|d(8pA)L$awCOR={2iX*r&b)EmmOOb0*FHIyL z=XA0XVSs~ZMh-WlB5z)ywL1|0snxzbYY&Z4374940&Zi!c-5_Wok_ACamr5p$R|z( z?J_^o!=N7Prqc}w&+i8UmcgyYBO2GrTnAlM4GoPo-X|^kH2TKN`$);VtN1euj7$OFBtz!f&TQ@WWHWQF9$1&E2jUmZM)`{kX8U28bOOkVoh3F7=6gm8(c8#iaCg?6|^!9baoW z+veuDCQ=n{kQ`zJgHi>s_GOqYjI>op!WhI~q!m5BuBscw1-mp_?M_7CTmO8WUX8;= zxrJn^gT(NIt>5u%;P>3W7^lKeJWzLS7(Fs0Cn`-CI~(*!c9UZ?Eo7-<7?9OYGT8eB z7$;iQ2CbfzXK3Uk7<6un9bWu4;?t{~Oy7JxNlNo7pFc}Opd!1vd~nbG2-8J5+r{o^ zob=c!XPL>duBlzI$^6Nlf@vRWtM8BgA62u__G;%V09+RFE_6QcDd3f^AF#Pzl2R>> z;($Fk9jq`mRW7E`tv%!Wnl85>#RPf z@O-{CFm7=qRu$3aRG4WG^cY}Jd+Z`GHUK5*ur7DP+(#j})q>qC59Vp}7q^?6jr_-5 zWqPB?m8m9G0^UCqI}!Z2=3uu7ewIpPFrYI4^GMJTzNBx|T-{qMKutLF&Qt0he0~Pl{ zMKM4IpD;k?1>P|K42@%T=^l3U(@%KuT(}C$q z>a5!Kd6CzJ3^L=qZ@gS(vhXzI*6M{774n&LtG@cBzN*=oZ6<4f+#%YFG8IbyU`^uC zH2rF?7&3PK8TlDi44*;T0$)`WFp8`|)o>u)qou{y`DjNq+QL+Q>fAnJgW|+`h?TJ1 zvvCD2bI|3tl~Cu&1#u)2K~lU4p*ZkoA`B6)S&sR&-kAbZExLK_l;M<-_;OLg*no8V zCw(o+*_Txdoa5B!{QFeEB)cM%*90?@qhM&dM;p6X4FA1++LUM2j&(!^TcTa=rxgjS zNKSmh#pxYx0-2t3WfM4?Zdz?2vY^MmKo&sP(xO~xxmeZTeO)W7RcVuM;`d#B-z)erno zl^MMkkNeh5hf4PXfL3K7w{Kf9gaN`m;QIWY6`{!&zeGDV*3?q9n`eGi`L^~w&WLSj zF<^1++cDAFjvRgTL?_haH>&1FAG0J&2e*%qCcSwPI;p?tmklFjLfIY2A%0iRJu~m- zFLd^J440?hA9iIQQ=rLlXsYKO#7zgdGeQ(m=5X^(@ZslTw9CvuiRU{hloU*U+MOfi zOttYhn?u7NpJa)?f{kU8xhS`IxfFAl)mo<~V4T@`^y9pA{qLFXcE^ppiT~or4YF?e z6g4W;Z=LIb;%pVkXud42_gh$5G9NR@A`WLS#i)hB?j)M>66ROJ|9O2ftc$Wy&GAJ0;$T3&c_@l?!FY+2o;%vW zZ%}{f#JrFR805HdVsOy5vt5*EGlXAwQNSrd!xh^Ph~yDVLkE%1{$-tV)`75bY0_04 zbMh+t#IzX%7wiVGZ*S!HX(s7tu2-~5Yq(m)EV_EqSJi!IC~mHe98DZg9kV#{Py`vqKDh5tEt46x~XECp8dOjVQ+U(nTYJU(YY`QTXLVbpD#M{EN zd;f%zp}a2GFtcr`TJxrl&)BmT1npAu#FU?Az9%WJ3nD}e-9L>GV6751hPD7h@fy+A z@E5=>)vpA%(n;*=Un~~6;fgFVe=2_hSC`_IAwBJ3wsUN#X|H#=RMC2#Q?VawoL6%o0iACEX&pNzxMI44s^JzvC+(NCh`E7ARE zCm-fN_>>N`9%0a0qme02y;fF{?t%v2*13s56QB>!q&JWN`mQw5)BcTo>VdDHL~f0z zU5rheL9tD(+SD#K15;;_=IzkBwLyv-)e!QjE>5LzZ*S`6&Ty=!Qlnn1Eet}BhKM+B z@H8ZAb|FZ01Pi>RUy}~fEprEQY1b_d@&;q=%UZiO)or9qGRfa0h%{g?L5M=ufkoEN$geLVlRLl(v9Kw2z3jj4?df&*rQdSDBN-VRMf13 z?ruq zf}#5xEB>m@?h{|(VoiavjeP|7(of|}(|lRcm1AFp#F-<0K3HJ5!eq`Ep)kiYY9{ZV z|7L$*OMGgJ=m(( zv4x2be8lVYkQm^W+xFFb0FHCzBCcP8g}#~@CmWQ6WMvvnEGmWK%|D?_J2ZyWWj z=<7*dNBRKX%@1}sl5*vOv%Z@tw+=2OUQ%?U&Kx&#Q6HX4M4bLH<@eBd9#R4f-3mOK z^ha_U$ID0CJQ;6$$+FmDZ6(+sr1VMfXAgWmmL2?M9ppbJI+CWb;JbVT2~#rh2T_7! zMNKr}nAB2}@1;fkkvtMpAlU6~0kLWgN4*hg0<@xMkUDXsj<-V)LK4ANYF#~*v%`K8iJ^y?4Rk~Fbc z3ycwif&^O(Pg-8CJ&x%5-o{@uo30LEL8{zllfw^$Q?Tk4NbljDW^SYU{X4)X;wzYJ z7#B{W7UBPJ{g9Pi`Q0S_ycDFsko?+J`PS~-3hf_4MhLF7Frd>CHtPh`4J@|T*;Qfm#wfG+Oj5`1o*f77@!PO_ zpo|m}6$Y?Hi2KINW9BbMnN@{1%Dr^%>^=6lohJ6zc>>6&1(&FHQ8PLST$0pE5i>OE6yBH zeuDs9JhDa6cLfPbT|Nv&5RTIl4>jL(SRk z*^&!)=2H0EGBUXF+Mr!q&v2bSCE#9@YlpnAb+BuN%5?xiB>XJS7L*YI`&8@*2ra06P;H-9Wk z6T*hNcI)D4(9hAYcgk_z7rKga>{0uer`BMIv3{Z)ZGcPGk{h0j-2TLGl+-sMaqao8 zXu8Arw1*B?{+Pi)m{s@3()g>qMYk>A2);+$|g!$ zc1MGizOOU~3-D<$a`jFZaC`Jl<}3~Ke{O%f``7*m09xSwfZGo;g)>Ud$>R4H8=_V> zmDB0ymWHB?Gjl5Az4*Sh>9%nB^tiFw9d5y(PrCR6=+qIh-KmjG%?5pIkHE3MT<4#5 z=Y3jD4u7m2JO{kHtbb=Y%y?_^&MX}J>anF0Y=S~45q=k%NwO@cnZ7lO9fY$}%I9yj z2$}s!HtyFR!=**pd@QYIkz)i*S!v(uII`qrmw*$4qYPkcQBB79M_35!e(Ym)O|5*j zDc44#4EEaKzlm=J)0KRP5i!1RXoQ~GUwzKsA%3c>_;i%R_)DPXg6b@> zVXdiZ*JI0P55T$!Cco`Cu>q`iNNtQqUxQm==98{|c=o^v%IGa;em92eS7nF{jEU&L zK!0H5ch`(w^<2g_?~z8Tn40jh+6ke&2oYISAr+?909 zQRNA{yTSL^6^{#=-A9z%{Kp!@vZ?29A%WAO%WFTV;=qosG@(y`5gt=dg@2imQ`=E; z&*#F@s?4dI*Mg(;u`X$NgK1%v`=tt;RuJ3}QurAVYp(g|1fkoPM*-K%0vgS=3a!7& z*q(`Z*ffXimw!s;O|QfDbHp;=`SmMH6R!kJFVubwJ2iZb2pIT9VZgIia~+njgTxgR zy*52EqKc(l(Un=T%{Z;}LPEiod}EA8)j8xLS`6q15g}T)UpgpZq*RV%Z&6O_&Uy&E z4;8NE{kV_ed>_WW-MOv`^MqpXWKF!pp5ZW+2k^nFL_g_`+Jjxy`R$f@kLRiy-I5)h z4@1kcwlaoX2_zby9@)0nt2ec|LF9k{%mC`FoY+*O9Ps|{y zIRQ55XF8F(t&MHav?v8!u{i5A%A(0Up$?Xs2S6jt04({9T^ZjU)(9}tv zYoGPX)%nZLJ9cT_nkbS!KX>Y%%}U{GueSXaa&*N+kCwk}l?9_+5E}7_P%vcx_hc_% z5pvvEN;Gs&&6J7Gtz%X7G9(&4s@~R<4S63@x{+<6ZC*wS?}bqvLF}s>-cnEVhu9%g z-tW`;H3wrgPCr#e}mcx2y^)Uy8j6!JCVE69&|mH39bZP7+6C3!Pkx`Z`AhY z@>mhE@y%;j?4wIdN~}!mY(x!bB$OMLKin)vG&O%im!t-aNe@q8W;`(SpHifa0n;O= ze6&a&|LDqj%~mb)Nv*J#(V%zf#$=InHW94;_&Y7_PeWG)M@kOuGXA?xl=P0H=KQLAsdsM{+k?Va;m|6;x1`OuY?M~Ro^C74CAvRI>G;| z&W9-pNYAu4sk?0*WE@{1(u@33O{NU+RJeXydPM z*@xHBm|9oX5g>X(v~@58SkrK7MV%ju`_0P*W=t8RTNUai0H6}FyNI0q@+tYV<^kr0 zrQ%o zqii%l6Jr1b)J*L7di)LP*IAwI7KVHwM^Knxe+Y!l$z*HK3_np?wIX1??-Ij}pu8QZC^a9O(G2uc)0C>cIVIPe zP4G@3R2p>t+*^&3avEEevcQL1?iTCTDw2O z%s^<)*GQKZMvyN@We&>8YnN=?NW27hxOU&a2pk|P4%xmD51nKbLv3sx3HQi8`VqWz zq%@fJ(6m%THRo-QDCj)_R-7aY4%zK7rk|;OVG@S$)k-L z6b`cVHm`=)RockP^&CoNuF329E<<}7!5;J7lA52N<%(Tt^pQnjnCC44s{LhT{~~|I z#sd?L1X!E#h^bs+b)OzW>#j9XiBsl;2+Y$4qz! zx&)v-Z<`xNwUJ6XQv)hi+rh!U*K@H82ACjVRt|Z(H?z!}Zpr?2)jl zd*;$3`yCcJE43hG_t6_Z-_*l=ux{J!#g0Bhu^+mc1i|Xd9lTDGxv&;E*W?~tu&i7+ z(ho`!i4nCB`n)Va@3Sl0TK#xQvn!?=N2yA?=;Wf6( zJ}ZjrP+gQNc;)n3J4_dpZXh8kK`rCdFeUS)XnJi0QfyAf=C;l}A^zA)S@XN5Oz@cz zRhpBhiDId#D(^g_RpCG_>}6#0h!??L8m(3^S8@{k*A>9{J4lHLmpuzfl{`B%LNGPsy_>dwSfJIO0W^!V$hi z8oT^d)>V$Q&)!TmkN+fs0Tb?6IzRh`{dtAPo-g5tG2%{DQMY-(TlfA%FNg_E-%+S(|MXyvF$(Fro31 z15y2L;#A=21LGGDJVrk$%MtI=3}G2Ik&fY0??1Rt(lu$4uQ7J^0w8luY|W8x$kem< zY&Bp-U%#2pfvvsmFRiG2d}=%Lsc3CL0Q zfZXF*>t@TI@pEZ?4~Mjk!i1woS?LBFbsAVW$%Q{oety5_GR__E`?Nh_PBNSGp^KhQ zB>@|NpnS>7;5l`<(|F$|QbVlm19x}!Xo9N8&*sbC0^GPkCCkas%&C=LXNW6frni_C ziA^#qK}5L+5jnxJ*ycPNLNivQG2$ZJ(k;`qA&+~cy17;vI0+ro8=WsV zR|hyxCnm}%6Nz(4IuP$6w5PYuy}!)ORolp-Rb!Y%;o?u9~EH6gHNOf}!|6SH(|1$-e?d4jfvM72toNx|He!+#h zXDVl^Fn3~9;?K5=$rx!pNA}E%!jC&Kdfb0|Yw^#eBP8!Zp{Z1k*O5!vzdt;`O2n?f z#!}HvFZ6q5@nzXQ3DyO=Edgwm`O)A;rpV*uCRuw07BlI_n7e@Op+It%UhFMIsYmL<^N}e$cOQV=Ix~&}UVliDQmMa5_+4 z@5qDxY+%k~TAsOot6xo?TSc2Ny@sUL!Vmi30}#H-PrEG+~OB#p!siR9)zbchq@w#(PrYlx58(j!X#Y!a}1*-laQe_;^dO@|Jn z$P*_$KEM!cBi4Gsjqd$SR3UFr>Xet@Y>hBRIRnKoBEzQ=;BeI*>$ z_|~l{fC~{LniLg%0}j|U1wb1es`eWd2ime<{O0_1)TEK8SgrbW7Pby<6u5!mWV0Ko zTHtRd_#sxnj1TZd?q+4X&A>1PoM}b^tL=|`OsaHX4#%u14uFr(hFL-{KMNm$;TvIW zEc>B;FVsOVijn2#I-EL4-mSmZ0Au(#i04BhZwUdnZL^lh_$lF>iH?&6m@^N(UdaHMeEB zuf%_R@8=B{KU%x7a01sGpdTsrnKy|a?!s0$294iBOAPngVls-;cnuQLd25y2MJ!hs z9|aoyiLCu_Pahb$)&LXq8w50_>s~-cUmg}>%#V>nJjiGB(AaEdBDCK%vHj2Z7)et< zM~VO^Xj;q!%#52~Y=R-;0_Hv46@kuh_YHz+i*!}##6VGz6i)3Wwoup~VKE82A&%CX zAZg$>twI&nb2hpfX*$e7y_DH>8FcVB)c3Rf$2Ch02*plzW_zf+;vCe9X^{@5*}d|3 z=hDCk?YaKWYhOl(tEza#_q)>^*2h7gIwp>tX5z#M(ofVj1KapWcjPr39EXuaQ%pmg zQX?Rb+&XYJC!@2dXxC!VN{_!H{@muAvZtRCWhn=sD?a|F3V0A1=FciLe@d}ggW|M5 z!h==msGSfr=10$|8mj`9{@oU{rOlKbnANE|y~@YgbuOF~JRezVV3-++R{sTE&!eq5 zBp`k^SiIu&4epfGe>@&3G;-~rbKmgwKHD<&LxR^9^NFeJNdw<)HAQh%6Gzy1EDbl7Msjdn%?r>e)ny+ELxT0a~FL;rYarQx+?T363~8p)bJ2PO;QK?#Gy1wutb$7Sg|5yjuzAE}mtW6{C*-tJ?t`(S-G#VPJjEM8_~v1SU;<7jEb&4abYIT!!;N%&R1PsiT-s>Vld!YoHfeVx3Q`1P9w#o1eKDic?wZjtIIx0z2QT1 z{F}trBy>rjpUk1@Hg~$RHS+a;Hy(8A8}hq*(i^VLbPAz$`^>OhM_mPy_R_>sDdPx}Z9K7*&`72a8l1_13h?E%1&SR* zSz!nI$edxG%SPyv`##$d9IKMR-5zRZzMu(y0%??_g+KbLtn2KWquGGwHj7lyA)#xlanS><_>K)qk7x9Hym39n8S6AHpQF%$7Sb9(r> zbT-lZ^oKH*!X-71B%3u_P@|`{Z%I1qeO#7rA`n6gSMcCqwenq)`_+x+&A}b0%H&|1 z_%?oRbQs4``%*)4B_o&6&klpmznq0^yh&$By_>n+{bie>4o$x?Pqu7S_%+Zcw&R7> zvyKO{_1Z0gij2~Anw$K_D>{2gk-wtgoxFgTXic*;M17Ev!VRq2Lbl9fF>RB;1F0O0 zbq5;i=Ro7%f#jXei&DHM3fx6=A2%bNPStFgUv5q4{ZLtxV2^p%mZ}j`zWDAOG?1s> z$8KuvJ{GvrkmPD5xl4l_cW*`~LXijnT$TfSdIvJ$_zgKHd@n32@X(3m$11hSp_p9l zqCB;S8pEK31ob9-&Cflcr) z;-UryYH9IFhpG1B{dxF}T!U_hEu;Id*sXH}f)pL* zD{x752U2i4HzxvU5$oRJV`jg{D1JRK)=zgmADuVn`Yu7XQKVHuCy(=B)?TLFzv6ji z8~ z84MOqcH5m3I}t4=kGz`0Mt0am(!Y@{MdPq^LWbGaiMn@T4`Rjkw1o%RKn z_Pt-s^kjJj0;CVTLX>*U8?pFa8QtqXlw!@lzz4q|AIs;N5U~7<3C%68g2FLrGuQ9M zNo~@#f!!%h7 zCk#|+-YA3}2g0xEv@bEvYUk2TvihKx&QMvI+5uVr1R=a|PXc1I>Bp`o;@w+S2kSf6 zc{i9eM!}mrNQ9f8yutSsw(jV4)uF82pQr%v&#gfxSKg5LO? zADBwVZBfqTCN#U*fuBUE%NvU^9njsb?HM&zgSh^gGiZDCT|z90+4NUJ@TFJJKaEA8rt<*{qXe){YctC-qui zN1BpdhSvahn+V|N8EQn3V!8oA+2Gt9euGhboUez`o_FPbIo|#^STiDob}=+sChOCK zHY3N_;RMdKH!8Jq*dnhvK}i;*+0y#HP1&yRydhQ6e4-;wVx_n-F7_EIB{Q8tsbSzP zf(pp|?Ak=D0MOAZXQ2lh;Byw^<;0Iz`u>lzwkkhzi-7_d@VAY{5}$qwow`zEh3a!~ zFkJn;CvlBPd()tK^+4d}=nnLib}<{HP=h&u2GXh1 z9tiYr69w{KSo&BBxccx3i^>Zr$iGnJ6BLyf<`YpAQRI_QRuth=P!d)a5q_Z{q$nZC zC;3eL8NZW<{oS9tb$ z0<)o!F*E)Sa5n$W2S}FI-X8K=TCe}f|MT;o{ND*V_g`ga`2N*(cw3kv5v|e-SgtjL z7O4gPyWIbyL}+8{Z4K-}1KTre4=*2}A$S2<%iqW2pBw_nkGz1B0_6AqWc&Y;i~h-0 z|0TElr_382MWDnhst^s<` z3y>qo8{`3U0a=0qLChd6knsOYd6EB=R|L5Or2|1;z;X5<2apeNj69I@uLfH{Rt33% ztU>%BUO*NCJp(qOf9(t$3nb+4|IvQD|D6}P23&m5ySuyg|IT{_1%YbCK_J4`|IXu- z0D;JeK%k*k4@)o0|5S$qykpw|SJ-I@2=u@R1frM%jxq7J6BhczcXwL|0^ux!KtJ8?euBvEgNCqdaIl_$u*tA+$guAEKukbO;bHwJ z{ih<}4GS9w7w;baeF8!vpg`k85H=PL4mK_h9^StT66+nXAB0PWN6sQBe~&`j690+k zBcbrLg8QrrwcV6Dzu;`bR$dVVgpaAHX=vG>a&U5Si-?MeOGrLfe4(VQ@>2Dc?i)RQ z13+DC8(TYj2S+DwA74NJfWV;0sOXs3xcG$hj1QSv*&jdUd@C#}{$5gAR$f=%(Ad=6 z(%RP3+t)uZI5a#mJu~}zZvM~0;s$JUYkOz+@7_M*^z8fsd5OBZ{-+lf2of9tgbQUrSVzl06EaImp~p9%*^xPS^k!n=p} zpLFkkB>exR`~MQb|4Mg2CI6`b8ygpR5!}PO_rLc29}9OYzzvM*ZV^O`g9ThnIAkC& z2!rALmH_(yYXh7M|5HB(7`?9-GQmv*79)WGFV)iOH)3e=RK*rr_GPk$>?WPpmMJCy z#f#ha>c?YO)Hp;b!sahR2Ow~5rlkzz=b4L9=e5K8HUAe&Zynb3`@a8Ef+9#s$B+<_ zQo1K9CE`S-OQk_t8a4z32`L36_oACgH=`SohS7|W-UP;gZT_C`&-eEy$2g90JfGb6 zeO=dmo{y7%vdXsFzwUy9Soe6Fe$VaJ=E>tN4Ne|gKrQk=K&420n>L`%VaJbSje4}3 zn|~jk#hO$}ZRlek?Lg{jh2LdI{6zbb51t{G6YxT5p|X|Sog*7!7M$rRCN_=MSVZFR zLr_}@+Hg#BsNKDi1QK&~7d)sPeEr6$P*<~Qps+;(8ni^xM{bBE;KlT-3e}TJ!J&Cg z4M`g*x1u~~+Zt+yg^H`L1nm!xgM9SfTTdHgJ zfK^BT3P7=UJy(|@h)&7=Vb^cOsx6;I`(D3Rl&5*1IM`32A1^b-iY*32*Be(zMvZ1! zbRF9pr%NJ5KV;pyQe2JG%pg7vA0ufr16q;l0^=2BfPi%p+)3e!2&P6)nc6k=d6VK5 z5_>v7Zjf<_)&TyXQRrkwY44mmyQUI=stZ3nuX=on*c8DzU>puPU5m6!|6IC-JGdRz zTanh+#jlfYJ1W47WDc!MNbLA`S5D3pvufPWF%^e*9$v)H>EtPPs+{Y&GKvjySP{kY zOz5QjetUVl4S||`;TU6URJXZP@#=bc63-ah+XfDDLz7yTU~MhlzE{A>AJ$8n8utiZ zOrUS^pb)NjH^k2p=EJN=Gl>+EL^?2_{GQb`XPV5m(L3E7$3-Qr5LAQ0#OTCLR3){? z#$U|LIfAPh4sFn1Z&@>&`PX(EacP!+hZ`X4r|(p2{(_or-M8~34yG((D#SvbWm#-b zT8@0B&4d>3Zk-d@RB+!KzZU3aGrGRb)Q@;;yUDP^tL^?W$Y>`VxmE^li0Ui@*p?&sWH`>j+R&LvzBO3%0#wZ0V*yfTI703DNu3)G| z+~=@Ia;tHqo9ME6y#3#IaQmPwDk$rP+B*DmBWiQRyL?1##KBWqsUP;*+7J&}i-Rz? zmE|%hDuI7Bm258*39c0DY6e=LD(>08K^G=8Nq}9(V`}Xnec2RRiZ>H(zW5IW!>2z4 zrx>&n{{w$edw1>+-WdO6;Ts=9eZ=Mt`LR4mPN4UPrepkc=vbdp^8S<|icOv#f4Y3S z-(;S9PWFkgznlvJxeOtiVM$TA8q(`{EcpM-e=eBc4!=(=Z}uf3#&BPGKNM22{ozmk z%h?11NTQ#EUC}^fu=e5lTBR=oExxe_>xjzClB=IhE|sFpy3-D(9jZY1uKVj_5DwZz|If_ zK<-Jmx$;DlCJsg3mlfDb4+y%t<8@dSM(L7HhGet5 zM_B*vi1(c&h3xqej!YNk0CY?legU#jp(Z?rOZ$_P;f`1f@zsnRDGSn$DVe+i$0LGH zr?B9vV>}FU$m01Q%yE|s{#i;`#e=Q;RGxFji}B|#a;43WRT&Vxq{AVTONq9PU%CHN z*&Lg9AFs~28B0WztdAWO`(Pj_1u-4a>q&R9SYsbd?KEzK?@!JQ(#Iu8>wTmPVt^x_ zSnKS9=@1fy-T0H zCV{NooW{3qKQU?OHst3Xjnn+5+hWQlf>SORXl1U|5o;<0So9s%wHh83R?)F1?FiW( zz6&E$TEh*<5w=XB)~Km9U(Mx&w1WxHFs*}4hoV_Hka(4LDFyRulvx#C^N;cQz7Gj8ufwYg(A&0G7gESu5fw{ z)7c!DC8Pbs2EK*E&yjp$4!XcEtU{-KNnF*fSozHLf)@Clg}3cb6_dW@(v@jVbp_|l z5Ej7)-i4?oO+f*A9{Z6?hO>;#^q<1~|0p6}PW_+iB+7et;!k6aD)vd! z<5x5FthXw98+;4Y-}dlbUnbsn2=7^3zeY0y|7`~A`T%o^-QBX}8t*ZFRoRr}`ckLD zbmSTl{gdn+M-G^FqYb!8yH-i8uB)Wb*PrU~ox9mS%GCKyT58k~8Z>y`K%)|(?2G8U zIld6IuB;<_WqsSfq5sN;PO|!*X_79bmqnjT6U7U%Z)%H z6|HhrZ>BIx?f~0spZKMQq?!~f--Y(9nh>|$q+PUN`R=h$aePz{FxK;d_l>;ydd69I zb+gzguEwqTZzCaQp>qX7=Xq=`1X=`bgp}QIe>Id6PZm${B7`H|UofWwZia)$%|7nI z3D-|ivNAaZKGlRPb=S9KFMXRr1QO>Yjb(;ep{Rk{Ny;zvQfS+kA^w6S`VD z-83F-JlD-%NM9nqLpu?g8!edI)oHZkk5)AvSY?MfL+FDQCVu5tuI63nY&Gn7BuHCi3BC;71!bDLGv_15k-WXVGCQe{?UCGiJUV%eswR<~ zKS$en`(wf)5{I0Ti!EnhaYBhih*xsHJ~Yd&I69ZEasPYxHs*PgMM?7G#Wxsf?}Sds z0y{3T1A80?6(RT{U3Pv>+eZ%QhbdXUlbO(nNK5?fb{E`v7`fU1mizbw;*NKcD8{ zFVh^SknX^TvwoM0m-w&P9kfqsl3gKS(~+5y%}npE*;Q60FBB#l8B*;ai$U^;sq$;+$E#L9od_!PmeGPU5Z0OAm zTyGLQ+oZohRJh&V1T6G>DDk`{icOt@7QyJ^FHgM}M+&rA-papI`F@Tb+YB`RTZqZW z&0<7i30nJQ&NsN$?uj)9F?uyoatv|qo{fc!tc}N-S@(cYH=p2(i>)qRQAZt(f5mNpuIc9bsM0WGg`#cYVIzd}7Kl{c_9{DoP^~asp=h5y8 zVQTv}FNg@}JxI70jyvEYb-7{9Z#;b$)xl1h%n_>xd;#4crzeTVO8p1tr1$m0v*PHa zL)XR4N9KWvZ+tsB`05e02{!8XA!o&;X(V%%h)aP^ieO70%r6rpk& zifux?ucD1zu^)Mx7jvH)23ULkUv+McbaP)cTJ3_@CB*+v*8Uk~p>3iZ0sCN*csrn| zKHR5op`YW+f}zchB4;O}<9|a%1_1eU1J7(znI*RNU%w*F)TfmWG3>xWk0%<57ZvFR zu~?YH^7pvfOP$Du<{OOmATP)UcCFj2ZxAG0)+;G~4RiFu{tW-1g0+MB_naNFK_m;T>O(j*-t=*ByJIkER??n|Ftn zt&6yf(kei!D_yFI@|c>(BSN-1o?)?D&3~jRy>rC=M&YdRpZ{}iFfwpfmNlk+^k3f0+1kfN@xDUH>Q2E6rkHlKJH@3LzDwmlP?=95S%&Txgi zY6=iZLh0Hi#kM6Y2-Skpf8^+@&cImB8@t=tA=9Xdsx;$>G1vQBcU+@x38nFHEJj8{ zVgKDJ$5`=?yV%S%^h1()*$MK;gFA+HN?xzDiZa()&GlZFSe+TBdrTT>H#p1!YPtbIgYeOhZG9O+Qomuj>qGoc{!p zY-Nr)esVrtmd$!qzxmzVs?kYGMP%Q*tVkm!Maq@S&7d8DkhqdOo;y-q)xdwkQuZbn zu%+40D|HXpF}`RI=lunv#2YWcX#nmTJ#1i33%x;Id^gjb=e?0a&EN^E+h-eJa*eYO zfopR=pkpvP2`CSK0?5hY+P-=-BX_%NPELcYq4D#JUDXowoup0

H{#A`d`Q;AnM5 z_!CTT!>od5pHQedBE8yddgk$=}^#ocdl0M==riEqs~VLoEAB~JbW`EJ@C(~ zbq$%pzg$4@xlEleyGnuQ6#rVz1;OQ1k_NprMf1tf+hI(kSS+Tc zqkLs@Z+6-0{J`to&+ne@IdK^m*#xffxVF1&VwIYyJ_c7&u)0$IlN06e3#Q4E1=2K^ zMCG0nNxarL`qz|NqYwy}XOjfn(%4@b?^wq}753)%_?-?A1<702mmuUC$Gdo1{LKyR zLm81C5ErjhzqG*oewtTdcwLzE9ObQvrxp>*C)B8C@&5d0S1+z(J7h*h!p2x)+?SJQ zgsZM_Glj*La+dW?)+|;9_tw_OL_%acVn~+iq=B=C1&es_K!(S(s+y{w0iVw66H&D< zU}^aKYcH`3p|WsByzg-rpTf^5^WTV<1vuj`LXXuRsR8f;jZH?E6EyA-z7P~gcu4bw zYSOGiN6+FJ7j>Ly549u>=N@+qQ5ggO`Z0mYT&q!maY_5mP2OobE(MY5t~(|~k03^W z#2=q@gG!9P-k2-^(@aRu%M!LLNPAJdHb-C`2S5z)Fr z>Fcw9SM@E3C1ZTE0l!7Wq4pfhpTcGcr=+37b$Eih;%^ckK2P-E10ty)-qt+%rdEPL z32L7+DgJRP$GZUzPT)3k|LcuITbIm-S(uT9s9aXU1Te373S0Yu15pxQz+4&Tue7gd z7gmb$EB?%CIZmS9BuYZdD!Ni#ZFA6M+i&Yt_yFE#485`E8UMOR zo!e4!YBby+{!Sw0b=P7G^2Ei-m%34H&oU;#{Nr;e@a6_r7yqb}aZC8Buq{~m$ zZ?Yy#?>_6>2|o3g&)cQR5b{{-9#{31ty92^#vO#DmxRvj@S_fTKZ^-rk8~onm+b@> zzggB7_ktoC#))YN=N5AdrmIE!9H#+Z>EL;di%9C146`l%s^ zh}_p{sHu0+L3!n>4x()m=c(*!sQX^Ro(GXWL8lca{~&+n@6ju>H%!%~Nh?R1(hj*N zYEVY){`T3m9@b8PxvPv*<3Cky=oUApq>1s-Qdz8hpmhH4(!PyLMa55XPrdliJ^E}} z{^Y4C@5TcE$)ly~mR%jksz;PcR7@+#Oz@)Qk#!7=RRYS|hBbkd>Q?h@^%`A$XS*1b z_}s}a&k&5Yegr-|#0~(`<*~Fd!Ywu`Zm#X4&Bdx4WYYI{67US10!T4n*f!1<{ES#b zFt)3#FmB|#J&BO4;+jF}ya&VyZ(j}HP|4aU7A$7{GuidV^=?J_fa)7&Ux#Hl&DmP@hHrWmE4OGrVNP^A!tPE(1w_69dUMf|8Q0!zo&Ydi!7s+hI^NZgp5&mrYblR~a< zA*qRkJpBbhvh#6wNMXppI+3LdMh%D^=mno+zZ>ZMc6wQndUAgdw8^QZMa4T%&Es?V zk4zeTyNwky4#;FU-OfHI)xqiYRi5EkR8)fo#v1Mpxr?bD3V!NTTnq9a?&yx$QjK9W zw7*E;nuQ7l+Y)mLmLTxI(h)jMg9q`^GpDOh|8KuxCtCO|W1u=Z0pDE>MI3#VD`?H~pWu9|#C=0-)sD|o{sI2R)`>ETZ0DN-*xRzc5t&^&rc}U?iof-e1H{0#&<<0J3}>r=k_@ zO*I4Fd9t#6hf1b~v>XJoACv=~9~|#f;;byygFrn5^_=_R8pfTfM*H<{_I;#9sJyMa zWx?v@r^UfjbBb8e7wg4c#+UMy)d>Bzta_J0y%HBkliC>q z6fR(ji!^1ol#+;vjV45dXcC_%);;{2p&`B31 zDBE8l%2m!>#U+GGFUmwdl9_rqRTb@|dI>VSd^{%ku6i_@=i*UJSp=ibCXuEv(2-%9 zA#|J$PA|T){5UtjgNFD~eY-uT14{f6o)C@`c#?d#=`i8KdFGTppdNbHoi{AzUPV9b z-{k*mn=WjF6{pfR;u|;I`G@3|_6D-^DP*lYhjy)qcCr4QO|(DAuJMr3ycOF5p&W4Q z#6*~_PL&)4rML0o^BRU4>+0vj(VF-+WW;t^H3El%e{uh!(LYk}#_VsV-`awX6Gq(I@$AfrdX$BzL&;QPv~e0yNx(c2;M&1$$-_ zRWZXHRC>XV*eTczyp{r@ms_PK@YfT|Nq6#f<+?H!yDmefJ*D-;pR= z&nYH;amej($iTFx@cV|3nwhCk&Q^x0eM+pJVDr+i(E#z@&2LueRV$8vaKHaFh!#2q z>etD08D)c(YO5P8Q)kh0@4XHQUNR4doG%zQ(h_Eb#cqd>y&qvz4SjTxm2gxtFrBC} zx0a=yt<9o@cG-QwEj6-z$@3#iuT)*eR) z=zw7tc>qM&0>+}cgaiEz5QYB=2n>NRt@OF@gF7OD*qOuoI$lpdE&2v+h9~Mn#q~?~ zTH=%ZB%|%TgPd%_eN+wF$~nJh$yWRq`fH*8z#)SK6x~1g7-m#euDJf}db)3}_{SPf z*DIn$Qe$+f7@L1R;8J`NXp=$m=*OxBTS*rQl-Iql$e$2t{Do`XBh;yQeR$(w`6{ZX z_2i7m{sg@`AG>DUH6D&+B)kS%-WaaiKpywq&?)nk)zKZMU1Pe*fe>Yt`vI@S_1u!Q zc+#JT5{fGP&o`irX!lrkMq(r8BVMU6 z+`PmcS{%trypEsRz-40wx|3HNg=Z(;$luy^lQs{#$s zUnueU$8^N+-o^&q0C$f#Q!bT7*n=CDTO3sR1NS;@eS9ne6)P4+yXJcC-@=;dOH_(2 z_dAa}z@qZZ<}nX{_|FvmfoudK(!BAuSchInWFSLyJwptX1-liQgy!_o%YLi?Z4^rqN|}*oDj}4L?y6U#KP*24|yfFaB!@Mb~}(^@X*`kedq9(3xL3=$WVu7 z$WiJGgdg($Prd&WufM?y(;~t@d?2u($8R_@c9lA+3p4_Gi*d%)GN@&zbLs9$OH&Ip zki~Vn?6HYWC z&S0#&VBzvmy2&&67Z|hD>9RB7nkD#6N?v{X2rF$`nND;xuL$=IMd^Fn?L8Z?&ehO= zWQ-)hj@?bZfuLP(hDHJX7|@1_x;e2^-2}P%HSG;wuw8&OLgr1PZK-EF|#}$(fx+9 z$&16yxY}U3`xAz#DQt`Ixq%)KpHq$~hFDc>zilp7NPXA6Eds+ztUZUUxlGFOa-lXj z6|`!E%8Z`ArEN(HyRJkdfQ*Jj!GSwB=3vH}-yefOl>V778>S@GNBmraT`MOqA5+`C zW&H8wtp~f}D{^ry^xkC@f%h8L^Q9?Z^5Kmzu})msb31xO!p&%O?w@WWYilgvGQAVq zy)SX@wcoALY%@9gWqaPQ`|tGy>5o$Z6i!3P|NT;Q| z0&Zzs0L|h8>}EX(uwk>PbK`?ZOv{fQvj%n)vTC)u(8)@+^jkIm$O1e(bjI%@h2P;D zZPtqwxY&CtDa3h7o0Io;8yT*g%K{E#hzeqz?};(RW)k%ac`HtuDt|(X*T&<(G;)UK z=9werUbzh)GSG=i{qhdtxiV$ z6;eL6M$A>G(sXPf@lIFD4%{E!UWLT_&xEZpZ*v0CEZttZV2umGBIpc~%Sy(D%1riF zL3pEX&6A^BgT{QwuOESVY#pLs4gA!%Y4X(cn+?;p$(EU;SQj z<%@<{%MkA)ogxxUh!MO`i(UKE3PBHzOB@{Qh$zoru-9=){*%6Sn$)K+UkMEB_TEq`>-bUYE0S+jIU(<48fDt&COr2DJ({af!uC~GIR zrdwbnhaxLRguoJcxenzZ7!eD7N=9q&;RZ3?BKDmy^>xAf>Xo9P&mjqUPBnnv8b-gT z&4r9K1$ePHQip~fR#SefOD=%e~?&G;a{ zhz!qG`h@UpA?pNn?wWr&tZZmDjMUTFGIogjkQgbhZU#)R6J(5|is0e}Ro}LpE@unrRezIh6dL)OuMy#jz54xPAOExWIR!gqg+ z8r_?fd&N0)5zV=c%mFU|UO5OF9I8Cm#}EsY4N<`8gb}ZNlUH)@q=%~kA&0t{uq?_* z+ZVgjtd}B9?$gF3ke4{xlK{c{0F^N>@h8mH9rXskk2VfJ{fJSBQkee}$Qxghin~$r z|#=IPe}fxcvtO@T9~bTb7q#=pEr9Lf5_c!O@0c}urD%aT*dS3=X=P|{U$GgEWG5Nm=%{tI>|W>=8N zgF(p59+Xw-&bV8GVm-T`k-exu>ha6J+Y4L|O{pICZea?e4eIcB0b?K_LIOeAvqdbc zXKdT1{%f8%QXkc?OFiI^i{x{R9kokZ90@Qyzev`hK(LA&RAOb&u-bv*!=%y3GUD?KR~$45Q+G`=%$485CNg?- z+(o#oy}e@-)FLu<2|HowMC>f|^LB$N??m7@FI0`1{YB|ajjt15D45H*max5X?CY)O zlz4nhA#W&=D08KEr3IZp!4HM6l*iE^s$KbHrwsK1m_z0U{Ch5|92zJyL>B1p7gEg=81}~AHJ47AhP`@(s``^V|K3f_321zzy(-sr7 zcyC&8;VJ$cfBymWD)f)Np^Yd?Ecu1BdySK+4jSP_7X`44C(h+tI@*tIBT9wN_|Fjw;|wk? z08};qV};g1j^`?9LkzzGR8uZ(1_9tN9e42}W1@5(CeHT@m=)NPY5J8t8`Yawi9Y^X zA@Zwl=8MN1Xj$87jQ01>5Gc{s$`==ohyRgDg7Ff?UG9wj`6SvcEV38+l-XRb z!qnA?H{}bL-H*lYr0I`i!3Y4k2^Ft(tT>RRS2oF_WAT&>@37AukOX|B1>Rk;ke@nYN!=jv5Bw33AT ze&wrr^mhrq75Z^V)p65xh~tlX$|Vi<+G$hUYoDS&{-m)9HZ_3(UiI-}k%gHk;zyqV zNNWb3f$?D(hu!0wrXAU-KqtRrWe}b+p3)(OkNA_Jr+K62r*Oi5d-?%>2Lo5E!7qsA z2-vzK{$dL=9!i-kDNGz&K43a&m{v~AtuYx!^Q=Fyi5I~oPP@yM+?{Z>b0p3DNdLW_ z@<1<9n;#vA6Eg12`QK)qOInIn6AEPys|p`{(TLK6Fj*i7&?-pey)hv z9B22oc%Ms5OBEht9lN{Sh+2s`#I>BM>TLXZSywGucNGB3BqZ$Xts@e^4pIx{U1n*a zBtDacT`Qh&$2UpZ*+NMrt7?5cgDXMJND+5U*ZBS73DxksrY^e&2SjxG+Z* zEjb1J0q+q53k~{qb{Pq?psqldpJv!`{sB0^|G;e71M7+7tH-t;rf8)_CKksvao9L<=6*PO>;qojNPhK$SMcgmqlKok+o4)H}n{SJ1?@AZ&tk%hPsDv)ziW?Adq8ulnsLzvXhi$bYf#OQ>8Nv7sB*yJ_k@JcR5y0^0}MVY$%#=g9fMx(QQ> zNol*n;RCR!#p$mQi1qWifs9e*@lGV`;$EQayGk?7W!94G*I!Y;iDw}5T;$!yMs3g# zSRY%n%ysoR`iB`f@W^)0%OCuFXNV}QnFXY+p(n$P@bQ=E(Vja_k@wr_5``wv}>1xhaFXxZhw z%-SezSkhUV>#tV6BZJ=o(xlPJ_?Z;y=(s7Slo=~Vvvl4r}>t! z={*1^C(zhjnffQ7SHI*OlNgEWD!LM~U%>`Z`c1Pwi%EhKS;pRs4ZjNv(yN5I3&^N6 z+IyrI&x^t33R9e zs|L6;?NV3g*^{RWO6B5E9u)||8Sbe8AhXMdZ^Lh*Mu>v4? z2~#ICi-B{qsL#QAuJQ=yaOd>z%g^?`hOY}fH}|~U22yc>I8c4>*FYdSgxa}f&wp9f zvbg>qS(IG1UB>Jff09PLlI!>EwCn0mXkB05)ksW8MY41#LI=UO)j6Fzm?oeME6#si z)bDjzf+3R!1<5YJa{;%+lf6)Iy~uh%{)Ty!>@Lz2=?LNRUj$fFB#vN;iTcyoV9hm& z``xJX*U`z2o2_4SHcqpPk1CGd8-YT!w5aK!<5?T;^CL?`D+*tfr>qC2)xB;#-+48p zBEg#(q@Fb&*J1czxKIlbB-FUu-ePV~XlGa$Ux>o7VT-#$oHwMH@frJubMKoIhg6fa zU1-fPU9KCkcbK$GmjV+#Gp_V{SE+)*QeW(__rA(D`&82zvnRF+Zyw)zCiu{@st^K_ zJW|eiC~^QV_D|~r)6{|Z@X3yLvvNWA+}&>LwTY%baTO8%bX}VUN!`)LW)>9_EG)bY zXQ`VlYzS_BGQ@bVd2om?<-%SW!nOB0U%yl@!yu-0QcbpTOoI1#0%Z7kxb*RN7RfU{VQSyzRM28zfUADhGx?YxYa_elq_09k$q`*Aw6Nw^NwdN;Y@9MF+ zze_(VW-6%!9sR zj|6#)aY|QU7f*lWy3glz{#NrRUL8$IP|ts4A+~eda|HG$xS#|ykiJmd|7yXsC!h>3 zGlFFe8!b%p$t7YGFMht0R79M9lNx_`dtdW~+g$BTj_iYvn+TUBl60GJDXE7~q*3Qs z)|%PTn6}w*!x8aXqJKvEiL*%@Zr|$bJIptnz|7*Wgg)iNNOa>!rdF~MU$42o{57bN z6Q2Z@hbv4|Jta?z%(6xiP6a<&dkgCBsKBh`Zb1Cf@ImX9_{z3~VE;sK=?ScGHzT|0TqR?=bWdies^tVHfwv1P z9J-Z;KBoKWG2D7a)%am8?4h!jb`xQLvmM;sAv(Rnt#VV${K}6~{muBDE+v)c=7R)0 z4)MAM(~wTr(89Zw^0~10!ihJ$S`#G7=_Cy)5pUkNIOct=2l*>%_kDu&3C=CMRZonq zJS`zK&q^rh#_~sM(f+0jQxv2WoJDdILV$m?Ls!m%?3<+^nXreA9NeCR7Cs0RB$nY? z)Io@A|DUPnQ@35$d?F~PvG@!^bE~Z@c=T`Qle&8xZt0s?5{wIfslSlTqJ3h0^%U=G z9S-8GAQ%^+`akZ@jP5e3WM3!$UOixwl*QZ0g_kmLNdlUkdw3tea%syye#+t*k=geX zC!Sl_1YV`g>+CUbAoj%2kb~WI>lpbdQ;PyCyGhcXHpl#+Q}C(DEiYZ0PM~%HbaP>@gx6T@P$-C? zgMoCz85UeXj8GwM_d=77T=~+6S9)FcJ~Xa{*rUflN0X{k-<063>^4d z;KNl0`J~caKTilVyKd2_$w;mC_m|MMmjjAIXhHB8LjvLM3OBa!SGWi79luq+u9E;S zFZuJqU#leuhUlMlScRe$_~vodoaLES$ytr~^Aa;xn03M?w9aA?{|yJjt6?e7>vwaD zA?q-hRf*7I8Fn~^~u+^!|B`_&DqYf{;=bzL^DqnDv$ei_^0R}b(ppxXl* z@EewT^rzQn8#7eTTkoVCqMh#6H8-`~8{!mtZaQd?MN1R~9)QdLEoQ$k_1n&HHaEh< zfOqQwu64RLKe2jdRb^?jjd!=uO`jgQ1gUJl92N<#=`Z}*M3InybV!H**BF(S6Rc`6 z*$tTh$j-wG;s^b0uIcA-nLTB(w6JLYYl~|<_zT~=WVU-<_TN|ui{E$VpIt1~w0%VS z`T4njl@JU$n$oqlvOSxS@3^eP&OB5M{O~#_j4Vd9=pta^pt))!3IK4FPc4mL6x^@H53%{^#Q;hptuL zHnl2z_TKEy*{TF$PyVQOX)cP(+{I!}R%ME>9q4{fd;DvDoy4^2=&AG7^g zWspQVe%0;`ghWXO3kOs=m9Af@9are)Sv(SM{PAo&GuJ^huH^j?!%La1|4y)&gjaL2 zUv1X4d)>e2%2%>w=B@zBJjtC|H%SSHrq?c>tUu#QziYOSdWkj(YyJ7b1D_OyIrMxH zzTf08e{^c#CYCzasw;#D3c9+u8}xERnKp0acCmukRkdC!{eqJI@UoP9%xBhBPdF8x6)vOwH!iZ`qqFYF;X$samZajZDo_w|;l6%;bomj@yqR1p{aA_tcivgR21cF842~3S2@f55!7f_DihqYyxDpj!=r0kM7?5 zLYGU)Q;-PcA&}zQ6da!emj)vb$f!Qi5VuIF5WN>^*2KL$l82TBo6vrY$0 z&Dyuy%yH0vWXaaH)E+`VyblG9>*iRkqca8o>8#j8_`m7@Upi}p`~Zknf?g4;g5Q-` zK=-!In~X=9?>!+UFsj7+QFn6iwwxob)%=!~kTP*q9liVJ)vw9X9JBHVwE5DxA~dB0 zvHfw>isk(|Tpl*MY4XTz$NWn+?$_*>e&5@;4L~&c%5@&%d;v$a`OYP zu)=~tvdH~@n3dRpz^agYecI-8uHEJXC8BCL2+`7IVq{e_!5>`?;32Al6zz0mOs zHr5uw6nJn6={Jna4sYrDurYEwK&UwdNR4TXraTzuo+p&&P|8Cnkd{y}f&kv;>b}1g z03zm`LSs%@%rj&6D=xn23pqumZaZ8Q1m6mvRldeb5JUShhnQ|Wd@yt3wY$A4*>DI_ zad*r8G-MZ3&Z)PafG#S>s)}vE4@1_i=f%u1+6N;TaL+7B0_ztq$!j#WY5Txb>#?q% z%wqq+xcI;aN19O$P`AsT2~;Im8*;-Lwc0;UH;-46{X!Qub8Vgw`JpKM_NK3nSPVr$rkn zxRm{73E?opE_CR6v}jQZ6Z3Lvm8u>QoiXGNpK>hRbss>i`4Ukhp&WM5QlNYhC8fOk3 z$s7Bg>r5Ux5`&~j@LJr8D3*NyJm1>e0vZ_Obvr|R_WNz}2USrmDsAQ*bHZ_s z=HB?2rbsDzZ!E-CRjS`M{iC

xfq~Q?cjyx8dy~E=qp73s%eG!FJZTz2P*ut<}?p zAzxtX<$V!A>`4?nD>Br!0SSd zQ^D+>LLtCta*GnevNHqT}V;2!3yRm^Rb5A&DvU2|WS>(P>7jD;z2`Iq6 zAZeiF{;k@0G`0aI^%S$!Yb+S7Xv4ttj}Jh` zxEkF8lE~e~`n=iWyw=$PK$%oN?Whl7L|kiG#LoQ%n?P?81VAX7F)L(=jqgO2QxOxKFihJC(%Bz{AfJkN3)>dHcjHgUm%2quSQYU!&7OdWlJ0MUtvuQDkJ`KE{`%^NrMhPg7l|sp zanl!`Skza(w+l2M#tfCrB2$ru78`-&^0=q?rT=m~ z{K%)|uDaHx76hMSG;2(I-1+8zF_x{p{n!bv3|uK|Vdx&GgOrUF55YWdzxZvkD7jlz zpeeES-z8l&iS=3n1+utC{igpj7DP<>2)E0GFw*=CxcKy%?s~ZyXA!bBH%d4M4m=-d zADGD}S_lj4{c`ltk^%)0eQV@z-RWJWG6#w>5vT2k1Immr=Fks)!kgqIygpZp2yygRId1L zX;P}Bw&fEfxhokvbJ1NW6_lbPLHb8_0u}+U{ZzjXM1t8fpdekB?gv}>3;2ZY9M!t- zFbB&{`xH5Ihh)8I_20H%^IwxEj8LTviY*!M$PILdfN>o&Bo=%)`mFCbjG&7i2{BcD zkoqja)s;1Dz*B8IujaE?jh%0IMO;$8Qg9Xn&{wYgw_5j)tWSv~=MUQuiF2(wKIOe> z7Qe#X8O>Ul=gg!^%}Pf7_`0dO*XUvbzeEUkWiZQxJ8}&kxS26p367CvV8Ew#{|c}L zIfkzAv%l<7W>#HrP06E{AEEqTm=4b87G5Y`{yXFVl z&doKuwF)WcaH390E@H?2`ybgI2;(0ZQ=7s@h<3MvLb;$*+8mp$9)~YIT>>|s7B~uM z)XmHc2fn&W=1<94J|44qx`J$rWD4(8q*I3zoiEdH0~ojTe9{7(T~*Ge@S=JiUu$G6 zc(6NSp$hN!`{;HHtc-PM*f;CXzQM&Cne7T>uaz-tIIb>p3`=L74WU_ytPmDa4M+>& z*Zgne>cbowEPLGh7*W}g5(iarJ&MH(hu_j+qpJo}vn~I~sDYemXze=D@J^%DmW3)+ z-98V+6z>EG9!q&o<)oZU=fwTpbMyAg8$vrtFn*?DONwP`j;DDs zkLIyW7>?kAehZPbOL^`*aQtbmgF#~#GHd>PYW1MOQ#3$}gKpUxYJE#s7MZqoQaHF_!IP^Z@=I^$5HfpW?H8PUZMx48YLOJz>edr=EZqc{qa=rmL zqISAmjDT04oLO!g9VS~=p!ig2c(9@{Up=3bla*-x#NQ2_wdVdh{`$3ZzN}Nhxv7<= z?4WK&i$#bC&U-0cHE`7-zF6;OlltKWdmqTC5WTq$iiyx3>j z^*%1VdAbakia+zF({yC!^aFHa7(WYOM+2Jst|v%acr;V+b37$qCR*_A-FPJlKDlJ` zfQj5x?pU$Bu&2rLwjYf>H2Q)QZevn-pu8FyqFNS1<)ZQ_70a2YD=5c9n`dXm%FYjc zomR)kgO2||NAT8Yc;`534oBWL6Z{@`DOoI^5MonYkQ@-0y;54(dHDa)bROPp{_p>% z+M=pxOKolKt@ch!?Gk&FwrZD9n@CiZ+C@>j_HMNH-h0)IP0ZQ}MM&iRyFcfAe}6#E z$+=(meZ8*nd_FF|p$Jj>-lj-K?Zx3?TB^{3&3Tw7hKwMNld`wl&?Q6ne-`y=E?2UG zR875$J3Mkr^@+{vdK$-&zAKiuy|vEGpi&^13>sVg49FV*O-$;iAzY9Fh9;)QRp}|p zH29gfoA3t1MN^C&_q&fTj1x82&D=J_)W|;5^8&iO)K!dOd+Z5C?Kst*uW2gk#=1^C z?Ft2_#OvbpL7{j-uZ^};yynV4BBaJvgoPz zdt}DPAw4!O*+mjUx-DP&nhM1Y#6O@97Ygw&-ay5>Aza26ZVYJ>_>0CwYUV^=h`XTP z{QmuWFvZn>aa#y6E7mcvdoSf@aehiq zs)FZ4Xekj*e@*QaXqr=?!H`wZ^J~$9q^c*MqMWhA_(g{aD<31tU2kMFnh?2GDmz`u zU+?yAF_0^9rru4V)hZbLfF(BB!e!A$)g*f5y1~!c;$iwg2=Bp+uUuA zi0^Yq7bNX0*#VtQ{IXrO^?^iYRQ{j1cjo^JySM+5xV3*tl}8*p zZ^SB^_do3xZUOA01$=d3o77>BqGPLiVER*~)ZSLTzMIv*qxkckSdZkqvi3W;wK!QZ z?b-vtBmV3^4xT``i+(xpdwWuN-DYY(LYbjpG;lH}_LyBM2-O?N+p~XYh_#>D3Z?Af zWzwQ=b0DcQEMxSLPFCE!GFy7{x663SHIER(x*0zMXcpS4D{wk}39d$I*4;(Y6gO`% zhdm|?-3CAR8u(X16ICDhO2XoAdhE0>f56i&p6?0F)$HQWmB3`HlMU^2Q#Rp8-K%{ z#-8!AafoEqaMHVL%DDss?y#F%IHq+`QkW^=5H{*b8B_bD;auoG;w;}Ba9T%TYwU3{ zzM}P4#J1Nyik1Q_xf{Ci2YB>;YT`v7^HyDg&lb8RS~Vl%{N`)wx?gQd1H6^SSaN{k zg624g#N=pSrwczY#^%JhRVq)F+CHM24=#-qrE5-3+dP>&xFI{QQo>Tc4(xM+M8hTF zlMrBJOhb4#TbVbB{`2`#dT(sDZ+08RqizlBb6ApY{znx@iad?}B7W(RR>+*}Vw;qh zi?O_=3${Az!Fdv1-UE%1M? z`S&&nF#YaD#;h{*_%@AyX;JX;6h8X-wmxM@L%h^OGvTd1Q(efM=6P0sTqCj-c!v-R zsvYelL~0j9T-@y$_cMxgmSAP+g-M40NK)cfJ0dSWZU~XEVF`YJS26m*qCh#SmFv&Z z?f!OB=*(5}fnG)8lD5MOkw5;|eVE_;o3gS)F(?%7B57Shap<8(bo#^GfP&Sg<3hs0 zQOdRD?^jnsQuG2~aDYoo%J``ak!O#rgI&Z~a{2!DP8ajPc9#Z;!q&FJ!U}goBeD^# zhB)2II;I(}So8O(P9_|x1R>z91unEsVSrf8YjmCB%>sV-H*l=Hq>t!7nVT(`5gU2` z!MWC!594klm`SeJk>G$6T)*}`Kbg{xRSvMFG=+V>>CP7HM+>?qhm0oz`nyg!Wm;V7 zM%+iNs(XT|O?TMv)Jh(X@Q2TgkH0}{VSk3H02^*6TVK-C!|KbA6C>9b=8 zokxQN|0epL`ngqauWfz$xs+-fX@N*oZojUCy;vI$gFmXlC2dq{vru!G_|GmJttooC zXYhy}m|fUkQR8;v&Mw=r@#pl{x&O;wE&YJyftaf+HkvL5Kf_fQ`P$RGxtgbn{Q_c7 ziYYXP9)@jJi*@v$Q~o4&H<8CG(&yn2J-$v8=GNW5!>%`Pxtq$||D3*pH79qzA+h6o zCzN8lmBsWr2Nnd|>AK>HEnf5w1 zzu^^#Y}QF0tJL(z%`^gk#@Cu82bxp_N zlCo|Gg)%}=(U%EU@Uxd?#_t_26U?7QBUS`s;N4jag2Gag%gRrnP&a(RiwPCacw&1X zp#&$2jU-6nU+zixBh*s*JKpk|`)DP*OT52tv1q zwO13Hv0m!Uq3c8{8P_MAbgcDyZPo(DABMLR8hknwv9|HynYyXy>(*Ce2Ioh5M|($X z!Ye1uzqJpSTWP0*6TjvEKCJul&rHXODXy>CaebhhN)0Z0#rjKQbu# zQk;Fs{qk3{&Cc5nN1=(-;N0oNz>qmPjxr(%E?Kk(e8YKRbo$S-)K;FR@0)~=2F+xQ zivt_A?EF}pyD#%Yy7aVJ>zj0K+TtDas?P!sko})~40?H~mIxWwm9^uG8akNIGpU^j zrp5jO*SicSDE(hrJ4-S$^DCOnLVbJcY{m>!wUJM%F`$El^`eLk_Siz0DyDS?8GV>U z z_%5KMulkszRQ$GAu82_GdgU1Gcn6pQmul96q2D@~{42f(acQV_UeUBq7dO_NsHXQb z0X>mh0B|fC8>mjmO*QrHpIwUbJ-&16mpzxyC2b0Cz5JT@Y=6mkxW_vgTc1D&}`)tS_Hj zaN9n&S&m(6Dj;&#V(H_uwmRPgK+DxU8`N`uczXDfj&?e^;bZdZ^#zL+sqiN$Z0u|YH*(ueHB#BE-U`0afP+a; zN41&dh5nA|Y57aj;oF`|Pf^@E=Je z5z>y)UpAybK*wY158Z;eczZ2Bnnk@$NaR|WXCw*nnnrcmvSHar7M|s#fBmRRA5bk#Z%{lNaSBA&;^_R$WJXb^^i`bRaq>}tfcnFaeRBJ@3iqo0 zB^`XSo#~Az@+_n;bGjBy^fcT(K=Q=hu8CU4E75_^CZ|rp7Hdcb|KaYGiPpo`9xI3V zu>g^b03q5s2FLk+^senV2HFRxsV047YrZq~d-*7hm{ACCB;B=g=H7hr`vsAK9k+jZ z>{_z-;d?<_Q-h(jhzSbHD<@zOtTcWGG(>K=1}e3MkFq0$UitN?h+V3ir`~zvuJocW z;QZ(3i0J573$hU-@n;K|*7&fqoDf@%-?$jzu`KDN;_DwaO$BN(2FZ7O%68z-#(7$X zjesU97=5U?y3gBfDxHwY#M2VaH@(A}@{Mo14RMW>%5Zx`q|Cqf%g1FQN~6#}xTR{J z?)|XysrzeVH~doE+LfBJr~ws;K_RMrEptzA^*)X_=PJhH8nM%=%2NqsDr>KqA#^yd z5HZSGVR;YGPJ1mPCb=!4xSuXXle1$pE3Xt80fpI=%;Q|UO>dlB5w{gwgZ#fTai^G4 zvbrc#KO+g6H7rRrDwF>QT8QY9cRb);moo!nq;t`_T05)E{q`k`V`Fwkf+3|6_f1HB z|LJS3Hz|NAi2W$v?W^7vB`$-nuM(EKjM?+Kv_7H$JAh&iPNw=?A6Je63|FLi{`u>D znX}A#$px+tc4ZO=Dhs9auv7{ShlOS*1r2dL-S+6wF(Ah_67rE=-@+t-L`N%iDG><@yTLR9$&J%5U_GNr%t-wT+ceC;H=C_)D*exB?hARV=xG1SY*RO7-^{e_9ZnDGnnr_d_Wx=1 zpAbje#BrXEx$rTdDP)IgaV#iE^bb^W1v9_<%i#BUG8ba7j~Fh##cR;NmMYWCck0!*`q0qTM)cUq%~l6J zt3WTKTj_I@lDb?|S$i!5@_W<+G*~rW&aXpm(nFeKZO1vTlnXxfKrnjECW6p)=&i?c zF;T`ov(z#Dh9dlImuf6KX_-G zZ6NB`X)v*bK_obDcmD}r`1+CTC-E~f?@wHQNNgI84r|wks2=apRakfIRw%A&uS%CQ z`SDa{#H~>~d#pHFNuu+X4%zhv>>;iYlSI68jAvRIEm>OJ&6chXl}yylef!Cfq_S`? z9G{@4+WUBLF^!Y!d#7?&#u02L;TrBQhLgp?c`&SXQ_f+(Vh_12-tzMLGUsJZUu!>*Wq#uIeP6K2~9q+GD`$z(7CtivmfosoqMbx7@gzM!|U9I+tY*>a_sfA(u z7v9>&lIdOlUwF?H*6IKOg?} zaPLoSa{9#2CPdm8s?`Buw2A@e#2DHVf&uGv=ovqk4v0>}FaJ0w>{;V<9*xL-}&Gk9T%N@6r9LPgbYVDq%WS_={%)PWa4KHPE>Lb z-CUjSoE!Ht`wr9FDg1=@ZNFf*I?7ZgT{Q>mNk+U3o~Hrb_R{Q8Zj-43^%CN~dnKK&K%9pu-Yf{#t+ zvj6zlYzR@4+FcH%hw-+4S8J_EJ>-P0+#$8Nll`MaAzYyZ`4gxqlG|&}6E=|dbTB&o zS3hd7m1TqH46h#@V${h;>Fl3f4vBHad>B6cjGrU&V#&LZCELccRId~_3tb6a(dbElLAuIwL%6+gsHXlS z#?D?iDM17{4qq4p?gcidmtA;9(Y{i%shPL1;x+lJLIr-T%)>QU85tEcWb`VOs`5Ft6v)OIG+m5!fvDlmav(oD? z0-MVv8T?RM6YzdRNCwIRCXb2paJ_`L*`=*LYsi#xE))C4&A)0@$k*&z22FEnnQi9y z@m73pjD0d4Hm>F{x$l?1s4;iR(Pw(WF)weQ({m^ycOaF;89e$={IQnEhGRW9=hjS7 z?9Htc-)iwt&-62zxoR}D8%$2nslsWktx|L10w?=qqULW<2)(PjGw(z^0$S_hYxbEt ziPW%CyfzGUT}%`pjs&JgD^tuno9?2ApE(vTjC%-&_Bv5fe1SeQHUG2r=yhS+>6zUQ z2%`aBfUMVdfaYY(t_u;a`?~7DNP~HK^D4{~tCF-jTG%aC_obf|!FRbl)c;lc=aHA~ zF{QnK?;VRip1fF!nP`2)!=G6NKNd{`3oDZkLS>oOZRCyH_?!98nvUV=3x_$=Z1{PV z4qd1h?+EELo}7)946BU_v`{;4;@a7E3uM3>#$Q-YtlA?rs&$6WzO50fQKp8_8|sw{w|u9?PVi^zq!vUk;JEx!uNbU82lSaKFw{7G8l5K^|-~E3Q0@eN)pG zH{+OOI@QHj(@M?fMttYb3#3LV&C@4hN*iltU66=&;E{7YB6zK?4^PsCy6_3V;}d&( z%WYDOjIP=JV54djYTMjp?~2zGFfM9Eu{27a=_=7T5zNev;6d8-xv_c?JvlZ?W~T5= zM5m@QEzAg4j@Kf9u&AzBLxw4D!FW!Sl$AZ&EfVo^H!~q@R-YoRDr+h0C&;VaK+>re zUdKHD9i6tCV5fd(sL#2|+mwJd{X1=c$}{hesCZ5L(ASDC#B0^#G&sWBnEV4*U!dF% zN0?Lc9TIx5ac0sP>YE9un{6K=j@C{gIlunE{}oS5jag_yu^L zXUx0Lg$L{`v3t^9vc3E8>E|B`SB9#7f2($#f=(ydxufcI~b~FS<;njZ-)}R?e1g+tv6m=+MW< zr!b}T$j!E8|BcfWlKv6JeC>Iv&gG)c8AuXX21V)8ejhH|iRXsB!m{}9Eus$#0xP%6 z&$)lk2aZMu`561&virP<9Fei(z1?pAzs_{D=_)B84qyG!&fuV+_X)i}|K?00fcve% zk#&e58t8Fz7I$T7fN^%~@e5SF-6JzC0p+>@+g07cX^a)m5!KW2@6EbbF<$lmLuG;I zP601C0p#%uh$pmXYNtuQ6ji^j55nR#l7?UTjUs-qKfv^O1gLXRGZqSDpZ>1zIsSdJ z`v)f^P`d*__1q&cbnF==LiG>!X+I}!4LA7R88P3?Dm0TjmnyZ-UszRP9vPz4A7uW` z$K}ns!>L@J9F&f5|2@v|?jNIs7E`Kaj6*7J1HQ20FLhE9dW) zU@J?KykK)p4*;sb-K$si&={-2!yXUG zVp;OnAC|Dah$3Y`Py*zLAYO3}FjZL}uz~2&3XV*XG)HZj=x;WT zA|(9r-3?1!4>uxuJ{U&1yPQO~#cF9QRhIcpw9-JIY_!}nuo$YaO0PE6n)hiXp1DEo z@^DJ|#6biwyLNy?LVj=O1i4CAui2I2{H#u2c_{5`2u0c>D3)7C3dJWyZ`J?wpoZUs zcUZ1=wSUaHtH>=Ct>_cW%%y$SfdVPl>u8QHLkc4?E1$anf<`t9wl8KBj7{ zE2EbK$$do6VQT*)3F0>_<With*fSosd+nFekOyy3!&8LXmtbp%5FjWnDat(`L(=GwDhz?;`-QvS`^<~2Lr`= zl#Cb*t0o!qag+Z}PmD2~$v?vIx-$OI4Gjth{ff&f^G`Lz)Of7A%9u`EiuXSeY+`A& zS>{wWc$dhwWfAv~uEuQnKaxjnd<2q=wRb51XR;Q74z`>OcoLwfi1Gu# z)y=C9yG7h^l^NS#)8fru*ET3m=$E9%fQVmwc3`Waqm+-6t%-a zykJp`t6D=!Ql!Qkqn4Vv1`G3u;mP7Lwt;`=XLoX?b?DV zgDtJ1B*VJHHdm#YO88aq|0u_^*d9QkLj%7<_<)JtzoO~c*BJ$OV@9D&HPe=qZ!Km8 z^DKGkIP(RnJexh;4+Kx+I}z2qeL0cr$|xx)y^DN(xOdM{}xEt-iFAjG~7pZLYmYCe{zPgH;mEM*4c5bMh z(K5I7Oi`j>E7GtsL=)OK{&|(91<;<~yu>a3B;3a?ANGIw7tg+=`snk|GwG8Gckg4V z5F0obRAJ=QP%`1ENBP6Tla4Utn4-L1oY8ewXw>$n96?u$7vmXAep)3(3%bfzu26*B zu;a&MEz=7-1c6^m zpu;s_sq>C!Cfe=vjixrkyRS66qTPV=GA9~AL-0hqiY`R-+Ojv-<02ol&ZYw1^%3t# z5wc->dEHiSpQ)<>qC!t zab+Hczq>2LuV8*_lNmpP?6DrE`eN*9>OJYElQM>Fav4t9y{OjF2&O^XO7z?Pu6XaX z!Bxz!WHYro>T{G`ws#Inh-dT{>;v5;xRnqTu_-xTc4f|ql18eYcdP0iuE|%F#TYb| z+ui&ii<~^haa|lnl)y(IHEM-%jhs5@Q4gh`xo)w5d-*y3?s>wKEghS8zwCXy0U?B7aU3^=y9oSq3@P`C%_Ab$7~dYEL8CLOpbdG6pg1m3d16$=93dlNWpK!nz*_3-qM|Nf%Hz!uUDQb!>)O1CHLV)G32a)ZN{kt0Df# zMvE&f<7*&?>{}I$n~6cF=`$SgHQZ#vMF(Szh?H&^RG*)kCDTT0-1Mi!Dm6a=E+}zY zR_t_;f+p}f3|Xh-kr%~B_8;p1=Goe{$nf-wkD#@B1P8z*hsOLG4hEV~0~oJ8OGswp zN1s?0y|@+ssn9mZ`C0Wd&SByqf+SlIXBU7m+ki$AX>e`au7Fw5r9M!1sK&-}(jlgv zVrQGGfz?}zIJ%;U)4{Cvvx@?Kf_O!%+yxi;@vx(XQ;pU%((EDUC;nXikkG=sgoy|h zmh8FzNQA*S;hpeGWjaLX>o#HMh4&6p)1rqyLLw?D&9#wBXZ^y7{ztC!=z}pS&-;?;K`WtAjS=4rf>^3l{Cw_5f)DBZYx+@alH$%sQtBQq%^z2 zb%p~DVoyFV+QzDBIZKjld&o>w*?9Y9!`jLwSb1I z)as*A=nMYHYPJhH*utAYh(H&TNwWP5-GYs+^L~bncF)rvO;bl8KgGvPAxrv4DjN67 z=GI%=c)Nsb=_U*=Qp#r#rcXcfjvW1S{0AD44cy*G`p|-6V-C*2zc*qQc$Vmh<=Ou&cYNQ526D zl42*c4(8C)ClzEdY__mfI|%T2pqn0K6KoT%H0U}FQ`vreCR#Ex2Ivk$wg;d2uvi2cUm#Wp}7++ z3bM7B@HDofG|nQ^S@r1Q#&^kFLzh2bi_@FloxGqje`{3$mlCj>_aBK&aFcTd2%~cxoh^#x{X+1>i@;t_6-`0ie-J08 zI9pSevv&qY%-%c=x6tb-l4$tK_#o!gHXf6Y=Iw%hW>=mf2xGMu*4;Qd{k*InvTSNq zWXrwsXd&a#RF!@D?CYHnd9#lHNPdI64)q|58l9_5-U#Ww@vgjMvvyaBEkPjYhRadl z*Zq7{7zvj^lsi4I!&T?wcEIhpSj@uog8AP(lZF|0bK!zF7GDv0Ar%pBpCu$f;y`!X*P?)bEpw>m>2u!66LU~LzNbyK7#66&XqG+KmWOo?Tp*`*W-1>&XdiWg;^Gk z+j0b37n_5s|BB8x6Db$bh&e|+K#>cH(kk6=<3o9$=KjnrbTIiw~#daQP%(Z>t3ZqY^Xc{I+mV3?=sG4?#k{e_m9Z$_yO2~iBWZQ1;L4Kv}$MH^9k zI9rYJ)5Y?JBi~m_57k#`<|gV}sC7AdfTWj72gX^im~k|Q*Uh&w$SfFzbS|)^t-XL7wcaitN_p_&jq(F0nu1^s+Nx zxEm{M3T%9Sf*=1(ln3gnUns)Ea&UD{r{TPR&;~ek^HkCbO!vIhpW#E^`i$SUoaw6@ zv3C>QzTURJF^_|ELaOWgIj9M5YbT^)756l(Tg-)XqrDhK^j>{T=(Dpav(7xUX3s+7 zCh*S!h$80hul)kjCLCjbgk$nyim#w0Iq|bXWRxVPVg! z`07;^BWhTpZldj_p}x+{H$qRH)~PtbjdGmoW1-Q77wo67>%-+$K}$iU`+ZM(TI8=2Kw^BW&3EzSo(yXHg7EOTpM&NM z60<^!c@*s!HUyL!P4%IB-cjUWye75%;F8+J*(afv!<}IUvEcij3j9|F+p@877Gsf2zi!doS(f_OlcjlSN zuS{vGcMLY2wVL0L>y`IUQy17%s^4wA&Rr%rq>pz33LQ+TcQnh636nGO#t-OG$S2`` zJ$0}P0goGAZ(k7HDDH#Mz(;=J92Q?q!Aw{NOZIgHH92d;kU7wsZ{p^yX`2f+1>cFN zYmBa9bfGvN$8K;q`@_0vnCGpd@)G5_Ghhad|4#|JsOzC=W)6n5~E&DCO@u=?j>DhrV1o#4<`#Q zH~S<_!6V>;Os2-7qCTyOi0lX*%zj5YInZmebnHGPWmp-1Rd4>y*W|?@?-}wGND~3I z$MOo_^-sjwj}`6~jDYY?7>pT&W}nm&vioNB>~x3ZM!Vu-X!eI6%tWB}TWbPg^hI%Y zqVtBwUADKftom0#aW*nqk=BBXi&cwWzCj7}43*b7A}XU}4{_iPQxn%4I_LI*=0X1+c@wkeu)>)D2Bs%c0Nk z_bPK1L4-$>Mvn+Ai|c9cjF^};!YGXfA-FW>7LGy$3Fvp~K}P=SeiM0Qo0NZ`t{-?t z^!>{DXiQb7bD`$n$&nLvJ1>6Ku~{m{wXJxLzjQ$Rh2%eyd>|mV+lrjTNjqoLEd*0c z|65`|BMc!rrP`eqk&9%#SEAjzH%@gYYZ-o@iVDpoJxh74_;(|=?)X%1S9VO_?$`J_ z!8y&W3!P=jKW}^9N~lwYeA!2R<~|5Kw(9{dRAv8YBIV2)gpoMlwK7~Zx=yWDTS?Ya zK{<8&uem1{9qRfGvSeFea7&=`!)+ruSz z^3bi9(~WOTV?hlH<)U$-EfqQM~?HO#w#GTrec<_oyAmIryH z>^bFN-^t3j0tTfCYbzlu)yFC3Db^3E8cGN$$Om$RW<4+Fd4nofBFo*pCuh&`aE&idhUUii_WWVwS;T^##0@~?dI@tD3$O3*XE&=JvFiTXxPCuB(&7i|6bZ0Ux^TrDY zB+zxanClvP!F$zS9(8qo7Ysjc zx>}h}Oe^v;K29^4SB45@q8bsMvdRJk0W9yQe>7j}S7~lnulqDNPo0f3uc+OwFSUdW z*Sh_5{|QaEc#`UERT&xcoJp|(n<%(*x<{MLo!-Yfk(l_+&ROa0wENc-n#=e0XRZvBO@zQzguaH?`e$%|AA zj~uc-nyq&&ewCz@^)0Tj>g(X-iChR`ou=M}khcB^I`4Nr0h9+H0qh=3Gu@};{@#Qt z*CV>#a?P;TP5lV!JL5OLYzbM47miI*a{Sc72jQaia3hLZ*pnk_vo6zZ1@jgySBFSH zPxsA-QE!cPnIG%+(XpUEr6e)g{jCBd;4+7Rk>r*YP8Ab>?pRfu@e1RfRqC?x6#MXt zU4%h(OJw%S7?1x%aU>U5OowHV(OJA$q1S&&LZwbVd5>q((6Uv^*2d=0{Yvrx#P0QK z=T>YS-SB*GpRnw*^f9va-38>{u=@D_6q4(NXIKa@M}l-C?i1+T3UM-K)3q5&qWbMo zdsfkl%@&oLPKWPedIko5W_-U43rW-0k|LDBRIZV2oX$iNj2;qQsqQMuV7_Phc{)Au zRwGGMKdZy6!uL7LuNek}Jo?b$3-#3nC{84`ukX4V@jjjsK8&a?jy!|XTlza_Ji1a! zqLjod*-UG+&Au+1lGNU5!y7 z?S7|dsurg)O3>?%kosCb>3q{kb{5kt+_hGfP;N-6EtA~T%u`=izk2kCa2oDkOlPCR zUv4r)&6VV5aFNv1T#slu2hHRAR%@Y$bNwA{T*0cnep95bZkz=|_RVDDu3_oh9RCb| zp&fi)XJ}_K^dX$#T<~rXBb>pXLvOa-VZi$v7I=Lu_Z)l0ey$!EfjQTny zlPRyx_acX(s&~)3+AWEc#NHfofQ=IY!u)Ex_g6TmBCj9n=WTz=Bi(HB{YQ)dpM#0t z+>fM-CeN+aTI!$8cr}6o7TQJgxk+tA`7xF~&97+J%s~5f1Ky->Rbb7nCvr>ta>Y*U z1pD=a>m^Gp!G5$V(@2A)D#70+vq{Q#y2f4PT7hrxD98SqMzxbQReq57OWrpZBf}Py zz>KT2;Mu&w%EEr>DfA<76Xvg0)gSDvje^kqv1hs(oeQ(6Z{GV*b6VB8NpJJ>>Yt2Z zl&7=YbtKzhZld4y4wBL@CHevLK+b%Bzp^xO38U)#+j%nPQG~P2-m~p;9vhi6IWEUq zxG50Ib$|+;W&weQbXeI?oi|`_#Ye#)=tHZ6^QNk>;I?3S9S?nL8xNoaziJ5?!dCIL z=R_;8#4>Bh`;62b4H=AxOr(&PaBB`Ihqsqt+9iS^2(6N#mNC~kQ;8y3< z@kS`VK({KKzrdyW(KoFOa`%5eCW(FhtMPkE5WfXmSEu()E!8t;b z)Mwth2CThJR435B=rqEiuvVD(PQR`#ALzmHAy601EQtO^G|Uy53*;j_2C1n zH7y7ZMbToIFs_z!mnc>j`F7&Ws`M#;nM$UHo8p(l=x?@kt2a!#Qp9O8HocZ|CsI*j ztLG=zh>@K>eG~7!I^}#^Ge(Da{~wgi|GtKaW*Mafikajm@#6m<5m~o#d zG2!XqqptC>P(f%mK;VhXIG9mFY9fzOxJ72z#gC zI6KoKnOg_^&0GNYxh-Sw0yRy%2UIT#jctz!Hq(k)>$2>?)ryyD`a0BLkhn9K6oWVW z2ekNapAy9W)FMdyk?Y63{uAC{a>;UMA75WfiqIOAgo0Q{+_G0vCQU}e&qqD&)j45s zewa%3cu$~BQQm@2i55NW4~>|w&&>t|tP^FGKD~R}mI2^h6>)e^{C(J4oYW6o=Xw=3 zJR^U%$)vS5Q@Uz5Wfmo|wA8n1Z$5)==+b!1#}vM%bu;~HMHEXTI&uPF)wfH^yAGEZ z_{&cUYJJ_>Tm>P1c4(X(wM=A8xjF`L7*&F8Fgt`tb{*-fy>6;#(7{EJ8(NrpVtvx5 z^|_YkI}~4!22Y_~_X|6ZRo*PBhzNlLt6Q$S9G^fKqvKdxCEPcV!|H<7+G^y#We4+a zE#zOw?gC5ENc87cVdgJfWFH2QPY*eHSy5PFFFZ30juDM=uQ)5WDm}|^8lwF7g9!c# ze!~4k=Kk{`n&#E?e^d}s_#b9y$QH)7bX_=*ja+}gP2NrY%#FkOGWqakw#6p2|+Nc!+Z7rADYEw5GD_fxVKFV*{r~A6hOXbRT)oXGk)@mE! zOeDW9MZBhoWtW;hgz;}gBWU+5>G!RMsuq@BwZTBAV=pMfO>V+|t6P8_kbh`y5dikg z*?u&HMolrM6xkit`7tS^L!ZlEhiTNA}0V!Tn!^+mxf3c8I#O(uR7JpxsEOjjrq&5NR zp(@J}a;Nd1K8U>r{s+WPWF?Lv?)U&8So&Iuf6G*=AB)m8UQ;Dpa6Si*9C?)GC+fb3 zS>w8a$G+Ad;sc6of9T^H{NVZI6X&C!N7hfDB42||m2Sjot(jdOBRZ`>YenfmXTq`H z85pWl+k%S-5`Q**r@Si*q^8R}M2+Hd5*SxuwU-fepoV-J^8W$PIgP(>U#E~FB zg-0OAOQZ_w;vRe9bH!M|KmB=k<3d}>wK46V3?HZZ{aR4=O*;3z)cTLa*8|MvS~UeD zEdWE$H(O7pQ>b|7Gwn6tj>su~KBK5GZ%vn^z#JJD26k~lspKCJVe8w`kDqR?@!#{H z3iRQeGgV88z}{d9!?S0dNPB302{n`JZ~*X_{Vw>21i?R2ksAzGNwyTz!;%RLp>_sY zJ-5H_khPIdo3#)WaBnb;Qr+H2gd$B<_v32o0raoWE8HZvnTAf|d^eJs%IbfK=l94>Nra<>21+Rtnu?QmNE?#99QB8j61<X4M4^8r5FMzhfr&87@bDv%JJbqRCu`=6f`CMdv@OrOwl2-;`|75PzP(L^mYdE?S4^Q+T2rpzr|34dsBXUPX|84d zQ*|@94-1P@n@2fF*Y-agkqIz<+(aI4Hvpe++Kw&pzjvXN9TXi@j=OtK*DItF^xYwK zs3B@?c$?hpYsqUGfT5pAvk-A!^9ddCsZyDqz�&)M51K=Ei@q(tL#m!J=s?3IFgJ zhEtXSK`+wCB+b;a<|*;*)ewySI2~~q^b6DxY}AdYjvZmh=AAjy2{toq(n<0uvxacR z%zl*|Gp}Hn|k%n;Jmk)mMs>Xx@@@RX5));w^hVO(-wO{P7~FhohFJD;5N2 ztqSqgS5L3ATeLPt!Dx+$V5`;PoO`X# z@QHG{Qa7r*!Amk(v(#CzTq)_$eS~NSTH6$*an{8vTHrpet2c%_vNVZ@43I8?BIFp) zlmd`zcXrag<56e8P9pM0M3i)!49X2)u>@ir>uMg4*uCz|*a{%+>?6BL_9vVGN{qBt zU~sR!%DLBN(Ak|)xjI-kaXBWfHRB7Kjv!LI|3~so2KNB;j}c2l%yJjJVn*&%6RExH z8tQ?sR0el*>B$S8%JoODE{w{S)&N1(pR6E9J_EIfr2l`x?d5+Y=c%~%AH?x?Qi3bC zzi0Bxdl-Mdt&8C^CHOd%JaXnciP9({8ui78<{?Z2`?GqZG6=d*TRwlh6TsxTy{tJo z;P>q9y=~+_86cMY{*T1{7LEze?gUKNpr71l6m_wt|5(0v4r>&c)29Fi#Gc=mo-r6W zR2!I^1UXA7nHfK`)pqdh6xWn_Ojg%%BfhT{&e|nwl%;W=vlfXoN_X~Ea!~?-r?}&e zt(zh@`ZZs`96x4#G;NE%Ξ$x52WeM_JmqIFki__4Jy5xOBg^l{GYo2vfvYqFRD)ni{e5o}0>^?lO z!}`E>Yd!wKUXxA!-lK(>IxQxW>!zIpY+j5<)xx3ThOFOBuN+k=F~k2y+gkv|6?}Vx zgKN-`pn(Jk1b26W2ZsQ`fHi}Yptn*jyTZ6uDHu-cff~+k>XNp}*wBb54!X#N#{m=FxK5cXL zPVB%IduYS;ce&DtrOl=!T<7lRzLF#9bgOS&+G`TxFi5Kgy4M@d`oN2(DBJ5;UM+5I z!ncG)*-+cR4Mc5`)r;v7NxS<#ff9naLi68ZHcE#sO>$Y<+^j?mlm+3od*IJ0K6ANc zOOHuf-xwlV%LBpEYwg?HI^8GY13*+p$E#c{^aq2+i}&HRcJZgE$cda7E;l210)y29 z)PdsX24{aL36rIKBs#WNBCE<%i@{P-6I}J=}c(=C|nHUDb=#phegy0o`SxF~e~C+3_&01wS&!vj9^s@D1bn z*$D--5jZWhY-yXW*n1F-X_wMoh(tgBz65gdAV@1peNuBs8hz z`r$LT+AG<}bCtL(idjA0*%bA+Z(s;d&Ko0V_@5Lj!ahNC z3n5Hi>$rl6q%g#t$Gy&F98`K690i1BsMdh)_qOmV%YACqxLjF#&-T*==X|PJSb_oO zRPc+my1d(I)%yuVv1$mcr@Dm97+D?s|7r4YfN?toF_R0dWGU*&cZgx&-yeyaZmhzN z_k~F$1yc4w-qQqJ3d4DuvM#Foh;N&55lhMIi*(axW|m|~t}Wl4^aV8|r^2?;e$Bp# zJW}rDJlA~jQ-LqoHK>mAK9+eFvgxI7Q(M$GR#juREB!6uI6*?V87Z4qpqs4w(f07* zbJm+Y6+T|=*WmE&=`I9gUGoEPFEQQ1F!Pz&G(+GM$Y_9f%FC)pobaUHGSSj^;dk5P zMw=w-CBevFE}2KGs6GqDA2AH*)4=(F%D*%DhAVf~#D|AV0RGnixFj-6AmGxpd~XGk%~ z%}mBhp?QTgK{@|N3?HTq0-A?GP&5c+aLpADqJ7)-0_t{$h{B>NUN9s#{ zy!lG@3EGr4Pp?JA8rFx^0njeqAJ7oZR2q9&3!zE_og@kdfPy`&{ay?>jQ} zG-o>S?;f#y2-m2~f~k#=i@jr+hxMg%go@Ev$NXLHxt|jCTRC4y zQAbu2wy9goOoh?_rx^iZeL0JA`wtXa+Q@Uq#iSe4$24G*b{4os2Kb(|bElxvF$kst zjFZ$7^=M;VeHoSt{NQ9P#$A_^;{+1cg_%QRGB7IbI2c z(o|IS60fM72IJIHAAxXt^AxQ}63HSdy|h8T2bF3P%jq%L-X2Yt@69b)`^iwox;d-3 zlguuu&R_fbwNLD~-CRE=2av$|CPi$v9T&?D+y0!gITWM^+=1Q2P{2Ul>8VO{7= zlEiuoV@;gu0H-Hb6HC%6z$oBiaGml5|Xqb`L;4dKXb3OJF#Ta7^A%t$`|H& zI%|rg8@$lt0zOv;Os{hC3Ib-tcpt+X+6r6FZ_-Ov;&4A4g{E|er_jaBGY%5|`sUSj zEl@oHJNmk2`?aUtyfFD4Q>x;FKuIY>yXHraS%!i-tCLPD=aVm_K6h{5z588g_r3bG zc(h|+vaQ4arP*e6l=!)%>T+gZC;k+I&ui~iz6~yy+H>~?0e(zSS2D!-76m9O0=}v^2c#4x(gMc8fDzte z+>I72p#*LOQ#E5C=0gKvys zf>3;e#~<)#n-4HQldYh=FF3r{t0(n#^VCa3(&Pq2HH^BDvfCg9#e0UHIm*f7(9wEp#~+b zm{pSimcaGJC;Gs`)ea`=425jh$rTM0RVSM#l`0Jp&gzfIFJ991maHK4drfE>?+ZQv zGh?@W=bM0O1nqn~@5T|JyT|jw|JY%>C?e8SJ7#^0yCwI%oB(7amr(;E;$!#{tn6=$ zxA|p{P=#Dm)g!~+!*lN8`grX|Oo^{oDy|v9o}6^&2YEFB42)p|&wRHhdLLD{@H*HI z*o$m9K-gcFn;a}klj!z91E7Y$Q!nZ z+Fr*Yu!tXC9zuYlUJCR1n;5olMO-63y8+Tf91-rN)sitBN?s(2nO%>5b<@I+gub9C*e|9}FU?lR(k z%%yFEIg2k{oe1m9@-#LZx3Kobtfbtci{D$;7Q8WwDdWs(xJl-~X3P&1|L@SdPat71 zj9mv#bcw(|cyI@IEBcv1^*4)upJ*DNJEW52ByE--`IeJz+G5(9B`Qa-PBpW-(=A8Q z47~~@h5?oj1aP49Elt~$rPeo8nky)7e;6HnZXS3H_&k4qJZbcuTd<_0u%33YeB}wm z7>5V$hCt3gR;uR<4Rre(%~K-ok3I;?@5eOR&p@yDATj~O;O>{;jlw0oxH@&1Ug;-8 zZad*Ko*N{Olc6^oDFa4A^Y@(UhFY$N@k@-QRpwwEBcLOK)evjIQBIcT=4qaFqpKW4 z@=Yd7O+Ca+6TRmLPJe;q>>;%Lt zf2>qP_S)kSaQsYcUDeL;a=t)wiBSfUdzYruC0i&+VRct;1n#B;H0vez-^Pk6#4)a! zaX67HCY~>ueW21sxy+19i!XY_@_U7jc7_V>rV-yL3&79zGtTp4oU4yf8OBKt7S?}w z@3K)GKgOdIuJBr3U|{WNfck(~#$!w1lK7)YT`~0leEgFjuYhpBDxkxsg_*;Y&mjmr zKA3tzkCn|&Sh*Hq&6{T$+lmm0%6Fd&S0=cjWJ@5E;N~%Z{f)RN>T_q-NPL)L0M5bv zSEt`W8Tz)EKT|%^()(r0XfZ|a(+aoZ>RhkN_>D7A%a%K6y}tMePd}6*id(X}7_x`< zAaMFhe$|ECfTt_4`i~-VR?05KchhDR97MZrtHq)F(cq1Z4$HCW9%H=FbSHka$fe~E z);s+iAL@*_X+%ZkjBM|h^IM*mFu2Q#N5L11dD?P&0k88wP&yo5=uuB>E}4k<)=i?+ z_zho1U%eBD=t}|vDG(J%xNc>BFQ5f@7Q}&PAua8vkn!(FbCUSG&#MPhEgw$NR`=23 z=qM9?Z~=VXq|rCrEv>53Zo0SCEeP`opq3F(>*vAE1XWyjw#_aBYG$?oI_W>R{1~N_ zbFpzi30t5<;nHtSO&D{T!I{$s$*a{!ek*a_g>g{z)nI-rHNgx*8_+7HefOKVbQeFD z&xO59sc|aMf>BkyGhpIrYQ&A~pg`iqEn$d9f|mlzHb60sCa513adq}JBdYl#g|69_ zk^?pMZoJ^)a-L?#6f*THGHw6*#iViEU2*m}!`@7s9NC*ZZ!&(i8TDDiG?~TReh?jt*-2rDN)XQcaGU=f z7F_qS7SU-MZ3o8qWDFq^Ud+HvHF!7Bv zNRxjV*kaA^ZD)P2(I*wQ%3ON+yM@YS%ACcCSD|f>(g-Z=2iFzd!-OTTSBJ|dt>5~( zAKn%wIG}0pekKptbiqR1Fq?ujVlx%x&N>{>14jaB-+gnb{d{4xm3}&pv1|xDIbtg9 z`Y7H=KAI$Vt6jyC-Ylw#S5vO*yl_p#GUQyEdfD|R;T#wzV(Rh3Zq;DLa@H})%_(C6 zi*}n$1MMqjW8pzD%UY#fJJ?Sk$qeSf7|SF%Q9C*-Ug5*I3+YtEGeF-T3|T=OZf}TH zx44iX=8K?L;&undRslWK=ZfnvHvD5r=zPyQ84r7~&uMj`_3inZ(cgrlkBv5NWk--N zJptht-k(eKTq&&ZFb@#fGdiL^PURRX`K=a8`0kch;={B9kaIb47(2MvF`5r}$Ysux z`uOKRy(s0q`Q})1*y{eLCRnMlRmt44bBjYCl?~$u3dnyQnuNdtHj8L{(|(SdkR{C< zdl+r`LC%{CqKh*X+atP+VK~7!3h#RIn=p~*I2~H2)SL{DZy2kYUnXoGDRL#k9NIH{ zf3_B+k|b-CeJ0vnR zwPeyYDwcn~pZ~j)LjXM{CW(_s|DiUKKxa|tBpkCk=3gM=9rcxC5(*rRM#1I7|`d4vZ?)?-WwmuOroG@40qX|Z1)!F=L2;&H%yGUl_Q z8!!s2D^jY9^Yl=Rj#5AA3s_=m7t{wsn+T+F6*#D{H2;7V>vi4>Sm8f`*eO!Wr%yk= z_5LQ;7LyBTA_%_e`wWn@8nLMVaLlwe zYR}5C-S8IQ2wXqc-G2gQ;8%h-EK{FALk)-~H=RFHS*9(f6|ry6S`bQN&QzfqahnrPOkdq(y82o>OfA|P@gfk4PVkpDXe$sG8F{P)$te;1!l zKmuykj&6>w){agz+-w{m0VxG#)PH3F`VUY4AC_Q-Dta*vREv5Lh?RB>lL~m+LS|Hu zmNr(?P?c5qAoHJ_LMAnFa4aOdDH~;PjAQLlJCrLH65C8c8 z{`|-P58cfDSJ`RSe{CH<7Q_mClk2gs?@;wwB4z)da{sp+wz-9?8E^#-Bn~qt7ofwD zSO8wl)6MB0?+5VDU4WYc`1pUk)qnAY|9I2?;+y|dMoU8)C^HK1G-lQ&<^X>T@GPeP zr+lmb6K`+l{_lJL(LYIXEgZBpfzRhaq6EDM*??R@P9Qsw3CJ5n15yKV|38%H`A>Oi zkRwpq8{`7ivjSOz+<-cgK+eAwEC5~sqL8Pjr?&seQ-pv(HT)nDcFX_dF$#b{gg7A3V2hK9i^+eQLjm5AErBQOq8J2v zrUwEM{{iagyIOK{{pUIot`rD_w)XUN`4$AiNCJT#IG2)J%xt!ANo&2z#9@W3Mv{p1|}9Z4p5+h0ECQ$f`W{S zf`<0*fkg5Lu7gks(THAiN}?01n_#?ge$Ew?kcUYtRntwPF?mkMZR!$?h5h0sDH%Ea zTL#8=Ogy}N`~reP((h$t<>VCHy-JnOj&|S=)SZb#wRd^z!xz`T8w1EIcAI zG3k49N@`kqMt(tI(a++N(z4pR`i91)=9bpqJ-vPX1A{}uQ`0lEbMp&}OPgEUJG*=P ze-93!7nfJpH@C36`+ssFfl&TSEa3gWB>TU}MF_}+jEahait$e_BxKKjf)k>mz2-zG zl2peqaVCDl6@>X*Dj~0?8;h1(NB>Ufj{f}JB zAZg$i{*RD>7YZ^m@UKDv2o(?kAT)He{}B5BB8>kK=0C#v-}D4D@}Cxvkx_vc7CIXG z|GxJBnt56U2C$b;OCVenB;a8}Aq0to5JQEn`YsxmE={C^G+9?0U?!K9H^XWx}qvSCL7x^@Xx#GATNvA z5|ndU3E-uaCDSf2i}vxj-pfzK*>>Dr3s@nqgnJ`miFeVGrtI~Be1@vo z-zzzO!R56b=|7g2^454$U|i^a?tQ0!_V5G}-Vdt5Rge3Sqg-R3t7rJCQR>F9du@#t z8a5TqX@*!ta@IOq5ncMA5aOIeR4ZEJHwUI@8lrtJH{7&gdXMm6qed2m#q-7Lfa8b# z$l=EC%B0_MF}k}?mg=Lcs@?`>mnT!sByy)Ca|~D0ftg~4^T=+ua>>?(@Y=q|4U{0X z7Q&b+Vx|TD3S%eRZ{wK{(0#r>xcY(nib;kJKeO_E8WD5yRu-mfC<)#M(|fzzFQb?v z;e{acEOX}rH5odnUZ0&oa)d{8tUMK9S`Uda~U}#OI-TYQo z(4mDly;SCT4H+G3e{=jcEgGKdz}*q-FwNZkPLO%9Y_j5~e;GGUF~($=DS+)S0==Is zw;9(8#wH$65iBF6ey!POq&u}jVJc?4dH-rwii?jxPQ!WP3n))mn&c2G)7(Pe)l<4V zX2!Dz&sF0KvrEm}{GQuid@4dz5Z_lnuAkSvPvzfR)4(o#Pa;a;3R>={3C8egAR}y# zf62vN^SQ&DHsK&{$A;#e@%vCsCYMCN?Y}FCU9Q>)F&h}c73{5sb^U<2CjZVC_Nmm+t0txCr&(xpMsxPaph^1Za zmo$ssuNk6rtFHIsFVtOxH7T%=&`woqcWPj=lp!O<;*SC=QEigTl-y0d?aqwx)766J zrg%E2(B{;8Is!>h=<4$QeAO57{TrhOwhY}fb+UVVwdS)yUiiE2G%r=5p6(NhHXl~; z7hWxI8kCkDVGM_!T_--1;j`=(x_QzqJ095K!YQT_C@b@eVn|Txe*C2ngQQLNIY}o4t2`dJ@-$e z-+2Og3do4!0g7InM{_tX%ze`_L>#XIPJG@YtM6qh!#!MDH<8AO&z#%d7x&?c5`^Zv z_S&B&RrEtKW9e`~;a=6wy6*2~{R%*9Md9H86aU!>Gh@TiE&y4$zYsHq(F3d6S{T-O z{IXa1mf_e$BwT?qcy~WVM_$84%tH-tf4@LpVdp;UKCEvYcVKg!;9!jo=7rPeEu^=y zh&oKP*jxTugc6r>hq}^V*mDCy2@5snzkP>n&Y|Yx`xy-MV$nu6*YFU{%+7;Eq>Oz{ zcAohMj{Pd{{e^jG96^7TsN(ZqC$hJDDY1{I>-@fJx19rjNhuo%{EomKkNevg1Nbny zmw;)7pXaZ~qIRmP!vG?vcc;=rQ$yOAqZnwN;{850oqFiP$a-|fNhderH#M5r8?XE` zqnWEH#}oZ>=gg7rz;tzBUp73E02P+@OBp)pVl@Yq5V2 zgxz0P!QYh%MoG42CEjRqQ>9c4gy5Xl1gi|ZmKzQ*{uAD=aehMmg@9=%(E~nE8|ZpA zopHeqW5VbfVSf%I+BU3@Xp+H?V0FM^A@=)H%qzCuE!6=^+v!}=<&@BobtUiFvHjJ(^n5~Gh z!7jL|7f)mVESRjvf>V7E-eLj=XVvR#u(KX(t6Y_~=3LW2zavlAJz3G=n309#R}J^C zp$uC;xa&1m$hI5x_+JR#VGhSt+W{y!TSKopd6@Tks4`ln^HiReob2h3O%vLHD^~ZI=?O z4=xZuFT8}C0ZwckpFn4Tf^^fEUL5`JrLd`Zb^9YzQ>dVYV6zfw(kDAGVDd3US#P3g zg3?swJM)r2qAiPk)EPiUe9AXCQ5X_1S-BQv8c0F?ywOx$jX`q5zzMs3FaUSUioHXO ztvB>1pRfNmLhpd`2VQlM6+%_=?9JR?N9oRNw5CeeMPBfFz2+LSt{JA%i(A@aj;X*O z0n{hwy6iFQ1Y-2vKLf_s@vlb4@DP`BjWwAXZEZw{E~+I9`sW z!@2ZLG2}6&o&9`jyYJ=AQE@lP;rfsb#xncvbO8?E(L0egH$!R9oKQ8tbaWKtmp4=n zmS$nBAp@G~yxq;3Un>0Koi^OSW+z8Sa&(o=r*XaFi$Ks|Zz%0e3T$(8{oB1~5XM)x zf%CP7(nEqTL?h0nIBmkBe20r)O))iu-_f;aFj}&mxR|)szWGX>fdKaT-Hu}XUi=$r zuvDmL&qr~hH|^6`!VQ@{H=eEJdS5)}Qs;eg!e~{RYLvZAQxcMyCNa3nIPj30t{NSv zQui!r5@v3B8FHN}_q?P1@l<4#iWzQ#k7{5^55hp(S^;nU3?F~gQ!iEztmEv~&0PB{ z>DTA#PFJPr>&cjTmqhXF*QEShIg~jIkNf1W z>Spe6B_at0JVW!H3n>Cs z@pxV^<4)k>KwilzJOU-I9ZxjY7o2l-kWb8!@rq)uw1zO5$)yv}r;<~pfevQHOs**r z&BX+N3n8Kyoi<%0Z~4fUIBX1&EK<#0sHPD%Q=2KXNZ`uKQCE{pdATSv=u9mQAN*dC z6-i!G++n9BL#aWoxmJ=tzU_M+^%kA0SOh^5Cf4|_w?ZnKQd4rMt0YC+9Ty+Hu|AuL z`3ZEDtpJaboViU}@b#TeYTS}Hq?T)$#&kOlSC=Y_1r0Q~y2zJ(2>V8v`{irZY5xGN%ud#=NSx#W z!q?vhw{a86NT-=(qP~LOhF*YZHyM0m_J~pZHu0TDd|`HeeGPrvO?YA%XROYjT*bbK z3@^`WG*|e&_DEZxx$U!gVc(`}p@>M^D)OW_HeFGA_&t`>V6=f@$4`4% zN%m%?-36R%B0p+&i6`}IBTmtr?`3BDe)rIws9BAhr<46)pTy^?P6L*WAWxI5CDQEE|gbu$|hqoL1qk zG~n&%HWVoDusiKh7Su~`oGcssxL=n_>y2c6npxZXTJ zKP`nmY!ydBtj}KdTeZycGU(C@m^Vcd5b?xS@fDBcaQA^<`qPzKw2Oy%MYpmER48wn zU#WNO(zWt8*&8+pIceGRYY@)36|Gf7W1Ro}?Y@03XwW+J)OcxX=cy?k?bdy&MlkPe3^zkc<5ZW$%72Jsd z#Sa18oi`=HatE3$-2U!5{yG@xQ2Vwc`iBlH>jv-#FT{CQen(6{;_M368cB3*SO%v* zP-C8V@Z$N5Pr{c)ZF>Qrj^j5EBlf2AxX^<3SvE`h!nRN>_oP zTY&X80JJnemkTQj>CZUM5E)7Tkda;4*G_+*Wxz#>L6srlwbibpf6f<#mob+DB~wAp zGcZfO3qA)teWOt7^ivb@Og^g|&lHVl3)S2mK`m|NN@vt6_uq^&n!OUh!ykP&Bu7jy ze6#cz;HZ2WkQ_1mONF{s8Eianc9SQfS7e*v#_z#R7r|o7^!@Yj@F6M5XD3Mvnvf6! zt(WTAmfkZvXzEvI#-X`bog6lhTIpm$kIb?2mCL<4E-@BAa_0DZnM%}CAG%}DG8q^BHll>K?}CR#M) zJkraQlXbPX!YpQ4;F3N-g3Yx?f`N?2$A5lP97Z?a#~$5DQzK5+N#kGlp{W<#qW3d* zn7S*+6*W3nv+b|#z33yL&9+ee3a%Rnvny07XG04Q=H<2SWn1 z)k$Q4Spy&tV(ImYEOt9r*~{lgA29b?0yL!`V*{qzmOk3Z70uok`TEq|U4UV4jr;WJ zm6Y@!Ead+%>gL%;#cdk=qQlpCmwSF<90j=|h(f?~5p0lh06r(pv$xX)i7!A9RPrOd z*muPH+r+nscJXyEZXuWA+^)*21pp$sAO3gY`;ua;AXPxtBfrmGc0LAFn>1@3`b2k|`;#yAJ*T zlD*a=Nmn|>NI1!vrR3WIaSxazMf&#A3;e}*Uh$cUEJ-LC_qsm&fF)H|g7k~~kFR#G z-MK1nG#mEC%e+ z6kG<3eo~wZ+EQcW#55ZZGZ?<|WOyLb>#WzX$6zimUw# zs;A0VXpcP_?7zvncw-zHvR-EzC>0-gu$f!4{t?2p|48#>mcxnwBkKDUS7~;zaN)rkkfQZquZt>&H2QeQMmF zgDq5FN4>Pea}l7tI#B6SlaF2wlbl~FZGUn8F579M4FlpJP$?6~$&zu=?)a=r^sv3= z*+V?Z0OamSJAG<`+A^l8nU=oxhuenFw30s7v2i-Q0aM7WDu7-cx$oUjzmxwEFI|!l z=&k7hD)cXB?!KOjVLI`sla@v5`mVslpPNe42n&0Sy)~9{JYSu(FZW-2gagz>1*=BX zzqfm-a2nZ{b5NuTZg=QKi8F&Yip9idc>p-+3u>GzqRDmGI9x9b^^`ygwe}U}A>vc+ zjY4$;*v$A_t7G?Bh>uYWE*46-b{4xgsYR?$qQcNy2rt;qx3t~t!5%IEb1S0@{XNSr zzoeI3+??c_^Mh^KiOYG2%17t;2^6Lfr5yV9fIOP$ zA*7@3@Pd--`zAFreA)Iwz6+LM@4nR*WX00YtS{LTcsTCe1w?=ab~s-y+vHEKD6mc} zHMSoOzEmbgHKb?l=Ri`6d`0W$<|j(HgBr<5`8a1lt}wjW{rj<)d^uV>St%TjQ>|DP zU)Qew$A0g&C|m4GKuvbNFCWX|ASmFP4$~fU!@(djQD0x^y-o>S6Z(OiS;miJ`p)J3H(d}qLMR6|?t^eY zkBi`a)az&l_8@)>%NI6$UFW;2Q{&Z-_ zndj-)`BT;<7IWXF1-H1Qi}#-TfexYso3drhG+YfqE>Y`SRpAL2cPDu-*^lz>MZGH=YqOCi2Eigam#$E8SzqvSBSm=ar)^aT3gt$P zbg=k_VD);o#+_q*W;jJq{Z;k;&3fP5y&U{G34BnckfvsaUMoJnoGH_5GAj%IBdLoF z5%+g{LNoHbirv9+5YEq4>NEI2+Y83#f#ATnC>{-0-F?$OL=n=yo)|5#KITB$fu2S7 zo*Ts>uXo)})(-Xs=z&O0F-FD(G2$xS$8OdM45(CfTYbwLsgJU(k|uDvP$p$T)gR7p zGVNIs`~!pz$qXUwP$*lunaW*!4`)(fITu z!;N9``q^HN^}>xpq0DwHh$i4W*yI3mdF>21MhtM+%Xm7*EpEv^T)lU{KZ3Z%G8w6?%4qIiB*%?fUqW?uJF@0GIRyF8}keA zXtVs+mo#*jRv6>5`(m93P?08#@Ef>VQGo5nl=2Rv> zQgaiCf;27uAN71ey&tzkHjl`fO`LQDJXn7@+n1T&kZb|$}lSN!(8Ie7-R+z<(x!8{3 zMZs1rmIBDt9O{l27pomZyc#Ya&{2hh6NXM#72x-3XajF?S;8E~F=?|hlEwn%+JD_p zxjuo8N{!@Nzbo@|;o-b0etzU;k=CB}u=4o^-a2Rh7sGqgR`+B#_fxYa*Yg3I5}cWh zEURMUFZ&|WKJ%X{60Zz!eQ=TVsh4U-&88~1iOUCT1RP>=^e?FA>UjTFo)lTnJYRC+ z;LMeb+%4`{ER40tS72af7wDHfIQhi%r~AcT7j4tsi58owznCU{_YiB=z1wveKRaJx zbRUK8>Fb`FMa60$p z=n8Bkc`5IgGH*^MZtE^z!JUK7mYU#;4wXIqJ9qElKg<(`fPhu{e`F1>;qm$k?E#lI zIO|!?S{3dXH9>Zj6KE2y>r5DTd&v&=)zo%$Tqz78xjnSg9cKDx&scjU;A2TiIj*7o zsxY;!43wX$jRuQ+r7RAwS--mWJI{lzD;|dI4WxMsHfFn_MXK0%~vyyENUE2_sa^WjU{1cWbf=edW?^%zsv5o`EX z(A3Ueedww$KmoCQPvoV!&CoLuqfrro>u_(eG8vbQtgzop`fB+@A-L_TdX^n^<i(*(J~ewQ7mlBBvm&l8RV3I^dB^_*0^RBKi&N-t zGV~o>QK{EiRlILvGVIdBq@W{1Tx3z_y=v(9g50G3Ij%UZDoPR9=Bycl6%d&6Q1{tv0QPYVT0KBBLmHcPLOj5F>Rzy(tSR^W%7|Zs#wvfqr^V zUk^o(q3X{oHkK`ZG>Nkva@YVV3OX-)i|7n_IS;ryle!007UucCx;bsDrms3(uDfR< zak~6!rOdaa4^B9K^#+z{1tspiv`*o7jxg02AAa~nfg<80xM4*(-%m}}DP-e=c3QJJ zhT(d!^k(4syj2M6_>_4h`&n7yk0vSC17k(lz~;d3N8?Slye-GYHvPt#u_cpJmNnjX z-7syh9*5|V+Ym<;g=^4~iFQ`aM&bE6Ut{@%sH< z>ZsQ2rp3vRRIg@UeRbd-?P+GXH7holKjJ?z*R=3<|T=v9sq+}SlkiAQS*F9IQb z69R`mIz!ax?X129u(#v;#v#Kq)lG)0=nun}RLd0AjG1bW%(wV?fZ}+9Fh_gzJJ?f^ z)wsRLaq1XuMR50X|Ir%}R{Gb+YuZHeHh5#_A|UzjlL_$$hR@wkp!UFEA0PJ(w<=HN zi?o5=&kiR+hWp<(zuPC}rc08+YvARNucNpwMIqg+ub0V*f0+r_Qd}IO`Sa)L=h91< z)-<8%H0|de7>mMG#=7(@#5qN*nkxON^PTNzspKa%4JVIHCnTD3guQxoF29)zn>Km9R0sW&Z*NkVwJX%-zhn*1b2^oiA%&RFyjZK0_mM!9$20w>M0m z^r5kARCKW!^&vNHR-E{Q@>w=F8un)M{^`|L!&M7wJ%iBu8t){oyE^5d@FA2hoJUzY zxnJ1#hFE(~MZOzk<&Io4Tvoj6zpyLil-VKw{aA|d>a;_d7QH+g>{Gj>iRqMh4iP}1 z0+c)c?ZqlFYW_2%SAzRRxV@dfbNPWa!+hnLeK^~Sf@n?nZzm1~(o@d3edR`(R-zSy zm|f%zt14CS0D9u_kZ5ii-<}A{GO$dAj|ofS@GgHEcixj-K#7&bhs`L-e@`j$j2>&l ziT}ZgA_DTsusBv{k~c+RCNqeo@O1F`Oc^<4dZQ^~d#g_o_KP=0sDiI@w|bl;n49P& z>rqZ~{u@i)PJ32bJqaVs?Vn2Ad>|^pGM(Bx!qYf({sB?%LF&0meYw_EoGJ~{fz-BY zoSBE4iu}G4Be@+17GF09bY@#5OXwoUD0fbvlM8s_OOXwy<0?N}#X!>hwD@o%gPO}g zb;%V~Sv$w6D}Z6>V-B}3tku_78E^K$B;78>5W$1dUhK8#_op20#)6_hzv5^UZ&esD zcF2wKa0z|Cl(mh*u-xI|>2cUg%eawZ#C_ZPD1<-!Z3WNxW9TrBH+>Royvj=kr`3=^ z!Mm32>U|^SEwmZdDmL>7CCm^J6AXHOG;AyZ%k6XYj#k9D*NI}^L%**t=QKT_DGViv+U_MU7S-4aFe2QN6mQv#|}qb@o@9=j`iP`;ZkP~j$5tG zxU|9g7VLNw)c%FPJB4KRTS&HZYl=_s$@L&T;uKAbyrO(^p*(4uafyB{rgqRh*`r_# z^-wbO&dV7%uT7<%-ONYbXA|Sl7i73AC$tIO`zW-cuIrchH^q-JBKEe=Ci>1onqFBp zSt9HjBa2>LaBzqekI%M5NNBBoO9shWFg3s-SMle*gt;bp8M_IcZxb6J8N$o6qB_0} z9dY_B{fyOC3=X9?0RgI37ykHjVVyOz73~Y=s}#}OA|KJt!g%auyDQysy!IkeZOcoT zx*o^@J64p3fb~VA;$Q@w3~ToF_xv@m0BH_?7n0?--|cp)q$cx40rg(eTb&_W{=5vq zPVdWvD1k_tls7+_LK%{XF|aDh7am_SZsl6ATNt`Pj!jayjyO`vbQWn5!{!YOymF=oaW7eZM4P{o zKbb3>X_v966Z?!(tv=P^CyJ@~^5>ORUDTFkK}A(0w~a^qFWetFuMFAXs643R2 z{AFLARE6`3%Y#=KpUv9a&6lPEdqtcQInHA7xJt7i0@Bxh)W6eKmPu=mYjZ^vtk>nM zDmxOOf}xH~x6V_{)v7S|kqfy{1t>Ae_L zQJ_o8I6L;kNa;pJphITa@dR*idb|y|R`@#1?}t@A@7Za0zs&D>DoGExJ&J7%U07i+ z8o@;3cASkCH|^0~n?fRwc-Bz{h?(s+=1cz?QOL_LbpS@Ft}pq-B_{WyaC?w!SAZ|p z#sxGN_vL_}qMw(q)FKH%WDfhCqFn>~hPz*xS}ul+e05wb?dD#jpjyhrF;s0KA+#W7 z)8H2TJ@MaF%U#G+h3;pXcMLP&+~1X4EE7*Nh~8dJ+BTw=)j*0I;q8TK*0;4^qdr;) zx8TI83NQcz%OSb%_@^7x@s-dIeuKScAl*?l>=#MA^QJ4n#k` zetshNQAoK~!ELyz_ZZ{M53-6;x}zf;)nZdl(L|0OVP?gXqccAEax^|E-X}I7+uwG4 zo6pB7WodTlAm1Y99NjaX1MA>VtgqHB({0h|+2U$;#N(!8M;9teSJxFO+}4C=BnfF1 zi(oq|+T2&W+vT*kmlH#hnqofp@<@AW_iK>c4zrS}|M-SBQU@iAK|(ijJzX*Lcr9Zz zsOS#Zi2ERTEi$x5{mdU%e45A}zmS<*YV;`r2<2HvW~OvmOq>BY4jc^^CuCP50Y6#x zxi5CC0?Et0@yP^u*$>Ixz?BQtPv*`qxS5FZa+v`;>_`d2+SZJ9*LOXG3#0=3zkaT+ zuZXU=x{OA*tcic6%vEVn!^-#&4f+}syGXt9MIftaCNL3>|KlX+x@frCiHB~U?S*$! z<86D7ypM~3kyw^LI#c#0dX<6|VYO ztDL1GrJJwX>_-V)+V7!}k;ZR`JCRj@Szqw}-beFWe!A4qcl@lR2?sPmw>y&ew#gqU z33FCUg$>?IZ9cC_@O0?vcOKKLv#d{NC&2ybCAJZFr1yJmv*^VJa_=@n(EZFjr@>-^ zgVq^FGOD?AjIl0--7-I=^;sr^sN@k~!_O4fu;CBew8+&`L{WTCYWOlOPZ_Npj-ojs^z7Yh1J>LHr+mhd@ zfj-M1J$4_V*J=yWN9fg9yb}>T!VOAo{qC3n;c|y8M%^sIsNNI@*{}DeAb%+MTPN#j zwMT!#P8TchI!!plR3#@+{TEqDkIuC0WlligBn$0OEa?p8Wzsb5eI&d(4*p^jK^HL1 zS#e#kF(J zbFt;FS`0+eA&jRt%7EOjyT8P4qHF*#Q>L{;}}R( zv3|ce8c)X|Zk;Qx^U>_2)MY0oG_KQLv#3iPz2o0pU0Z|%V&~RCSRT``lBLQhg6x+@ zd4bkedAEfI%W6HR)MB;+8)AIHCt1Qtz2Af#DpzsxmFm}J=@ot-ANRW?FLT~TJT&9u z2*|lLPISi)T(XypVy_K&}`Zjee-o4(^Kj8i#ORktlJ7=Dwgi(m41UV{edr)hn+443`{pU@*@`Z5tLX|>epe);0Q9bA5wYzy51faD z?LBG3aJDkAv;A_Xoig*$S1}nGz&h2TgaD1&feBos|o41)Y zvY7m8vvNsp7id2;k6_;nlD(;S{HHh6Yf=rjsak@!5pP2qE3 zmX#5JT;`$%2f=5H{aRe5P~)Wi%<{cTB;mblA)5%g*#IVNs;hI}8o2akmbJcveh%QD!e7o><1TR5P!sW;yM0eq@mN0fs56%>BvfU}j155P)#dcaS+EvY;B!J*!5# z8tVN*qD!9Dh1&Pkh>U1aimzkq2*Yb^;dIv(&3vc!svY;c2A_~^+w#=x`*Y<{^$d5}^XpmJ1w!_#uDaYYt{qtiyLU9|7;%fU+aF%eV_f3th&!# zd+{T$fHIh+Zhx`n7IRio@((rYaD|@A!l`SWkVsl5uG388ZwpeO>~|sh0}!rbmkdg; zYfFw6a=x^RzV1R&i7}pY#BAMB#Y5%FCz~Bxn$mO>DrZPkvGvQGSS~gYBUw#s^YyfI zowEWX5mC-#M5#N$3ukvr!zXpplG2r9Colo1oLEgymBpKoFFANb3$uKS{f}tuafAaD z^V`j{R*>EcZ)z3uG1_YP5`(2~4kNReM%z`()5NK)FGe0fob6!H)_)Z%iGxWe zcXJLWHrbv@l?@sQRz#EVg(FrFe?|W>BUwfa1enbPibqLb@|E4$TNJL2UJo3QQ-Y&-3*PlbPSP#_CV3O`1^;7Hu3ZmH>KRAr_v zJkSFBI5@U0pN@>Dw{O-3ym4lXrIZ7gn>?-YMDIp2iCGfhFic<6AYZo$D&P9HmBMOz+uKKoU#}17iiAxRMF|Ey$14Tf|qh z=2hwHft^-cnTEALEgdytu%XX3owFIhpDFXs_?E`l>Jk zROT8lf4aWj1=hj$FBI${shW4D*gWI9|r zM^7oKT;k8geiayhc5SMD3S&XB#Wq;7)S#dQVjV_hWg)^gbXfT>5&QiPS{W) zCT|m}xZ~Z_k-j}Q!X|ilgt!Ddd2b6ks1N;>0z9#VihssZKR4j`JTqvI|HAKS<+uI* zVQ%cdiUuR100$!XvStTlhzZJVuwbRsfEkZVHla*>q ze>TMi17gp0F1v3X8CYle0KXpqavU&te$1j{@=(0p265pG=q$spnTse{f|D%0 za;rw)Zc9s?rMx+PZQ!+xtD+f7+Or7XvT~x2!oEk;)Ju0dI~g?nyl2CL1=$;#V!PGy zVwbyn_*z*;uhwgf$XH$qcMWg}cyn(jY}_9(yRDNHZ5w`aO_%Qj-~p`~_;PPJyEOJ@EjQaj@KAxAC>LDx;#d+&GHvEB~S zsnwoqtA}7azZU4CQO*;Y2J3~*)`y?&bm|&gT@=;JNiX(bx~J4hmIi)vi@Y2_$Ut~v zJqH*o{I{n})7IMOQii_sd%cf$x>fb^{zDbVmoNEuYBH?fb6_+RKkm;C-7L3@=>jmu zf=d5p1@ML;+dZ5ikdro7M7#A<`>JAjTfyTOp0h5W!|-o=v&BVZ9sKx9Sr@|(>}fiKHp77z9|{_iAfBA`y9inMkUn7E_rLcnRcBIRcwskY@X3Mm6T>wL0USK>SW>8I@xH zlD|}WjOb?gyH&q_B=)>)FluP1sME}h&2eY4t~@`+@{>~kL?fy^!WzIElX69`of#xD zM3=-zKfa}Q+)L^9erCNLK4nu=lVPOBmvkl`$_G{qdJW7&q3jVKcdsW?ll>`J&QB_j zxj%g*jgf4b*(O0Tz*+dbq{>lWSM@?K$(sAl!T*7dza*PaWHjxOPD6??qlsdMR`lb@HSakq*46K8ls``J^?(szve6;N}iv z&LbO)aO>gzyskqkv(xh899P!Nyc`OJPw?8WF;Tt!=T>0i^}5gXe?)>Fgh$JcS8pRe z??4?%e&jt`O?cquy}X@YxF0H@WF=94si6I>_@v3VnJ4E^s;I@=_>Znfb8E)$_3YuX zib;c|Tl+g~AtD@qP(!&}PjlGfu#*amHI1QS!@?hvxhQ_93$wOmzl-BZOaL4HF`{8RE@k z^MIZyP8>R2;2+Z~+XV8?Gvj9(wB zt}sy-Z9VNWNEu}v+VtqohI$0UVbF`jvjMN2&(E4fUyM-&|9+t%_~l{jQ7e}YpLWL2 zrxo*;4x{;}TL$>qT|0%ro&F=Du{-;6jffSS z5KXKfIa8UoWHr9~gouihEe^gmzcUF7snQs7&Bj_196+7ttUAwSbQ7WumsgjW>d!!b z{x$A33~%SKJ$?J&>_HQ!{7&4dx7SX%@9S^Mf8LVEGT#}Ly$zylhoY2k`q)rfabdF? zJId&gEgaoduSC{|mG!}e54}o@(NrCx`+V790H}qHwN0kv0Y4@R^;6r>;ZEgw9n>S{ z+TrB~&bNdu8?beyUZ{UOjz6`Y&3>;Dx4)VQi;4Cx*gU$+b5?tknY|P@PsM_ry#MO` z5=BW%_Q}J)R8H@(iwFpHfl=AQUmaJ`Ea_`o+|68M&y6O~)zr1V=N@-!-(|Irb+hSX z2k+75AQN|BWSb+oU0QTu?^v|r`<2RP+eBN#mRv+4e$qbZeJJK#YK!~R672*R z@8vIP>3LJNAk_-@OqJd&5>z`rrp!b40QySsk(XZVzV@NB1ZBxzWj_rlSk<6#>g`FP zh}<9Evq>P?%IDISi*!MjY{u_(R>pxWPvWkrVhK;MrR&={oUJooBSI?+`X@z#A7j-+ z7}?zQNhOIB6HQnf(a_!~FJ}SnhAG=PqB1W#0}Xj|PH|#akvGJvjcl7?;ZiK`3neR+ z+ZJce=RQbR@|U}R_!2jDYE^1($(>&ZQrt$H9RDgb{PLu)m(A*V412R#(I-nTdE`rE zhI#n;5yS|qf_S;_FWQVcoLE@dJ?mqcO>s9%NLJ*aUD%v4FWm{j#t@I2z zx+vEMi_AxrkFK|GM)>dcuiYKISyW&?mDZbQBP9rWR(7|{78dW$-S@C|u&?gAbtm@? z;`6_N3|g%Kz3Av4(8=|so_l%ursC0=%aaayzi#R~(hC|--Z|3iHeZZIY>p?)n!mdL zBGTih1&1CHhTUq|hguFAfTqhW+tr;|3a5RVeVgI8^Ir+u=cDRY#Ph0TVnzh&*3tv0 z(E+dSp^5L{wwZ75gPNp1Uvbw{ z@O6l17$I1Zfcsz?CKTR5`)=Eyp4WVNf;&Q;ve$~zYfOv{bJ-2PzMp+_4(e`BhA1A% zbP%Rsy}OvP#x)pV2Rn`f(`Ej7?Rdf!xj^)qT82ci^*Eq6aLmcDPkF(${owY0L~kY> z-TC>;wI_s}4!M$E9CP>J)9IKRdJDHy5KQP+r(4y1##!4NQbQcP^Ra3>IApuuSDJ=` zI|il;K8-;lZ{sunGJ*Q0Sm1_rHDXOqtL|oOW?14bO10m`wyy5V7u4?P{P0R!u0v7r zUl7k7r2oEm`Dh?Oxzoy#yV;^6fHDLBMf?YtJoeS+Ira`G4P*tW&ou0FvVA-8yO8rm)0hhrVR3w3vWKzH~d^Ix6Uz#KJR#yI1L^laL%2T7YcjLKSM zIb&xz3CqPy1Lj@Ye9yP6hg6rI?||OL-gV(g_H((Kgs44v9wDNi#FcG~$#Fi z$$Lnq={NEzGgzESM#(V#%%JF~J?Ul~9PjWi@lfVtSw!IQ4)acANl4b~5^qu-`G1JB z0`;#|MsmyRr+;G9Td7|@c2&bk<>cndv(G#ZM>YwgAhD`C`>?{<=5j5;6J4V}hjSER z32ILtbq9Tkc4?eXCcM_(I5Rg-42f{nB9|L=b%_Png9SF@ZDg&oj2h(ba6fKfil<%5 zQ|FBN8@c!?K~2VU%rtwaUCG+j@hQ#g-rqYU4_DtXk?5pmB#kZwzYSy){AhRR%{G}v z&#&f65?Mh;=0ebkv6iK$_n^2TrxA%+Ix-^~81r@YZz>JFs2fh2o_E1~;|?|peNE}- z)-@*Ar$1wLzi2=~yYkGp2_cKwNV=#py-sp`<~dA0`fk&QS`t^oFW}pndO1eJ|7fgn z{Ic{%ZRU@nmBj{k;>|88+{-)GnQ!?2`7Tn(8ZiD~=HYP~fHn4hx>P}+=~2nhma{qfvMnuoheX~hDqSxY>x)-8KV)66@HH&*~>*&fS~jD zz|Njhgnh#C^@`Sy&PuIazk}@dcAcotlKd03Q@c=c{?L2&7%yQ&Qz?ky^?m1DE(haS zL~PO-qSq<1MX!e~tmQwVJ_xv}Hsd5g5(~q@VqXq$G`!g|;%_H*y zMyDhdhjG!MwLBO~GX$U726953tT!yKhD)Eb?JlbF=Px8Ew57e~;R<5csOXIEtQ-`Y zr^2Y9CHldSh5-aueAe;twWXVjo2&kq5BmYvRs@%=%)^Z5d3XS)h+V~WV7E$fuGl)? zt!c;Tis4JEREN`eFsLa|`0*E++ZyUm2odtszG73pd>mDcF&~K?i4Y2XuoQv}77;a8 z9Y+@Ipcz%u2m$F~9hR;$NDZ{6F|S&DwQ-ZUU#na2RIJe&*>_(yFLMXlbe`jX$~jg- zPpf&Rv!HuCsE#Dt!arP407B?8Vv5%OkEnNlmdV$GVHCNSXbcW-b1$&FykD@igm@e+ zdM0TcxR*|V_F5LNbM1;1Upo?q*zwsZcxSn9z$onu%T_cdBlZI4_Dq?OPQk3SXwodq znPDE1@2c3ROp`9bGI+lp?c?4p8!#qsZ5T>fGQX4Q`u>ffgIq&km?r@MfF z{pz#OzbP%0UbxW)SvMlNe_Nohmj_ytWdKIiWLV_}<;99;K~JLqeM%n-OuZ@b1DfIG zvfu_KxdD;nfVP^pV3|F~6*y|sQ22hDA&%|!cML=Cx zZSOzqR{K?rB!KQG^AB#*@2mTWp)TB+M1P3D;iLbi%H)#XpI z5B$Rjoxp(+!GB81q_N{%~5sQ2X!izuqFMPh&4~D_o@xZpLCvnO?-?qD&am>!*oit_{#1SWsc6(g*kmyk z9q{_V^a{18)eO5UmK9HraZ5gpj{ZkUTuX1E3=0=>%s8)1k`>Nd@cgA3)hB)D+@F1d zOGkX}n+W4?7V#Z38FJHmO&_oH{>k5;te11^9&%?$CyO;}Z6ML6<~Dwce0sqn_W^cS zg()H*P^R{4>DKCcx4Z8(B=1dMBNzsYRX&XwpyI+8`XadwoI|y(p7LFZu-_urxJ4m4 zu$h%BIi|NkGAqPkXB8?Dxsw;Dqc#3R<0ybgg-jlJFg8CC>;ZV(9Ar+v zLVRg$DQ>a69t9gt0zF*rC=Wu|0|~+nz*;^)32nHa9Esg1dXAoKctpvn^Mq`+vJ2F; zDp7Nn*8mlv=!P{3BCq>Dq^*!*ov`jq?CO6+O4R=m5%5$5KDbK{p2;n3Dmsf!I=^ia zCT459Y_ixYt7eW4wP&H$R^8*-SDx&5Hvp6Um9Ia(7BCTTng{7ujRP;)e?;NeWP!)-p6Nc%-An(K zTQ$Xs-#cM&mEfx;iXWd7MtNXx_;48TxyL|SofyTmG4~y3Qc?3Ne(&0`)In~TkD^*t z2iH&J`mMMR_6&mhI`SV;j|^;a2cthN|B6FxZC6pgt%kovJhi31xs9UQ9~~I%|Hc>b z=PVKj$c%%+pxG$(>=sn2hQik z=+MbtMY|yGyf=zYp8~hl)8M&TwQ~!B+RXYpVYr7A3{|_ zK)xo4)mx8a^i1ixp4m!H^5YmJ(bbjq>rs@OxdD4HQyQnnQ(}LL`ppd(ZUgC$pdCQx zstmI`q4!LAgO92MW_IPtls}{X;^IMyzw}J`(X-hL&vg(q?tQG$0R6BX^5>f(WBY%m zX2t^c=pXd_plUBoz8^hVJqy1>Xf2F0cd7yVgiWK3I2`lTh9!U?7(lHafl;lU3%(%g zdhlcQwPoE&{C0}Ul#sxY?oma5znXQdXXblh##`E46r-G19ohJB_*wx3rMV12zVg;P zQZPrIY=FCY>ttepAHPA?bAQ4Gnxo?6B-R7_{PwIaKq#?#;M6d)$dS6#vylfdE z?P=}pB``nj3vy%2>+9LaM`>!Nq&{=#{9;%a3=SM@Oun>8(83J{GTuSA3nIrcy1#Pv z^y!ms?tb9J30#VX=p;7Ss#-H`lqK z(N2$Ga58Tn7qb*K=__4;^cDUh7tBrI(zJ0UDQ9xjE)UuKRAzP0+4dsia-vUg(BGJ{ zVrdxBBqa{njzEr9As=7PCjx^<(c>O_VA9aGfTb6}c`&xS&L3IrEf=3Dgbt?PAKrTa57Ya`YF81AV9SgT zN;}cJXjv18gRKB2s{Xtu*0I|UxdA@Y|B+Hej zH}8hl90#2_WfUdjK39eS__QiWUG|aBjWO;UJ`Qd-EJxhqEf|cHm~w}NefuCgrI8*z zxfkO_2@o%ypVDh~_fCXZlJRWsiMdo8=1ETZur0I19ZB8Do8laQVJsWa1@S_?>QGp{ z+WJh3V)2XH!V>ATS1a_xW>*NsbWHQ6i0_1Z*Ukz&-U--B8l1xAsesY8#VK0fe8I_~ zp2Z{C$bH3H&}31*ChG@2^I}A^>3k+559%hDQdfkYlPA5G3GY50J->P-_-KMERrz<=pwd+f`0 zBk4cNR(En3zlIn31df_ZugEUj^{z@citl#o<88vI}rMSo3g=bp^>5M;=M5Kdm zt;CE!(Qh(coy~Q&Jy$}nV*X(7-iUlo_J4)Xm4)PC`8B319+ciBoD;?WZMt7=I#NkJeUg5`Lzkeg)%Oer^}mmA zgjorN%S!-&z8<05zUnAU^#&EK0nG06fg=*uMnGy zbm|7*#exj6!5iSmcDmkFp&ND=P)uuhm-Q3;c9XjWCvnH2-svzJZkUL%0hYr6P7V(M zZ_L5pO^L#K7eGCevMlA$gCaFA>!(%IHb%CsH|GBlsYu4FeV)g(K3EC@QNzvoD;F6- z7GsMRt4ejKVNX&cf!Ej-15@*2l_GLS!cdDvaP@D|O||XYZ@x;EWeF78v_{P@v+ahU z4v=I$R52^JYOtB8M3ZVXD70RIy>7a0PuIt8R^oHs>85e$!v2uUco2VS;I}vC7W#!m z*RiYlFvitd7>-sHz-$Bo`me4h08kYk0w;f^PW8xF6PqSpYxhT?bX1^jQ8LA-C8Xkn z%QoVhR|4O2J4;B%@w#R(#`e%1n2un4F;3vd&C?CThKD0jthEwx0^hUP&_H*=8~)+v z72%22^%Id^<0+s2`d&H4YW~<=46W+xn+I`tu_<0V#2Qs{Z#Fpb=_%H7a!E`b0N5&5 z^iabirTlp+K?Y%jW+yKLNn$M6zi!Vd$st;c&YE%`^4mh=(CXA#^7@tmy<1&Q3X{f5 zCdqzpDpW;pYub}D>AI1JW?h~y3WE>gzU(+vsNiPhvdQ0eng1w zK}gy_sL@dsYR@q3=*Nn~2%F#CS^v(mII)2JZQ;I$-a^L5E<2h5@fz_?Bc@QPPBAPK zFiK$bM7i}c!8{(U@~nIt zmy$@~JIJs#NO6X^}$$vzO7vy?h_#RDg zmIJnN$o?lxy~Uz6VGPgjZGj7@_8<5M?6ugy|0v!2(pO=YN+q}!m&tUK;C&;p^AY!W z!lAIXp(#P^X}~vDM7!UF^_qj=CHQ4{kIM6Y@(M%aJK+`i5HcD06kK*9PC`o9AW7_4 zHv-j;dk+Y%=*ikWVZCs`h&xO*{qDF(lB_>1hO9U4KHL2jFZ(9u5KH{hT;PWe%U7aq zS!9!KL)z&n)#Xg;Hm#y-*f_rLQ&1ZiI|!u(n=nOpl z$`s@O!Pt}TnpggqY!Y4p!#>yRW`aBd_`2WXl1R2*+bn2M-ojMwG^(Pw8-6=BCf=Sr zEpKzZN1bQ6{{Z$oTUmoA3&fiRFr<`|=0nkcC(NPkdvU*2wxGvGChs zbj<)!RZdaqtRcn7>t61o4C-bC?F|eCA>4Dgu^w~+oQNGP7JKQCWluu#TuB-B+K7g+ zk042g9)a3RY0P(YPNvQ(O9+%$|9L^AU8s)NpzJ+7a8Ewr`OD1{-P5mO)c)(LRAfev z#p2Mv!+`LQtqv07ForQU1c&*K6RTTVl?!#urglv2z8>i8GeC{?M3q~bB=%=LZ*H2Y z_v+_rYl|;Pja?JG9Qtf1rSeh)&qy-cd`G7 zTHrgV@=O&ZRd*)G?!N<8_Q~=>)9PXOdsWvlYjOQWz+L|g3ouO-H2?x};158*ICPz6 z^Y827cE=eppm*FQvS@!g_naj2cAMxuNFk)M1g*(vM65t z-sgy*t&oc3?|&1*ozh0Q}p9hl6a`!Xe}Ywh<(V1gn8r@+`17YAqPE91bT8zAR9!j_OnV)zp5k~M3x)E9`spq zN&AQ0=}kT}elJ*YwA|JZovc5zNMc5l;SCFcujOY~c|_raoHiCgQLK#Bu_hcWu0Bxw zyOr7F$^OsfTLyyVn=V{D4%3X3WFk3?T(((!j}kqm%fW*HWPt6^_U64W1#paX`_Dq^ zV103<%ex&E=!>nTES4EtPecCQ%^Cmk4*ghV=O-`Owsnw|vUUz63{uWV(c-VyiKYK_)L`6KWJ(^rgO-PO{Fh8lZdXx> z7&73;#rv`Yf1w8RI9V(&s^+9O>nU0k%K8#(Ntx&**504Otc9?N04NY>dN$xZGn$ z@m+P$h4~Nr!;cR&t+<$Z2}QC{;QaWA50s^p6_EB-pAZ{{(R*gfazo|gd>nipbo|-1 zGXI7i4mXc>IOCG)ikFnB)$<25?OkBFxloWi1sWdRrhw*uJH3hZdz5H6c^PVxm!ozY1(&d*O&hh@pf(eEc)p8?VNTV=ZyotA7tY{$5!3=y-n{%5DUk!jmyh> zraQb@$BCk;)e$$;J<=u0bHXnM%9)jX`(jlBf7bFR5Q$w<^{ryLK}S))eU0&9f^EDg zK;iByGs~ujM&Q`d2!~ud8WSj8qbncRCB?~B7uyNfo343=qORNU-0TOiE+vrFvQdvb zH#Te|n|_z@1eF~IBqO-R+QB+g$l8vC>mepAD}eX=cI!{7|FKT2_T#D#sg`^4ocKnd zHbjoiRY5@_I`}+4f@fskG<>uZul+JxDsX5R4zl!oJb2UFwBTD{T#5{SC;86Kk9UDdE-aN<2s8;kN(83#3DyrVrqmqwzY7m;Q$%LdL0dlTgwwwtzmqV`nr8Vl)+=nW z!r}Yr373k4)3yedi>I>jpFhxQRC^3xA832q@w{e=%Q8>V#kzQ)!J|RX``k|Om#vS3 zuz^5D#oI{Epw}#ML{H}aDGmEMxq6kwt&=GOlWOTd?;Uo?Q|UiS2dlQ;TxGCocFx2O8*itM^^J_!b~px1C<31WF*xW{DrcmSPZ!A6VY!b06D zJbTt&ya2XYiZ+yr$79s8;gVwf_K|0vc%E=f+E0-TSC^Zubs_AF>_hGi_$>!4zgt?oCX={uLJpF;$6v6OC?w zeJ7b#b|q~;WTnrg+;C4@N`K4sOa#^ac&qcA3)g&E981Z&OuaAh8{9812&|cqx?f{7 z*vMVOJ#VqH6c4=znc>Ra^9vh=PCn@Z5#6c$-X=j)_igJa zx1}YCwu&oli-Pu7#SD?q`vei#H_%!G%eKD9vslWWIuyU}NX>!vi_FF!B9|(&A6#;008M*B(Gi7fKhz zq(*weya#q%Vp~2X-%HwLZHH(2KLqY|3g+f=S%r3dU~AE~FGF4wpNLqziw6;M{K4q3 zUK*bb{TD-ZE~4QR5ddQApR1f?WuFK^V$6BQNuxHwgu7y;rSMEv6PK}Yib|^`IPjk?3%ltABM96U59?T|8U@QF6A5>wz5Sr)wHw*r%LqK4N@apf_Gk-r57T0f(=w`A_+J8N*Jxsr zl#n-51?E2@vFhtFqHR8EjQguyT-sS2MrJ`e6*FNrAp7ZGK^AjH_L$8+zQlf+M;H72d!R5=!6}o;F$T#i)gAenK1J!A1HJ_R1CW8I zn(I89voG)XM!k~iM@h-b2ArrPy0=;cm*WgRfbe?W-slqGa9G!m@00~>5#y+!FI8;R ztL!Or3-~Vu8T-wjPhIXf77p3GD*3P*cORd7#7P)wY<^Fw%jb70isxS!W%4WCIY_@z zM*_ux@>R81fgU%t(@NLJt}sJv#hY%wt9~lbJDh~X2rp9fOZ_fR0urPu$}CzK7H!>p zmJKAXS{>XvFP&>HVt!_K&YeBVv9n?EF+$A@il`MVA5nVOv;+g^{Ot#j8Cd`Gjg)Q= za<3~`K1`&0bare59{GohBLXQU>uRW&acveLpM*C$x9D#%R^j8f>mocMgl!sR^8rw( zVo>wq1y>EKVJ+|0nDMn?Q2chw311K8ZrvSZFGC3T;Cb z1fx8b)gscHwgvJ`Rc~A=jwu;p_k&;fr&gXnsaMqsF8%nnUQ)0OIQYjFO4lKW6YF(# z=;sMH#=@2J=l}HST*gHQqd7ddL&0Xy(>Y|=Q9argnS?olYNddumeF*XHb>0&vX`A- zTFIAPVQzOk?Lcc3Zrm>x13g-noBi#&xRtDJuw0vP{ge zC;T0bK2kD{HqDgU@ztEt+#I}MMI|o0WXtDsoKH4ld@)b8i!s`)+Mah5Y(L|K)gS+l z$W`cO7Z&457LLV^;moKCA^S7X<|czuSb zAkRLDw2VCaGxk3mP20uytxXqVtC}?{OZhdoF8LWn&Dz$Fd=#W)Et{Brm5toNJxct5jnDJd za3VU828Ug+(P1xNqbowh+M}xke^G_mKdot)w1pbvG(nY5^UNIbL;lz(D>U<8CU39_ z=x{Q-0mNUXj)(Yc(RpDEQ>LCdRUG#T!p6Cxea10;6Ys0dGY<(cN{ znl6Nk_Q&`Xn@6!(xj!&1-S`F0lg)l{F0+qH&kioC+-oZBa3 zU*^}%26sN=1OYU*#L*ofo5fam#8iMi>z?ve1_U)(&qMDoQ~JbFR=1DnoJ3>_MT`sUTuB&}b{aQLAV#3sKNQKE0Pobx37Bdk%ois99G};Xt4it`<7Tj7A zg}mqk0%$09FUK>K2}Fs5)V8DQQUDjei*jeEwDF~3ALjQO!+Oxu6zJDtW)GLX_Nctr z^1y_gWI~`#lMcJu0=WER)><8#?R(On$vpok{J(C6MtRuyPh^j(1OfFJpe#az=;i|K zNrg5v0tK^wWAFD8D8?PX`jM&|B?AUX9NW-xt<4sfuMLMeZ;Lp&GJ8XrZ%8wNr%J=F zL(7bC8qH`RL}%)!>ySsmW3^J0n+Sh-UXPDPh!uZd_R2BlJ&XUI%}(^tJ8t=zB?rZW zwjdEVEV5o#`E;4x`My<%Ei%ga5fv07_LIpbp&$Wj=azUWhOa+PgZ_?Gn_(YuKp)mFzj{%qj@M#+=( zWZt)vcl__;n)anOiUC3lr*-&?P3EVY85Ah#EdD1e*XxM1y0wf`;Yf9@!6ys-*#)ah zM(RaIO0RzWCO_qY? zY-djs?Bz6YMoNC@hAXj&Xwt9K+^bluni^R1cF^S_!1JBwYItV~o6T>!|8QRrS^qkF z*1+8n$w;(x7@n^y_A%B-4kfL};^AFzQIQujwYBDqIOmy7>Ycv=ax{1J@tHWK8!f`f zoIeT5x1^;%Q`Rs!n%#1(^7+idi0tjj?kZ9_|N3Ec#flG6)vj+aO}NY%pq> zbyrb&VApwQ$XS7e^$#2liv-aT81_Q-Weqk=gg&HXCv0D@@EFqsXg0h?FwbqI9M#`E z^B;vb5I07&(Qaf4fzfp?N3qwVp0<%kG=EIq#F|u(tHPyY0caE?!ZWJ-aqYd1#Rp{jv4Q?=7 zMTcjy1Dd>}K9_&y;U&S;*cKMOd4@Qz9N61(N7Z2Tn@j|6078;tR#Yj~w?8dsgo+L& z81;c6TeOfZnl#C+#xuaFmMuXOwk(@nRH^){H(4UB=4E%9 z`aB`May0L&+d>zWqTeU{xRPx+#$AwBSx@5Bb9#@M%Q$|Ledb~6;&LaGf}Hhv3lDvs z&3~tvS$4v7;y7pOefUk~fpXr*@8t{s5$yo{B|JGk)n8%>$Ai|=%_i*%j>!v6U#i)$ zV_f>=7Q7gEg8$_I7gDbh)`Sy!h8IV^&aN1NCh&MhyghbjeWWpxJyCm7-^ldQTuBBj z#&-uOmEH9Ss_(0BYlvrruqSS7i!QV~t!4L*-SyR(z%AcK+k6f>?iyg~-(uVwpM$Hf zKM>Nt4L#0fklF=qTQoMta@rqSv}*A%#Yh9ZtK- zsy5yYpBPicg1xJ+F)4o$3`p5)%8S#uusZ!$MwYAzLz555Mu~47i%@Y+>y=~mdJW49 z!`CAjH}qM!pf%z2mqibsj`{@6BV_eTlQunPH_&2o%bAVPdLYIXZ$7qgE0Y1wbEaY^ z3ihcz6Zqj$Su66vJVYQV;zFnHwwB$u7H5{5S@V@;m0#6iG_TPsu0%F_%l)7v-z4OH z7*@GnJ@c97YvSOe*qL_bOf&R@kEWZ8(19y^#7p;_>8zrwRiu0?OJ%4K^TXH5QsCTKW^s@p%+P`p9a!|_R*$r36C2ahl z;?R9rgifAniap&!ivLAE=U6cjEW}ss#un?bfq|or$NE?~>Z?Pil$xlX)(IN;o-7A> zP?M7cW_#m#CE!*8f2E#_c%D}#f*gdxfpp(or<|T*YZ7H{d1_9`Ew<;>FQPZSiQs9K zC$h)F$<%q)^YKrB`+Pr({Ph+`>$D)b5IHBXyuY&Y8!64ye{i=%6$U}*ZGqgEpNrFjdP z3|6F|p$Rh{&nFgLcOKlg>3V0V~agX+=l&7&0XdvaQCrd?4TaT^&Ne!GIXd5H4NZ zm#__MgL!R?wk9NbdJLigWEY!@!lr)enUE zQq6W6pT3WytCF<(&^xwW&4#YC75QlJ*P`cEj21;;b!+@V=jM!52k?2<|0U6jz}JAG$$;iv<3Ek6 z!Izd>%yH-fe4R)v7AZReT zOKdYSYrPe4*0Q05@FcTD{}FuzA}0u*L{nXwtSR87#lNHYF|IYL(c0r#7Nu^{$De1q z_Q~A>7n{0@Ab4Zg#jFsXp)Qy3h8}bQL<^z{y6o!1zeFrkQRG($DdOq^v6=K2arQ_O z=Chdg5edzk!UN+v2>Qs2Z~)`{!2Y$lwp3a0kT&0vF<$YQ?2c*|@##T&*3?K_)L=mJ zj}T5`Su0nSy|Hq|Qs9RQ!WEeKGEOA6SJf3TQ=wvc&&zV7na6 zPB&9(hpzOEyGSlGs+Lg7@{J8-g6d~EUI1O)`z z5Mg4K)?I9ns}%owfQPK8QWOhZZaFi6$3Ot#))R*@T%w=lp51>$xw`PA>iGu2WY9u) zS>jy9uWt)0y4B~Os?O85Ty)Q+mSGq-lt-U~j7Qh91+v?FWY&Ky|BHl>SN`8#->T5W z%O8%iydMLnNa6PgLO8ieUl6igrE8VeSBCD_|FQR7QB8i|wxA*+O{GakK@b5^dNZ-n zL_k5h5D^d}gepx!qIBs+K#D}A2?$8gaAo?_xqhO&KPHmdtc7O zefvLrgSU@lXYaMwnrp5(%b~lU%DyA@Yi5)t zAVs?I4@1fZX@TZj_YcE}FAcgMyd>Lsd!@r+m2eeh3(Maptfe+5poV@b6&YzoV?jI# zxYf>*tUu(mt*2Qpp7aXEE@4k^+M#n1!*q5lahD$-o8I_jF#p+>tk6=V*NPq!Xzanp z_~C=6!AwZo7Yc4S{GW}}_>M$03c2^SKP!@5h#nu_s1oT71>IiGejxj5cph8q#r_

J_*7D-w2(y$MT)Mqbd208 zhdiZdmLKhx7e-h`elJy%{NXD5@CfF!duT({?L|%+ba^b=Ui&+)#p!9G@wx0-PuDx9 zu^V23LIgIDP8&n$E*3CP;+z@o5K+UJ%$PKpHQfKcHhW=mRs1_uJCKiX@DGEe^CIMM z3%&nUX19)P`42-<+6lcCO@0b#|A#>@hjF0tw`pmJkzC2Pm+?g{4YMLml`X8XjsHHl zc!wQsOcqrryT>2k|L}m0z47Fqi zNMaL!5*-KrZLuq~cRpIKbmaJ~MO=x==fA@RXcvH2hdbPgBn}PtLU??QZwMP-nARzF z%v@X=kYDZ!P1Agq)5ei!u>$mmftM6RssQ_;*F}ZT(UHo3xlY>`xkX!Dx|OE^d&P!U zVOCeu@`3;$bo*QI-*C7HAVIv3I=*)e30kN`QnSs0Eo_%A`paUoS-I`gV>@k&+okG0 zB~uV>*Zn%X#$(Hq=r`Yp(IxZ12}JbMM@2=HcH%OrNVZm?xtNUdS1x&(S_)l7zhkFx zXY6c5S_VKZBq-^Kdz$ z-zbvjb^JM&S7BW&+fMDjM2gQH^6X(Fkxu`{pG=Y>APjxkMy>Om+<@bKjd#zu;x+!R zw3^Tokbi&P1EM3)9=p#&6Qd6;_&);z{67pQV&#+HO^=spC78-T`-+!;+<2OgEjL|W zh(HmI44UfFn8t)nFm5lkV;D24x&m?E*E0_RO?sA^CC_=ItVM7YvUXyFKoNR>Y_W#y z@Hk>d)80Hf20!|T0YVREvX92e7BtcbGSr15)D`p0lrieyC>oav3v{g76j2(>@!NQ5 zIpVdYf?Bl(6E}oa0VcBz5Q~A?{>MZ0KOU<8@la*?e|o6W|9?1P|MxqAiIV?%KNWx}LjaT$;MWV&JN{wd zf%BcHP(0e~q$zbING}=7cdQ#mz2=r;nkFRh=g*ALC>+ryJkZi;r{gejOD1WoaykcM26$1hkv+WH>lV^23rT&XX8H*4m*PK(;6yNgUZ^7s;X{MIi5}`8* zb(4*cj%wge)`pR19MA7AL_&y9J;D~a(6}%2a=?lOP&^FKDG*|MxGdH7X0-q_<|a&D zgj)~yCMnPPtdqn?*7o*w6t0sTlIBbj?X9KMPv^p!ATyVqk{&}N;6O>So$fr17(N-U zT>U+la0FdmI?S;RMD~7Y<+G;$0oOi85g$s8f$?X7cD2>FsH_L7dF(PL5CZ``hQ0UL zcuIHYOcuwfqZ800phSEU%-Rj&C*3!cil`B^)rMtvOV?ybUaa3J{rS0cE3mvBO}So2 z5)Big->!y$WZpVjc?}j!aEbsBvCr=0fq_z`AQjy(kH@0B|c?JKYNQ=Lq+cY zeEGHQiPhdQ00J=kWh2|g|6$k;_yf+~XC(J-0Z9;33>GL{AbbI)oo`WFSdv9s8bWg1 zU1>@g%mq!foQ9I_xut{M8k|v{b-^b*e^h_x30bI@A=}#^6`mAEy!^B#o{TCFNO?f= z=!ydIYqLGM=i%#7G}uJDYFUtt{1(@&kvy2g7n%@;##MqCfSLzQkp`LzOL&W}m~#h6 z&VcyUZUIET{cVpME7MDc)%q_JrO%W|i{TE-mql^zNhdNCP4YvcMdvUZl-vmEf$@>^ z{}7rDUK3}vgLpp(VWkUnZzrCkR(~`qF7^-Mz>mHIk6SNwgDzKNuMoTEb6t7!z4I5V z1CyMSbWJK!THf_Ok$H>m_gS0>U4n2?Z7IE8CyxS_IuY!&e0xff4Aft>5g8+oxY9$K z)1_rMo?73U4uTGa{kU;eEg{j4QnIi=_?weZ41K=Uw{4jN79mRTA*J4IjnA_iQ8W%ZfkM}tt#kW#u2PO z?Rsei(=VbugpI!n@?G5#0)%N&@5j%6AARGPL=%T1iGElrk5d_{d#1? za+(mgPWOVu%-^Pn=UY;5k+i!in5HWgggCTnKZOeix@GULPM|G80>C>o=`xgrPV>VY zZesp;%@3cRmrdhZEgG1CU$X;pGdkt7)U0};^XuKXpKrtmpGq|UE;2G15#JxiA5wXj z$V8HG9(yseqP@o0*S6u~#pQF8nxk>oqepFCk^ z+ZKPzNy6)32XYZELwi3Qp;Y0#WXuCSwe&L5=3Lp%!Uum9OI^0-Wei<;&RCxPw2Tu) z>WvX<`^}o-=Sr~#TGqbJjy*R1X$JFu-@gZ%Jv)%wrr`do)7}tY0PXNMa`mU6-T)OF z6FcREmSL~t?ecALnSai(`UhGzu^J3~+?tt)Gujw8rA?TRb=?W8IvbgHh$ZR#8=Il2Vg7ZS53zb!?n@4Q{RJuJD?Z0U+1O?09yCP_V49oZIJsX>tHdyeEDrx^4`|f`NUPki()((A3vRLopS*w_=3l?PnPj+=CpX(ihTaFzb*Ah;TEfWL>$d zMao658*5*FDKHO}88B#G2w2aNS#vS%d={E5JKo53W~{-A_5s)jtY}=J2naazsq|ls zkzL?egTe<(3DExv&G^`)GzJ5D|HxP?B@G`p?E#N=-x1Bd^aP+76xfL&vfUG)Q7wpSDHo}J zpet-@7*&5#&FQf!8&48?5x0L>FXcAfL+wQgCd3YPfL7$!l2YY45x|UsT49dg-}Mv zEL7BW%1?`SZJvJ(QX~>0%DrpXM!K#whC9B9>W(fmW=Kgeg;-!$!PHBl1`Q+Ne}7BH z>X^yjCNlvt`f&hK%wpqv#$v5IfZq3D)`*NT)fiV_S#^bcn|jN{{q$Z2Faq#j`67jr z3}lHR?+^ghYQ7HWF_7Cxei;R}8xyE$aiivn`Z06OEAlDif$F-tdIKAX($jUQ%Bwfh5$8QP*5_{!cVajr4m1tW)duoY z@Ow^gtJaTuA|A&`CH~~ETDkJeNqmu$4>%1Xzzz^-qI`hM|5(rDi&8Fy^!kr(W}1If zI)OS)PX+(T`yYOFsvjZTyvz`aR2v$QTRX~!TSxYk$G(lo!Y#16 zBx#>H9Of9;PScLQEFRAdm4IIJ)x&?_|NQHGUMBrc^8pfh8wVnE<1KozZ*pV8A>br~0Q zdMrV{lY{TE%+d5`bRlFr7>xUgVuw59%j09d7%UD1%RDCqw?sy0=`U{bQy;h%1SE8l zj~)4}ThUCw46Aa>?6p6)G6TMLlvJFw3KLQEbEUw|z)I|mR{G$(z&IrpT@Z=a@Cf65AW1VCd(7ju#Wx1XZ?*BGW#B z;Qw>>GjS1PNi@r;bGR~bl{lYOJhh=hcWbEf5}q4V{UAJaCEcrU;qn(wg3)S+9ZxgX zo}|5!0liu1mVc!wzyZ2sIwq0t{~h!Wdk^w2X;~S;M75@6qV0zxk*Dcu8+_IYADY31 zN_>Vvh{kMUiJ?T|%$<$y^TT8pt1BGhr_V9SGB_M6GjKoE{=Xh1+XB(bn2qvwk9n z{v~FyuL^wQr+>bZfRSz8o+#dP6C>5@tVS7(Sw^+&Qg%jvc)a2jn`)8ZQmTxss%nme z)#xIRTJnS*#bpToCs_=a+GjS?*9O=w=GwcqKZy>S5=u)>#ek)>d{1Q6My_R4}^<=CvnEVdX=_d@KF|^TeI%O{?5+tUSw}zzWwl- z=Qbmxr4U1XusC-gW?_}1UyxcQJDRLd?xR1*hu4C^*&7;`a>y1sWX1cXS7Mk+5Wsg; z>}%zLOHi&6VbN46;;cbX@^HzwP5GrbQjwan6t3 zx#uF4zn8O*sY3=9vD62&FLWXJ%|N9VSy09I?!A6X&N|lJn=yNSy(-))ydia$T*QQ2*julsoHj&vy+*+Kix>+b|WDm(S_~wuO*5tQf zAlD@Fw#W{#W!8+wAv4x1;Wp#G9zwmigmay2>6dQBDpkqqjn=WAuMRt#lyEPRnIS3j zim|`vUe!ufIoG8}80#v2*Gfhj#$;ZV>qskuL$n9gsGI zUya`0=7#dVS~TVUjTwwo`_v4>dd|h$BxPIdI}st|O^WD=36+z0kkCEADrdcTJgQV* zHHNb=u6_94>C!0<7O)x{<6qIjJtNg#xGj^Gp=Hl4kGbciQ?7aWy(S%1AAU}C!_M=9c&j`05CVnDuQ33WVqhUDp~hAQh1;`nV`M|wT77-!pdbDJ+~eVkM*4Fo5@UBF*v6GK!E5|XrjJHw8)vtF}f7|>_-Z~ zo+ba0>LEmS1G1Z^)Mc!QdoKeT% zF07+azaevyLF|nMjc94~MEYOL`Rd-Gm-ivoJzj&ysoiv2<~?b6Yl(;?hmXA)`|+bx zOKvHv;i``opbs->t+%W|$W%V1>DtHt>BF;guN?NmeCp}wWYWy)v)jiYnrOKL_NyJw zo7%!%KIW}Mk~EdyL)N@6CMID1N{DFcTr4jBGkc;FOdT_{}j2LP`eC3UOCeSOiu zHe~)3W%K|i`T#QF1gO+Z=3GEy_RB9=wUkrE$?tsy1HZjE8Gm#`i25ahSq7^{fJgxo ztY?(|r9K6_Zw}0gRmQ1STY6}WzLBA7aS7_@bf9%|&5U)&CVnLA?iu4D@9aS19?q2* z*mOYdOv5;Ga#qu~8xesR6s^Kn{?QsV0-&Ffqww55M#4>rGeYqk?b!IY*!9CsNJ!(^ z&&5L_T!MqbKRYhytln=BJ3Z?#Ln?neN@urrfw1k*#01nG4Z9S#8xg*BY}zxAIWJ5! zy;kcnk`_uzsjQ&oAv&hrG{9TI#MMW2wnZLaXLL$pU);d41zC(A?5=JZXwD(}z?|;c z7aT|AAk4LY2ZMDX7tO2<1)c-=$Zt-74*kDES?NF{6IXORPC!w4xE9$2+(w>4opR!R3OBN&#plhw+{Jp-UxO0TcaSwVj7-Y|Pkts6Tz#IqWB z$o)uX5g&#kp+e8O3cQJ6{y1l6c=9edfP_jPY@EZsBB{?h(l`diABekxrK&Lf6e|nyW z75_S`!{&l)yww%F2qFYClWgGzB!3I~?~%69yH#!-;UR%P(7RLZ;&%|*ch0gLC5VV+ zN_7~j$X{@wtCEy^-rr1jUClai5pOhrOC)RdvmA|Fy3=)2|JUhcCd8HA97!j2KanS? zZ;~--a_W8i=GJ-SiGfzK+RslKnD1YW*a5{O2y#C`77zhZ?j*%57Np`nzmlvOwX;3M zs@(=?)FC_ECZ23V#rKPo`xRChEcW?eMN|c%QHnjToCBJc=px|TIAuL!MPT9fvVVH- z4DHXYU}LEqH0#u&<+L(P-_<=47go3_4)bQOHO4Qqwy(`->$<4l^vt@Y`$Nj$V z?!K0#u6c5tHa@;}9g&DZNufxG_?9Oo?`Ng|*u|2bz7>>HV#5(u{qqSDw`0 zochIlnDuWbA>=*P2(XNtliWP<`YP|1sO4`UIxQ}usjam+bHFr&%+YzgQah@h`SM9&eUE0n_I|d=m)K?(def)Ca_p zBh${Resox|J!k}aeJ|GN*6DDWSe}=+A%%y4XpTh)LXdj1nY7x&c~L5y0Fw-WmDPwP z*Q~VN>N~@_a@%e9djtPc4P2%Kev{%rsv`fS*q=BDaBBawwVvMNn;uG$`BYeaQK4t6 z-Xx6WT(UHu6hq>es(;$(#w5waJOb$m1o>;P^*Ib^;llfOop$~ zAtZ9{Sah&8A7`*LB}!iUbBR8>*u6i{z|@6RkV;G*%G9+rhhMCrmG%U$-2#RMuw=xR&p#qBe*l-muBOl|>+d^=g zG0~rr8h^g$J}T%Ctbb_n)aS4)& z#9!Br)@1Ef*GR2q2HSs1`Ms8Lq7|9`+*DPa7p2~pw*uPrSUh?R_A=v_{Z=giIF;NExl(h9+W?({n#OZZXtR_ zR4AQ!QMiVb&gs<1D8iDaxB(%-i)ZERIX=hRVqPl|!k5p}vvEe^*{@}rF3Oo#1Y;cv zJ~k{N3j0BqpvkOFdj-FfmD0T8@H*#9oZ?y5JdK4*7G&*gb?=Q<%U$z3O19S040tTa zk}K@f&0wqAeK|dgG@XQ2R4_LCmP)(SiG%>;;v)TEU4u7j7My zGZb?KPgVS)xsspTCVQ(D>Dq{nmv8yLXoPH)5>Jeu=%#$Ny2<%GDqVR$6~1sHlIAhx zfTzaJ#H^VcreIe-7$219$t=pi-R1y1%IZD95&f^5`>4DtN1paw4*VL)PLb~v;T+uo zU7A~~^B2HIEstrn=wQrryKAadG?78v9|rVu6ud25IFH7G`tmi&`2$tGG(dg}q0mC? zNmCm1wx|}!DaQ{OoOIiGR9&ta6qa)FmGN3p{+vJFJaKsC@5|Uf?;$lxPnh?(aPl({ zauu;IzNU$0XXx5_tWy${*X>&x&_6^zf-5=%K^;d2eeGe_TE6;b2eLU>CQENy%|<2m zT8mbRQn~raOU4(ek`$-!miHEg!CmQBT|XO}RyTzNy9*n1P1lsp*4|(AsGiX+nC$dW zda7jV7Wc#{UDg1jIDx|2v^bMrh!T*95Esm+vx>!39;B+;rbmXq1S4ef1@v1ka9UaCje+87Q?xTL7}%|7MaDdlFQ zcEbTNlaNUI0$m9Jb(th}`mJe3RZ06p(lF*q}mk_T5o*2CVpK_N)wt25Ap^AY!mN%((RjTl=Ap(zV5Ar=ZLVKWu)?(mp4v{CaJC{E7 z7hy_`9yk`i3%r)jtmQ4!hQ!5z8NX6;PrTRWWn09bjx=*R`;@f|4An8MypvTalH_LV z)#~p{dwJ@`isbQrJSFX6eC=kIc7lkf5ndjZi!)=to+oWou_ma$SFu4;L2HXm6>#&)S2 zw|D*aS2O8nNeMl&@*Gbrp?k($HZI+^d2~!mYWX+g#qpb^z{DO$iQ$mp_1uYO2O=-F zE$4haC2PqB1_*5Kv#RH%7>zEm2YNN6>D!7wuL@@zYTr}wY-W%lE|9|j7WpB5qQPeg z=!xMYTY?>r0a+j1&zQxAfXR8tVSVA<4OAe$+h2ElQKCX@)dGkwIUuFItKm)GQ zCF>n>({_I?D6-GAi(}>w%7x&!vwDsXxAsVyT~!;T0Ka3&gcHzTT`}m(J$O^OS=veSRDX1n?=yB5&cMCDjo#MArCQa6FQ+BB>p};j0h)kL$^LVTtD? z4$2=W;Nc=N9LwRk8fG6&^FDeE3%gg3*h)^s)uJ-?k(0&o!hMBmzUfBp(RRh;{pns? z61W4&uKf8FP(UaUH|91nLnGppzp|b1p4qj(I0G;bbJ^~{PfhqVo9x*YO+6;564>!b z=73j=K}~uDx2aox-%+5*vR$?VN!vH!9Tb%)?^U%+1eO>`5dv;%P4#~iXL5MDWP_&k_>r=BfekXb~z(!~9bGd9pvXpoJ@34EfGPrpi z2DD(Cp?|1-OO^;Odas?}IM|sp?5*CYmelJsKnCW}doVbd?DU%{dn@jWM9SZ9ViNA2 zMv9oOdacS@{qVLY`dFh!UUB&%HtWH#LEfFSWrr;oOel{9Uj%%7rV_DTKDgEdYLW{= zwHzl9?tFn-KW=)}RawBTGZW*!_N-idaty>sYl7`VeFuHI$f~B$x5~xMwdB{ygQ

0H5mPA#B4A*RbKX7lOYn!KwM*X+r!2=%VlB~QQYC~89fu1bYngJQWCuq z5#$8^S5`*iSr;YN*D!|4P?;j78Sl8p(z5bg-O**c{z7G;mxn;IJPi#R; zw#Xu27Cf)Z44TkDtpOfdja&zvn`q?@=(MIl7tKwJT!tmMiq43tC{!eCQ1hgnK#|ep z1;*$1$(gURxWk1ervqI=KPD?@c&oA{tMG{V-;r@3N5yMywlI^UpFwqFs?S7k#n#Rl z>pg`j!svUQe;c%X9jwsj<-h2?aT&2tyxA)rQ3Xix7hDuR* zE>D|-ONJ*gzrmFroqiuf2>3y@+L8Au?qRrK*LCZoDzaL90NRN&^5SgvFD>=Lv+IYg zLmBIU_n%R#cU?~p&?0%o4EXf^uTUmnQf`(`(mFH#<)jg?7fP12B595V@VV~VbbXExt#a| z|3#b4f_eW`J>2_Kr(=IUX?8}L7%T73K8=0mZT5Q$B}Y@Gaoq4S%1w7gB1Uhtkp2zK$KdhWwyY_ zWjm-GsKajn%Qno)2gxJKcv~-gHg*C!l?U)*>Nyc3*TA$>~_y}CqxEhR57dL)T7>fRP=KMVT0bT|oK-ETP%2@Kx_e{uYdq~_^kQn`b2KVmVX(0~4{<r zq0-k9whJGS&vt$UB1f;w$@2-$%NwXJH0&MJs|ttzDAEtwv_0M_pU+qZDrheH1*#6J z;}{Tp6OFD2aKM+7vo${Vgcc2@@z!Dg%1?DIK4ESAG?o_>HQR7mB8A3E?*l|xU$vOF zKc#TtILg2DLG#6B1rP~79#S7~Hzx@_HlCN&KV9at3F+aa@=B!Et^BD578Gy<$kJ+s-FS$P3Rtg(OE)*1>|(Ayh|$! zi20Lb_2JL79-72~uRo;}{0Gt?iZsPH1GPniPfv@o5c)dHicki}$5P^YxBPa)E& zZCCEMp}2=%qN5N2$2i^E1Z-0z#fCga8D63p!T#$R+IZ% z=H?uotNJQ9p}pskRPF^jOX!IK}X&eFJ%uWoqU-P zaBP31Ht>FTt4eMWyhR;)wp_R;tzAc~o=D%Fv)g$9@2~3qlT!FC{tyGh0<^i5E%+1w z8+K4i3aNJ}p}A9=iGsjs;LqE8{HQ)|?`OM!ko~(SK5M;`*n>G7N(9`o83f@9$O#~f zt}yvYQN&FXt#pjVvsX_mQ)X5JjHT7HG$xq(jN~PO~^_s{u9 zhP!TG-$@(w3+F22CftRVc)-a=>DXE9?IhUZNgUOzTB;S_!&zdZ+8mF@9nOPx6ZMxP z0EbOl6~H4#av<$F8DIOMSRV@QieT&zPar zhSH?>gNbJUFbJ)KdEcOU5aSSDAjQ3TJh~!h@Sv&Aa%$Dj2d4bdt7|7N?-c#j*$K{M zZQqf;Dp_|kqZ}#rZJCnIK^dg4l)L4$#+1xVe!`coS0AC(1+?u_w^o190A=t*~hE|joEIEBpfH`1==J0=e>F}&0)ig&zaP{F*u0t&L_dEl!`3ZFX z#vfMMZ_{f$PRsh&xqp(cDMH4;y0V9xlg8hl`M(+aTDC1MS(d&#lSYFDZ^I{i^tBrk zr#ANhYGx2h$q>4V_?0HXeH~oCZu>mKHP@5yD^06G_Si* z&&o4h!g6*rUt~+aM{Q^{P%A{{M&bR(IyuoAgZ;!v#mbDyiC;=IPBQsz!{J+cy zw8|;+LRf1LD)&jRKuwN_SCULz8$bl0y?q07TfV=eEHw<}5ninr;W( zaeW#+n-tlVq%EQSe&<(jtqjS5&bs8}9}L6~1rl#6|5J{9x$wMq8u1W=n#sZB^SfQn zvw)8z5IC4z8w)<;L`C2#B&Ce8rRM5!GW(Z61#uhc^Ouu$?}w#eueNMUeHEB`eC19x zx}f)t+R5&n^hFfmS{X&gF2W;v`p&GR+H&|!F_WK?fqnt+#KG5(QXAh`(_esbg$Nk{ zI55zJd9;sef1fPv+St?i{UqY9ibgHR?%fRqhBG4CNH0K5QmQRZs@IM^ldP8hTi3i# z_|ujcz~`erMNEJzJQCbMR{@qQgspo0fTe8JV4=!JI{6vX!xvlN>x{WhfRr2KiU^xi zwrUp8G{F7Ev*x>=6e)Y8)$OjkPrXtP54?XNtF>+qT|WD7-KGw|Ni6IH1h7O|eNRNe zPe{WF1@%ps>Vr>VnCcP-S-$C(5h+s>aJ$QIE%#_j=Zp)=3^cAUKZ5;&$=z59Eq;8j@lvi!Xh*Y}Gq|kGe(s zg2XK#r@8QD-*Q-K*)JlyK-`Vzv3wSFKl68+`Wh`eE-Soz*bxg_nmwk^(A#s=fpG}L z#egdE8f7quIkH(vn`X!!ODB|VjPb{v$HJArBgn8*VPu$ZONkcoyp!*MNVJ*wa|45 z*(no* zntJzW#zVfKX=Ho*@2jf^iJ!jR#7D~xpHw#V=!yCA9HF9GFF#NFmSe_HFBYB>>C)D@k2 zqGZzVlaQwD-r)9z3TzJ`;4`^W6rFK%rtK2~-RMvysQ3xsa>Xa|s_MGaY&& z3ir9PLwc2oKyc(FzbEC1a^5Uk=((h?mDDFPBZIzb3OXNXpK|z4=5*@)PdB_Zuzeyt z6xDphC?E+O8Hu>@fL2oimn7~)G;0^S4o!_oL~QtlKE-Ui57}AfL2@o$ap10?eqzJX zq#|1dJIX|IoN~rLZf%t|PTV$Mu|64%hBeK=e!$ZlV%>A&`L7pz9FxN zgIL!@YgGwcWtQ+~$xHeb#JJzq$jPs)LdiPyyN7!B95s_B2|Wfu77LKGS}epR;}~P& za3t5`+Ml(xHrE~eKIqjwO$|GE^*p&tC~it^Sa}n|-g`B;qS<|H++`dtfbHC@EWb8b^^lRS_|8Ior~drD$LWBVGrGiU zvq_A#R{ce%i+etZdTJJx&%n9coCw4G;jw1Ad84)2#RjAn>af(EFSs#_jxBs_nS^U6=UJQ1@U}{UCR~gYrHgxLf$VZoSZ0+t{f*IBKW)(>tzpg|2#sVW5X` z39O9BrB7NADL!OF5-5}7X0v@sr?b0ab!t_-L9ZCVhP8iAbKP6T^P+?!!%XI{dzZNV zc$A&Ayuo){r znCGu+0a*xP^#s!qr`j=m2FqEQEU3fo?8VUCY|%o86o}V>9=83uYGXyZavrkeD+YED zCt!G!;4d5HpnkB9+JvNlWGGC+g#?*WO4`QRoQcD`EOO0aEUcx!OrE7ZZF1jP6tb8_ zRU{Cu;LE3*$Zv?KUNoCV6!o41ciqKPEOkv!cmpg?c|L8wyx$F{ogk^kq~SKdT2gPQ zmSiI#zJz(I?B;L6#hN4|Pn8k}PtEsYAr``hNcn(<_oWcAF4KJD%(pVW#uwso)U}=Eqlh(kkeM+CArH&Dsm%`u6MAa8dD-xt^&XI%=Fwq9W4!=)9 zAS5=XUp3`TUR8=eycHUzFQDSqD;~j61WcKD8e7w^mg5yWg0y;+oebA%o=?D3a$hZe z&Y9n6Snm~l*F8%1t+w7NENcxmjsm+a@Yul3NoP?f;K}Kh!*@!;x8F3ulqJDa^irtM==F}-^a8Q>uN0L_iYx;#_eiEoa)mt$0?@i?)riuvRW#;6 z9m8^iX$r>eD#m_QW!EzN`dxL0Q8YjnkH}#|;n88XHf9ZD?s=bA0`djZUcJb5gf=|1 zRZPXDr12d269FmP2IA5ZUpMRl9umIKZY}q~*qqDIho!d0@=JeO9AxFZ5s!Cq;OUCf zYYV&xjJ0ig_+OGrsTaW8ajAqS(bn~*#8{K6)h2aT#F%6Q)p(v8cdKRcr%xn%qU64% zPQ+772|$Xb^cH2S8-DF0h8z&SqZ|yY!Z-!Nu%>G#;AjLBrOVflucK~c;5(E{x$NsJ zgjK1N(fMDI*`j#+RQE325^0CLMzl-81#>D1jqO{`Ha}~!$ zQZc`N96Qp=-Z_2rN1tb|>s2k3U?S2a-5jH3{~hW_ zSH()zy1sz@Sh}@7k@j?X99YhrMQ7(%N!ZvkV)P>Tzq0+$;Frg>SaCbi(_22qWzCY? zHddbfAYI$up`W2a8c?DTR# zjBR*=n4kgrpBA0384DBz;BLe zd#jS_EEQWFgC@ubo9HOcKfD~M_-Sr7sf9-e(+2zTMS(3C{ei!!JbUDqc)*CsQyV)o ziPY@WTEks})6N~6idOym5hBG!QH|jTu0crOJE8{-zYzlqJUjAAA6pDALOneFAO7xM z@#J7&V0`zMBM(&wF3q`82NEQ**Jj?`OJ=a-*i{8ujtN{4S^6Nla;`AkcR(}lF8@`` z{r(5LXNRsD8|8wv56erme2QGw13@K=>fN;ox$Tl)k&3~SrKO3#wh+;IOD#x8qW?L2 z1uZtudf0RGvWb;oJ5846MF9;C6t6NJ)wj^LfrG;*?Z@iee>)+vorhVev!!x znUiu3=emCF`}yjgSSj+v@L-H2IOew7?qhh`w;kx}5`Ql3T7(XaeTQ)fID2Z|y%)Vz zjRi9WvUDS}iaxk`yfcknk8~`Eqay2j;orCFc?*6UXhwRKk(R7M(1%= zB{GMEiHwwe{&XY5Vji9yQmhUdRoA_ru<^YQU?QI3CzpJA$j6`pl0Y};u6!TOhEj3= zNeAae1}OqaWm}Oq@?k)nE`p7SyEmMk0inM6=HZ=)FY2MYM=86@2ny(fg&kQQs(0gV zDM8V0-MNvgxfZ-P`<@>fCVM~bfkuSG_QrL~<8r#^FR#HmC_v~RvOHu^QP5r^R`FDf zcz@01bB0|PEr0Ewj-x3rvg^8qFB(C;EQDS}9S<$uLj#M7tY{HiWi<&7{_Ws>UW6!y_dp+$#u3~y5_mwY*M+(Yk*~xg;vzO8 zTAN+uX8F%_$*V-u)6!z33EMVe9M8At>iqBX^IuZ8*r5?rL~YDcUlc@*EJ-Lne_%I~ zm6rgsAo2jqNdlnQk!umeHltC|zDpJ(8im(O^*p5Sw14Wm=x?zo?m+L{O&fXdjt z=vwvsx^u$y{6PvXTJC3>QPo~1d?NF6yN#bE-7YcumCv+1_6A%2K^G+OYO+Whf zNRPEeU41FDRHg+tDyOp#2x~I!;s(DzUPTmv=6gLT6RGPv2KJ0?eos6aFkE^JdKAp1 z-KP}D8=OI0y+k0;X|zP>c*V=EB)vQ6J6_~7l&sB1<6G|<`8Qo@v+Aky5XtiHEo*VO z)Rhq(r(PpNqc0wT^>wX&7PWT8r;Jq>hD9T+J}UI48&;~;7W+rSv%auvc#wXgNm_7V ztw@7S_J;H^MaIMavReYQMY(-4N2Wa}YN8ju4?TqjUxjC;EIoC1oAPF`_fFAje&R;Z zCW9BNl+FU-2T#;8LtpZGUi8f&Q;ivui#G9#K7s-LSJKb0=H7#yY_Dk4*7A&d+J0yH{T4^&pLbb$t5Qb)zD=Cw~&M z-khhS@}M3zn>=&X-C#zb3PO|c-n8qx2MwyIOU2&I*Do3w95q!QddtnnbN@D9VCKiP zr>LEOWG(udC*l6_4`q+2M5wr>Zl0)Uh@4XZ?JF6G14c9PW!LhJahbQj)QhWKEWVt* zP%rvq57ct2iv!mmeC#*%aRO+sL9W@H}B<0#l~h?PEE=?DAstqIPSZ z`B|j7fX2~;uI486ZbeJo6dOE@x-N z*1irBNG*3`XocJFoy_kZfqpL8XkiRKdQ48vyt#|~Ny#WuC&Nj{?sat5>jmZ|`eRK> zZSqvMBGw{DS6^<-j|DSmRW3#e*aID9MrxntVz4LVWjl)XBv`4hu}w;8V^vK$D&(8I z(nYAI^BThOFb+kd3_>$C%=)pGS9pw!&GAaqZZ$B|V`#V_ys)z$AMR~>0a#AkBdIIlbB`GUY0AQq(Cb8f?I4d}N$mD5Qg)5^xB z%jVl@^DJ_=e7pzftQRbv`8fzX`?=?XU{F8@TUFm@z(%x<#g*ZA6{fP7)hCrz-k6K= zNmJPkL8cyZR@X7o_q?*51#(b~heKzU+$y5O&A=LayBQOHi=Fza#KyoO_^&@hR~!>HZ=e0w7`fv6 z6&ibcUu;S&g*HM?rz+1+wx_%A1)CmdzVh}l>scx1Mds2-6OSH}gjJ&I9|LBTBN9Z; zD+DKeA{yL&3uF)enC|AXn#dlc2X{8itjPK~AnH{nYAR?QLP$_Bn+0`CUjWn1*RNc* zkm#U5{frW}Vh7Q(pO$HT(o#K3a~&3@AI#C zEPU^K4^L2*%w2Ls+HEF!xX#RSaBpwa$2jQ@_v?5pM#o8Zei+rK*k!Y4E_9B`h+VIq z%?nWpAkw%<6ta6?be|bKgNDF56#U@PFYvnE*=$&M;mLBFBR6|flh1YndcQP@)dE+e zl({q6FRSrjxN}BUTWNkf%(wokxg|)_wkpBl`un$w@?%fEW@XMT30LtJL!F?T?wL+w z4^YT?)xR1m^#cYi-qc&}p3>8^J96zwA;QtML&3(!xxoa@Br{l9f|*A?Qt96L3#jqQ z-1Fw0X$65pHz>jNebA_LWniLn{?Jt0(sDY~A5@z&&-Nn)XN@7p0iCF!T(kblw{5db?0oi|1dHxID z@xOZ}0RsO9MaGLhA|u=?#n)t4;C)EZS3wVHn(NU6k0t8`r~+h&`+-NVnlZlVb)tqI z9{ja81;t-IbGs$%xwiRyYy86WlmfWM$&*%qA+X$p8kL^d?)FHu!2D)!-Atwf=bL;R z1T+iu9mR>i;db?+OVcq-0#`uDcW4(VkvGy=e`wrmz;wZEB{C6n1TIcl`bTD!kiLrg z_;b3A8HkyjoW==r_u&G7+&mrFFHi9jYX{#7A-66o;pIeywf21n zn=+keCKi>SBDigM(kRpV`#~Z(O;64>Pl|x4(ru_BX7NIB+Yt9@PW|4Mx-?cQC8_MS zd}Cd#^g@6aS&fgU|HHF7uGGn zgBrQ^`4r6cZpp9iuPR8X+>SYaei)H!ow>qV{8YP$rP=_7w@K1H1pOLgB{E$WgKQDe zO=mrZem;Ol{KX<|AI?RdS)^RD9B&{bcy9Z#h^iop{92kqF_8wiXuNpsRgP0kTY0fi zYFSTxNJ1I=;Nt+QP+Mfd`c*R5b!w5nh`BD(?MuRK=%Pu|HtaUU!Np;f_Q%-DrCr+C zi}*x4?VIKrfp<6_%AA6G;qwwWm&@FmnJJPeUZ>0G_y+k1t+}bnjPaGIE%0!Q-U(!= zWY9n?rE73zqrREsf<2KbO6;DB&pT-OM+QFv$&en9rb%r0Z+2KppkzZ4Q5!k4w~tOZ zep!_`kRa3+?ar6RwwB8F!D!yKc(H)?Nrek85L=X}dJXb0ytC=FqIC6+wPjJ(DVNC# z)HtN%ZWr74H%|r>1Y;kv@tRJa<@|O?Y@=W3C?xe~3)YUGY`j)!t4aLD`SE5+vUZcY z`irZ1zYP;8EE<^ny&QOt%!td7^tLBz&8g2K8ig~6W7wJ;MyR}_svwp-)x*!}ms6OT zvgC?xB8}4w^5^$r`FXBI#oEj%fwaH3oOJhS*d%MzKgkg0f5AFrPC*Q>fiBc7ctSzU zvUE=Mwk!~YSl>`}DsV_bp!seDwZrpNZ$o=f*W1XSJhG+@Qxog*$PQv{sj5HATC7gN z&;G%HL*;v{WvaD$C9NS-P>bdJO6IlY6=iUEtf&V6cWtLFfGCWLCVmN&!|4JyNIyQo zCCJps@)7C2_`B+7=S`T(Fqc!^)w8{=HII)xzXRf+Ld4*hDI=`?hZWsjGe7mm-))KahCWz3}R1k&5d~z#}`oi zVy=vBksj)eh#^plv!>FQb;^spC3#HH8M-LkED z$#&QA9jN*ECUtHOuYtbMLm`A$aJh}i!AVw-Qufat?ytGSd+ zhXgX#YY8H!#Id`nuV3%zAwgM56|d8)HUHYeZa{&Nxei^Durq>3RWz_}$pf81sV$bP z&26b=5;T7|l7+6ogy2s$IMSFJE!YT>U&iZfbwcG+o_yzVsA`bqNLdef!v3twXG9_DbEynAb1QqS6Y0Fc?kM1G|B&Nv;42uNdiFDS~lPXBI7&Ga!)VwwJ; z)>0MC>u_Y7(KM60vVzDj+4yd#3vmAGv+wqiGVN~ypuTsQqKiDm29pT5d{-EE!vhfSvGwXpi((^n^| zSF!5L=}s~Yj&luPqFim?esxExNGY6Aw5oe~c=@phb%}800&Z}&8nG1DF4L|pHzJ~O zi9elX{psDG+BdirDb?Rs8*_g`PB$yl?8dcjq2w&RQTSOqY3IXRauRvWvWva<&+ksz zPH6lvR1WcH|5^dIvz#irOCaLwRSrBbT8FR%*`x3xXGBIDoO!DoO9zThLWYZd0;&hH z?$h=XJsK4Gm4+K^z4|IxYh!Ef&PPa- z6FUv>wxY8nV96f?(g;Pl^tWglR(+CFlz3gzj}-Dy%TamZ8*P^OQsDDVwvl$3T`LY9 z-=hWTMGXhaoE<-z_mh38j~`Vaox+6EtnbgkotCd^y48v9xOzYu}GiC-4ssF1$1FR_x?^`&T2X^A%%!WF^~jz z&q(~!G7IIxdM+dSo4V2SsfCT7J-qcNjkT_Q_$xU!X3dRJFHEb?->s*BxQXZU+3~c$ zr8~iKd5vcfq`RBUFk^e_Trm|U$K6wR6B?M)Ot9#)QCpj_@(>}zuzdY8WBEd0dH_78jY@aR2}S=7)T%it%35)Qj6LqA&u)YdxUi`#e@irl9v(eO)jiXG4|N!~c*cqyi5uU@ldaqF_D^YC z{VqLkl5OEvaSK5|V~)o!q{b_^lk-(}dkSmk zclAx1GzYKsd1o)`I+axt;&B3Ll=MmwB-T^F8Z>dB^WEW;t;ieDkkwNR?kYZVA{=XIR1J` zkNA3O$BQzW<=(w@Uw9Z$lnWDqbv%W05KIiH7s`o451yltxWESoW*@J*Q%2e;qa9*M zi;i&Tv|tO`7cmxe{fVnZFM*@fU>#b1iPw?lL+B+{E(_$ULU)&GmCcuPg&S5mo@O^JWMz9Te_~Ko@i0!wP*X2^48Z+6_A~@5b#~_&%`{G>&9<_x@F|7PrOJ=M2NR z0T9EAqfXr`fKtpcmd>jgXJJoaciWB0s1&NC zrYs*l-sG!cut^26xE!5-WX~-gSbLbvRNd4~w;}dcz|cq@yvXN9Q$Uo(qThH!W{4~J z`E{-Hx-f^aW%(0Qmd1yFWH*5Hu*0JKgmPcW5W}5=i&oBgpa$PA^Nf+Y!jSo9oeB%y z4p!$up1Qg49q;-Ox5{AXB{=?pvW2aMdoFOYSJ>YOES1mE|Niv*v1%Ym;)~5VLysR+ z5;$iN1o|g(G*@t`DD2Uu5pMlTxfwNI{EH)d3gINY=>>DD_|E`Mqc5*%Qa$ar$p4Y8wVcA=;(;ZH9Qd_iAMvp`_sKT) zK#xvY;g|J)Bn~$;{Kk!9dIyMwG*To3`Ws&?`bdzfHdFbQknO-^0{ebcI+5P}U%xLc0l~^;kL)K_Zg3j2P zq0n6ZW97wleyuGJHUU4G%Rs}|*x?=V-UACWk`N#ak-+nV_6dM#eZ2{{B*wVRsP=Sm z;zQ}2j)j6|D^{0uMqG5>*xQK zeV+%g@2e|{4PTj&k^SfKzq0SuWTh3JNlHtKNj-Ze3$X9ipNc(^l$REhR+m(hRRd7? z>he+m`(9S^f7ti`!@mC?_Wl2`@BfE=|NkTV{{QVRaW1_Y&`ZOGH|RBO;{Gy|ce9gK zt=s%uU991djM%pja~zz5G+m{VRE(@zE{f}(@4ziBoL0crmcE#RnEaTGh7axz;#LK= z54@~3(wKd2(3ZAN7?lmLySbumsee!h{-6O*!<=E!vI)Q8b7|^tM$0K}i;7U~W93ie z_Vutjs|HoutHX=hXZ*|uTiKZu1D7IWE&KpK6|2|LWGcBZsG0R>UI9t=ARi=ohQ5tM zFsh?-LP0csJS-ZOv0w8L|)u4 zTGY+v4Q06d0>Fjcd;?Vx=|wog7C`2BwhE_Z_9-_pnb>VUm)M4{HNpI7jVnf;!`_}x z(7`dp>MCHS=%JHE)>7?Qti;{Mr`90ZXMt?2*}v8(tCL~#Mw$@FJeG2eCKk#MaUpeh zIyB9elP52QYCj%)RrioFG}p7Vo;#oE54)tUO>ejT#KxH%$Xoe8O$i=A{~6q017qyH zxV~38AT9`&B}+=ZKK#kMG;Pg0A*wb+dj*leP7}8uu@zqH1~UG1#6I~d{rF~gVz=qh zY$;I*D;`_mJcJ(c3373$Rtw@8ew~yiRs8OubZ@7k4d*O_4K@XB&;g>Eb6S?gsuV@O zoyS~eb#t1l^j?nry;UGSM@I|J^(-TSIK$o0?I?r%K7gu>=i$EfNyfIv_1;BnEB zri*ARf%PI{Bv^px7KmV~4?C3R%atkDtO9zfpY7MNipAmVG;b>p)u|;Y{vx_00%F_n z#u}wyyM#V8vA+6ON1E01$5E&uGh2SrMZwzjCKVmCTJbp23eT!+6+=XvY%rYS^w$dh zdkM?YCS4dHf`z`G#nkfST`P^++}cC4iy3i1xOAQ$yD2T9le%6T^r`V+Z6PV%-Mje$2%#q8u^&Bvpppn z0V_Hm@%cwat4e{l>@(D^2-MJ+JiUI0RJ37gGp!XKd65t@rbArA$F71|AQIVP482q{ z$FeKrU+uN)9$rtKmc2nkIn8~OX++VT&*$_83Rj2?nSRoZlWb8Qg|m?ZYFV1GvHEhW zbX*cXtmKSNCi)X(&{_(?$2NviCa@E?FzV}ywFLpl)V?gykF0Z&7P?JTSh-lxo_^eHI(A`elb+1JM2ud6XMUkoa3dRX~zk>5apbI{eh<@)f2KSnQ=6eRT6n= z*MTpP^wHbG)<{<)P+qkF0Mh{`1U9OiF0-Wj80Mr^6;lVKhnr1?RXAm1y6`pOYpR>g z7RQO<8#J9uWWBkJbDdA&yjSyDxkGQ?%M1N)7g33W`2Al)F4$&c=!<8(4|Y6JJ2vuT z_!)R;%N^LNQskdni6vyRsxev@rNC7^olrZCN3E>4$>5L6t(>(qvOI8}26~67BV)sxi_o8iDzaOY+{_FI%}Ptc-0?lBEz#tc8w}=E1b?ZCub%YyMeL z*1~RVkKHZS(cN#%(z9`{q6W1v0ay)7FM`cwSqjS%-a^M4b4TKNU405vw#!SgW~zy< zL-l3%)dUSmWV2DTGADsCFI9RGP>}FgVh{3v5!I#+Hrzl4A1$8pgF!w64X<^K8%1N7 zUP)m3mw(o%$`al2k_6$9@4)tv$f)U9$80L)G=6geekj0%2m8=vGd(^xWWgI=%}WUT>E{|0_`G(yI+&+4 zrxrwHo?U}=!)gQ7aIh#6?IqKQ|7 zE3B4ttGqO6o91+qEIYp8=*GA{#OYpwrBJwEXB7Oaqi+DK4?Csehr6^UAFDuKtwj`$ zT`$IOV|{Q6mql&lctF=M+l`p4g%rT=a|y z1L*^`3jaPEOYqL@nSXnuYfU0ki$pZnhDc+#TEX`a8xpk)h6t`2BQhMf=a7}}*-Yn8 z*Ot1x{8iWAHYAg*?KyO(#*tCv+D7D^P^pL7K)F_ME#}XbewKyHa#d4^n}6BqJbd@* zLxohG%(jC>IPn_(mk}9Rw)5hH!@+EklV;q1#zTWf2#<;PTk#8P#6Mkc;;R%yNg(W*52_I6FLpjLkuP{dN?Pvc!xZMV!PzPQ44cz%P3 zoN#pFC(|?@8kmco@?VSZsqr$qjAE4BH{ADM`drx+Y+G6E49_5?5!GxkQ8+-*%7~kl z&A=rVcM6AXjv6O(gszE*GW2TEi>sQLpo!&+rsr30S#fG*x+#q1a`iy_fv*jeiqLQ% z1o%hx_8%FOdcO6tZ0U+px_+fe)$lin z`fFcNo%;KzE<^%PwZr};=04G7#T+_s7viywGXHHD;3lc=OFiL!|BSPzFsq^ekBV{>$d=G0IB>*6cIcto`A>(D-dt5-I3Qs}K-ltEGEf$>x~ zJQ=IQ_Q$2E!};q~GmkCmX78HN4%N`o^qX!SbNu-0tr&L-4_D=a7zeH2-L@sD8bht- z-gQKiRl33-dn>5yZZGlo`6J_UY(Fv5?VBQXuN=kE`b_O6Vv`mxk$AO#WCG2PS$jo+ zGd8aIn`Hd=&jyOTg86WTB}&{lHfjpZmuMuLV4!o#e&auAaoB&i4tzvI*ZX@nUFyS7 zsxayNYU4LSdBG;?<<#1>Tkh{>#koakn>~vVgP;e*;LD~K7c74ln0{Xj;J#hVs!#j2 zZ%(qsd3r?8+kx(rR}MO;#J~THH)PFxT4a#X2L{5?`?KIzc&C-a>w(Y9+82-j=z1@EsI0G51j2jdpTBUDN-=`v;-y7)9t+X zifjIv>s77SZ?)jg`297St4@_Wikr>1hwOe{@evtBM|^Y`l^+o1r3$_<#V!$aAZ{}- zM~~QxqC~$N_wB(0RiO-Z_xR5Z9}D1kvYiI~HAe+cGZv35_(vd4YcP5Mg{pTJr)(M^ zwvWqIPG=aAW(i2v5ZxPopKKs};oC#Wd7E7=Zfft%C!wQ2rZf27f9iG(N0vNK58&=NmApH~5~1~m@keRPhK+JfKO1>Z%p)rE~jY_d66ZzcV!r@y0E}9>V>Vr&%`*83lliTjOHA4gO0l-nl*r>e` ztoTqnf~{srxM#La{?FaY3BM85;7W%(nElH;4MzboP z$l4Gf*ZXqF)A-BzOfvYI(V1KDsZ*cZ0UT%=n_DwsIjcC;E$9-f{e2>4YieEwEIdnb zH#BM+>3arQ!^q**V8~m%L1!#NISQpJF_O7K}&@G~7Q117cNLC^2osBSl%K^nEzM2RdJE;%xqb$ny zi0goHDKK0iTv7DH$(jm3fTifZ=uIu?bCZZYmG}KxQN9t`E9{~rnhX$mH*tRjgr0!~ znWn_I_&~Wpb{ELd#V*|0`i3Yz4zDS-n(O_l9SzRxXo>4M>*0#uX4}_&))E97 zX}WKI$FCWe7AC*N92HyMF~S<1K?<2?SA1Jdlg`_8S0VxopSD`w*UY#twBb@)(o{XA z=Xaj-LG|Wk90HvOrNUbUpzlGhuL5@9OMz0jO;C-do%cn^awg8!N~& zO~N|V5%#G0F$&gEZ)~IdRQBz? zAxI*dXjuXD{&%l7rR(2Az=~Z*8U>k7(+QPZ$xTLW6?bZoZr&(OH`7CaadGRdv}g*A z43ivX@OEurx!Ap$$<&&r<-v<`n5(bHTzy9l8w6ZRngKJnNarIy`ThM-H@<)1n(NZ6 zrzk5vqP+1uOgmO?kmF1(mr4&t53V^RGO(<&B!E1?F~|67le9b6bY|!~yI#ilwl*|{ zJ#%^!G+ZS~=YfQ%a@G}VEnpBjA2=8G(`{UVB9@r;up>8v7c&wvkY)9Cij3{&-dYE` zQ6zzGa{hbQhWXUJ@UXZwGY1(geOK_kLCJ62D*O>Y6)&alm1N+8zEv9Ib-ldUk+4M4!sOK(qg#etVwLTu44O4;ApPm6eqvE zP-rXa-t|Bvi~GR@Nr>oMhn4Hrdm5~#WHfDfYWeqt-lG>IvJT&D1l6iUq_6icYDilz zVZ&E88!-l}W1m#%cMmzw6uAcC#rYH>cjnuZkvq(!dlkI4X|i0#5isu+F9y$#A|NP| z#6mRK$^ccd`5z3&QFh#mm37iQ^d^~}kXjV17FAku>y?GnzGBnHtK71D= zFUtDVotE%Inmj6S@KFqEA+Z~A9rB+jE`~J5%q@?XmQT~Ik#?%qLK_r0-}A4n8A_>< zzDAzgw#($H6VZ2GZ8^fZ8+jFxCxPc1BQi>zp}{>M{L5_6AA!WK_o#Fag@AKDPW3w| z1-8)C7#pI=#%rv{n1w@j$Op{#HBd|qm;oL^d}lHBVoE%|=C?te(x*5h*Cbo?Rk>nT zS6Y^GXbnUZpX%o4Zn3kG`_3>XcZtK3+MC*v{|D!XX<+Id^AZ%{jptu=PbcVFsB@JB z#%N5PBJLX`XS8O|X!i{z;reB@6rot#fkSIQf(lW$X2$mxUz6ji&Bo1%Op4Fw$X6V7 z@m=JZ*-@;VYjWLSGNM9r!yjT$+59%`gUCC6QoUx^$@h%{ zWGT*`!21x$@3>?<6Lw~DObzey2w9O^5F7i!mrt^DJoPrV7Z@HMx~{NL&$-#=1YEXE zOKl&5eP9A2gT_0J=kevE0TxoSSBEqnU3>4l=P_(WxYHv|S_8U_two*jO#RDVoTU zWn?13NBF$Z886m7w$G?Fa_|@s%UXe|4E90@pg>*R7NFXfg$PVMm({S`_~R!(>!0ge z$Gj`uDGRIZ+56tJ z)+W-N+Vo#vp;??N2e){>!|H(vp`uPy%@&b<&E$Izk_#8U26$#I7*)4$h+jYLFxLIF zhJ>#_auqa}jIYR-sRwT9ZgZH&JqT%ve4zDH+Q&7+a`_qiiwMNp z>Zs!^070CJn_bD-i$Eig^|%;(Le3qCIeOoopo{q8lzkVgw-lFAxL=zJ+;(-f4NZ6M zGznO~;|oZNu;b@2SY_jhJU8LEa$4l5pQ<0Eq#w&qp_p#+SE9)7(UxjUz(KyX8kfQ|alhwl;j%}=%tyrkg!obSCni2oA zXLYT~N63Q%0Usz3FlgDkXuiHHt&SKp1&d~y9u##VkTYF4C|+$99AV*D^YW)vmt2&a zvhlCw?SKGjmAPgquw|-7q+@0@7ymI;|4mW0Puw0rq=K%l!E5?qKxm54!@C&a@N)E| zgxI6s)s_!5X|hKIhWS$~*xsc6OjaA6TuAyfHi+PYdAKR98MK#rgd^xh`5PXy2U~j3 zj!9(vwfje=$zXNwnBCs>V}@{C#Zu3ae#mfoG?wL}Ras;`)}1($veoR7>Qq;(BKyZ; z>+iR|>wOH-@en)P#&@~4hmJjh4oc1oZA1FeEt+lwbZ^4c`jz5#ZoIzZS!`8@I>O*`_qUUz$Psn(VSG>D_Jr)h=cedf3A~0{80?o2o4j zk@YCXF7w*Gf)IZEx-O04adOWhC@2QM{pT93sdQ zRr{>Y$qiUBXrE`gQ|;w6XGoJK0n;Ya^?x5upM^61(0H0%whwCVEON9lJHA2FTA|7qDTt&Z^ktKkzr zdDT8ti2CDM%ZRRkvjjG~{@L3<^L0Q6>;t(c`6YgWAVz!z6s(+mC}Mn7jO|;dWWY3C zDn0p6E1o1gCXc7=8!xSX8bJerGHKClTWvvf-8@$lgr9`kw2 z+B)fAGd(0zM=($1X*CZ*#}K>mx~sZ511HV+O-si$KqsUPSs>h6`OK9_o6PaI{{BC*^w1*mcdZV% zVhklj?D7}U2CX8F1XUwv-ies%X0VNTjn|#^`%fuuQdP z>#btJd(HWTyvv9R+m_3C5}VavGdq~tOu;&E>2{{K_Sz1#_S6jm`_C04gLZ>%huV88 zOBcP(N#pOgZac+YcYI-}&Uoej27c)?ltfh zo&BF`?=-~uG7E+-0@nacqJ_gQz<1VbB_|3&`RLoiN@i1yMQxRfDN7wQ?(k;~J>~s7 zE1B=m4`|B>*xCp(AR#hE!-x!LB+6J1f(7Z1A+#68t;#OZdC2b4L>c(KOV?>tMX7rx zBmQHMRL`Ref94t7BNP96PqPQII-he9QZV#p&PSeF;k|kc9B`2eiv&bIg{p#hYdstA z#3M%{U*XQN`QNmp5t<}{&zh9^A_`F>?w(e2*)>-#Qw+gxLxgUEuQ|`E0o%R^Lfja zte>Hv3HKP18B_!a;l@E!Bp%{3Ea;Q69HwN;8h;<{!N_ylrR@=zc027Nv{5_lmbVG{*6Sz{n;_eS?BCgIEBlGD`z>JJXj{ zroTof4_baRe|$Mt`*d!g_dOGuD&sO6c!LB-Q%ZsvK6%$D>db>HewgeR-?&9@#ARpG z!?jEj6TXyS?t&@W!>+aJOv{?;AdZe6viOR2lS#(WO>e<#-+w7Q$X~7eve5mAkke~l zR5DiV9AHs9FFi1!*Y;-)(8To)o51dLerm_K(a(%ycy|0>Y0$b*9a1uado;bT{6# z8u4Nea`En$t)yXmn{w^SC?%x9BW88a&4SfI-fed3V(&fM()-_C&_nm43HH9dSeni| z674ydml6@&%zEy~T@COpJg_%Bx~lZ)&?cJS1qj~%(8YYJlu|#|g1({jfHxq}F;am# z5^>r0@Y~FA>N*GPorW-{YOWJ764%RIpUkG}nzz75?kUwp7H7I$7muASUr{#2mfBcVl0T^O!al))wT;?O}b0{5I zW-MH#wsFg131rny?3qdZ6Md2y3wKyk#fG}>Zv2ThF}ZE9YP1@?S}f=gTw0g0QQZjRHf1`6orkv{=itRn_odO`4}fCTG-tlrdR!d*YMKZt8bluC zTrnju6`o9~dGJ#a;Sxvz0a;aJwTVcm9X<=w1)iyuURoDzbHmK{3XT&)dQu$KxWgKrlc1}~HcK60(j3O58#7t$fGPH4Sf`2@0P7zc-d#d1X zmr#SW{Hhnu;!+0C;8s~ZbCufK#0Bte;tR~uA(6RaVdX6H)JrHUh&rw5v2Ej%!RTQXmZ$@miV6;yr?S zf_+19uuFLXa?uIvnp{5re<)Co6z{xYPv&pmn>es}_sW=;ok-^OJB9+?K!wEz?sSTcH-@uu| z1g5z+?JSjConp9~SDRH^Dm(ltY6o=@%s{cLxkHqs?_7ke!`(GF6HxCQey_y_my@g z`YSa9D@(6piZyvh8m(^lr!-u<;I{m%W0-f&(0bO^GCJ#0g=(%|vOsmS4lgi7_v$jm z{MwKJoZ0t-B~qLI!e~+Cs`r*5D)>gXUoHV_y%x`XmIee9N*PlYbT)-Tbqc@IqM=x> zZ?bM2RbGhz5U>{<-Pb3lDlv5$ue0f!ZJES0oV?g|KUt5!F67_e#odx}I1(_6sE+F1 z1<#Xs;w0TVN;~DMbGtV={+P|sUd*F#toZl+|HYl1peVLM^v@a5A5&CYSvgzKdQGzNuR1R z4%>Zy5h1FlG0pis<YK;frL~=17XKrAcNk})$b<+**uz-H@*$#?y75n8w}*GV z?{uj15fFSV^tRmWQvrmP=4rmv3Fi?`W?q>1$*l#5UZ050=y22gGC#U{A3IVG8yh z$p+G$FLB0Gs>Kq_zOHLHZef}eiWh35g&FRcxsL|7HU%of>u9h@00a2r0 zKp~l*`94t7rb>F`aPQrhbUHNuUAmphLX8e%fxq;^4hI(}CwxW>bYQc0{%RUsVMsdKUY%OQ4C6 z6m9S2<@87_O+lD%%!1SabUgD(0;EMuQHKZX>xc7RE7cwXA9Q|mt7opHN;V(rFIA2} z84SGGg|(?V7bIsmud{70qB#RJq!$3X^U9lhdV=g>oL>W5@xjqHBGT0P~_mYL`|0P z9`=v`k)5LNsmzMd9D5HUvQ+ngX&@nGfro29OI?ZsuW4iD;@(9-q>BVQ!P|;FA8X>> z|GaH#$Y4{`_JJ-xj}#rOD82XTzB5@XmV$I=89!9v{2z>|1OU-4{fQ4=)!XVFP?Zei zdC3l`wlq_!w(oioog*c3H6mL%ej@QpDc$Sz?i{?F0WRyZN|m4Jh_5FE6K%(9vtX;J z@RNP!Tgq~P*Y%C~U)yqZ=_UqL(!(Ub60wQ_rX|~y5ua3fnVt7hM3%ADfcC=#9)=b< z!Yhb$E9PY87%!oV2?*Dl$#f?zzyVBt#B_Z|%S)9X^|Z(+DyZWeD~?&_Os)@%zt`6* z5re?!xPSn^tUH+L@&ALpw~mVP>-$EBkS+=7kZzzkq^Qf(5Ul!(29=GwYQ7$)ed9x_2Hu}rxM@9Ryu@O>rY_*VrGrx z5J;6@y_CtV|3}O`l7ogbah0#kb?vPwbnql3{1Wi2MfV~b zAqA+ITeIb=4FMWtIyC$V5(W+x`6+-n`7;0%IX&g6(&w{Xlu-M{c%>CcUzCC zWY?l^mui{#3kQ2I@6Zo;4>nG~hb*_Q;iJXPV#%AX^g;vc57sJtDou1S)5Sgs9InSc zCo1C&A$a~_eV2uv=j3ms4Z_{=Fk~Y=t|Q=JGh8sx;dTQSC;MdK&R~iqTeUr1&nQVP z^=J2BGMD13)K9o6)nD5usL)`oca+$dKXT8kvP}_$Y1vgy1-p=OVXbK!v3aqMtUzKN zjY^}VLojBBnuGS<8U*-}?*|h<49b@!q%>hXrSwcTwqQHr1R)tLkdFquTDIO7?mv5clzM zA|`XRGGc2yCMBHF&DA%{fZa6G zlj|&!N9Jk@g2NKj#0-J=;>$e1;;{q zW)@g7c*ZUq*kLwwS;FTOyDy!&I4EZ?xY#(`iuce?(l6h|T&CW~NgJ+rJAat7q~87T z`Uho^hq@Q zk`RB|Ji^lj$9~C*zxdddvp4L|6E6C<~XmFZ|H`fi+WV9$dU$q9SS_UitPXH^oDFKQ()_kOUdJ5gDwf|80VVWM=6$p zQMC`gH%M6T7A_|Wb-6RwJ*6n+PBL5UaauE)HiMMSr5%d;e>j(`=JJAAfH~6~^}{D~ zM;BYV_t>V`-)mVN+keVA?}b+k_R@w5+{cSZ^iaAy8Gs8=AQ?{K292}l*D1mEd6)(w zlYaIAZG&@_-z4TWb1->@mX(99|A6{V1LaV7jwk#&w4tw&?fl)9PFjRz!9_K^UXwCk zb9}v=_KC)9KK6|sabXQISmCaZTQx1p^CjDvLy#AId*zujCqVh{9{S*7@G z@Xpj%n!qs;#D0A4Yxh9>uc#Kk#^}P~X!r$3RuKEQ zA&3!QgK$xRBQeeScv`B75^J5`$k(LMMvFHU^*mm{2(0-2|IOn5VtKSyv+GglFmCpfoC z31eY0G40;s0;9AQs*e#r!j5>c7x2-NhqIc{lR=FOn8!D9jyM=?S~|EoLK>)fGgrDh zp7XOBbBeW$e|#@xSG9g(!hzegb$MYlxBArNUgyn|1!;a`T!R>Q^MIeeEiJ)o`)=Z} z3jzJ1KXM-|!~YZ{Z&wQ=IK(S`R=*vEH_t-MhwtFF36yk2>X^ zLd}=HLunKB>=7qRs4o%gRk>f0a{a|h zW&)E_w5$m#bxNKw$4uw$p$-an;1km41>!z0S|DYqdoCwJtU|1}ZoEb%;FF37jUS|^ zobAlYT*HmvR52B^Qd4F0bqOepYZDBNeCj8?2aT?`l*pQI53TMb8&s8A-m^jG-xFkL z>_WWgkKbQ5cbGPEwvh!bkG#BQ@977E`g#9eCjiFNFIgXo>$*;RjInCL*3yAub6(JXOruIM*w|O#n|($S%n%DTV(?IYG%ngflKLh(Nab}D!L@Ox*p^ZkaJ&)xR*_Qwngd)c-LV;P*` zoo*_nMd9}p&z2N`l`PLBlAH3I#w1o~m^B?Mx>N*(Io}kM4fT(=*~_f;@R5D100ce? z_ZB|66fj^W+lw%O&=58Ic==KEbe1K)N;a5!WFDWOBhyux>_wbgJbgKAyg( zKX=j@O8IJAGtE_}YNt)q9a_&6;hsqrU$9dyDEjlK>o`Rf)ilN}!{hT5%dUeAW>ML_ z>+gUbqo(u5pYfppitSEu))8=K01NZvgBm)qQ}d-~K4jv#C%n~4KVAjy5@&7?i&3n! z%at=W9}i5@BCEq#F3pf0_Uv_l6JTVM?eSZ_+7^TRa)%5_oR5PiO8i(mM8-Rqhu6I* z+)Uu91R||Zd^SI~Z@3w2d*JFY#%-PuRf(Qwb_xIX_YMkZSbD%(*RDpYn(btJ+u!tn z`T($9^(HXVM7V;N2Q~wsj07%vdma2age@^{RYHl1I&yiei08QQ@ia`MZ0aGwlme9T z<`Kh~SI`G(E5YWc!HY7VTDDsXMk+ikCnp=o#Q_OysYtHFp+%$9`QxP^Vgbl?3)>OXf0D>wB z9HCVa?G`Enx2B!^+}x5il~{!1`;2WcS>zvC0A5_iv$@LR6H@9<5Ie(NM)Y?PLf0mmMM zDg=aymA)MJ@cGPEt0Am8RxYpBk;7}*LRyU_KOJXBlTpX1z+TD`*1a!Y-LoTRSy%At1*h5ACMqvEM*^uisAD|JkfXGYPC#Y$&7e#1ipJ@EM!?5zG8Zm3^BTw%p z;R}pAc#1KA7ZR{K-$$Y_!+-W-OvZwn7r@f=)jA$k(*9GdLgOOV()#b;*!Nj9cw{`H zBO4b;lv__P{_+6&4T2EMSx6-uk#qz0sL&F@zfcURyrPod-p;(u?mj=CIhra?^%Mu< z7TZ(4{AAu*np@572r>=FzEpz~FBGd=FU+Xk7of;2Oakq6XLr(2n;}=ME=%CW#3GHV z$1k$hJ3M@7`RR2rayEOik+iqKI)wi|oj<~^IIV{(&wS>BkpEbl4ENp2fU^%{#+s4B z=}+;)6q`?XHN%il*IR|~RYu0Y);mT#-$UaeMxs23;&0T=vQkT`LnZ^htCJbE_>z+B z2vw&N`Rl5kjPV=`<9*75Qg|=LofV=PoB%#38rn6O;RCa@cCgcle!u>z&~UZ#IMLKd zM<&_bf|DDU0SkNi*2Smt!_I1xhhg_2taN!6RXeoV@WMMOkV;*4sIWbcC-vyO8^y=h zc=ju(;*P8Wg?osAU!o&g8aL8GE@mwqe*15FW<}{OGc?C(cy)&whA1BDKk-K2mC=!x ztw(O02s}pu!~muzxvc^8sXfJHos(RHUXhaiM~W8^cjW%2(s5J|cW1_B2f82u1T6<$o0VC@dXpR(oPn z<)c&Mex%d8*;^JN%ZPX(x0Gi9gcQiQ7|$Ejj(CVfQjIKlFIb?XUEASRq_qRkWJceS zc4kXMRbEwVBVoRfvdm;*vcl{UIRn$1u-K!9Dur@qul#!VGLuiK!Hw;Br6=sYmKEAh zlHT~M^oa-fhu=r^_X6?c{5`DUvt|)VTk)YZegoE7#@>fBGp{A-uhv%Ojn}Zw!Owv} z1wk+{&@BT?%fS!m$|f@}vWC976R=M}A7xTtl02(@oPhli+$m?HB{qq|JqGMxcL3tL zWc}va<(WE$W`-AQwEpFdldg(+&2pkZUhmt&`=vXL8EJ(o*Qd^N?uV~F=vK3Ik}fDa z$Z@dcf9X%4q7N~OmYc~&;a2&{T-}iYclvO9)|EU9siWzjMEimaeG6?N#CZAQ1pAK6 zz^Hj1h&_*4f@Hmo+x=GXDxnsKNd!Mq0hB(}11tM$7Q+5dan%u%)Pp(+6?V}$*5Fn8 z{rI+&!(pkO?ue)R(puF@UU2+FnedDDbZNvh=HjscMjf-ETsHfGG@j|QbETELAV6V$ z3Vo~r!*}kbX&P=-147kFyHx?}q^X;1=w}DLLY}PlKcLqRy_2peJnUP6)ld>-4o8v* zkGBY)@B4&dYLmtEu`aLqR*$*fpYfZ<1BV=&)=|wKXs*)cE-jPp<{giyRTzG)Ke?i~ zdwJ(`Usx+E64*Of4C>*jZJF157UDs6@O{bUQG6mcb*2fIF(|X?e1dzIwzLllnXayw zDjv4(>z3+ov8Ha4vD;+ml#B0@!5q@kKAppAD?X@Bc@@W*cRV7&puUSkJs^pP3!d{p zvH>8qqpC=<+ZV1VU~Kfk4l`Y<72Ay+~;zBQ@x2)-nCMmL0Da10)-}Zv&Y-pHD0c!>p&^a zkatPB%-RNKOR!YbAJBgHn99%JHM6TVK!Aw4-ih@_D+I8$Sr6WuTg|u`7@r6&uwIG# zf95eIAUjA>esGt_>hx1F(*A7@4#>_rQC1Lm6cD7xWeRe9(<7$n(G33%V)9KBPb$lc zmX6jWV&((<_s&NpEwQ6oBtpDCrxS`S%x;*)J03&5%>wCey>|HN3u2pm9J+|R4pi|b zl^>6*?Som;1&9xIY(53SHtL5kYiv^W19~+u$9)cpQ{z@7)Nxvtb1wsbROAmx=_l_B zX{(q42}e3w-Yy0)^@`YVwNKLBWkvi-TX2=s$YK4V)aUyMV&Qb^w-YTiR^P2Iny#f# z92XeCTSR%^)fKXF7#^{hzR6#{v&Ib2zIJv0$YBGnYLs)CHFR_;kcBP@1r5cnw%)He zJXIh-|CL`JIPcn3$gj%eeoaoeQnVTcJ`=hxMX@2R`-@5VWTjAJYs?L;T6W)inxZl2 z0xq*i7l{ezR9Fd$1>bV9wGovy0D*|>zyOEv-0lD1`g|tz@vY9d--f0B5+k>k&Ox1@ z9g`vG4+z@xO4(<4%miR;^hL7QJ)NXhey{hwGySxu#qvF>HmqJ@P_)L@j?NvA(;3gNC>^BYG1N zo3zh8+9v>@?;p@5J>rh-jxDSzFj_}4Giy|*-H5x_D`PD4qtTJ+5cW&g>49w_{T<&j z7!n{kHIjkV_e*^5d`ycPcD+xS|atIosH**ZV!K*HOXh4gtUS`2`6)-<#zS}8k@qABbNpT#^C)Rwg1M@)g z>fAlL(n9dRk);S937%eqt*U&r44!+QMfECLo^IJTZI`Gv#ZoLsQeTxoE+K+85N3JO zo8Td>1zr(_9NvMYPg{+FoU@6wtt_4P&&W~;O+^ye><>tn3FFF8nL?$cp8HwxTjRYT z2V)&vovdO@n9ueJRpN_0;fL?CW2`HIzM~$MoB-b@(blp4fhCe%&!CRBN- z_T2#~4bk(V;Rxhhm*nlEP`Ru%dg1<0^g5=w%$R(YBSd^|gBl|mP9`}RvBGv3*o?{H zWVr|xT;~kS5%SFCPamCWJ(=9MLd(u~L0g$?S!bK%Fj842t3WV>Qh&|0;BaN$IKceT z^W30!?bx0hA)tzn%TS8*%h=jLZ!PW#1KNQa`BYq4-Av0*Oif7#xGHofZQzmvdAX@p z6s|)y#~oSrdclfg#NFc?z1Jyp@BI#Vj)RfKDBO-KQdA!dhqd!@nFpTpn;bNio7f@H zCAj6ESTKzL6lVPH2AK|bM1Y~GR&&nZAaWVO{s~%UkvGN5WQ;029wypu-NKoUuc}j0dUE#wT7OW0q=1 zDjAjD@FdZ?Qj9~Jm-!eqo@-*K~D7Rt=Ysus3Q zMYN86!lS&l%ibGsCj+;{u$);o_;ED1`CvFxg*O$(LQTbjNy4+kxNN-6AAhSAO0E$& zRWU0q5X8`30mWIFo^+~(Iexo?b>Ef{)lyd0->$%+`j>AD%!@~l5BY8-?nLjwvJ7e) z*0j%k@690Qx56(HDJNU9kIC)7CqDKLcbbqfcJ;CTehlqhYy{%BR^gNYr>T_UV}Iu9 zQ_@$>?b;tdWq*{|Dz%+Q#ztsZlbjbXpDCQw{pA=Xyc3u+F#kDK?>6mg!ji}uTBEs@ zlN!0}O26*;g63v&qvGzN>t)7$NmJX-Jvh|Z!?AqANUcdrXhkhca6m`mC6Ux%rHX`T zVSzaV14E8+d6QhMYhUAJd;IF2YOG;+&V6L~A`J1GI9X)hw9EOHdVjAfRv$q9c2SD+ zlplH2x^TM1kmyO0R^$cVA%;&L)SO&hI-al~pTQ$yMFj|8(TdQ@i^P0?s8{3t8qZjCk)T#MkrPJQth5f;<-lC8GfVSglrG#* z0Ua%Sq;@&bvF){dEvk>qsuTg+U>t-c-F*3h0^)are@=MnPq1+6Ty0S$2_Cc5fr8Et zAmQ$=d%cVCm$Cqfu_1C*`Y~c&ONjS%@9WO@T+WWvD5-q;Yl&?;i|>Cx-%3IU&UY_? z`BbzbGU@V?_#x^?YmFl*;Jf|WwD(NA&nK}SI!bjxf(|?XVSFx(NLEJ@BFNSoPGoZC z^42Hgm$;AG+ENTP>kF;b5gzd|OUUpceKXa= zcN<6cAo!th&dOFaxtqeblvWfT1)$rBPSOw4H!H&_ktB>np+Z0GVU9L*)Xli@_#vU+ zVbbrm$yk+%ym~8kcZy$MeyQvCYil`E-onhYoj8i=-IJ$JF($@*oqO`eaWP+&_YFpt zJj=0T+y07i9nyGk>8&o-do5tqne&#lVS7%Q?zKd}RPdG^^$@Y>){S<~SFZO2fs8U9R#vTpef%oWl zwi3h_e47r=B*?mJ97zZsucFmQ7r|`5>dUUSqg?-MXP| zeIt9w4AmRsR@9XK!;?Idw#*(4-*`|Z^a*>A3=jzua4WYYFo1nhoY>6K*p~HJrn}5V zQfOU?@qx8Z78c9nufe-X<+M?!c1W-SLM6d=K2)-9z%3;86J?u{DF%tZP?nHSM&0D5 z1h}e)@y;1hk^H@R(AsYH#g1>nb~pZp4$Ew0m@38HxA3KiCdMIH+|B)J>d%y?Md>Wk za*KWM8-1`2Kf|@!!wz>6CN2~q=FQ>Z&VVH%570K?{?QZ1Z)Pb)tcbe!JYToB6wKfB zaSGV8gU(2V8nHlbD^CbrwHwo-9j{jTV{I!f*elpa`l(IPQ{b)AF-cu5C(1?o@EM6e zAPMkp$fYsxF4V2;_Ylv78u8QW{iug078x29JKIpieGC2hI9__>00f>VeiPAh+&7tA zO%zjOHqHelQKcSI4i8K}CPQJBmis9-nY8O&$V?QnO1;|(i70X%Mls*9A{+;0h&%`b zm(_qQ!o~hGJ`bC;9W3nIDfQYnD2n|{Sx&@I8K|K*T5)Av0=wAGT$yTzc7}j!A}fP! z0TnJQ9?=g(R1^VV?b}7D(@>ot{Oen?cKea=zZ3jv*1{pFIL+=4BOmdOM^}&kSn0<)PQotSUg}pr|YVm}>JRnZO)!f`fHwuNwoR@9C zyaFHb=pp>$(HUE$UH7}vx2;_37Ld<0Q-()pUr{o%-W-K`ontk;HCrj0+@=Jy3!u<9 zh~F4dUl0=CTJ6ZeXUnq=g`}<>U$dY9*IjDZ{D$13_WVZo8yW||^Wgv3?U3&!MhM!p zRA3(vXYU*4x<|Y;1E_)9$Km6b-u-YGqw|vA64z>lvq|slmhIic>LVPV)$CsY>8u=p z@IslrNu2O$R+{nl@voR~Vw9TSEJJj?f@;(msbBG$KE50Y>(22qu#q$Eb&*SBL<5-2 z>mJdETt?15YplN&z*$nQt90RSUvfup9q}llR(P0wwI%1>`+<1r{pB){zVGO&i4qpv zv!ZK!{MuIn@4c&^!COuW?8y`?iM}DM2cfw9rEs}~B0dj_z^}h|t$&JAqo=iU2VaqIJI^9nnjBn@U_T+!?n>evC=_WD!Ug?HNTX zti!N=2NXmkNAFegU?2JpT)>qS_D@wa{YkC;}p zUvxozeX(9ps?DBuFy?)(8w7j+Ur!*<#V<9pO%san?K*+Vt{8X!_?Nk2`ajd1NS4d! z`yyHHP=yns_PRj&gG{ikp1V66&n+a@U& z@nU0^im`IKnO+DGtQvRUw#_r$g ziQCriyb0W#B9CrPmU@-^7?I2YhhboRjrX&N?GIyk4cINcIhrnoHOk*W{f$t)KKlKe zMZj;CwQP-K^SFCN**AY;70QV$c67>2W11CJ>5fl(XP&I4N`3C0b7)3+0QmZ`F$0ZO znX~pc;uG*1?d$mkcnhOaEBu;FMt0@+Wq59!J{7zs-VKrl2D}HqVjI2IJ!0nxV&akq zZ^{(fQTrynW;@Ntm=XPb!C49C#h9VoEgY#cfq4e+L05eZn)BzWO0oef#9Pmxy48fBvh+j*cST_0TrwLq_V0A*Jx3P!m9zKZ@udWTbVgr`!yHq>|~I zPL4ZMCh#BD4a+Dm@Rrhf^<#_PqxAH#ugm)ZNag~-5^3PhMhaU)^1X5#%Zhlt(kZ61 z#=7X}$#8L0Oaz$DehMs44&-LR_GgfaA&bm{*D8AXn_DRvFncOiPh5jEo!~#9=XjhL zNJ#|rj^^xE2kAc=O1M$hS+FTd51(}hseXN?EH`z{jUyFlk-6O4Z5h$u+1qrKS-spE zn>R4|8a;9Brtmtfj0AKm*|%=v$Wjypzt#NvAgn0>xT8NHr+fDwT7;J=4zbK>oT%Af zIJyCkWEHILYNN3(a>CatblkDOB(c*X zYQA$~Tu&nb+ZyKA`_JcL0$MZDjy|O;PtAiUr2gfe11v{dF|)E>p=+ zxD_Yn@=e9s(?=q|UbaQ?Et7UeV~`%RpTC5p;S)rAjqIkcm~moEhLZ)VpY>T2ipV|q zO^_;dI=$AT%@(NEP@2{n-G?D+M|`fIwPWAa0;B91vQ(GYAuqfGQvwU;}~BOVIyo&q@i# zzvaIbfEErx61q+vz8*eK9-d6X{7*m9uLUQ) zj~(#U9k2!1d3yT-Hv-r|ECIfrf8}?8OzsVw6d>3Bl^y>j_x+V${!5&e& zbN#LGf3o=62mZZ2&@&IuAa7?!CtoH7;I7#-skz(n^D+rN5d@I1|2~6%9djG=U-$Oh z@W1zk`hh@CasK@IGy32A^nvScUI2mcAO80~jv(Mx0PPIxdfIy1{;NIU6FtAc$3a-= zpTE&acR(P(Rq*FB4Fn>33IbgY|M>$X$o@eA?V*{2Kr666eV~UpK-XKrKw}1>KSaZL zi1ud?#0Y%-0PUah?}pLPF)$xsVdLQ9;R6+#NkHgm7#Qf77!Mx&?VM=;KT7fcw)Ty8l&}|0x`Q3-5oGKg*zJXxP};_;~nacz9&| zj5Lh=|8e=V3Mez8KT9A23^brKVLSxMfi6E|7Z%ura^cQj&*ZW*WuNbB(Hci`qD8&K z*wcp=QTJa!qy=;rWcCcb#U1H>peLf)yf`^tDxkADk|<|7W+{PgXf1TAx`%P1nI_PC zJ8vvJVe+d~Cg14rQm4gcauh-1k=k;|WJCr|1Q#UtsJQ#wyhx@xy}DqJeDti)@#Fd| z<{3n~5*uH0F(zkEo+y>7@C;(fskX9P=vzsVlnVb?mSwZR7X{UBLUWMGqK8+6bHDsv zMr@U%btK00OIRSH$>t|-E@FUn%m0>nHy(G8UiTB&bz2YKwh z1=Yd5%WGv8ZObVS<%z|qK*aAhKasT?JqRXR`^g=fb8z(r6%=;rfek7K3nv_3JLt7bpn^qdttzeE&wo{n%`CqiyUA}+(-|G)(W4CqRn`aWvMU} z-v|H3WzNW)?|y{wewmq*sP1^}9jk8NE-S+I^Y?QbLIu&BCODgCm=X_KCKe8NnZWEU zLfE$PCJF3Su0u73?MY#9NVc3w5cT%PK}0+e~>PU{(x#;xD4{o-sb zlJ$Yb_Gu8bT)KX}gn-zpnY6=QvTsH1X#kT=Q3xI=h*mhoA-~z=C?mrwM%6Ta{ATv0 zrlaTNo;=C$HsjWToS+`1duf!;%crs9iiS~JnE;^A)D`y7w~wA-J`f|~m_pyGdM5j8 zvLk!$~>TYQg*Lg*Yp(nBg8;8Q61_lttLThpXjYHz$I}5-?y;HQ#55wlPh?s zE!;l|7FC|UE>99?jf6%uZI#@%-(1|#Wb3F-UpjZh(6p~_Jz}*xudoy6$pSBZd@DUg zxlplVC4cvHde2~`*3I+lSE=^LFEXQw9sp^w3gy3kp0G!e?V4|YQ@hXwJr_SN4D%bCt#0np8-&ZQSD~8g7;9gi#5S(vI6AqKcFFo z$Fs)vkmUo^?+>M?6p=IEcXwV!&=68gw!t>BW8k+-Z><%8l&r_P5#OK)TllR)#Bn^3 zfPzeDxF``U)0h1f4`m>2lk1#azY*jD_+Ze?bC+^H?2g+@`w-ta{2n+Ly}o0$UHQ2m zrg*v-)BjliDZ?L7v6(UT4Q3<3IwFkAK8^Ox={6|Fn*P_{C?cKDZfT}Vflz=(4_G#& zhYlcIE*XjM%%}ZizPRAH!EQ%$bw-bghtq@ku}dwT8wbwJ=|@-1vyn8zx8`=P>F?<5 z+HU6cT*Ewk&?h~Z$MMtXRc#_3LiA?OhwC>T4I0v9ZRbxe$V~`X+nv9Zc-^}y1fZ+E zE;idCHLsKP@$fGLbo|!CqLR5;;}~~%90RH;a)u@1`=DJN6WgnmF^LOlLzB;|BXAj) z(B0f2Cp1=v)voQRyQ}hVKrd zo3Y-xU^0+jhco$?pzF15c?xSsnW;RioQTM)d)@~`@w1-SCrvE6!9l0cs?U*G_A#K4EV&r8oP{$7?^9gmx4* za!<$)_T$KNbCMKbKUJ|Q>NR3Ht-g$~pI+Lvn6+n8?nHy~nimRGL4@-=8^5-PnB#=c zuqKnM_KKlE7*XSaR|zPVC!|YG&t5kAND+UCTz50LQfc1-x&nFN*L{mNo~(?=_bc*S z0__(b;Xk3LP>${goEMLde(xOF$&5CotYksiKUA2}?*FPi9o?ZLt+t5TkJ6I>*{AVz zAOl+(egjz-WBWAbN>lFythS04&*f$$XY+W?MS}T=hM*fonE@4*SwSpW^6J{b#EPRc z7`@DG%f;~>W#U}6ZX)9P0OCqrYG0(v4?)&fSw4NgutVo7U21Yg17EqGDdt>cVNsXi;Dt4-f4JMQi5o!aQ&Z zD^D4n60lox|5J8w2em^61!!xi>M_yBs38-go@P5O#EKmLjPBfp;(RU{)wEb_fM$}m zHbRw(dY1t@t8Ws^xk_fL%r``|o4E#X7Ms0h-vrQwpM4nqy%M}m)At~XKnUlT3a0VT z8~^4F7|CGk9ND#Vb9z4WIzP4%fwJHQ_-%QSq{p-ewJDO=#-PilExxLH$o*i0+i5yp0(3&^Iy>{Cl zzF_Eun-pOoGx921Xgar2xQBN$i$Fdr3{kh|5;9?wKTxZIwo)H7H!68YY zvDTi{Jg3+2rKT)<+fx*y*;%rARo&6rlHPWgTO1S^@?@K5mR7$=N^G4EOZ%2#>sEZE zi<%J?Dolx%xk-z7I;;`z(^%D4wX`WJVHbmpFa!OWxuTFDz!?^ap;`(#Q|t6F^L25^ zc#ef8$*~kE;C#7NEZMs*2RFyu%%{SML`%^>zBqpIWZY+sGr;nZeY#X(b&B8CoLFhy zZ#~O#7ivAdga;}TH-(q=0{We8*`5X46R>yo>DzPpg`b}WKBw+%mv1nK=4Hl7Y+YP} zK_}LUWk2gvo)jw3GV-asJ}IuE=!Vw;9G6pvTaJ_bYLl-wyV_sua@1iC$(U?g%ytQw zW$JPtiI|zfTmrE0X@mG#S)`ctcNyi0DTthD*nGH+g=Bda!zs*`_RVH#+#t0M-#Hz{ z@!6@gx5ZLoe$mRV1k`INBa%g~bZV8PwO$+QqsJ1YtqN)5=*_ObI%jA;=)s@jyQ2(~ z1qXT0m}%h{MQ~PRPn?~mAO_K(aElx-k)x8e3VUveG&S`?UsnV-?6)q`aqtuwqO*hZ z$<$5@g7}b=*gwnyOREsIHlA~cqY5>3^tLTn#lHz^K+v!hAERAhZW+H1Y&FUoT6-$q zL(^)*f>-@L$~1>qNUbQQ(7qF`(6@eGVaiXVj^;?_rQ2M}t27+)>aKoC&$L0?2&;h> zpB%;@5l>!`wHmvI6YIG8P}g$vz$DQi`V!&?m3_1(_K6&HMc6X2foty{eGK}{SEj*M zw_3l2S;KGqaUe{B<;J*2^?!^pnMjz?zJ}!&3Ry zsj?(2qW<|C;^3XZ*&K=6hW6h!zr58xABV?dZwQbsnzB5rA7=7n=$)*dG`JZkE4B76(UD7}Y2TU<2?aaS z(PVFclU~0*$D9`Bnx;)=w(*cim>@3{;P_Bthp5-7gD$SgFjN+_IkmSdnZQmaR+l(#-J1Q%rV6C&IZN`Ib zrhfj_4?z{HHkO%L2IzE8tW=#2von`SoVku@?I))+s^C+cBb*~*FKg9MxszJ66u~|5 zgMhGw{P#I>XJdbL3fZh@5Z&Ececc?`bfx?4X5cmIG$tk8ZXvDYQhJ8?0 z`}=wBdfxKS(n)L-25e!T#kNxFE2X3)LsaC|MLqnREIZ9aTIijI2&Z_Y+??_%E#sk0 zwA2&kHW_Oz8d)%_V9EeH+nbxARXP5+NGVs1R49?E#4o8opo<#Y=iZ#|J;-`u&302+ zT{Veej9LF2b&nCfaW9{(5&+ecMZF_fniA~8? zEX{&z*YGGI;_*O~;m*>}v0%T#r8jB*LlqbjBZt7uhN0Z2DnI-ump2)UOi!dQm;#wy zMYgr)Rp@>VWB33gDX4EDy?$m23k$f_>f8y;4{5SCNfcsZCeC?aMnRM@5v< zg~|#xVF?$<<2uf=@4?R29$&9Z3WUH#Z?aLMJ3zL#UX8i)fJRg7bNB66*(_|7*ftE; zm^VqAp&WH;HeUuREMyL@(vJBGxb$c_H=>3F_tVu+_txEX3gq~7Tro^v1Pkd9Ln{_t zJDNL`@;@1m2$6JB*~U%HZl%0}FU;G~fqfk9gXq7ireZMf+A(8>)o)u|*iWwZWAF6$ z*GmmNmF=!{DQJ%E8tw@q)e_D*opoo!5ZcPlzEb%dV(d{K_{bXn`!7)_PjIItrgew+ z9}uVhGk59>sQS5`#5bKH%ZF42p(8_rjh*2|6=7^?@eCi_Wahr*E*q(B=a}hd#x3X4 zkrh3`0aTGna=rbG$S`j$JwSUOt`BaAWHTWR!aNj(x$gUYG*faDEu(xQKcpnKQ8>7; zzYTgKB=H-qUZw8_^0}&B&Cko-!#$8sq+aWqf{t@hCkg+)(=2!?)ZzBqznDmX z|A6lip%E8fnX~xw`r3~+X zkgtHfHCe8D4<((bF(>jDE%`Myp(0|#$0?*?53IiesuNVMe1l zNPs_CdKyYfqQKfHAh-I9r`n((w}7r~%_VauMw-POsRnj_DEhO8Z;5l1>DnTko_JUD z{j>Qly|Sf!qZR|T5O>+}r*4lwrwP8v@tLiprSe;6G`lgleK}wThBVlwuhiNm5)eMm6L3@d>`(qAkV$ zg0Y0#&n>r3*eE9xmHm!#$bna2zT3ba4-& zj{&JK(rJ=XS9{yAVt(h6tdhr?Jy}6#Y`Lky)-tO({Dhshm*E1kqV||oQ82}EI{!@* z>x?~#MU?~|1{PEk6k&FKHaK6QJMZo|A@5!KtIN3E62OnlwYVR27M{tpmT1iNKAg>R zvWr}#b%N;Uit}_;E$^qvJzbd*8T!eO>urRie*NzEB8uV)Wq80qIRo!33L`xdq)**q zQ0I8P%L0?!m2H0Wo3oS1kK8iNdZQ(ozyLHr#*G>6G+i*oC$ci4=zl%xA^Aklk*h1r1>aKEz@r@ow zDu*h=Q<6Spj)E;jIIp;%EFN@gf|4gPN~i(QS%)DzL1?T!$EbFnkp9&d&+= z(M?x`!#l5>%-ICk3`VJumU==wPli9ff7g!ih<2qgys50oKn1_ z!?ItXaD?o6tle)k&@lntJxn|J)x6|pWu8(lZ5S(^s(6sTed`z6R-yqAe7<0s|Im~! zu*>Z8!RpelVEe4=FHwHHU!}QjyDKe!27NCKttG;eq^3v}>#W&Ze|}8g?hHDGe^&*q z6XvsUfP;asS)ZWV1=@G$gLYEXuk`UGZojKwM@@DI#X~y9btn?u-Sfr*A3}4>6=T`t zS-d8KcO~z*48Eu9MxM7%^Tt&OZIVyNykN!+Hi}mj;z} zwajVx`LIRyWU(c5e6NLK@9-;Ys~$W#%?(ZZlh`GFRzGrBA zgKjRW4vbDC$j#RJ$z+NPvNU*Y-ZEH^XDO>FXd;ecsw~DaZ$e3O8($HeZ7EAg&4Pr~ z;B+ z8RLQ0BBA+>jsTO_wm36A-T|%S+a)N(@MJ}JU_BUOh5hwJ)Bn^;i$}CNlku&xr&nP# zQLrH&u&_ha6<=z%k|#`k?|1)B=E2iq9*-hB2==X$N>2!8-nRYlNHeD`J?nseo> zPKXIyCzHVfhW|3rQfaKDxDBF=ecR`E2XPIgI69{{zhwt9KQaqQBVx1UUnR!3Y27`N9nMUXwESMh_GdB)*+=YU*Os0WUe7y_mKz=Ft@ygM90bTd zX$=%@XG`8T&9Nsev?(HzduVdPPlhLs_9Jtoz7dK?!-F3O9ZSi-u}R-?jXhiOEMffC zjZyh$?rM~*J>KK}wb4|qD@*kM#ok*5wfRPEzrjkO6p9rmKyWKs+}h&B-Ca|>xLctV zf>WSqahKw*3GNWwf>YeJJO6iPAMAbn&3q^C!E>CM(Y@-hCFlf=X2Q9TSSC-(&XK?tW--+Q7$u3M}xurU3B~5>B-dmgs%7!2tDbd-){i^o@&x6=5=G>uu{Y|hfYsU*jD?0F;H8F3s`(-Cg$pOcip)gQD) zi9pAjYNi;+-&@4tRmopSZiytM{7f-yPBAP3V6qb&<3ht-hclhIh>rzujfhkTt>Y}!V2`R6brIeEbHTIPq|qid>%7G@QMn-E3^}w73CFkLwf3q zhT*GeA_`j(-85lnCjrFckac;&|>Hbsk^WtL^3 zQ7^k?U>BXedH^X+kYBGu-~Eo`uT-XNHpRsh!teqWiV2oiX!K6oT^|+OHnPz{`|48G zc?n*~$eQA*;x$#;AIv@U4ce6*{Css?EFM@r@?dVNybSGoUEYNkm!Lv9ojd%;=7rfe zyOyNcX|Zwn)fQ0={o<${xs(sl!e>yIJDWrkrrS`}S)_`Iz}RT57AxtF7Ic@ar94HL z`1p{1l}+r0RHSQ6Q^-#oZPzpVm5@DaFE(`>6-6Vmr)vE$*7uHAPU;yLCbE;8n6gyP zE9a$THHMQ~KrrJ~yAyR}v=g988!D+6CRjLJYREIovlY4DF zS6Q3B?R2Y&UaRmKl7@Mm#F2^E6p9@lP`wq^?>encA^6cCxiJWPPz-WA($#6LIaG7x zZYGW?Xh;oJ3`+pv;GU#roR6qf36nFHyxft?k#WbY{fVes+}& zn2mSZxMN7p`-$HdaE8g=JH6Q9QE6nbSWmrMp-8BhaPE^Wwx2%NT!NqDl{K@rD8wiP zz<#uJ6hi^cm`0sfD5@l^o^6`l$eUE+V(y!`fXS6w^vm>QO6UQ1ejYHk+o(IP)H#?#0kZek z{CkYW#!Irk4h;Eg9@4PsiDx2gg}qY;Wsqm#ii}TcDl?UkI`TxCXugZPQkVAE5vxmq z6G#cXrg$S;DNKG>reFI!K3nNf6IAJPIZib4+Y&*%w}$8U!J*PXKJ&D^e|-#q6L8Cs z^&~shww@rYWlBZKH9)h4593c-&(F}y&l@TC$3rZgc^36l8Y)o)4fqQ~z^SM0ne98x z?(6>m9Qojj+Kr@WRvLCWrkJH_l>ICv{c7@nf}9_|Sot%R>oLZzLqkLfxM*Da(XoEk zt^SywCPgJm2N|v(3}Tdi4gCt(e$((u1`7>6i}c=;ct@j-BGe~Cp5dL>DLB|^Oz<=fFnE7St0Z`oHdAHrDW`aR*S z?;A~RQf~fs7LEr|kwJ8SF=RG}ge?={i%ll@EB2lwsF+RZ5<=G>=;HIReO?F4t@~)# z7XOkvD3aGsD4Kyz{AGd&wn>?uQ0=^5`4ohym3Z6|x<3K0^KyeW*b(X}vgWf6#S2^3 zz5u*NWSF(arnJE}QsOry-WZMsmaQ%GSCpmtHJbNmQ*w#?IgcIZ-#(FQQf~qj* z+7yju!ylHR)Z8}soEAF6B;BT6M}`3Zh%Y*Un|DWFUy0cAxxcMmAmK)2p1;uyJ#3YD z?=y#-j5C;lMLiR$SKLUg;xz%~08-H5@}$D(1#e2BDV$HI$9p$4BKnGArYSa2)NG z#!S5_$2v#dbw2PPgcbPHo{r1Suz#^iUFJ(Emzkvr0s^@J_tZtrl^P#UH9e>~8(U{i zDl9DCWyD$DJYS0O~Zxr2PDHUmjPFFzuaa zht4YdHbEHih!@zn9$vAPzDD-#BZX-K4?xrsPyc`iAXJ*)zLYUbx4}fl5hbwnS*0}| z0!GA=oRtV8CI3zL7LRNPRRjm-&>nfLyVxPjxeoEU%8qGY`sRT#lxs@t#gs;CUM?5? zbZQ-SdCbQ@@|OPq9TpZ(Y+P`SjaR(BC8V&kJLLFi4UQz*5V@VK^P!7v`aNsC`VI-v zq(ddZt1*UWgR*hjr9XfSn?{Apcae#**R2$CV_XH}k1xZ4zv0Y(2J%cvmtd&Koz%pQ zptyW6Q)u_2S7k8nHt|R^Gq1s1E!G>LTJ~9CP{x^-)z+4KnWc}Wa?X=9mnfotE|#dI zM5dDk#W20Ucn^Y4E+dIEIg@ZbP1N4-05v_ia?=r?U+4TJP}q$Ro^+& z6uFzf$qZ6N=ym~u@U%!N@3o1Wr`n`F9q_ED{5Xp2v|Sf;KhRxBvg?RukuWidv1QM0 zbj;S{I~7W|BNGq8yd=a+6Ft15W1={fxv>Z-V^0yM+NbU%9mkxWXxofaR0b57~2x0aCO-G+$)~ zOulI2tTJ~rG=B5MwURSV+(wYYfh)7y<*0)$SLAnYu~cV4oOm86K`g*PViIV z9wW8W7xebp(xlp5-PC=+vnq9NeVMtM4eg$$f|3C+31PpRgYOslM2`vCru~!4s^5Mn zrs`z;EZqq}5>rd;wp2(&7+skWD^RPXF;-@4I~zMjFks9RspN~>}{weC1yFc{x{ zZ}n@ya+h@|zx1Me9VL6Lp@cCncP*^9A*mc}bjx+HShUnXw)X$4WN-iT#8ZTq+bl7}2U$O3%(*L@V9C0>jtC`R zr!@){;Uhc8m;oK1%Wh zbkG-i#aJxMfS{OdKjWALa)m{z?#8Y}14njhXj_^z7fHFOcppo1TL_!&QZrKMgJ)_6 zB{IQ!|2gl+6b~0<47f6fvB558O8nk+wO|Hou30KSHkrugwOYY>AR5&jNp3 znpB(c&#*m|D0}U(g*C;3=sM!&E^i&nSLYBY-@BOp_~?Y1%qy9Fy`E-K&yS|gqbf<0 zURd4}6)QgYnNGO~mfv2qD?YD_=_wU=ZN8$<%~-pZ2?s5K6^~x3D#aUF_`b1cNN6BOLs| z%%>Qu4b>k`Y)=P3dW|VK50Xiy_#r-?F#XdNPpD`5NhFDO;Cm6JjhH_jw{RM^BUJzZ zvR2{tA#|?C#leVri79HSSDe{^nUs&@4U7>_QUw$9wCYR4EWrUj=ij%+FS{wDLrqw` zeksKFkw;4#bN$$AwrHZ=!uYZsA$AdUd4l2vc@+tos$9lOvcf{zD~7_G=&HZ2Cd7`w zUmMB0wvU5lc^G?^ZUNaogR^x?B1Vx0_E_x}&WH0Mf)$ExZ(nhWxgBbMK(MboXb%a< zO6?N(2}~Zp)%@i-qFGkSQ=lxvBf)L-A~vUYBZmLTA#_EiqFHAf@529FzDr0*^m?-; zOeq3#`Lw6JW?yBUz@cg9dMUW{4&~P1bNlm5eIvdh&XL34B~RB6A#!A7M@9_oLtTE6 z*T?9X7~g5|VVnM9tct&e4K!ES09adD8K^#4Y-9DE--WgEmDBZqlsrhtbziNlQLkkx zhUGuOihZT8A;0=so-;QNnG&2ywrSP8IHsL}!J~c!mEa0bP_I?XE3pxE01TUSD>lIU ze)>^qlWo*~cdW&AJercQQ9Oql!{u*(3*1Q9uaF?QJDc%$PR492g^-iu+m}f-=0q3= zeV1y-fyns`lfil>M!mn(N3`nbW~$$quWy_0WE&2y0wdwWHfArr69T4*6HM_|K5dXt z?=m(%^5*H^8pwfvB;>|DmYbE_jQ}b&ebM+Rr+1FQ_m+7j>U}4Hmb@jcuLT;(1 zZd7r-aw?Ty7L`^rT@uY%QvxsmbcBR#A(DI2jvzBYszWg2Ruu9d`v2u$EulM3OjH~*4VTfN$KFnMAuqU$*z z`Pi7Pnq*Nl%zrP?OFX;(UMEF&`S{QG#N!|xiQ^0n?u0+$9$v;+pVhY>wOc)4W|F zb2~IfS{}Sw0xLLH)DW+I@L@wdtUmt(jJ4VC4DCRz8(Tg=+G9T4Z-rl?sfnI7KesS7|R zE>~ZTs|G5ZSl^cA@u!cY^UAvhTz+!E_Fu{cD%`m+S(Tw)b@ekTyvy~p=~Ov0P)ic- zwT}Nriay{fakjX7+(F|7@dMV$0)Q8z-uG7Dt7rsjvv^hOrUbO}7GTwGye+e$@?v*m zxh@Qon9jIs%rq9XGDHDdBZdbWhJIyT<@LLX8a5EtKJI(Bxg37puRG-nc5waho~0$3 zDY<_DcajrepxLok4z|mu3&Bt536krUTbBS&@!3m(U_Y{_gdc%Ec}EF&$#~+bVu9;6 zom(uTOzW)!g3=3lF0CvggVhs}Lk-&9LQ{vYX|7RrJ;( zKBieTPB;p%`t2w;63R93vP)>sYhbfT!ydtd070boZ=?D{?51&I8Y!~``)%oO*s;tH z$4@K&3*()>2^U14`Sf-eO_B){HOX0u6y^iaG#2U-PbN&(yaOPa?YaD4#b$^1TBaJu zjTHb}gVm5FdgumJ_G8qfHiFfa?WgT`K`OhmVK>|LDx?JkL6+BbVJa0HdNR4dwk|Pf z9v*u52Fd6Ou*kNFr`!rspvGtIiPjG=nek>?T)}=Yb5sxga5cT*-`(X^`LO zglR%RkZU08<<@@7F$3=(K<_^Zz>T$PrXB8T&Sl}BXe(s21ur&0_Nzs3*Qo4p$dEKG;9U5! zWIc8)OHkNJP(o_(i5az3PV+I89+FD`tQF%~>a9$Bfln{tZmHgUPn})i*%E5d`^6ze zqV0o?qo$<|{TWmQ9)6pxj2**>CCh{j!z250{}lu*&Uu~cXXJ!23==#jlSAKSh3&&H z9uJ$RuO9SdR|lY8A~jDgx8`lwt8;dl8*QpLZ_M=0hVyVr9%V82br(JUxa5Le_uxEl zBg)z3hhC5YfR0JWm~i}Iq9Jld9J~Sv%yzH#qDV&v=hNg&-C#J5viFwDUDW>HHYdAy zs!6V_h!IMgr;Ye~Y)`a_n$IlpdLE10G4mrhwRoDQNZIFaPC7AV@jAfV4}NV$Q*Y>N z6M|2@w84*dycD>kKU6T-+oU#SRNId8*r>@8`L7|Ey`H9_Qg%DkT4KAfSbDtIDj@_g?^1EA zCfkr2=l=r;1AfGHAwu=;vvS4L2<)9vp)Zz)gcw{dEa1`EJq_le-sbWsJ~M7z)Nely z6TCze3R?#WE{Hl6fpq{Pzr&0oT({)OwnHAf{FW^sY_VrWkw>&si_8TCN-Tq>M6Ui3 zhMXAGbhm*vV!ev!g@elps;?&#eBuOQ6Xd();=rX|*FW|G61FeBRJm})Y|%<4%yb$Q zi|(RJNSA#h;bgnB-(l5NG-vt6tBjsMvf1ep#3V<3bRzwvXS@2sF>)lO z>s&3r08*ZBC~Z*Nm6`i;-}5R`^Emqe10)mxJ4}-|*>#_Az6dR+jqk8ZNc7E-y=}Z9 z-9vHc0_NF79pyH}4q33UCA=j42M|?%pWKYbJ&&@{8hEvb7M|F@%3r7PoCKu$_(38I zL&GW3mSKhWvG%4|zf?Cga2%nf62YI zZdc{e9Xu($$?iaykMCj@w4;aw8Pa3}u_ntX3S0ypf;Z3Yp$t4-1K z=7-kWLjzfty0VJj+1fN83L>^()o=)UY{zJI*pzsmV4vh;)hp-ROhX7sBR&!?`lhZq z!ktWm+GI8+Nu73-VO@-@mZD`$CnJ>u9-MD1k6|ToQ19wZRGl;5S5QjBk|(W9`-ro9u^DDIFP3pfkx>M?oc98T%oIDl{Cd#n%a%;jEsk)e{cg1n$w9yc`|VZoOQo9W@Eu_6P4v_e!N_>`%G%1b zcj~aj_z<6+3Z+7+uetPEuYsdvi@uZyz;zBT>$K8}O`2AOL=SJAn0UR;U_*#U3c--7 z6T0OS?_z8C^fvm?MRL5%4(wIv7?+P0L;Xc$eNrfP$o$Z`&-!MN5TXd?PT=4-%19a{ z$CTBCO!NMT_tUwAYfm^nZvHUCnaf|vq`scDMi_;9ii9H6Vcal#3qwC^i7+B&`5Wnc z4*uYwByk>O&3+);MROgJDj}tQ=P`gP?uz-jifk8HA-p=KtINvkK(^8pZP5uF%3!BU z%-E$wW+=HJ$%eTbhX|D4Uu>?5^%u2GNdWHx^{biW5Ex07{$=mMX%yRJG}J5J&zdL! zELL~%*9k&Q{{XX^D+My9M|BNix5aHc5JMSnQz7cq67vkX(cR#20HRO=8bq7IT=h8a` zqa__X#w(#thA&8MzY{`k2}R*Di{Q3}$dES~DBWaNCFqPRZv~fJ*k3sd!v5>98!9p2~tpX;3(?r`*$GkjTfT}FlCVmboGKdIw7J6 zqkJ3jwPb@ZqUh8QrOzgjunHyqI&tG30O0fmnkjMv{#h+YTWZ48W?}+0%#ENtRdDNE zMXUrn;TGfDOCfs1Wsw4fY1e>7Xw(;79n)q+=(x)sGl~_3W@fjg1#H6KO^eD%@5g!s z&PN;<2mppYbHkSO?%S8ZuZYB>cSOyCuKUG)yPIpt zBe3N##RkLJ@dxAH>fWh1z64PnfUrcRX{9M~ z04O&l3>uE#u{pOry2oUgB%_GF+#Vx?Y_5VV_f&%aV&q2Xm{|g2GFBA*_{NIOyyZ#pn{vX&G9|=37 zqNAfB=XXfh83ol7`JmyWW6*NHAdphSG;<-O;|YFA^dUaCrW*^ytA0vs?mCJ6mY#3p zf1zN`|AB&W{{4TT;QJp1&f-E-948Jbo8E2plos%PBkPz7yDXY%#LQ({iWfZ@4sx3X zDh4(2k9#di&+x~H5h{0kExD*WlWG&ROgcI!{>*d}!=s;l2k#M9%AAWtyYocJR0-rF z#*sW6B>ri)8S|>|{;B>Xs(b|O(5!z<@5C&8se^3WuJ-&fz7qI}%wAELzSHd^$xK_8 zfBv}G*8lgUjiqE19V5Omv-MO8GcnyIcl2cNz|&V(K{=CmPSkxf5Np+TzmR`fpmvys z*5TK-Zs=EQ%xSbr`6|IVKP@YpJlP^IJL5(AKpu~kIGx`%YeG+~9pS|V4Fzt29BuFT z2UprSr(vX*N9mU@PG_H6^HoZrv<*q-no2p|!9sWAB}clYuDjcFFxlgMDUq_}O}rc=wVjYY`2&B*0?)$ z^451yQ=IktU9dZQ=imE#oylu$E?wAmR}*lzLQrOE2l7pDF~+ybAi3^HoI`xYVNk7=#qwBWFYozMY&mvMsEVPU z9sY8PvgS*IZ(@4ICh)(hlx(SY+4-WRM^Dj2Y@2LZKuA%2H2YAltl@>x*NvfiUdP?f zX*SUfLS3+i!28)>Uqg2QsD8a&h$8;-@IY5*o|pSDVwnw+QBw5h%S+L7mfSxheu^*W zWyH^xC3@P%dQ}r8&-hcSz1|pt1RlaRI?txS^YsH#GI5-L1<Cq0wPuJ|FO44ym(V_>1%w1E8E5iju*Ty z9WAwPcwb$d%)+8wP!I&R#OozKB-eqBYO_n1?Vl~jHBqXE|0udYfS zPOE#;?9L5Ughuf(WEIxCNDVo1w{wga$FezzEvb5c2JAl)pWE6OCkat9zRhek#Yh5r z0G@+p$cvyVXhsXewr48dL}m5U!>pv#FnXQzWj~__oEPx!7O{?G3QMQ?)=maMfaP3V z;Y|Wj!Te<=oeJSn@|jK__;Tt9A(B^vluP?!iY7kqZQZ~W5Pm9Ws3-jbs6BlYV2m%} zW+^{0%hpfDFua8i^s6EZNm^#7mlb-;BZBa5WF^HzTKwP7#znu8CwpYGQZoLVSXCER zyi)JlbbTNa;680YFfAj809}rnF3ByyJ|>_DT~S8ut%F7l@q7ORZOiG!?aRG{b}>Ql zs)7%UzkD)Fxik^<$ggqSKI4xpoe1@Z1lh+|NY4KAh{1~NDgqc(GGPNMVNS+Q;LXDO zlE601Z%YsM_JqiIKeEj`Zt6&Kx$0VA7V&q38(L|+v0wu}zAoCS@k@SKGiTI!8+R%+ z6(gqiz+|NK5%TPQtTV?VYzdivEZd63u(#>$5STtrJ^JSPAewzV1bD)I?VqkAq`zI3 zUm0~S>SjEFOO{6oPnEah&;08eU-yDCYxxv>VDN*vP@NJjKIF9vd1v_VG+k(3)_<<5 zdYVxg`uqiI-G$z<(QUL$k$kuAB7Ew|Pp(u~SjT;|M65O`(dpH_)%={=8Hs12c49n| zqCcXoZdVgp|7`^((r^P**gARSb;nzay{r@#v>_E#)N~&w9;Kgz+fg$^Hs=l2%;k&% z{sAuO5^iLgAt1(tgR}R}`EPW@>vzCl@1mObTj3JkDj$t-e{CLCh)r6P#pbcO#KNe! zjU7Q)%p!e*rQ6BXFz@w)hOTmhz52o6S<5d$D{ z_u2Xf@PXJT!#9lnpx5^ec5AjSUCiR~NMC@yITKx@(2x9uD>TEVL`>AFbU|5WNeFr& z(wpvz7p zH9zxCMO6koZQWR%sU?-LTr#jO*Nm8cy)*&mFWb}kwy4KJ?3n1pNZ1tZQ?&|A24mOw za@l+bI}0lkGNDt3?tKU~vGV$;NB$2G!R{_8td<_X zHae+qBYHes=l3hWk+^PH87=8vf#Tsr%A&(k*>>(ns-yI8a=lkTYBM!}fcmrc-M$kF zduD;PHrg~9TGdQRra0cUxn?0p_t#S4{ZD<3=+Vz-3;zIV#n)8xh7V{OXEz~ywZm>D zAI4CIKKVfr3%tf=Wqxj_Eh({z)meW-W2VgM#}5J(?oMnk&LHctdANc4-BDa*@X7^> zyLf$QlJ-_>Rz>-v=4=(?lAQ(k3l`;H9s#CVh=juSJ3r|vd+co6*7&&CrlwD0uHsq% zkIV81leG|%65v=P`(;`s0nIWh3eu@ucXpA@68mZV-&6@rP%SD7t)wKL&L)$+O|tYg%TO^FAKtO^fonm5*-2pmWrgF*jO7>k0l3 zxK{_l{>dpZ4%cE_)H2l8{{Xs3L${9Jv*HqdD&g>v*K9p0_dehGzY!`3y^#q}>v}jA z&JxBOYY9)fGk>XN9{KeXh0ZxuKw-R@^}Uus*@Gf58M2E!?*9WZvB2dEzkZreTVfBi zx7m`s8m12Uldap=q+uMF#fNT?m@LR zFZSPa|8;gPx90P;f7_>Ozw(|q4W~Yn< z%VUg(jxqkURj5ZskH7&gqEUQ(7wOn(B-Zm$WXUjG4`g0m5gj5}*&pq%MxecXkaC2F zK66MzFZ$&pZWty0C0*OjiBVlWr5w`v*5p?<>=Eq1>10~_?Rk?YURh3-5+Wx1<(c25 z?}NS=dbY*ynZI>U_7d;3B3n(7IlSz{P~j)|J=Y_6;8A)#Q&u^q^6x-v9q@b>lVy)y zxzzuT`s(2AO&M#5vee0;KKICJ<1aK*1m-ez4kXN%{2)f}2Grzp~9vOljDA#`HF*^oKrZ=R%(!lI3J(QV*LTlYIaS+ zF`sAeiWiMqk$mb-;RBc*<1kUUgkJ$96(TR3F`IHRK_;q02G?_1x_(FI_Akgeg74;`N&OIkwaJ>k$C(jKa0CHc(u(wHC0(V>hr!! z?Tq_wZ~OZzwTePpSu(cq;nEc~mwx5@Q_9gr=#(L1VAH0VO>d&WLRP9F ziS1HIIp$4-%UXqN!?{R9Kbe!%+Ok%1mK*~Ci0G@3x^M{FWRdz@#(NgKJWh7j+DOAV zQX&LZ#4Y)zgN7yi>~pVIi<8r_PTn$~r|N5Cz>8fqR=jn;H!716pnXGvWFr+-ZNNW3 zK@Ag`AZu8l@D_-}p~?uLjj&v4;5SV>qM1k7`hfv*~AkJ$kK{=neq;0x2}$ z%1)+3zqNP(zi2o3ng6rwznNWk#(84EG8WV6O}}bgTNYG`-QWYkD;o3{h8HkN|JUF{ z7c`z)QwM;v<5sEt`Qj%mue1sP_x3bdVB52lf&wDxh)kHnNd!N{z8 zfouV<6@Ty%^Uye)$e>y5pfIh7F}ZNbF8-V*nnk?gQTB|#0OZZ}c25QE@j)k9Xr-Cg zbAE7!fuO0Ll2OuPNn%}OPCh|!GaeCE9qWM@m%6MhoL0j5t#d4hT;g8ws~AEKhS0&1 zg>VK((B2X-)dT#TMH2qY&T}YEWXmnPYp%UByiSQW{X(mm zS7Dum{_p2vsN~^<+o&@%G|BtdB!tsxrKTM>*U`-R#b?tDhl3oOXk2Bs;JLba{uoZM z9oj!YzFY|NKR~o;$}(Up%199!0Q_OxSTmKc@h(MTHcryq3#iUk_~0V6uN*{7cApi&zt`Y7O8Ar< zWMhS%IxZQA>!ZV4Y_F<9Ze-9>r%r&zG9W1U>>fciI#$87@ONx4L?ZhS%41D*xR@*K z6+Fnkh)mOA;QP$XUv~-eah&Y=mRGF8zpLH8apKW)x{8|=*zOCDHdJDCuEDL}cf8?{73B*;e-dS%=?z>2Q2ZphX% z+)TRcP~7~NWR*RLp*ECRmh@2tbXIyp&|RJk{NaS&2ka(Q|+lg&n~@7I|d__^7@ASDFx%Nsh#z32NRwbSgI znjc|2Q&F(~BbmC7C>C>1mcP}7?E9E7wY|TZdCSzsEP(pU=Z5pFt-xWPKwJsdE9XDk zKS4b0li4_Cm1bh8iLqUC2}3vr!5ed;{>o#lVXwbetWl!o0S%+-3H}&hxOBbA2wamG zYfT0Ho87KwjFMe(9u8+Z0Z5AZ^Jam6pE9O5b}$6P!5^JETmc{qVRxik0AzN|c(@f8 zOCx@|C>P1LE8iMfk!OB*(nS5_5*qD3jQ?0xnSZ2|R&Zi?l}{1LSo6-kRfS73_QEaJ zSmLL`T=R0=V`!sb72oXN6HZ3GpIlJEZqo6qs>D(5*PUi0Rx^_V#3MgQd@keVphB5s zLL$1Scoo+3UI%xL(#5g$1_OE?wHqPXE+CrPn)$_)-wj%}FN_+lnHWj64VC$v$XG@NZqkA97w@SsLVfP}XVjB5AZ+0u^ zR*!QP5T#C(Z2)Ts2}H4gTSj?`I|Gv8F9Oe(cVesreS0<9;J7JIhXY6~S{-k!uwE}w znAMIjrY`*n9Tns4HO4;^+iD!OFFao8b7HMZ@qDLZE?db~hIC$YJ$m)&&-1Pmb^bm1 z

f+InKsZbjNzP|@A9PZ66#TAG@ZIjtSvo2lTww)l_9&`_s6BaZ>4FaZ^9 z1p$ep$@j_isypfH{6UUaUD}pH^Ks?et<;8WZm7apMPy*7W38o4Y#UBLR~2K|wRzh0 zZKspV6d~Ip6}_;o@sO5@PzK0sh-Yp=K}4j#jd3gjqW}^B6%pCI5*wBh7Qhuy+am{J z6O(T^bf7&-^Wt9`GGNxTRsyo>(aP$Bb8NWU&0h~L5cGkyrNe!bG@XKH87b8 z4&hYpa+qz$Iyvr)N`~`zZTpBv9XW=A)B`eWX_>_nVpr~6%!$7_#^wh#&q7+L4IsYu zz1@9zLHr-Ez%rlM#q{){G~+UOVd7C3g5nHj@5D+n=PG6|oVTO)ZdY58M<_eyaLcAJ z+;hW)0VRIU;`s0g@lc>Kuou+I0L8+WJH}@VN_gN;u_vZ7TBp1FB%nH5cDm=liKmVC zRq&w}ATlYkzPBJMo!t8#0@+Bovnf8O$RWXnH42lbDOt;jgWa2{6bz{Z#&iq!T^)2< zU^+rH;~BVMH=mO}XL(66au>pYJupUCPI6$hv+EKh?3AFAN&wnGmPrZU!Cw~^NZOcBF}oyzbBMD~ z(UkGu2yMq03bPwy+{=7eA7F`DD+{!CwuDSP2w|ETa4a)DyUUl|&%204msGQPRfBn! z3dkZTX1;ya+E`mSAEJgdaid@7`f``FK5=QMMN!CBhjT^==Y1@3xw&Owp|g_Ad>?{n zsT%q`;|Jn>A|IqiJmMjhUE|1#5Yi#XMO`(ZtvWe zG*6vwi5!Mv5=Z)W#9;F5l)?*ngo@Tl{dIabwNqTkfz(gDxQ1C5KbQ3lAhU_;L8ES? z`3zgC04A?(s_fHz0!5wg0qDE#Hwu!@D8MQPgCt)^T!$O;Kj*pG*Vl2qZ_XBW=~ zRhBW0vC!dGyK1^~-zR|0?2i5#OOEahDG%qEW88VCfq8M^a;@ciq@3OOL7}J3(_Q$p z7d{j|RhEOA^F1%`7^a&51?m|3HH^a|eT3^UMP$6HkCmjBZtTljX?UF}^wJNa_ z;ET>)0o}GHA*+xT+A-WD4Bd8Ka>kmM6UPYicCT?MoR@UX87H6`Cm`bUfP{8KODlQe zjG9NjSKk`Fy=>4p0a>JIzlF0fcAEwq(akp`0% zSaM9UkjZjRkxtewp~t`_iHu?`%*}c*z#qalf|UP2{)2p({SG6Yj7L}=zp${xwQ%${*?RwuQp^r$N=qB`V&#G~ikZ&M z;>i915OMJ50neK7DaWAICB_lVsbpxamgP>5wRi;ib*W?6*fyEs)IP_&h^1#a(F2`p za>={)LL@0QA))LBR-+@8Q3ZNErq0REo6~H37uf=Lq9@+Hys31W-wGNfsEjj6Z>c7d zGY>!q3aBH6wPE&*lUS^GP_-n(Gi?^qj!>5e#y@#{csM9=l3fnGz;sMnN=u#M_CqG- znBsQ%khr2);R_WNhmf)ix?iiC!on$Ja|e5N-CEs^!&=G3Dv} z+U#wZj-~OvW?I@8qbb)Q5Rvw|x3u1MsC1ZPk78b7&Rz78vTE`l%E&Zyt+IWEmM6%9 zs+Q0M_u`apubFH}I8N;#QX3xYG~-gss0aCQ_w9;iw$><|I)M!B^q|mpWhtmVit!v{ zOeDE3*u`~ETX4VrS(!+>e#6x2wdHMi@8|Ou3bHbC$YBdd>~Q=pe6oeeAzv2%(fM;% z$9HMv%W8$lKB$t_`d#)ximXXy(|f5+R_Fd+TqCel{aAbbjAtCv0ZdqzeRD)s zVby)|@Ij~qQ^LC^O(m_iI&dK$pDA2f=eGsVTY#KpK%Y1pg~P6GcKlE%xEtJ@qsH6L zk;$4c!7Swsk1;3v-h@fEHWudU-81i#(MGmA1^&I%3MX&!8G~FV1L{#Z0Ymn|zH#dj zQ2m#-)8V${*pT$D2x+mZcUx&>E-hh=+^mcg@A#D6sTs4ZOj|OWpynWKE~;RyPSHU! zN_O0&rHJuH_vCB_%*iv`vLn3vGmE4}G7$O$KAHBjREK#EUl(|5wX%*8R7(y$kCg5x zuO&N|)Dq8SSX+_o?z|M>oRm6DMFh}8&k|N=X`+z%6_%Lc#h7W}gwGiN4gr%?NQ)T$ zLyQR@HdIV9`HT5NhYeei#TJsa!#G?=z+~ns#?m|Guud~TUNXua> zVY&Z{u{RG$^uG4Sp_!ykGV{$d%w#f|%x0D*nVFcqStd!H)Fft4nrSjgrb(u4w%*=u zY15`{?iFwY7eECBL3R)k5fBj(5fyhu+!c3JR1`%-#r1a{?tAaA_kHhs-}}cIa2}rL z`7Y;szGwNI^K~ZeSbcieIE0?;(Z-~cJreW6taubw$F3lCII!5<37REt#IG2{LZvc8 zQ31QVv1>YW+wTWIH*UlyP6xRu!KF+I#7tNUQL(IIfK7Ubbif!Obv>RqzbD^Ktyils zEHf?5SYW9Se1fvD6PZ7Rq{^6tlZ@x!Ak0?@=ZQCBkKPE$IzRZ=gU3X0?9Zb&k~VI4 ze+?$tQS{AoM~)tF&foY0bPfN;UA*z=f$Gp5uOHa+LzYe6?jb@5vTC-^pS>AUYEBS# zfbeviU7!)5Js>;{XCG(`DCsm4QgFZ(4>SQp0h$D&0)cWsNCxr(r2wTun$8UPlhWY( zEGQ0tLIfXtr9tM-9DLG&=7BPR79fk~MiK`-(n`oYeHKvlXHWhu268b@5EkJ^UkqUh zzB1s(0DLlmmVsD6D?m9wL7-fqRrvY893q(Dqv1wE1^mz8&+aCe;d2P65(w|51t<*E z3$zZ@2ebi%Cm3x44FE-ewx9nwy8V&;xQ&|# zb*C>t@4hB9!cS*_8i1hF1SkCduFHfP_&N>kuZK^3-0VR3xYfdC8aPnktA=cO!%HZE zS89PufpkD+Kq|?x!-)hnd@Bah02KkPoGvH0P79)4?}GdBA?XCN60bnJ(y2v5Um1Zo0m25JFn z1!@Co2kHRo1abj&0l9&?iHDBAdG--AsMtpztcQU=gnJrc!0~}T1Ot5t*bb@@0SyDK z{_I178)yk8VGRDpv%e;QCV{4ayg<`HGsHtNkVJ->&|yFLSdSh)1e0+H*z6Fn*df9K zkRNCfC;+q!v;wpWv<4IgS_cXNrJ{=m50^*TVH5t00O4$~4YUKa3$zEc50rHJFR%~} z!6G;W3*Znar-xwVk7Gx4fXq62*b1C~h>#9M1Ihr(0?G!01S8;cc;C-HCv*cbp?fU& zI|nEiC=ZAY!~x<0+}3M2pz zf8?M4ip%~xw)l6Ju)=fxoh9t>9cK$ibnM0Fe|jr8I*&g18qOmJ$g4j{!Z`(B0RKOi z{Es*Ue%}-A$IaN09zPeuFay&KN-;oH=lY_(ulWfWQA+u82N+?715VNw)j`GjG71DL}YT{NLFE zXN7-c4JQ1}iVi`?9Uu{Npqc&ui7)5YrT+$&&$HHI3p zdz@hfp!XzcTa%6@-2iFquGnL^J3qLq<3E<$PRv+SY0&Xoo7?<^d-KwNSB{O1hMASs`n5KH)7J!xSHD7`J#3^^_0H53m zNdQ&UAyAl zBKBP!j_yg<6QE!A`ELB?MDytM=y0YM?Ou2|ML#hgFxwQjz!SX(uO_52n<}=g;}Q}C zg3`NEr|tVF;iXv9RT$#l-t83oa+(yvJ+x7CQl&uFI1_sdf&Rc}i3+ybCM|50hCnOL zD%Vogau$*%Tz>Lx=(NCG*&}ZA$?wPHeR_e1)3A;xSQ?@zu(Mj^n zxQC%6mmW%Hm{HE_=M+6TDQnX{nU%;Ue|ilbSwC8+pYgLLUttVQWVND_mht3=r&4%$ zL*(W`L%J$edOub+;5D^b{hWJ$alH*a_DF)dwSc}LQh^0!h>R>`Po%7&M_wj3Az4BK zwMDQ^34mjW7s^a)cV>8av7cR^LdP#&?Ho##Bq_xZGcKLnO_z3U=}rJkxn5x;HnK<4 zmm_q^87#>>x1e!V&y|3y5}cT?2hx{WwlsnKhgd0NXL7r5ubWIn!;c;i-9);6C_DZP z;mKFeL!J2Z4h;T?a~_Y&yglyzGjWwC2(6>;IQi|*pShO+(O@vT>qn?wH>Fkj4nWrf zGtJGq;A-qmgm!f76qr*x6V8I#eu)@dB1LWaja;52?L}xC?$cF`w=2218?eMR)@^<9y^@WdaObML7JTvAoQz2AoC3<3uF+~FQi3zPQ=io62%*ey$=eNGw#B1_px<@2nhT#_jP58z)* zpKKppvDC{zC~##OnyiOe@;WTIuBgFkToroLc5}sFVKRHZs<6|{Lyw)ok$m8#Gm3P7j)ym(;v`aptgOyy@fBhre zbJe}MgXrYD;5o*7u(;l`I-XGvh*_(uxWgaXk~;_gm*3dcs*>+_+f)$gs*v;sBYpgm z+Uvc99te+(uQ&H~QT-}`?KXbziaW|a2Ow2)StuDPi&4XZYls3m1C!{CKfsrq#NT2G~(GrVAtRXt4}^4yAPvv^cHhi zNoe4;nA4DUhlk7p^oU3r;q&A60 z^5!nD08$b6?)8O>T_pmG_!`DgWMynx09Ow9rc77_T=Oi`-(0Qy3ME+CS*I@9zLKg+l4PLG z=T8`W6y*Yz3zE5+(a=h{yiV*hO{46C57RUaE0Wc~)LX#mfRl!iv`dRRT3Xgs-(XcO zYcs5^**SFaJs@_KFjjiBxO6dK=}%m7Yv$bTlVH|gMN*`3N~6Dg;RTl(g!i-Qnsjk!P#;n zG#n!8uo5GJ%8iL!wd@*JCD9F1vc>(Rz54TFO}A_-IOs-NsCRv z!)3rb{j2-i+j+_ICqPOzZxm+;>neZ`nnAAlDO)?@q#gz_x1P-;ijfpSG82bdSb%B?+P8?R-8)u|n zVS-$3Q*X|mNpdrxBP#M_ySh!jWFH{GjHTVBa((M;Kk=Jr{4KL3eTzGxl)Q~mj0&=G zM?-Ov+i_tCL+M}Iv24;)B_P$5-qb4As9M9Aghb=0yO3krR}Z%o$WLHdj&^q0rd;|c zbR3|Wz;&rh>b_vL7sdYYVPQSBEZ7oER&8yX1LC$@z96ee}QYW0SWO;Vw0pC zz;IjgudsZ}`2J{L{d8umgy4gTE=?EFwkQ?yLzsT1!7kh8kX3MH4i*`d-_KXCqfsB9|v5MH(reOV)49$FOnlrLc-4Np~23n_TORr4S^1EU+E zl})4MUog#letSQ?t%CHs%Y<>*YL3Ooo!jYSG^6AeERP}d3pr(yM=mpPB`WrKDg#Ga zJ@zW&dvTw9FRtKwaqoUFZsvTP^6rm~(LZbMCZwMy-S!gi5b_<5d5e8gCAi&l zJh>uUg?&q+BjlqSZ-p*RJC+>1o2%kPjL#Y^9G>bGq1=}VSizdIhS7}@S;26vN12mGZx14s**bdJxt$?0+t72YbU`_aRdydjo9r`U_5%o)*no}X(Ppzu4S5EN7`(>a#W>1Af@ zAb|@TAs{ojWyM2bp7JPWG?m-a169+c-@Z#=!juIX^ji}&G60VAo%9gLWbW1yD*%j) zZ@XFR!MV*(0=W5d0>$u}XLvz>ROq(d{aQ0e8Yvf*OJf;!3w7uDas&`P)wSU8r z@%v|c)^b%1AoV*q;^z95iUepUE!xheJ$tdAqK`gugkqHf~In=Ymt-;;Xc5)lO7HS}|<&Aak{a>ugqa zrYsf{lPQAo*)FyMo|zk!1&Y)(!+Oe;;s8b;7_0547HVDr?t-lg?qk#rm4{YBibI&z z%C6sB-ECR|R4#0JAOv~C;(=C9zT^s~Q+YMZ%xxwP=S_>=XIsv0PSL7l=dtP;hbUm~ zZI~hPz#n)#^LF}OHyn86u~*-}^fBmGedp6-;=Vc*xA}v(mu`qVegpVmpM`IP+3R#9wcILFZCH4deg>B%m`jROob#O(Mh?NOdJoQM5 zA8=%NR2rHe%9$?Au~bN7pw3DbWjzw~cE*Gi*W& zuxx@Zx|~N=$c|x|G}BC^xwnOM@e6P$;IcxvJ(@ouD+b6SR#l%doZ|JQ;Q1&27W&h7 zxio#rV@fc%gyRD(GEwMG7>~uMOkK#PuiN{L@(jQ>I)vKPl@RL`x)Jn|Yiaz2K!iLm z1@ve-HMb;nK;8Kg@B)YgV6cXnn&MDt7wL^RPwb9Wt?@LX8~=g_8lc;In&yZor4U9q zWilt2w5!*mzN?VFar&xXYwBV-LI6k6`e#|w`ZoVPAoqi?Ii>PSX;WD|>|5|hu? zmIl!GpCGKjm5eoa#B24az}7HkWCksfTr~G{s9t~%qKL8Ipp{vr3lQbw_HX!^?o1?q z7Q8a}hzt%cM21HrDzM_UW|L@ocC{tzML|saw=w9)zdAr7o%!m)bJ3#oUw`@8XXtAv z)CvFG0~Mza!GTxclR@8}KY$%1d2T*nl8|2e;8TfU!)V^uE{|QRnKWp&Hk6Ly{yB{qRC8C1FJ?@6E}UqytL5-ZVct=XI0%5MlH9#?gcU<^EQO zWL1%XrF3kkg_DcJq`H^Uk1ya&$r_R6$~zYoKfwxfn~Fm{`VL%?LRT)}T^Jd26}F}> zDgOm4)-^e{mU>wo#J?c_9e7)c9d)bBP_hgZXQ78u(plxQZW7O;<0mC42`SXlO?szn z1z-YMLqSWhMRac*x_lBJugR?L{L&hq{1RpgvgnP{On``*-Y4yfO26x!I+uCDL!P zYEQk!IjzVhz5G00RQz>8n6tYZ6o>&o#ONXSTDqMP0g;Q;C($4&F-z><+#V9og2@o! ztvNTk0`H@j9szjHNLF!9SYM#ff-`NqL|(D2cagM@qrH>p$caSv96v>6mj%G)Pq7AA zm6g6BVg-8OPDK4lys|%oygH)*G*CF4!fiJ9pqHOVKe-dl9!*#=_D`)dA1ZTOpqp_3@N0;6|Q2^<42M5PTViKy3sk1vtOo!sCw~!?Fw69 zTq8a8D0<`+>F#3*=^~}INeJ4T`R&GgK=kgO`DwL{;hv zvWX#tF5=FhIkR_~!Bn8FOXrfQS#=>nxoeO39D4d9K9(y+O2n6>lAXk=wUqVhbbZOp zKcQblFRgYkodSNE>Pd`QFUfGxrlq9w2xVPF>J#zJqx_*MGjt(&V=q!$#j7KI@x}pv z8R?CU)O`V>~-$f9d?xih4bzd?N$e+h4EGtelio?#`pT`e|AXM@o=gC~)r z{fl@f@@mFWFf-Y~H9L~vK+v9$R-`Rg0 zzuJ&r=F@ovDhQod4OkVk1&f_OhvkQU{~$_#J8?+f7VN2#fa60wnXXPz%@(m@y{PCx z+$}elzFWjIn#6Zt#-V*}_Vk1Y{p1-)wvGMjimzhMr`wc*>BjRI*FuvEvU~A2zJ^_e zH!c^}(?zX%DZmriK^t>GE8c$(eSDmDLxN7-Z^{x{K*DADc~DMiMom&4#51N684+q( zKdl&WQe4@bKA068$|17R+Cv9O$58dnPYHG;<6vK!WX_77yq0D}xD!>}kFLI$FucU- zY;r*{F@BRrxWB<1mZP|n=$Q+z1L!GNm@aM_T)GcjY!qV!Z!VwcLd_52qgm_G@$~Kj z`DKjb*)Jq_CFOUMpGA8Q;w&+u>Bv|Js?q$d=W4hOW&M&IfBcUyN{uFKw&$U}4m3#;l zC9Av3D#tW8pvtEocsVh>*s$uXDwO>d%XOHT8ML|DG~!i=km1^NvTeuAl8vhVhAC5| zl3v})faFVn1mTuz<|JiixI-d-0<-O=sVHrlB9iu9!W_(igJBJ(dz<)gF5PEpz_lU0unaMN{~y9NAKzF#M1kLuU4mgpnIS%DtN}x_`ukS6(E@Qd=NE)N4wNXesgAPA@ zJt4lVQqbVAK%p3eX?(}Q+#4lL+=hxCAbyyTl_pdtcS?_A)?foO&)MF^0h7J^fyuFi zl3tlWx1`(yY>?F}=}2uZM#s()Ob6WwK1Xk{aZ4-z5)*{wYIY}omK1uDP!4l5xKg}Y zpV=;X8cJ3a+eUqUR{rHq^!9o5@skPRoUVXIh1-P^?pC#E%(z7g9fX7XF6X>$XB7-g zuYraKy9}>XADVEf-!tNm#Y5ow>MmNwQTBpa#-jqPD z+R{#I=}1}28GN?;FW`*q$edzj(<-HP@IDz$Sg)k88oyBo&?%<5cKk11B_tnwF2OmcvSjQp%5KLjqOLAl z^41{fPhUi5<#-HhF0DB*G$sf3Vf(r>7z14q($wbw0DbNn7+CZBb@k0%@?T@z-JI>B zB~QU=ynAiY`LwXxo8(zrl7m=Pu7~4n=xKHmO=u_%9V8_zacslQi%Kvu1i{+sJw&^$_5u(!LJcI9^%bKAJORqEL^0*{a%Lf4KZEbdZzw%4E%BEuaZXZbjcwb)5i z`Y=A2EwuvOND*5G{ZlU$1?a4k^TuE&hH+* z9|R`PCobJiQ%YkeFM9ho>|NH#V-O~?5HCSm>g;?`gL}`S_#7)XXRc8R_RyCwD$~KrmHOrk3AD4! zQ_!5pvr@0a{1ruOfQ>oKK`*lmVqWAF$4WXR^WX)^AU)&?UCK!&E19bMup*X=6C9gu z!>gd6Cm+H$t==70tH?V6Flxb4ew$uDnv14mAcx`^uzu_-?&RXgBIFbq!kJC0nk4-s z(SYa=;jA0jZYdpDoffNMoNexA;ckDV5R_%pLulxhQ{I`Rg5D`*#q}>vSen|Z=Z3kl zuc5{0`>4E;tjaW|>>-Rs?eD2k)0QqT5u^vdO)&2*DW=P%ki?SYo{$a{u;vsJ#J+Ty zaCK(W+8eBvTOn5FqUYDq^2)!4_DiBwTue0v#>h~w_$Ey47@tVbE7hVu#-hl>sPD*? zaJQ{yHdhS(&lK%IeNCZf8n2>*?tTOXo{RS`n73BDu09Vtu1A?+Hex-rnBhsqrZe9cC`(!)iyS%9>B80KK}H;{w(fr%!bk)tiF;; z*>1fIqGo*Ol5Mb3*+JAp+2=C}k-4FfdCj2meN4fUwo&NQ6w>lR6!$2uy|ByN`71+d zk}-%LyUXeYgR*(_DgY`DK6|2gBuv)RGtox}&xw7^no3)}HUWk2jQT7n>05I}+tX0y zM5b^xkR{UOgTSp&Bo3BTs;$v7mfo!by(Z6T7F4oo4Ww%?5i+9=6l>5#?sD#kk7Ek1 zZfSjfD_tHNL5YvyeN)Q%((+fV;uo-LQS0`CB*?@INTm_+k|3kSEy2p}X$AC7qTg<7 zvY9hU%eO)R0|yHaxTbxc{d}lArODlF3s0|x9)v+jBA$kbTSHC8*7nkfbQt~S?qhT1 zM$SkG?H)sqK92W(s#xe&q;%q?SG0ZH`tE954bmJz*P<6E(z+FXd`Exg+OR?j3q+Q|XFECv(Eis4;Fa-D#C~yQcZW$gbt(Z7L7i8tCC+{YUbG1P z>Iq!>xpKIRDRr5u5PIp2W1it~Q}PN5eg)|ncnQ75QIEr3l&%5{t*M^XRx1o=mVg~+ ze*$04GA?TW3DOJBisRF}}Ww+2CRC>aNWB|k&i z-QuW~LK)>D-JH4DwfH42SZ;a}SMC_>^r`m72-H^+v0Tjh_Py0JC~^n!8F*WPQY7gq z(TV?x!BGjRJrn8cq)(5dPoG4^Z^v)Xs?eCO0X65@} zAnops%`u0AP}hhs0wKb}hUtz5yHDc5lYFVg&6KGz(%A1&`crst#gNzBnKM`{x$9Ih zyCrpGJ9{4vQ?o*L0n1^#YC>UF!)WXd zqCJh?JreJ)SFLvUNpHst1!MX>UIw4^<;{eCs7fLZ*eEQgR{=e!q?(!+ATIR_V0v`% z8GJl5BSTHW-G)2~_hn%6tY5 zVTn`1ByUrZY7tgvTY*7q99UO?b>B*S7Q!;^%SGBG1r@#b_fxWsv{G`-3Iip52;PY^ zSJ@pwEv+OXhDbz7Q(i$}%hQ0@DM#;Kx_tInzn?!RPeQO^m$l4ji#wOK=)OBS%Pr^V#i|dn?6jFcNv0ft zAV&1p=h5B|t~8|u>il}}<5p<_j`MAhdM zgL~-qAE66a{060{xV#%LU9D;1s8+?k<51^F_#^c37sm$jwrx{=X#CfwxccInrcPl9 zeQ+=GJdM1E66nc0wC-`)2Do`T>|MjbdM)XDeweh!^JlAztH@MU?SzO#5^y4%d}9-|X` zQTTb>g|#JZpr?sPp@zI7l--q8H(yM0UO<$1yx(gM=5$}!fi^xnmA`DsPV$*DNE@FL z8jt__l|-w;UK^es0F|-aM;!`x`P*7SdA;Rl$oqrDtZ`*!>#zd+h!mHbsZSk~oI`h< zj;?^FI=Oi^sXz*y zl+cxIZmW(c0N%mTmC)Q;s!1`8r1AJ7&X`}f4{fH&_^`n$L}{bC9S8c+tp{2Iq=!-B z3PY(`3-rpL!BU(lqg~s^Z0Y5{qt||fZ(9{CtAMs$Fa0s5+NP5ib#2z4AlotY(s9^& z3z(c0EqO{-1{35Q7)+_|Ep*4mA=5qh8_z-WX?wY3)Uv328q1Q+Ox6{SIA=g&Qi*>BEomoP zIZ->kCjAmJ<+QtX6?w&ffTcx&y|DRKaP98mdk-0Y@0q2RBwI0=l!Xe*=oQ;}_~Hqb8; zAH|Bv^Oc?v4mE?AgQgzG{p#7(mW2#os`Ph|f)-hUsImWF@EdOd#pAY^(QavVzpYvP z6zJ0f+ktK#@5MU+kMd{O)v8!oG-cna{KrtqX?9~_IY$-s$1i~X2!2grs59B!Ed?{z zsvhX$xVrmaz;~f)Xbx2NA8Qa5aUC+)O-j|<3)Lf20TSn4hzEWMvdX>MOK+dlC{kbn zf~D7;pO%g?0FNIH@-FH|m$_nr@(MuOMCwUL>1?~XxXy!;mW=@ZEE zPC~X=CTgq!kR?sDvfyq{q7BE|2ogB`LF4vihy``B>liJgqi@nGw#K2aKZvUMgN;>f zMUClFP&xBhmSvGKFhI(D7>_GMY;{~w;*j>|Dq}D!6jSne>J{IjjfbKku)Nvy$RbTG zfe3(ocXDBc))qo{et=H?5a)@K)+AA-V@n!`iGw-fEcbd6B%(w3>xVc`h|Buv>e5~D z&%k_SXH86vaY*_9N)Qk~fI*o_Eiq6X!^)=sTwoTpbPSB5UqP-!G+@}S(KLhx9SYEJ zbF{ori^D1WIU0)vKiFvDT0_HdmI@*c`Li{WTZZw%TBxz|%$;_2Kj+s2cMi>$baO14c@oXumtZ7|S{AZpaK4DcObHqaXUAh} zKw>?6&8zX-a&zaUL8w(>s&+a@dW|;$PbWov4FZOBrMuoE2LA+ge_fd#?vLbwDEQuM z=(E@a@@AEDoegD@cpU??j-6o!AXd;*ul)}Cpm6Bx2W%cG2)j~8;rbROko_9;A!$_- zD^1jw7WNLbSkzB~73>(xOUl^)BkFjJ;J@+owQaqjQ$Cxe{2Q3*%gI^wW9#38+xCMu zp1vw<3LAKgG>8@HoO#05YRd9?Xy#P3&M7A};?$I>RN(h8iz2MD*^%Of%M8L~)asFL ztfVy2O@OBksHCL0=v^fjT( zdqBA3-hBaUB{Qi|UPQvI)K*DH1wV;&{Vbqwt|l($1@ngmE=3|n5AEg`w3X8TfbM)Z zdQ)|sbz!N4t@seL2kX=Yi{+K1ni!DX_;Efx6*X&9{Mr)mm1b)9D{@Vq3PtQI09JbL z#PCpssG&uP9Jim!9j_7!B0M?DK7zY+6C?RU$tG)FaT|ylokt^4cdnE2ZUJK!tsXli zopD!G)GG$iTh}iJcg9Th!%F;Q1}Zqe+PTe@o|J4YM2b^wEex_=x7bPi8xp+X_@{*A z;&LmkN~(kzuuPQHq$X{lXOH8aVCZ%eCA>msISdz1_ zWLJwp(6EG&GH-pbRS%o;Hgv9cby%pZRe$fn(jE^hBWHDx30lYwWF zom(u)^)v^?*WVySh-YCR6b>)-`I)8Ct5^wd+qq#*XWamkik|*4t{gF$x*}mvUUC`B zaxXQnwswT@`o^14bD1}vo9Ex@Q=P=p-F9oSs>DU=IfkdG!$WJxD-#=89O)Ne<7}3h z=2w^Cfan+$c?;L@T+0DXQD-Y+J$x=HMXyL>RW0uj^9an?SK-bvmB*LmwrKCbDiu?< zx{4Jwj-Nw!MfJvpl)0|8yb0+MjO?t)DqBpEk;;z)*5l8>6LZ#LV+B~7E&#lHWHr{p z26sp71p*_wDesKT)=9jbn&V(;H_w?n_Ztnd7ZC0FbI=^W!=#nglIpN?rNNAq{p3|c zHAucczDay8eyEl{-!Tet!OEU#`CMZ|9_g*8VQaz*^$k?J+@m7{#jj#yOT^QZ#^pKY z(K46`Q0{jwOB)EZE545)rk(v${6UDNPBQtF@y2diX7 znR-Chqv_F27J*~1ROFnnYK06l6#oNf7rrrxGsk>1TG4sTuuX9b+uW(}k3?LZ2;tm1qDH4zK@UC^ZSZ#UiNxP?=ICD{N}F{ubU<9>rI!ij2hpO@^We zEKQqlJEJmX{&V1nf6Kpi)_8?E8+^P#mWCT1>+e}A^1<|Uzlodtn_Z<{bE+x{Y<8K9 zjG{dMoJR{r(!sZI{U*P&ZXsW{0hHUmF8Gw?t>CXi}6yDXmo?tbkUomMA-n z=xQ9=cndvoGTt|t#heSsVz4AlMkHA3*@g2&%*1Cv5$(!pDljz7Nx|i)AJXe5$_hRB zNj<3QEnM@<-C30kuzFOmhia%5vnoq{66x+k=&Q>TaY9d~gWRE3s^CUlPV0h7H%Z!n zoJFV#0@d!3+HK3REP$_AG!R)-56lU_gFZAw z{m2zPWSx1{0`+Y5mK{yqc)A1KdncZH4~iy}J;ARXn#51ps~!>+^z63xq1MOoDscdt zW=A4v91TOPLVy0lG3FS@)mi2AqbJ`&Z`>a5+YXs!E-`G`8J6WS%3dZ3?8=6>(a}p+ zImv6Abv3vPm(8loZmsF;eGNL_2S*%8_bRpo;dv=wKQg!$O9Q!Cs=W@$yHWdmc1^dW zuWXY*;aWG()@7I(go0`4M)0fy44^VsJY@}n<-&K2>m;)iL!|a2kOJ`RwfQXznU}9o z0-xDCcE}M{a|k~_14`-uGwoT-yX8XxD1n#b6bHuO(5p}38eWP|rdSL0Pvy!#$CT#f zg_Q|e-6L?t4OgoDlS1KyK#s5q_e^Q9Qay?n$psAXm8$Zb-7fBqW`eDs7zqCuV62c*50M`fMATx1r1g^oIu$Wa53{ zB3oJr+K;Q0Y}F33(eoIpNWh1zpUmqM45ulczOg6{qsHpnl^DM8O1J^;-cWqGUv5y=`pCiR%C%h%jWszKn;&aY1+9+d5*|{M;NSG3cXZHp}h@U4%B?i&{Bk{_HCOEp`dzf}ZV|J&_J6=M+x)Tn#*r7FzG*Ks2 zAg>vWB0;N3Ag%;Wqb$1I3zu_!HPdn>NF}>}dcTUkRRhcY-uFn=w>&kd4&~2@@5AVm zZLIYCY!o>NXmBwUVoI*`GZsd?(!XMber=&(snfXzNm@D41N-Q;8;-h(dhqMeBMsS} zfwU4h&;!LK;$t~kZSLV$)hU06<)`%5R19vY@PnaH`+MlcFOHRp_iFOiFGDt*Z_}Zx zSDHxuPor-VC40vPS-E3NVO03UDWln5R5;#T^&$bXM!p42A$>XD-O6#|Fd>eMSIeHV zPsxaXMYW0e?2I&Ibk-YFAXlr3fG;s;PXbkwgpB!lVhxg@b8fIfHyUzpnVM%qGFsu=dnBoAY3OfGGn0mp7;?5i-^ z+E)3vb>E-_?-ZBi>}ndW+enSY@Dt@@u`7M#F6H!k2b8!@vrF@-xk3dFqer=~qTBDf zIxDa%7ljIlsF}zXT{7Kh!q2f&6FKly-At=#rzuTl1QS@(stj(}yFJ7nB>n)okHk|Y z#kMYuORVRTMv5EtM1;vGP^spt>BBv<~C?@vA zatnR9e|!-Pi`(ABxwo@^bGjlj1vY-E$Knf3PI8g%X7v1|0r z&XzR+;M^ZQAFp{H_!MpDef^rU=p?fIsO0yo0 zKft}G_33;0qq1H3J(x%l*@$$z)2>Uzj<3N+n{LbBn$K2B5+L<#wI(QYEt59SBlnvU zay;ZMYd3w0+p*G+S8OdP=|nw8(AJxi_a%Bc`pM9wN(8(o5=0nPb&K{`D!TVA$w*?P z+`YsRsj5MxDOgyZ*7b7Ppy02;@kFqK7Cb4^MX~JrV4!IFEGem+#}NsEE(d3^uJw0~ z^EpZexD3Yk`D99P!MT9We2jgxPgw$8ACX(}Li_K*d z5`g#6w!OA2*K8h%a{@i~7Ot)Zi(2+F@)wn_W2p?bxN4%kiA1^`{Vb|&lQ*>~wMq#T zL1rm-_>q-~`lmtutrB}dUmM;t7H%z9!ucnP0g-z)%gMlF)$7;qhaISQAy2ZZp~=yE zDX09KBT{x!XePP?aexyf#d_yhOOY%Qs+noB23!+0hln_RY)|$eMM$)u-FLUFr1;05yn&s(}4f>^wY6)Yq( z_4B0XUqPC;QTfqhiqh2e+AOqx_Eb?*r(}hf9YR&dQQzCR02r|vT!Dp3`L8e=En5=q z-!Uvh&9&XA!-v<9%P$>WmcN0gG&^?|2Qogp89~`n^WiK0#;%UWdPqqv<;s~^#>|XM zFqk{UOVIDd^d-l%o2K{@OVOKiMMb(|G?W^Xt&j_Zw^{5eJ;s^nA&1Ar0aZ+5n>+FyfYcTl97|HFQ6-GU`%>Ef!4XP)L2~7 zUj7OC+rjv9j+UHVD0>Af9$jS9c8&FAx0HZQP z{1(=az0W6Lv`ZydF(ykOt@JJM{^rOcUV?^{!)Z`tu@e3K?sHbzIBRrRZel`Awe?-p zbuNK1&@k7m6E^{rKTO-JnJ=cHXYPs_p+b$u`TjYOE^M$=`=y| z%Aa3%dlzzAWoc4~@zj`1;g%7d;VQBmCVm8~Brvs=*3#1>`wLcMt6{e3y4xRw?iCR) z!L{kBMaOb#MEWC)k=!?jG}3NS=oi?!cq=z|QHvdB_S-@w=edguwL+x$LE zI%S@prw@AMAdWO?5%*?ra}d3FC!t)OkeF;suWpcTC}BH*EXB2!EL9d9kfb7h17k)D z^bEMF2O&0bN?@O6+vQA)vlz_Pk3bw%kl{E;L7k!mX5H7bQ8YiyB~g!~2Yzz-!-V~Y zoUH{sUp=`gJRe||rM1VVA|q(FP(X2XgtEzDNx#8VO=jPaw3bTx&(WLjPZlI(7c6y*QDuDqpRk212a0FLQ0<)r&hcNzBtT7GPIAf= z6eU*ZW~TZp%yez`dtUi0&1y#ZG$krOVu8n&Aekdr5+wX&h0(1F7kpyHubGw35 zKM>NRq!UrgvWaTTDjwoN9ePENVrWgn-Y*6(OFQ^xT51*=77Vkp3BXFSSar(n^?k&+ z9sRlx%{+9)UhNE|x5*)fo7p38ZrxW8od6C(_rH&J?o61rS4x>PAl$61vU!KX?0ObG zpGal_<8>%D`~9<1@|&?zINC1-oM(87UQZM+IHu2&n{(167GN$`M^biET5|{a3d(vP zeH@qA+FBu7R!PL5zl7x>2E~ydbi57a37!Xz&9@DgCj-7e21|7>Ce1rD$>`^&QQ!Nx zzL+^0Xv;TDh$#?k=9;oqE%e1wNdAOVaDn%{)u~&eK9&-+BAJZsw#`x*B-ZVOu?KEV z+-|n}1k*XnH=y>gw>6{-rEWt(bjWW8;n23>Yvye_U!TcfY;ZBjmj6>R8J z+n0SI2SBqQ{t$f^mk=z-@$~Yb{-P0%di8Wp)PUKK%pXi%Kh{J}u5b^aPwD6AqR@iG zOQm3_Fg=Pc%A{4*t|(M8E{M{N#l@fzD+r6k5AX9l|Gi9d*z3J_zGu#vGjr$8opUB;H>kg&y1t^aE_@uc?lgEO3_22C zp1A)(Je!=799wZUE4b7PeEYp3Wnc#(l}!b$59*^~n(AUz?*8DM@RO8$F6?tK$?{C4 z_tCVXm(j3RQj+#m8U~=|{e0UiFThd(Xr`Clr@(=5$fvo#T56 zf5Fn5-$0pntIOR>y-r3$sG!~ZykBM>^nM6o&dU{baK5~4cTP${1cQ8{BJWII;;txC z@KBhm!rxia4qZNoaq|1;1Zkya9atiGv|c-Y=rRBYRh66?mMf zy-(RO7N&bAT6$mka4I|RMZ{u=p0ew0&Don3%}@z<35IUqb>bage5l}2983r%wp9Le zrRsQ9G~xF;iop$ajfcknph3($k6EQ2syz0FU7Xbc6_QR%_g zQlmu>A-Xgn!8i8oqaf6xs0WYA?0~vcucHQ%gWC@6D!%F7K+*mNZ#86@Qh77>Lq-*Z z?%wIK^+ZX2md{Dn$n!Q@8!e^PrCrfv^A|`|iiBu|d9-unrV_<)PApJbVw`Pu&j_ zqDQZlZ+ZAK{|x1|bH_)a+M}ngUMe~g4HoG8zO^CteeT1X!f5ch8DUv)zOZ?FfNwPP z0cYahC%=knJWNqBR=m0CX`I=U-@Z5YdW0o8y5V74MsdwGitCTqP1mw1|iv3u<-rTs66L~Ni9>vjHcOE*~@c7^dk60Lb;`hgccMI+%U%DDg zBL}BlN_y#;bkW!hq6gr2DZhrZncG8P=@Y!3Rr+5#wIys9WvD-@RcItWwI+0b=rP!` z`E35Sw4%FtTPWMs!pkhK(Y6@RfX0^kSO;?Gp<5r{7KXkj{o_O$B_P>A=XdMY6riSIB#NC7l@Rd0(ode|JZro8 z8oXcv-L(IsD|CfA;V9Eg4G6nxp7}Q-ZQWy^cge z43DsOuh5s7eq{y13Or`euSjY*^6ElfrvC7=ReO>yhs7q6UK@7AvS_fIA9GL-2 zL~b<&JWdFC{{X%l-XP$}_1@*_=Z-$xM|wDW_=ToG-`s0C+eqY%s2FN=OG49=#=66? z7G(d5Eg$x7c~eh`A>)us^Fxkg9;s=$jYA-xtGn)z9evP${o?OTH6rgx!mc%ho@6g?o1or zp&>QoctFwN{0F412C!5VXS}+v_+w^M8O(1c z8SKe7uQT^1WB2OYipKq-FniDYEA7!K&;b=5eRnwMa>PBtOBg8O_`=}@ce2hr`4A5c zIJ7$7>sE

Lic?oJu6k&7l#1$wKpj5Zf3}nXcGBSMcn(;md8HJNu#HsfFY`~?bnE^0v74eJcEsr zKYpbBgo95n4Iq+EnCnbj+Cn6i80dU2OK(_RH5)|?nKan4tS;N<;)UChtI08ihkc_j z9*TJy$$*VmZ~^jbF25=MXA8W>g(U4LEWBD&rv=3Z_Y|nlRG$|mjTMbC;F;`O+*qEP zbn*f$K!N=_A;?#bUrS}~!x$J39LYVjBjM!nx1>XKh*gR)Yj(f%4tbInIUV}4^4;NK z`GuyS#At=~Z#306@42591+Tt`j)cD4wLjL6(n`j&pU!Aj9qtmhdbxpD1rde3Z>K>d>1EXZXfr7GbW(Fe zMaFel8oDE)s^mu1=}<~O9fjB@OM6wkt8Ax&AGQsRh$IcvR&3uptb>uE1METFrslymuL=h8Syu3OtWr$h>^_ zWE`v|cW=+Q+gf(-BWc`3%x%-T96WXF_INbMD;qCjW z6>+4?JH{7>Csmw`xx5f+*GsYXIjZ^EljPC|@$X>}nb7R*`J&Zdx&v*6kY;J`oBe%) zpGU)D$*)b5k9(+!qA(v+F3as7vCT#NY2B_6WS!?77~U zvCp?WR-b$#a_82HqM`~j(ps8Au($h91{6FliZmx*JRg1V-PP(dl#K!$j>_`jjgYN{ zXZFT_gjwg4I}<;+-@Qy)O2RR$rksncgimtk;s=uVR_FRX-|bDR#B=-s3X89DFW$E^ zXgjRSJ$t&~>GczP->w7Ub}IU!uvbru%3tk^s)W8MI6UBXz@aD~*m4#ktAJ}v`nKA# ze=a_Y$|ueJW%Q1N`$__@;tkifLs?!wJo~if&WX6+VS)C=pn@CccQ@HgRyA2g;(OIk z(Y->^PCX%uj2wCEx`BMO)uzX7ujU&J<@32*Hl5C4TC*4ejvJH1;nG$)uP{MljhPH< zTN_vVW$RYj%VlPKi!sP*u1?edl9tvWl=zcW4`qyy>LZg4q+0Df-8hrGDsR$*^|K^v zn!isxd(XnQfNU^=3rQeg3h35!ru8xxCuiWo5x6n<0wzsUt~!Rvl%rJ)q_=UrvlIZ` zYb5~aM)KSFP}ld!kc%HD>X@In#hokub<@~*_*P(G$mP;md?+v5+Qmt3BbQk-1YEWo zgUw@glzQ~!ztB=T(%Yc#{s#6QG&%>{J*^^9v>?2v`z6u7A@56mbJebJx%3pdesn(;)#Ul<) zcRbjzopzHjoWbYtfjQsvCP8n^FbCadkSXK5rb8mMP~EVo`~@Q;rn|K*shpxNjw^UD zc+*OQp#lb-&EbL8tr=_%?86UQUbZT%WuPi9Kc9I9b7QNt3OAMd#K4tC0!`F*->J7%NkyE8iAzvyI3aNAfi%(_qxDu*J z$YxM2xKaU~C6x&HT)OEtL=U4PGF3q5F$hfkK*A0$qRHQ+Z89s}>pzT{=&`H%zME~L z!&{Y9hb1-3U#V^OWpRW|kx(M0GB`pOl`UqAseB7Dn<|pBEZ8iGh$-f)%~ad$Pep{y zG72E3T~q?;-VSxhm9Y5@%c00k1NK*VNI zMFIv7emOj-F~W7t&D*I+8_LNYv{~LzLjdV>LHhJ2!y@ao@9JhVUnOLP(+*@_h<~QG zIZ!I(@I?|apUUDggjBZ3LP+HcnNq5S1z*Hrvjr@kL`?IzZu&QXV&RzrLXvhM;Yb=` z^O7oneW3Tok+z!h^YqzuJBmjRC|WVEln%0sxh#Q%!D3LE5(yMlBo$GG3_gp>k}||x zF>sOck4z^hWdCS-5@f%fNHDP_MQbe5tzF)9cXwTUsK&fW6%nt4CEm8HN(~IBT_NgH zUDwz+IN0w*BW;bKFdQzMt!AnlNXTCYS!WEk(q-KIE}79jrKvXMV5{f|!S?m)=smPJ zwbSF7c4*p2HBBRJDu>VWc;PEs5lARlYvQcY04?Z5Xrb1e-HYpQ-hXWQZjDt&z30hC z2ByO#leGqDPFFjsYBF}y51>#!oo#%c;G&(h7Pt)!UZiU~WurQg(~z@zO!*2UJ&F3` zqngmKbnpFT4D zHO&sa$y;oPCKv%9E|bSNSZ{I8(PSeIwX8dKeyz@E-G^g-xM#+#C5~g*ELhcQFwRv@ zV6z+!{F5scvw#Jc#j}|) zwH>U%W8(WeK9D7u>w)O9oBR!~cC2Rpbl(Zq(M6Bvl?)vvU22T=WFZv_glsyMA+}&p*>tcy zs15;@ArZ6bTme_gkXq0)blI#8@f4eN4E525_ZqKbHbAF949eivC z7zdjn;&Ay~DvQq$P}yuIAM6)83K1XbR4kM*C1Qzif?BJhd>#txO)v$-8bFjc4Vm?~ zS;=%y6Ze0NEUi;*JvINp5{nrku9PmMLhV755CEi*&4n&aEMkc`47yap5xi62LGcns z0$3w}o&O;7@pivG#$&a;-5X62w(=Nc2hss^lYkDIhNGq&!_8&kMqce>MLs%m~ZXpSe6(m4#MOFo|m zvOW<*1^Vc^-QNf?PXR@u5y{n)HOCMWHe{T z6()wAmzduoIxu2$@aaIpUc!)CZ_G?OkD4=p}Ye%hs-uIm#C-m@4`8PnioosW^6avHuf>xr$?Z*O_Y_@~$DCqv76OvH5(D<2Bp_C=1^We5V zqOo{99-Sp&Q(-%6U@GQ7X8})Ds#wesF&I2HPrzeq_L1vC28+g|G5UBn(tzqRL1ySj z^b?wgR-%o_8zrJr^pRjn7)6)~8}BHRN=H>8olpPn{7d+&%MMHz%70d=W8a7PeF)RH z-R+>W9;VtsE!r=)m$_g|z_;Pl`P&z@Dl;q`_V~s(P&iB`-5HA3+B?{FRI())o|DV^ zE(^qW((cEo~gHeZFO)_E7=q5%o$7N1-_KqtinJk!ZLFJp6Pu7}fESW8y|BIE> zNMW%?RtrQkrizVBsAgtDXJ-eQnVDq1#K>yK)Oqub0K?2oI?Ke!WTjm0Ji*Lt-MV!& z8+;dSnbR74qibj5SU&^GM}-VDJGq?+u=#|pR2(>K;{q=;leIJ_y2@RkT$4MuM4w;; zFQOv+BKk9a5j7f(Env(2mU)36QR z$HpLpX&|WC8MjQ?I{Z$F>k*&Esq?qJto^Ke5;jCawpf7PR>Rw8XfUPQ)jAT z1W1ZwQ^g_q_IEq)1fWx@HSU_BweLim+8?Woo+=hwTg{zqF?EJiQ79FCTpKv^8b`Uy z+@C~7i+)*br2Z62(_tMp%Uqp>R#r1GX6LU0|MQ=>C7L(KzzL7({T(ww*7Q` zPI1rtxnj8IIvg7yfuGZ;Yz$5W9bT83z7JRCC-9f*nb z40$2FE$-0Y_^xwXDiE5K-rnARq3hhVbcB+42x*<~IyV|NXV)2sP~LfG8y6b|91;Gi zEC&l~j7MnDFVMaVAkVL^%UMjtb%M4CA+-nX?MEgB)>J%lnG z+cS`khRPIGL?oCX6&(T*_7l!QBS2GX1VvH+!Jlr=D!Ab^8L1G6uw5ciO${3);cvK( zRCUxS#th+rZ|2yjoBW~6^w<@tF;bM2sW-Q<{(F|Ki>Id6kfHj+3`|G;FnY{b7?AP! zf(c@Y)Z!=0snh1oU$F3J5O_T2f@tn8fJ%UANQ{!?`8c5z8**`4x7k1L;4J*|FL^Sq(4 z>Ghkp&F|iSz;Y2l&d&5nvL15jfLto7szg^HJsEn)D=miBWWqpK%54BPVE3f01U`H0RA7Osjm6?TAvTv??VSRrM(**~+FGNwk4FC=6S2LZUkB}l{W@qb>eoU0t$rP}-|p8zdvU)G+DrO%&|ccFgZ}Zs zXk!2P0CdnlJ|Omw55Nxk#|Omz@d4OD|M-B|KRy6E=$}96A0KS*A0NORV*mI6>!5#p z&_6!FcF;dQ=pP?oJLn%D^p6j)9rVv1^p6kv#|QXZbN~1N>!5#p&_6!FcF;dQ=pP?o zJLn%D5Zmu4uk-xr#saPmETdCU*E&z?8hda~>|E`MFXGhs`@dN2iN<%W_Jk$Lkp7F+ zo_Gx(fQ;{7!#5DGP3&L8_usDJBYtHKU(e^s7j0@w@x%?k!J1G^gS)kFCF)3R@0SP zzy^z#Cmn=!{XohrmIBjG7Jcwl2j@{bkKP-LnAt8N4d6C`NtwmV-w9nT`ry5ez1F{A z5ebUA?l@ED$WnwbDYIafbg}4*GrPivy*DY6PaK&ue-T2Mlv&U?Dxan3gJU}rOTp;L z0!wl6(&@MPKO%%_v{Gdk`*pDBB?iMMw|9lRJAlfRY8a!=Gc0>mFaaS<%Bat;cA$2J zruZCCq7#+R!$CE^L~Xn6$W8)8Og-Fb)QF$d(S_PG;6(>GxXcTB1#g^FWzYytB$$+o z{L#Dvyes%eXCw>mXH+JRsPScyGj`os!UtCsOvE1>$i^y>&7snYQgh`pjep zXlp9(dO8I}#H0+ySo^PFOnUEaGx?ec)d_$w zDTAHZ@D(s{}_SYI@B40hO!w3i7$m|(2bEiDP|1pC}6G8vpN#KS$b;UdFB zgzv#$8)$Gh2#9L3R|%x2Zg9|(gA#MmO#*_5*WhWcl)2$TXXb?vC1D;8#Mcu5Hx-J( z9F3tWP<$X5uXaoIX`KLK8;50U9c<(f@I>mk!rt)wSFn?wK+0iDx&x(m2m03_COz?#<8yS!8%0;|9<8Hi yM&%Da-7}IFf6mzT`KM=g<+<(d*=>L8%HA>4Z8cpDx^WnY!hh>=E!x7v3;iD|T~iYP literal 0 HcmV?d00001 diff --git a/docs/llama-star/idea-arch.pdf b/docs/llama-star/idea-arch.pdf new file mode 100644 index 0000000000000000000000000000000000000000..4fa92c71dc4c511378c628113b7817e583053758 GIT binary patch literal 42334 zcmbsR1zc3$*Y^+8ol1i+bP5bZOC#MW-5t_WlF}WL(jg%wpn`O#bVx`@r!)d0&%pTm z{pG%H@qRA9uP-ul=FI!7z0YT@z0cZ*)gxP$g&KKbF;o}ZEwJgS(q0rEPpLo*xZVsFboj&?<3Hpz=Fb_q_Rda zeQNUSs^C}GrUHL6S=HShY@?!~`OUwMrca##Y~YWT0W8XgHcE|xyyQj|J z&z%6*O$MJJXKG?;C}QUZ&}Ic60P?Vb|6%23)kOg}^VgNGzxkieQFOF3Rxx!3XoD~J zSR9}WU=g*mv2#?hH#9Z{T=$-+6A-|0gBB3D?y6r5uHP8);A^u8Uw86Ta1R4m9-BV1 zG&WV15czL6vvC61{_i*9_s&mGbX8TE!V4a3;A$AUM**NdBJw~6j69OWg;NTZ@>PTh z3ZQxR7+xF)n_k^SwHaC>n8H{L2la)&(IoN#mSSg%(qpleFA>(r-%2?S=Ta}bH0P(* zK1@!TfAgCB2&MSLAFkC|1KO88TLWizt3~JD{l4~77y>M43=Qb!MKd!viSLhL#(p&L zr=%20c2=G2GW(j;%`i17(l||A`NUFiiBcgzp@@xh@_EzXufV)isp=qPyMpUIN}y*8 z0K6v zl}d$LIi(vN{hlTZ#fp=n-@itaZ4=5ShC{d;dH@53^s=3&-Vr1GjK?Aa%I&rJau{5w zl+kPK$z;ow^=`BC{VJ6+BOv3;Ho3(RH`&3XYP2X;eBW}rL@{V;P1_WU*qu_6ogK}I zRg1Hxsd-Xwl}r9^j|;VZn(e_1;fM^)d7JGLmIgei6DqA&52_jy)S#+u5rkG2OBB8V zpbp@ns^`)UNrPv`rGw31@WAw^zB}#KYEQX=tn7q z3o;d3d0p^Y=5c>N8b;Ef$>k~5)lxH$<)@xDrnJmv{QbanHKp~J$N`}ib%A_-8Zpc! zC;>j>i#`OLw1|Y>>DD+FF8WYjfjo*eQi$Dl{7|BluvAY4Jz%`Hw6TCF-Hr7YrLpBs z(QF<+*UrgMb;6!5$_i7cte&7nn>bSEFcId?2iUa0+>1eU&O%}LV`hb6GC(Ks1=+*S z0iXhWC9z-=VVNaiEQR1Ig#|uCp$G#%LqB_AWe+vx=P?8)@6TzEpab`$8QmTg%~wJQ zC8$M)6?)Fk63KrL3pYo&I0i2dGeu+qi&zWET~t_+8wa*SG&BY)PUviirjUT)MRB&M zBL5MkGtoyR-52Lsh(iE&giC0z0PJL#QA704Q15-GDjty_NwoGYJa)rPL&R^MS(L89 z6TtY;ioZy_2gMYq-x`l3)Qto?EUtk>y#H7vMuQ3(o7_5Fqr_alW-|mGh)$5-EhK4nsq$cI9Xj5f0fe=3*>j^pKmn&_O%9WiQqG2PV85a z58KrZq7AewlHS68Hr#$hg3;AX_?5)oyM{oA?I}kY+y~6bh&tb`HuOcBeYbjsIh=3) zm@Q6wM$gF`QG7^zNav%;I-6)?khS2f{V~ZC0aP|rs#ruwYe<&?*sbhJeuWCo6hT<- z?K(#EZ=*g7*GrMm$j9SI1ySKEQ_-hX`>9G4!NY@Iv3%QHtE29@a9n&aI{+O<%(D2wx+KWXZ9+Y&LwA51? z|0dQtVJpd#TH)nnl53Jab0)1mxsUYE*FS1tlv;#ZRQz%))U(AGt6TX4rNmNm5{yEO zQjJ25UVP=~P@=An^Nag%PoYR?qhf?QpPWc!G~a!srdqC=ciMEC^??Le-oz)!WqL; zhUWoay7-Pf6)1$;wqvb!=k~WU#|2m zTR!tZsYe3VfzA0 z+B6lR=~8i0kv(!MpU37@4L%N5529JGTF|h+;i&7%shHK&ED@i`;%VdSexB?8>WmSU zKU^a`9q$XCKGP19%EK)tGEI6-cn#a?&5HB3>Go7>U1P7VAgd+gs)5&ER^OEtJ)D2m zwBz+CRkz{`UY&g1;0eYF#Rbg;5RpIFEw};W^t1Cby1D?baqO-2#hQWrh$Tc?PujPm zQU&wrngjF$=mWW^X!v{hV-E=&haDDcNE~CCO_(F}-R!f+)MgTvw6@OsP#tiBZp@Ct+qL<-z*eAf8J zi{-oFCCufs^Dl>!du!*D7YcCPa3#p=D0y(=aO#KxsBFmh;9cO_TA*9%{K;5G^r>{l z@L)uKh&~JCL*_u!#XWZ-U}j=3q1kq?QFHTlbA@|oB=%Uey-qT^d!$<;JQ7a^&sDrf zqC;$4LQc$6EJ;izNrOqVYFRu&D+2JEACHU}*OTg*W6|lz7n$It?XM_HyUZY+#H#IS z$9F45eS>8&`YoCc45Y+l59YD1LhrX0t;m)g>DRihz)i;21z*Iibuz9+?(2Rx{Sgb7 z6i^v3{5YvyH`G|FObS`nK9BQY)gg*zj;b8^fl<2G?fii&<5v)l3qF-te(e z-`B9O3e>Z4Tl1P;PtLydp9JhQ_kPEt~I?hMl0x$O+QAj70ry zZ_AB$-)tK;?5i#lmrA1=;l0+*buU=vPCk0AL|vs5!-gZA?=m-fUg^#9Z+OjJ4kH#} z>IwFHa9>(n8gDP<(4r_XWE2aQUXFjCG%IR2+RmCZ7_160j}EF4Ww}Z?D_3ZPA1oSV z6t5QFlqi)Lj5v$<;_>#VW3MkFX>=*DiT>yN(}dwSncl7!4dzBY?R(b4TPcFlbA^pO z=MyKkYbNW_}tJ6V;M@H<)MS4lkH z=>aFJujm5b|6E0H-T6wE$tGmRKCq)PI6BlXfOcGW8iM8!l}$L)GbuU~AW);HcY>Z; zOY*I-{C@s%3)j-;<72+>oway75%mTHqC2n06kLjWxp4wYa*Y(1KBFD%ZLhtyOF64M zJ;qnyxa`c%_sN)h!jh5Uv3hV6e13Y|Hb|d@N-C$7{5FJJ#W zu#_lN!LxFyx&Kt<>AY;4&fu4!08TVqqk5SShgu)XQqJ8AX;pa@2FGkHUPq~M$f=9`SJv~H)t^Xc~Hj3-J$xEMlDe)U;}g(mwI zP(|k3gNsy74@UuZTX!;BBO?~{$CDdqXZZ~1tQS1m3#Q+ULtYIT8kx@toY?wQh*(Ws zB=1C7|oBP@LEWz0L!?91Y@O{uD zSMJbk(kWh*}-s`F#j@Apl(>y*nQ2nJO1 zJy=0jsChXN>9EWYg$x>rhE-A2w3Nw@8$-}DbE@D(C{Z_Fb1<@dByp8eEq`_!Eqaj3$z5XWFaN=9R2l|HcVbme zA_6IE!f^Z3tEgPsqpZ@s*U!l2s;Bo~qo7Z*K3YN_{rsJihk&q+ZH@rRw!R6S1rBNo z=_gANKAg9VHjuBOoY(#YfzSmQdWwo1j4=3!6-9zoRpdnj)(ZgZQUH#z2?-LAFuLwj5%WbZ71Jmhoru@>~_XR6+m9pdouKvT>X;bKSFR zNH*8HyQw56f~R88C?vLgU@5%EQ^<~vNuDE?lfZ%jEo$zaot?)EGW@bEm@raA{QGZF zE4Wj0auW=@TO&6s=Q7}78qS7Ekj7~Je7od>i*_D+2c1X3Lfh}6`w5x4DJ&jyVilsJ zfKN*x7lF0s?ET6ETW${isQHu$%b;qG2+8-g90>v=m!EH->a#w1fG>Vk%CuL|`_c}R zXLv|VBl@o$36ZcOX73@X(rjD%2^GnYHZJ-!Xat#orjX$XQjv2*#Iz7>{hHw-K4Ze2 z;~tF&HGiNI<8PYcWy{t(9Ul19&iS}IBT&4sq$1@=YbK7+&4~OZeG7-b_xFQ7pL@#} zM+IK9Gx(AWJnW&PAjQ+Dm2P4?6n-Xk4Yc>QQMkRI27F#PX2!eAZqU7GeN`>fBy7ai z0Jp`DJwY*OMcsbRhV~X}{Rz<%%ZTdZFtpx-H)Onhn{H2^!XuTp{;*2_F}H8#S5rbv z-DMBIYRo#2o|>sy1uHcPBN6|~(#(=^Hg(R`H#d}A|l9ur?Sx$LAQv5v%hN;F3p%Sk#d~uvGfb$3p(StB4zFU!)3CF}& zFPXw->D-gYVy^Tv<$ZcZYuCnN=6TgOEKgEwzP#T_%WLeO|Gb`?upt?L%hXXU&q_l+uK4w(w`0ES&B<+4mV2|}PewRTg{*Dk6Klz^KUT}ejnZjL zcE_%Td40l=>+z)F@<=+@ntq))E4-b~8#9m-{gCuY;X@!Tov6#e?&blGmrPny|2$j^ zyddsKm{S4nzr$cSiRh^W7Tr-TpK3T*ePNI z{0?Yr%(kO%R~u!YSXRyk3LK8X%tWu{OvIDqt?25I7^0Zai&0e%Coz%Zs|Gph>zO%@ z;v^`Q)fHBLR@1Ahr!D4mNtk#W!Ncr3>kb5IW?CZgmUDOTjm;(&B;ys%;lm4al1m3v zmo95zKY6-O1*@tx^77OVg^{DNo_=UR&xB1A5(2vD4EmpJ34_ z8zM@UJ#)oQ2K6Wm?DN=yeP3s$AT=9d91DHP#hqy@j52}=$K!Zcc%^svg9oh|WrW3% z;(#Zk$-AbIM;)_oD^oM3vR^3h4V}sJ@edPEytV+O*iIi&a5L3nTIrVBUzagS_Gmd) zCp4;k%e5^BtR!}0vDZnRb&M}pNKaNked!yS#L?nd=uh5O{ph_v7GI59HoyCbK(Fw< zwt3cbd65^X$!=)y^p^p`;lR^fQk=(n$Sv8jxYFoS=Pj$+g0MUl!zzXGdtM(HTY765 z;qab_eTt&wK^t)Pw_AmiP#$qeOq-5SOTaK!EQ9-H&2fzHX9t}ZzmfvSbB?Crpqyug@Xq~jQ@ zc*8uoWfTKwWM_@^pijlq#cR>3 z7#UT~xu$qziRi;gA+7KXv@0hKGA(c5(==n(5SpLZ)|i!tp7Y_+4~(twWMvK*eff-J zDs8hY>IyPVH}O673M|EcQ0xb5ek5V}Zn25%X00$Cy?e?z#p?={Zn2f}$vtzo=}9!L zA0ue6(xyY1Uco?#H;ZYQTSta`ElTW-Ug2(_CS|oh1r#-0teX$qIiYpiZe2`y+!E8H z%Gh!$yh;|uCk7>vdQ5`-HT>9iY^a3qR8dX0RtDv|o8G@!f(sb-b+pR4cki>qbn`3t zs|{Zdx>#+Qu^(LGx%;ixf#Fa%oY4R4ax2vvk;wdkOL9TJ5x&- zkXD3gA-l_AZ{8aYRFzI1+Z`B>3ECl$73D~|e61n^zx6XS@*doy`U1L@Y~5yme-vsa zxc)+-x*0VqPSs-S3}tdleC{|QL_pu+SG2)_ii#*e<%;=Qou&5Evv%{FwsIs(r;P$xW2jB7@kcrd7fo_wZc3 zd}Q2j)563e{?c0fb21m9J!n#$e`x0_w~9c@aaAgmv>i~ZM|E3LSEZ}gbiTbDLh(%Cy>#VQ!+ zwG7q5qx*&={jW4FBU-Y^8?!3|Nm;j;^m>Bk2941)NJrj)_{8a^aD?p`Xnv}?MYI^$ z8cy&crReg||q58dT%!!tF9U3&=NB-hjBI~$m zlVnzFxOWnVBMTivCvu0pPHW?BuSKkOXA57EfNInF{gWIF$R;AAvr zV1nuoTeqSB4>#?N@cOqly%6>Ace9J%NRFqHVUoli7^jdLzRJi;baBJ|;VuHlY69fz z^)rdHWDGJt-9hYWvHq%biUgg%wyX+Q>^Ii=xcHd$2iEF%ob5xrB7b@|ntR?4MGVoby=J|}W0&+a8C!=vb= zdNL$Ew4-v7s+BuwKv&ns&h948+h^uTir|r_}Sw-cy;Y^Va|2n*z_T-tn z>@J-QRMK05kj-_0@xfLEK1SyYtT9mHh7!E__KRwk%zaaSu2OdK~bT@$PK2lP3;e(WLzRiBC0JYR|lf$P23PrJtF?6wAl8!z= zHY}32;UnqtIyT3-DTl~MkP@;Oq4#0H^qMKPRaYxdOs}J zk2cz6HA4P~M%>~;W`R3Vsntas{^_P4P!gy##Z) z{vu(FR!cOTZ$rK!SQ)1Td&W-@;|0%h*NM{U`_Y9ZybD&@bvE!LC?z|`9L$7|@@>i0 zS{bC;nHo{|05*Js>S-fZ&4(5<@fTVVdWUL7%r3CL<2w=}jVitHFrX7~mLK>G$w8Ao zG;|0_PX2oY>L}}FDE-tTh$fZ7y3oxcX)rm{j7H+QcpUX3XZi|X`)>8;;#4%YPrgT} zQ57O9?FS4dfBe{K9-*td1R6o}NXFJjwY!rFtLHp~Jji zrM}f%Cy&~z14CIqcGHq2EKcETSp27>UXDzqb7`XZ9`PV!sLw0Fn?HJ#>Z47Ice zIJRhS%l*w6X8Xn_aRqXZT+#<6^-~;^d9pp9cW%-M7;kP8WyUr>T>!1p>wyh-NIxW0 z(^z{hzG3-E2d-e-Zit#nU-{wV0qsLmrA46)B)oNCfyILj?GLyKxvksQsz`qKqa^AS z=(QxyhvRxTHf&U-1hFFsqFWafA}rqalFQf|%;EvS$pBcRCcVqDppHP;Oboy({zKXDJ#xqEL64XeNr@Ph3xi5*M)_?vZCb<>32qC_7{2<}u&0$E2=W z()S%eC>@mWtw|9d_48Nw%rqxL9=0aX6!w64!IzdWhL3g9mI3Q_{Ng>(lyo*yKV!n) zh4bOz4_;B(%D^&-2bV;SB>%+n#12WNkV|}EZ`*X7$WR_cOf~~jEjx* zu1hKKeTIaqh~?4SXFcj`{qWF8!JKSwWCR|-x7;I!>g2g_8?E+F*CMSzYE$s-;8t0 zG%rK^x$0}?2;SkI*IB3LR}3?jdWLsxP;-(}s5U9(x@C20+;CXB8lp0{J%hJ3SeXEW+UOob2Mn?!BS6A`$b zZI-YGu5`l+X7Nn3-A_zwGRB@sT}f6ulFU!qq-U)|WidT{5c0XcS-joIkHsQNd}|$< z;_XW=VMfXc&G(C4raW=4-pd)qz86-D9sc}UOZc(a2Rv4Tm#a_va#RtA^V7LYzn8!u zbSX(e3&m3DbHXAHEZ06XNal;=GS*~R`SgRzGRnRbRZ5(C-IbX`-wj6Y1A`0H<>>lS zA&uBZAwh^ZS=7;j;gQ2T8V*!xyk@BM3#}tV29EO?84xStM#uKYPvS*=Tpycvo@NBh z8Ao|U1Q$4T9SW9=yjxY~^!|yk^Hf?1AZ-=!bq${-Pi*%B2Gf9|Lh;8BBM>|P+ku+&6-n`YR6_!EnS6fed%PZnuDt)wdsMks^SWUHQNM=UA(CoAO z$)m51d6Em&)@H**#o=TyLm1XolT5DPVS~_E)Ad5xYFLCR-)HK}S*CLtsqDjfMz8NX zc2}2MgIoLIU7PC{a1S!Q`Ed?AJL6?LC_7nbSA|VZeO$0?vX{sUhv&U5QF9)Xri;*G z6t@CSi5kW?B<8M#@euy-j{My`E>y;!Z$;5tSN2V($9fiu%cRs0c=9 zB~z1hbrmRa{*o*@Eo|?9=4rWqOTzxYR678#s~N!M4gW27xX$AKzm+@OeDx3Y4lJss zZq9$JdSDTAbCytXzOHS!u85Fu0-^x_TN1${4lcmBIj*Y81^|~i+?b`R3IYHj6&bh6 zCT_m>A6p>p{{P@C#)be^W{%&gEP%h4SwK!#advbuc2;#XHC48=a|W=;0Dzn~*Zlu# zGj0ytz6j)7Vm79>02UQ+g@&E6i|ut)35z1&<{IFhlmXX(0IzS0GPr8w<`=kh-y5LCR0qApO7wF@X(|X=j5_*dP=TgaW#u06;es6cB_0f>1yZ3OL<< zeLWC_0)kLL5DGhl!VaOZLu_Hcp`frsDC`glJA}dxp|C?J><|hEgu(%#a6l*=5L-BI zC@35d3I~M30ikd}C>#(9CxpTYp>RSdoDd2p#1_sQ3JNEL!U>^pLMWUN3KxXJ1)*?3 zC|nQ<7lgtEv4!h~g2Dx%a6u?s5DGVh!VRHtLnz!33O9tp4WV#DY~jA4pm0Md+z<*6 zgu(-%@IWX$5DE{3!ULi3Kqx#ATX=3LC_E4fkQIUfvO+jORtN~l3LybmAt)d#gau@U zz^;q1Ze+&_!2wx+@qRfy5crGt%e#QTJPQc?#rx%rz+avK{3U51M9wI{UwQ`q5;PE^ zXt1;XvhSCsfxkoz{H1E(FIfYB=~@@%ch|hO^-d=fw={Qgye{DSkB71-ngbv$x%OJX z@389z{{xKeci2rU{{aTN1-mx*|9==nUEpT2--2ED_dmcmZo#fa^$#%4Td->--3e3t zKW^lITgY{bbFJ((<}_=Xcs) zqy3-Fx+VFWNzt9{yZcUp$GzXox|v)33+DH>(*Eq&;PLl&yZ$lIgGce-Y5)63 z51t164*F}T|MOkErTLo)>zzIIXV5LV-%NS$1l=tMa0LCmegCL=a0LCG_E*jS#kO0* zznRD1+0s9D890poX4cJ?r z)Bbgqf6EPSo{0a&wp(s+^I&}^?au=VIJExW%D;yGzx3HH7r1%EztgV2&~CZFO{U;Z z+8wRDCI6db!=0qN?E=Tv-}~&ZKJZ6p0>{?hLH{U!aBTgJc9T50vyp!`>z49w(l2*{ z{?c2wlz)@Oxsvu=t0CJ%Hc=r3m768=p(>0fBKgnyHjx|8;2se%(}zu(2bKH&bP zmA8a{lOMa&u0M7@IGX-$*8g7f;GE#^pucMV&lcU1`%RYb&ZgbnQQ#Q*n^`x>!hfON z()>-%@n2}SgnyH|yp#6lF1w}qo8;%6pucq4EzRHLQ17Ju*{oZVzsbVh3Hpm!w4%%IE?D*U@!367(`(QdNScee76oez$qzk~id&G|Fvmga9V zHZpr;7$9^a1FCBGD^Eavbf1%yd{7uIHUud_4f3t9KC+*LDe@pW>OAmK~?(Q;h zEd5;)f7Se7Xt#8Kvk>wxw%yV_WIg3yaJSSCS(f=1+$|q~tnT~^?v^J&7K!e}{ZSy` zp!(Y_g)B4ON&3q@z2ydw)vABN-SPv-qSwFRZg~S_?d%_L;NW_z@BYyrz)Og?aDN>L z{ye0BgY569Tg&*ruZ912Rr~I{3J$fmaDN>O{(`&h3^z-Ycix6O8hYCiZdNz%ME%9G z+um@qNO~vkkA4LXwYR$LuNlw37lWP-Zu`Scy~3TmJ8b(uR(t=)(((1KG){ls zXXd&@%jIUj8WtH#6Q}EulU|v1yY~EqHgNwWNG&l zzyf4u2VdFH(OT-MncYp%)%6?jpZJ1{qD-&REdNJ&)UQgA-)f`&lLOQSKr8~VXlQDI zExg_&2mI^5eN@m%x8h5yqn8V&r_YjgKnWHpqB&{pZM#4~4J)6J->3 zv^2C)dun;TCy^YuF6;Wk$EHrkj+XY$c8(~(UnNv5s&>Cl0vDA*YHYx}8aX;Si&_{u z0>CR8EV72b-hh)cEKe*=oGq?+r0%W;V0{ICAm@v1#QGm|2;z*kf@qD}nQkYjsj8>tB5bX3s9fL1%}HE36oA z`kGi&JZ0Z(1;5oEG5yGF>nO*IZv38JhpT!kSX0}xY>ml$Z~vL*=o#xtJY!#f?GHpz z<}F7jSA{QQc`Q~J@0~3PJLBsvj-OX2F1cj*6gUw*S*t2`)ZI+kEzM|}SX*`^_V8IR z#N67+P`dK?QU9suEv2~4+=nrF-H-h)uM$1@@=J~y8T9qvFhzdx$=9flwm6S2>v+u& zczU5>YO0jAa9NgwfHNdcq_pR4J^m0y^+WNlqR^uaQuyi^fh^x?k!U(MeK$30oa8Ay zE5$tlI5lk?#aG3mesA9uBf}RHTSmj2`yvd84AmyYsZ~T$QNIEFl$TZ)R@*|TL{SAH zR-$kg`TCXkZ|1%%@h^HA(t%$59=#HwuWsfI_$+g36cgM+wH_j-Z*%X3Ck&u@gm~}E zSD>p=F2bSG8&IVQCD7fIcT=F0=5ASBU#D1v+hP!*O3SK>q$VL#PFK5r>Fr9DwP0}E z2mUbKmN+0BFGf%-!QS5`CCu~e6lODQlio}3^5i4atcR!F3BY+c80aO?XF@4#wP^pgPk1NI7H*wEl0^M zd4+uw@um@q<>&)ZazX%NOb@m4d}Y}fD8x!k6``~HpD3o_1jne-177Ii(3ZJ9uy4dz z=x@+jV`RU)I+04=-8#LRTl36d*_gKe=5PJmFd%O}CtytijG<|jYIzvnB~Q6o zFsPnI|Diu)BQ)QMDS%*Yj@S0bvc^YyAbD^5{)aCP)=XdUk1M0NqKId(Wa6EgJ0iP7 z@-0~wN;-Q$Yn>y0nSP=l?>z3$>ompEUaD3_cFDFQhV8yM zbF6TAVk=>rBU%S9RZ929+xH!&bG?jY&dvwBt34WCX1-8-z!!fHX$#&5LZMLtEMiEp zdQ`&GBvQ_Z{yWt?ZVl@A-@DAE2F6_H#8|(TQMrDpFFM5QUs^dVOYd6!lJH|D zgJv}n;uT(kUyf&AU|-mUYI>c_H0n&b1;vsbkpagJUpOTfFY(^ev4=Tt+V>Q;31i4Y znxn^X74yCItYI#Xz0lGGw~$NI>pB`>|7FHAIpxp0BHp2$q28B7Nu7j=KiU|bB~>cn z7CnwbC6*$N=#aAfL2;|0i;q2gVL`@ZHG zj>legUjc>GZnN6OiV8uI#4g&Scv!?`(1FyuiNS*&jiUI)U3kARsz~vlvgz4)8j3>D zc%NLMavz=PY`d?SAK+u>B7wxy4pk4A{B^o)wC*CLNO7z~cPoH3~AhaOuhd|S-Sd>?@;^3|wQJWK~T&9&`X;yRx z6(P#PT1YC+5j*6;&Ek5v!_p^_(byg^AD{|Rz#jyKP!bwZ+My@G2{Z6eeF)>}(Oz=% zA}J4FQg?^1i9mVNis1_7Pg@?Qv6B<{4m+HG=qjrouewgu8|H<#h#sszdKqiCb+<(` z6%;Ko_Q=4}!Dh%5iZDmFp=ymnz`^JNN`mjqn7mLqZfu-y)?`=+t;*3`wg`d81&o2! z#0<*I=1T6>NGp*CG-oV;*~>!3uQ(~8wkcCM67+WOujxjj-krU zqNv265L<9Bo=+}Az8{Xhl60>IWwiOMYsE6}0loOH2>1OW(`MqaH5Jw@YwS0$vM{s$ z1^0lIiO_qn=P-{4gerBPJ|-4+Hy|)BA)ttkv5S%LM}=MXbL6PWij^>U8OH~83>P7! zPub1sE2}5*oz$@jan-`MJ&^4MHd!R?ar2?CYZ0se`tsX@PX`Yd)-29wYVrEMJd6%n zkEVeZ;t*5R>1y<`>WS%EXR-)k7eS~7n9(My> zd>$E7v9h!WjH(1DCb`OZBO+BjQ?Ww&Eg{<#UHHw8oAjPi8~pUv$Gk#n<)mFFzlm)g6f8Z-%aop2a_;kMrzl0FpJ#KFFPvGVe|w88o3^ zVAPcGhVRbfQdaXt_w7>p%Cr1a;=OUGXP!Oj=SvBfk>$~>p!da-j9%O)o1a1S(s9qIXOzWwOpZlz2X zr8ab=Oeofk`H|=mg?2hGaxePy07+eHFJW=$Ov89xo~e z2J^UDH5bx}q~yXSA`*qfo@^LzP-d6qN9QG}{m4E_3{#M7&DodnNqQDOEM?cRuS8!m zumtBe_69sq%50=Vr_NZL4EhNa7z zKYH%*&I>DBg-2NGo`F)l0Wd^-ZWtP9#&#qhWa`Lu@^RJkig045Lqd}cr%a%Vp>;fz z&KV~@ij9ap=YcM2rQsLZ_ML<#gKk7jgSQ9?u;=!5Ax$I=rTPeyh!T86M~GqpV*@9- z%-Xu-n+dHmO<_yZvu+I%B<3__DwK>RrRjry1w9Cb0X+w2p=fsBk;_k2b40(p!Pgb4 z{;;N+IR^wUp1a9wQr_k^5etF}CCEc>NB13xpNIiuk+Px>!NugOOCe*POfxSv_S@IyL@b!0sE922ubq-$SUv%)slZJ4Q!XO)bc|msukQNBR zBcnQm?t>yyba!~1NdUk#vjkd6Dx+>-5@2Ycb0f#(cjc!qmPA%-3$!Wg}u>8quZmz&FhLrdtRwdXpcR)hW`> zH_~AEgrWEgW6GFSIBO&`;cA$UC1>#`@yx20{i$V;FWOXUpwd8Vyyn8yGs~#nD_aLX zOzsD|^MqZl?fW{G^4$Enf&q3I;sO#6dP(}%JS`bSU`4o^F&8vmjZ%CJXwC~G+IYr> z+1zbliD1F`A`Kx8UNUBZ(82;?6$r~m_V9~14KzWX2TGonjlpB^=1|E28ZX#Mzw(m4 zb;Zh1HO|e~R$5YDo)0^B8G%DHt{#&-fi`fJI*l#z{RCwrjC2Ui2o(%<1=S5`eppzA zy~yl4^pH!UpWcs=;UJH9M5eeaYfGreqglgQEx2gR@SaPdG9vWLyjQHDuPTV1QfTLE z?N#8tXhH`6uUd=Tm1;Q>x!z0}8MYqb7a5$Vv~Gbjg}H#0UT?3L1f^=so1BguW)5Wz z<>`Cq*U${|t@5jbx$v!nT7x3?bMoux{SG?I-pmv^j^7xMIB*qs7sL;Dl0YSU(zR+rxreg`n7scfqw>h@G?F#Vns?~=E&6(n&Lo40fgu4&0>aZwM zYD1p)D~R`VDhB!xYz!+bOi~u5_ywWx3kD8nP((DfUonUW^7^3HjgI$9wEo-;A(4n0 z4*TTqgG07)(QK4w*R`+xgMJ(ZzEue0z}!tTe;O1}iM`lMViy>t+ZJTxbv9-2EHdm3 zWx~dsL7<=73yVu^48#%yArwuQ$L2`S^$ks6hguaUG7X|dQ7=0(Cfw7GG3yNU$d$1t z48NB6ALDw`v-vOD0QIq@q*1G3)mA-bG~MC4DOpuVydz$dq`!a5*nuJ62-;_Fe0Vc@&&n~y{`-TG1TYRN2(g;=+&bt zvF0^`8riw}S#MUHaa(z@EAhueKU8^PH_#i*4YsFHbWgV?@6QdiReMtL@W<#I=6{j8 zzW@@V6`~*wAa5pwl8}7C!~R)nD_1+GD>qYY@bQX&)Oi%{gWyM0j%5ovm$kTSc}|xH z^$4=AR1UE3VWy$(5#tqY?;by)sx(}G&fY2o6%-5`OFB?S3xckLN<+sM6ZR#?b}}-s z4~$0_#ug+FV3m2;NVa$i)mdh&Ci3(1yJiE9LQQG{7hl{eJq8ZXD!r`N(9d@Qp^rrd z8-|=%j>f{fw^7s)>bAGN)e#tQ-?rK(&g%}pz|ARWv-*s>XwWPxbRPg0)0617O|_>9 z^^#izGYgtk?h{X+M@|Al00`O^O5VSoe4sB#G9n4xaIZ2wI*P=EATKH3Fvb!P9$X96 zvk-?GBO3&R315a-hM0lChpwUE{hD@o(CnN#`$*>8of~L;=zY#}Phq&gFqc_Sa&pl< z9jRs01-lfQt^Gv88IK&Yj6m3R;V3jA4PfY8_9;MU!4ac*YW$;)9`!2N% z$*ZL^nx}yu4RAM@+B1~84$wTc64DB$RIJYFX@0^}CJD>5xWdmc=@FYBbeGe@`+7hHN94tOLTf{lWbf%uVV@wf zL+!##9wB;Y6_R*AAj=uBp{q!qdgt(5jzs@K*NZcNOXri0(QzG;c+E#&@b&1=W-L7k zURV_%wuKW6`Po+qU9_Q^WHl_FRW^wzkY%BgOL=<;+A8lGO~c64KNYm~BY8J;W>K*V zzzq_sMIx8D-?qz6g0)dvt;7VHmbANDC) z$GEB_=_?ojU&DZZesJ$%cYW~-q&r`#Vf;p8R`uZ!jl#Z_bb&deWWw2Tk*$L?lSE1i zvm)$q*;1G^^1})X?fYGheEMI*7+%{b40MHT}-BsVN4CIMvABx4mzY=PY!xn z1K*E~2CBLcjL;a2HK*1xZhumq?AJBaRyJVKl?rS+VC^kyRxAjOnUh-!_g?=YS+!km zQvhm(*UOqQw)iMCIBHq(ydgbvcmvy0*_R}1G!!0cD^ng-Tdc2Y{UHH7-p@}KH97(7 zKNFGg1SSl;s0PUj->?pQ@Xa!bOhY{Yc$7wN`M+vif3Wm13WJjM^?E{e1Ur&b+6l5g zziPT@roEUc-^l$8K|Tgl%vu4`qIw3rQ0E{I6T_x zYDGq-J16v!B{AJ(JUGiWA<9CofT@7Tf+0a5L9}>_Xd~KS`x)_ltf;RKG%qv}-&7dq z!ymw5<_tlEiv{2|R1bS(J;i#!9E#@~SSJY-85l-Dl+hqy3|+m}Fuj)C_36+LG?5k( zS8fg(;@yz(z)?5)12HKQ@*;E~@%AI)&ClI3B47K;l+Wn8+0|uw58?Ktj~$l?);DY{NP<_J3*4(QScM)0gd0E>|Sk` zAIU5~pOL*}fkniG!$iQ`7eze!`N9`=hyjyn^dbo=k%MtFYDhH-XkcyuqXyrIxT)BY z01Jm9Uq@Rmv5*lM=R=M#1Xn}MN`|Fle;LvYQ(Z~$<%1CEu5IFv@PIk1AVm%i71z)< zwQc*OXH+MS$>AiOBEe+^qACjuXBZ=GvGLZueH^aGuRflAN)b>=98AkFJTOWAgo}Ix zlGUm=Swwo^jI1kNIYwQ{9`}+2N!?jgod5bRVZn4CiSwo=uiNwbH;Z;+_&HL5|BO9E zoM|lV$8#&_a;Aj@__23FWwK*Ty(;_tW1Zj0FuY@ahM<_QH^p8moR^!+32P@2$T8`D zEM)IV`sj?A{7^j?*kUrafi51I!{I2!DS}BKJ^6E8p&C8c?!5t|vbW`_(@P-w?8nC&!HN`~2=+CpLE|bj;@VAKs z{%j%N85-uOqxZ7MaTK+2cvNjKsWm&`dymnMM0Zp^zGKPkR8JBO8NJNT+hFXe_<7*^ z*O+hTM1KC0;{kl(GY?JNK0Ot`s1+T7%@vPCNkAhFW8dKF&BU`Jh{%z%%oPcwsHCvP z*e`Tc9l&Pn8`ZYJjYUNO_zgc4W z`z2`jfus3AYG6AP2fkG7_p0c^Jdq!Z{TeO3sSO(PsFe2J28?A9CT8%vGvN=#A`qMp z0H_5R`*hp;a6f)3;m_cqhe3y6cbFB+E8E60#?Ddaq~7b5PFxM#3mRlb z&++CS#?6BaED<{+)FLcVNID68G7Y!21vUP+=vC zzX{EY3z4+;xu=8G1qnEbj=r?d(dNLHn^m9XvGYBLJe zq$J5D(EYxXRxOnz!Gjnv<2S60&1(F4Bf8aIMam+&39%+YS6^F05>&^N>B_y$h6q8Y z2b_&{#c$IE)i?}od~n|#+K&e)+E0B}tY}P^cz!v|FC30_A4Q)AMTUlKY&u_Sy)tMR zn2JX}hM(@2910y(pL`-qVTODj(sFhxZJo!F(?+UxzbmW}C_YSr z5>A*4_fv7{UQb8AnGP?JFidPV-jm>nsrFTt45ZeuM)Tc-0crEcPxOldunKk0&nd|B z1fO35sh`6&8Ie`zDr@y)%Nft+z_y517v0l;Nce`PxR4KLg?0UNh!J`yTJwqy4qPby zqxoztE$4i1A!9#tKZ_jS>Dq3^&?so3VUh15Ws1j?nlHUSvo&OR zS?ue`3q-3AcDR3b0eK@pU zHz3J7Z(BotZ8B-~-aT6!9*Q5vP*RDFG|1b51DQk8tt%@(x#%4b8J}Tp`KX*8^>_*Y|7+|lz@qxveyyUQfPgeeN;ktWL#K2~r*wCB zNK1EjH%NnYOE=Puv~)JkCY_w4N#jy~b-Zj5XP{?pU0k+oVpxTi+TtTz8Lw3sR~YW9BLF!#Tyzw58b5yXr|C&oaLl4# z6XA3w{UCSw($tiLc02f72@EH(|5G@R+?my=;Naef#?qRq3YQlO3_oKi16@pJVA#qD!*82 zgWR6KA3dfYf%UZ5o-AUp!%;yykRp#r+<-M^8e2D*o*vfBK6hg}_ra>T`7#g4n{XmN zhgFdVM;LGLf$aA?)XRB)!`i$u@O+QcWv)kY$Ln4j$ysRQ_7bUm4T!N!wz8{XjIdOP zX2b1Cn;1U%hIi2~A=*jQ5w&f;B)N~X^2NBrpL}E-UW=FAgpHhPlIEl%UV9^aMFg*= zrdMvZ(ntF8H#|IOaQa6gm#P_h;(F7*=vuwNX}?)lu1`fIx0SJ{ZHAHBB3+}f)HE_8 zsRyPnjQw%m;<0`y+I49ekwdjX)|Q(tiG4?X62Tr{xS~O9aA%mF3>Yzy4J8SFW6?&R zp-jBoue92zkS1u6hqEE)At2V1veyM1Ja7vcMcz*493;%|-3hP?z3idsO3Oj!_@QAi z;-loLg``xa)RyX}mqcFjW&-0xJyhAWZtV{(Q_?~pxK2#>hX&Z6wxj*RSQU$K%`ah6 z3ij04j_^T%y{BhlCWwkuKjRY&>Bl3>VAUG6d9PQ0xRSZAqUPL{pz2OpQe$Tw!liAh&EW9xh$oKb$vU z&27Tk&^7EtA<_SO+ZOcIJugy?NXc9FcvgqoGmhoiWWzgVu5jTmQV*Y{M^`<^{ceqy z(x>jKnnF3fIZy_iUdoPP3@7YF_-kh@!(93AyhA0;9e*1&8t~9Kk{VG|m;+m^&1fJW z?z{|x`<8!rGkuOHanUnedeVq@G0s(9!3Rp*rM<4oL?*?OGo%9ug1qGyQ(xCuDqT_S`?z+{~T|B@&J%o z_=9WUp}XaP8e9H>`-j{cp?dtZ73z{=y86`5BJt6OMqI9|3;o-0892liqlTys8LD~f z0xn(;UsTLw49@_I!f_pUAJ)xVB0b^nYU3qhZpLFbxQ4jd3C_!i6YbC-KSR`BYV~RX z<;Zq~GMu0#cBXYSieaUGG!aEp5#rub6G6ofyxX!gd!62XxSFn+Lb9R7Vt4fxyPMJG z+1{N(WdPzjzL6!39)s^*#g&~098KmcH^NYLZR-U={lRSy#kT8@ri7$E%AjO~KepK`DG!IMXV81g+h4N2uninQnKnbf)C> zrZ=ZOetIDlFZBZ=t&KPWVm@4I)c6g&=<^<+)1d`r%oB;6aE!XCy(9E-XH0AjLXh<3 z?irX#O_*mXU`Lnk>D{W0^z?52jjd-y{L2d2WF+I@R@6mal#F52@P3lMzU1=Z()Sm( zVkWjGS_uqIUuRQZdn_v^=d#F+7-1(r@hEC?CYf&aA;VlqxFx8k8bjU_mCV0FzVbb1 z5<*%j%pyToXtJo@SISa#68TpD66)Q(&7QcfW5a z1?P(kVQbr)v8ND?dIPQ)w)I&x7CJrN!UlX+a2H(W_8N~`TWSu1cg5Ve++G@5t`XO; zkaCD)nAqzRkrublCv8j-MQWrQGXW%{hQKyU6&vXV0&jJ`eCIFgPbvz`+~~g^eWvM| z-i?N9HIzvFhd65c;<_X)>sDrQJ(s_dN5IVIqXX-E(QkJ0BI^%_AAb5e)ouziP$OHr z7V*o1@7;WQz`iH`Q%>%2mkU^F;rvyZG5xsZKk;>DuNvXQk64UK@`?3ug4{Si*_Oda zW`bVKakNqjl2e_B@w1kY$9QnRA>2+!v|*rhx$0mH{me?mhb(+r#+BHl!k?ppeDyTRY( zu-^7*MM5L8IfAn;zAg6!F3uhw*Xvz~NvoHB5{}r#g6NhyEu1<{Rg3r)J&lNpkeg0( zk{L>EB#*vsjOWs&jbb{rGdS;J=o7st`@P61UhKC0HdH}!pMo{c)62YwR%0rI2@dIl zbivNAzzm%Go+VF>>t=Df%;&8x?f6GKbVN|rsWRw6v{PgrsC#I z$$=_bpUJ$dlorak2n8vGwj_`U9iY7Ab5FZlwr^hgRH-Si{=HfZ(b^0~h=J0lRj;0p zF+}+VO~*k#wzNJatGb|GeLu_@Hr~9aJSUCDGilPJ=fmIbvxNdhyRlJZE7dj#xJP8i z3Zn2UZ|IgtoE++1>)Ja_imnlM1Vj~8Y`cx)ZGCxUfI); z9i`kR@xETI*hDDmW(&_cJ~1ioRsXG9z|qE%;juIRoO|R;?zwnxts{7G|5Vvc+!2|e z$xce#y6{MZKcAbQWLKpivFw^6$So z?L8)g)eiU(OjbL20z+=&=2UeG%^DGjSLO#RN54 zRBq}d^SMT%)C7l9b@E%S$y(#H=Y?qhBQ3x zhb{m=r!o7(2LHJW;migqgM+;Fl_#npRo3Z`>*tg^EFE;jb6NJm|Q!>yY!>Ik5E^!o%syz*R9}Z8Aq@| zv6K-h*sMQPpgyChP~)$r6^QN%`Y@VLL{i;cg^tW1eNgy|$FUus^(XzL-Hp=lBGJ%FI zH@L*0NKi2F$g8~(h|Kske|7gLT95rCP2fA>t+wWi{^u+&EOZGabjD# zc3fdS3T$IIsh4>zI;!-al?$+Y3f+UgkfD|kuJC=ybTW8OKy_+gSp*qLyg?rY;%0D-bHJ-X&`JF<(s_!2@e7;D-bR--sC!xwFum<+63TYrRz z*8Rn14Yv|OF%2K9yGrXJcWq$`1CwCsR|m|R*Eyxd+nd0|*RG;He?%Kr%F&flDLRn8 z2h!)3k9&U8yLjQr`(4uI(>1>aeJizRu;!h6DR%4&{V#E*N~Rq4G-qIUAL8Coy%_^M z{ghZO(S+0^FXN;g!Bd;l{5Tpq1~o!)vo1uf!nUZ6BSv_?dLLLnv6vs~0orQZT>d;! zns&4g$|Enzr%Z>0cxA4_$rDpB5{SGLhl7!H&rjyD9}2WGiY1)W(#GEk^`mD`B$#f< z^SPCV$(Bmp0!saV85eXEd>E5=dRe1ds+}0)d7V05={T_R_O}XSvx7_ORWVwIMiXQ$8Snn^zb z{rExjQn`T{cDmeBSUntdf6kAE&rf4x zT{|5Y^Ocyg1$}OwUz3L?!f~^1IdIoA*aK_G&BtJjOLIJ$RZ5rn?$6LZObEH(EydN> zsC@~8pl)Mr#c2+&i@X?)nua-4I!EJ^BU4!$Y;jLx*Zlg{Wxt;j@P1H_jCu!mLKUTFJHC5)L(xcyLT&8ZKO^VW4AL=s1QBV2 zQv2U`{Y|kL8n697@vHWq^oQj8FHDvwpLFZb?rfyXYFQH5szl2xQa`lFty(;aIU+1)eMj}XPrnX%gB?+-&t8)2ry%e6i3i1buNsYe4f9@c z<$dmAp`BC(Szk@$IyXM?ypgD1pir0Z7(Bkc2lTf+0^!_pKg!#O#F!+BRDt^Ri37-_ z4f~>^WvQafEP=A!yVXnx6KWy9){fnf3~f*yMf;|GU*5SCM>PRN&}#Z7e>jlJoS31e zMpuK4g;5Q~WJSjI$RmG9q(OSH@1d>%AU>OJa^#L(b-i0_$Hl_G3+hwJh?l2X!F>+(y&k$b;2wUgyW z13z^gs9ILrJ=qEBTGmPO6gZL=vBjDzuHO)wYg7tjkYi*jEa|i;c{=R4t%L&xro8VTxGjD8CzP6?|E~i8P0=p*Bg$oa-DO8>rU6e zH&Qlh^?_u93thnJ&!Xwz^Y zFj=^_be7bcT}qAox~||_R&>^C*s4!aTUL7C<2PQNdgpbmaSovV{v21RnTO!8?g(i8 z{ULRk<1K2hF}DSt~2tS7DpQm;>k?`NFJ4& zGod;vizr*-s1C~n)rMAyOQ~yV+e<5PA$undzX04zeDBc;YJ-c#?XiZZz0rb3cHVEV zmsRgpHyAJM;BBX@Zqy$8HH~{IvFTVl3s9p)qDRUiPKFd0<;9R) zWN;bPV9ndcF(OakDJN3(|Qq>$2S1?eN1U ztUF`#iQZ*S&+QvD%P_S{vxTdhxG=S&E}$dB+%>CoS)hjpkyszXKp2JSoix{VyJ7M& zp=^WIK3(wPdoE|b0N1@NL5}qAw(Zg1Onn{W!QyM&<@}s{H2Vfsp0`~zX$ zhSGV(q|;-WU=+<16|L)D9VO`+^052Ivf;-WYTky(fBXdpx6UH6k3zpHr_75QkJN|T zNNcL&?>B>cCL1fE5B@5Pt$-Vpnk3gz@fDR)7@^YFKKS8FW}63$Af3LxejJm%dxT!e zYr{wcZ5a>qcX52;$doC6gjHSYU+$`8WH%oR=eBGGv2kE+mRI~4QzNNn_?;nS*+aKL zdGQt3nb5!_Pqh7vn;rh3JYEt-#OGV{_-hIHBA~4(6OI)wV%Xp*Rygp}F7rMB59DSZ zfk>`wh!E)g;Fgs@3VG*;4STfzae()yL)FsR5IVo$_wh_aWHTVKA5Qgm;T?A+WA6|g z-rbgceyw9!Dse6_ELV`l8k^n!Mv!`JRQ(nXamJl~h$iV06y=v_cLn!OK$mePMR%J5 zyeC^MXB0fQQ&U6KoiHbhl5Hn(aP}dJK}10V){Dl6&ijrNfYJt-w@k8Rd$BJ|(xv(m z^jk?!eD;OY3FXfSHfmy-Djt1eS4Iy1+MT&a5(5lUlu(X4j%NaI}i>bX@xI7!k zlZFfXfZ~wjlJU-I)x7vHOoPnsRB_)Pflmb=2YWDDsUkyByEqDAn_68#dQuFX;zgcH z#><)xKXW3vM2$q!nc%eUmI)aHXPoBv_)cosOX9D%W<~)fej*4XsXC1uW$uxlAKquX zr~Nh(5Z<}w_&1+Q`_enUkqOX`L!FtPMNaQuGgIQN97v5DF6SIEfFG5zxjfM%e zjt%5+5kZE0h+Oi?@j@zP*$ek0{R&JB3@+;$O`M#)G5hH@8 zU*D3hiX@DiJ}YE2T%g|Y-iR~`N$)#sQ3umWCKtt{GW|H-k6C|{w2pJJUaMu%As?sP zc;L;Ct~Se&???4aR(Hai_KkY5IRd$F(Vh-5qObU%PeNFT@ked7Q37Tx^VnU~QzuA~JGebU2%D5-A#RZYdFO25{CSX?}vuvSEH?&v%X z&WHy|1N&+l+(?<%f>SPymG?zwt=n2pSxowmyxiHTS zExRqIvCkq9SB2%E6iZJC0}XQHeM>%(75fcSIywg#=_)gG%nZIYdC+X+Cy5W5DUmv^ z6sQ`6NTQ5(-#*7ptVL;XC79^KpOUBM$+)5(mPQguGJ2C9dZBk#c`pB1yDxVPYv5l# ztsMiZMSV0jy!W6S@4iN5Wz=^&-ShitKG$%z^trI_SI!=FapNN6z@M4v9kt*0$I&%* zDuNz;{l&XrgnowSn|{OJ+JM%q^~#50)_`Vi!4VBHcb3>4IXRRS3`U@p61BOs2U#N9 zXLOTh`HpGfLh;Olej0e&o#mhF%_{WZ05ky`w$qecso&MVRK-h(h^=dxf|ursYRE`U z4kim~8u$11IgW3PQbvyF`ddV#0GB{8#Y^DE8(OaCC%AHs_!QyFQo7IA81f`xHuGUN zy-mE+CUjq-7ZI4E6T*wbP}t1`H*ga#D!PmfisV;ocDCrY3Q5-;TE;lkj`{uxFKxI? z8@fySGck~Tm>TFoc(<}@;H=p@vK5xm6_N5r@~#ZZ;(1j)2wGuft$Ax+(8ZcHAd_Q2+xv$x4` zqKX(00HT<|GWITfgO;FXD-+`TtkBl)Wh!@I>x=;4QWaaNrhW^uLD?*xYm(Y*^7aym zIr6-|ce>iMrQu0wJXgD%_|4H%vRLf_`8gd;>h<;Wq=(+*2b{|;@MX^Y5{KKy;xB_E zcNBwj{ggvt?_k(GS;e)i>X%m-y*EfgjpVXvrnCSgU#u&NjnL=dUiBZXwB>K`pDQ)|| zJXnqMwW#!Y3v)CpV`#OmJ&)CBF=N+%gfrlI4SRNHbzJWpbY2r%VPIgeSBQS@alPM^ zGTx7D6&ks}o@-ajsWobyRr|T%`EWuku2ws`)XE>xSf-c9qv3R%i`9DVHqz|cHG3=rq4Kd4cS}%>EDNJf%MDnD6TN`LNFAhtaS<>%r$mzM-A6{T_W5Sd`1O~Fpzk}I9WTo?Bf5?}G%$>1UF7rTs1>)7BOU@@B`Xs?Vl1$vMB@24ae|z7o^n zb~q!ok^5FYlnum?Yv~~-+g76snoUs7&TUYIhm$~{{3VM>9-PTGqHN*|!}xPXM;-ER znV7(_ZJQtxLaC+v0gc)im8D{?55HuQIGUQ)^=b8a6uKhWLou~y*XA23$v4be8taw1 z&wIma%x9xoj#@HL5Rmu8$ysPQ9BG!nne~jil4tyUKBm2UEkWiiPv|DKm)q*Psy>iG zqJAbcATW@M`CBAo9o2yBo8lrBJ}FD`eVk;h`n4K;idTnbq==`u0P{PU=@-G?Y(QHe zPK}BiV`s`v!%oPoIyct(R0}TQ`vfpkXoaTiHlabkoLi|>@VuFnX-!A94Ye(L8AZS1 z$KsfxjW2a^t%{vM_}Fm&+sweO8#0K^eHUU7^|E#bMk9HTB<$oi)5 zh_y}pSYxPg6MD_ss=w1s$NxvTfWkdo9LkR+mA1X zlm7ZPkE#}ywNySlKZq=<<9UEH1z@IXE0kS z+x7;5qg@W$si}x?vbWnh6i8;UmC^2<9)@mo*^VF5ty5o2sMH7j+|?~xLvu-$vK-sP zFk6JNF5?I~xG~Mbel0qB^2%nEGQAGJI}o;-ez13}#)hRw&J@nzx@AIqFZEVOC2!fkD)(!&cQtSfzxA_{8*Ao0RxsSkc1>TlQ?!#<`K_e^#y0q) z4WnA&(0mc-eo2bv9i_XWP0p%!uO;v=RLIL#tO@la;DpXZ1$ni;F=zw)79;sJjV@`* zDA;FZuwh_DGVqOc&;dqN0_{ZI%odBrc({6jIgG>}$QGMt2_~IQ^0|yHZ`LBsj4Z>wPEQE6x*4#(?)D!-g6M;*;@vB7~#mH@2r9=qZ_re<>iU>ndbWiiKXV@Vs*_RQ}bW@b+zWY z$jC`ni@OK-d&*8brE1)}SG@ficY5XrE#Iz2GzPkoV5d?x(pHY*@nYz@jZfSMSTOyu zBAe+HFzM&JdF@Uf39NJYm6g;<%Zs3-W~_wIs({O9$s4^y-TI@p>(hkPd~joy-0>0u7xUL0Ak`tW!3pAAT$6AdsKF8O&`gs8;|g zY#SPB+o#j_dW06DqZWS?7IZ2c34SH@$??oGKr^&OdSZF$+ve;qvdph`f@jG_%r4Ax zUZ(LHP0f|5jVBu|+2#1Icc3Oht+kK8*f%yC8GtK0+|c%V^)gPioGksqcD z48RK%K;1^2|7--)o9l{BhkH4v#v|l;zgKo_Gq1SY;yVAUF{zeokcA7?eJ|fg_-v>f z{~;2YmhCdUZs~S!xpn?BYP3I?w=$lW4w#6?Y+P}-p$$_Mr^lsGNBCX>ryKHyP9vjF zD~xKz*PHzLc9pva-!lu=aSG76J2H>E0@s)0w;#-UCs+HYsK}eSL>>dzYFeub^mOU2uz3@Om=rnkKAif{v)-qGLPh7 zvGL=1>}Gevy@xG1_tje2=ih~L7C6eI6m*SFojJcK)Q%sHzm}g0$I)1RS>*VjOpCEX zmM&=QCk)DCR}F~*@L5V#mTQA;LG+YNS)$*-rf=aDrwN?IK}o0Nma_vhY&?bTiEZO_ z3>=DT^TZdto?!p*cL;J=hI-l;LD}wa2JlsxZoogY1i#TY#Ws9S*u2i0pQRfpaImqP ztqfE6NK@0XWa71u5>;uvw^Ft;wT@b7)(-~I)VUzO>DFg0c%vP#?lYITHog;#fe=Ff z%d5RxuW`vtCR}e4Hn>;?%MLzu-&aa%{BvHkwjY7a9rn@pG?scDhLnDps>?Osl$Ee4 zX?k`+D;X~)0+~c3Rara3d>zTH*lWwThJzNnkx821Yjsz%+;2_;xd-FX20f^S@f z?kaw#@5-11R~hV&Pvet;r@FB(vPGn|#mhQh14QShEUJ19%?xmBXAElR%?zRyvp<_L z&xQ&`mld}g8IA2}3G9x-&pwajELxS<9Zi;W&{@*uFwq6oHn6sC&5D^%fv^0{slc_p zBBmjsJq<#`>O$5*#o&&y(CFVcqabj$l)Ch{U@RXJZK~g&%8Oomr+pj4Rm%>YMO1A@ z7;X@{lWHzeTBbHw(J3Kdd7mO=KrR$LYZT3l8%`n$)9cBxbibyMzI)#r90Ax(M$uM> z4?=K>pT;aaqo+nqb5oG);&1n@WZIaX@sFU*$7hzLZF44k?X3B4nft!e{^DPiR*l$3g+#MM32NFq zb>_9v961NU$)LW(QJq{6YOXMu34r4`9SaSyK2t{5gt?uWnQRx)l)13uwKmKU^UtQv zre3#>5SAOOSY2mC0l|tw0CsQ3Wi#4g%ThqL!j`W=fUmr+TUQK`C>>bUJX`g>R5ci1 zGi0b6@z}w8d?LR%$A3)rYf1tT75kzABDEEJw=czKiG5u^Rn*u*o)EKQSBvrnL5km2 zG!FAe%&ZwI4ZrKs5*CAz(3onsd>&U87ixCH?rqoa@)Ij|Jk<&Ndb|%vg@4eC%Ii1Q zOxVy;MU2x!2M~%-T`|CjR9~e$j>B{GsMs#m7}pfY>)b6oJDBwSV&<(CtQH2}n0eJl z3tA_OshfCLAUBIpcjb?3THp9q)P*K6t;Q9;UH4a)zDI7k8~6GTNR6k^rT<@`jR1{^ zzL|r*ow2UAls*K=^Mq&fzwKW{41yLA;*GHdg!3b*tt)G3u5Iz($-jYZAmEJu0^2}Y z7XJ>mfroN%{{961!{7fq``05VuI67~|M%Zuh#m~g3i=Pv1+8cO2M+z;)&KXeKoHZ& z-vFDZB&`1dOY*eu{-KgUdhFk^By13Al4XUBfjieRg=nF{mS zR%^%%PqO9N`{k+Jj3f~9E?(hx9*O2Wwe4ka3r!QA)1ZP4QWdU&)h@2V;g^n=xyA|C zg=jFY0Xz6-DeOi?kw}DxR(QW9oV`bw!oy#DrM7cBL1Kk*{KsJ07NZu?Pgn~(idgjBvY8cA-`hQK)|W=>TCIcE}S7#u0v zyGo_dLYi5MC`}NxdSKvRhumki<%P)6%Gp_lX&JHqLM=W?`%)=Jdn8-0{w9)XVM51+ zs8hr0_dA^5=K4HiZer*czopr~SrjBnSvHW$v@@r2nW@ zKXc|&&E-7Idp?Por%(xI8Vw$GXTEEgg9V!W#NrxKb^WF!eC-3P0E{?B+0!PEHc825 za|v9^^Q?q#>~+m~{!Ii`*K?&+*cl`%I;kELh;8ygt1{DK0Y;}0n{qK=sy6tVn)H+) z6J;3vQ>%`KcC`dL$VUVFEC&r1pPWchV)P?FT`Qq9<1D@7oIc0A@ogkobg07p>GgnH ze%j_Cu^fRz@K7eb<0u-aKv_xNR!#EmM9V=~v4rNw^ElyVkyDT=TIS3s8f?lBT4@lbdL~&8Z?#^mYc@Gs zRGbW?e=S|PNnblhU5oi4_S|WLn1G9@3H5@X_T%g1VEGR%P8|bLsi@+4`i@bH4a4c4 zRmaC-X@XeO?;o^?9rBJ%FHGET?=DxPSL~nP*Ac6UG}Q($1fMTN$f4}nTyfG3b{Lva6DFki0Rtr z(cP|33dts?n{n;EYCfJ1SmEBWh9XsuaTP@pwtg4j_0a1JI`TF1lgO#A{P`+za^F5L zD%*WwlbLdmK3wW(QkgL(Nh>Q*ENF7qeb4c3hj^r~Dl}XfU=Lp8;7FCAK%f7i+oH+5OA+3P*BiM@Vqf>s z;8PO%ii|?6t$=S6?4fl?TX2pa$mX$T;6mTuTDWtLv{L-q0(6G{*>y+BQfN8RZRPd)jrjb0*KKeqBg4+m#W*~i?FzqfP{bqIfp4#ZB zcdu2}y6&ExK?^_Shu58Sz27{Q)WklO4A4Uy2d}~5gcsL)!(X0^#X&3_f1)>UzrY{I z%)sBfjg=9>4Wa&}`YQ=>Rxksv6VA%a9U{Cxa*Y92>y<#a^~Z2i*;v%0&ZmH-&HZ&z>u(LYG^7DB zxz?Kklk@Sc#55FZ&Gc;hcegDkc|LldH|6+ zCBMC~nH~`nke`KFor*}tUfagbh3J!j&_6m~Cp#*5hL1M-+7RX;5v91kvxTLdKIA$7 zi6GoZ0jPPGQh=R_kqHQZNIW47L z|Mfux27w?RI|HJBWso+A{`prTi~o}WSfNyYHh6^rY`0tUsuo_E8=9>FdkL0`&tws$+%Pj2~re zP=e^=4CG?7JWAlAf2Y`b3A7xBXKg^>H z2o=da%9x>o(?=NyYASx5K@B61GB8wq^(bS3S{ff^tWY!Y;|ywFew496UD1DMKt`zM z?@{)TvG`HO1hpj2FE*q{GCKW4zwamvgD)o?$m13n$65G3c5zGr5BI**t^|Ka?= zPsb@U3)DLOs9lz)V$vve88vY3}$|MJ%ZdedeYC}r`J^gWGXyugXQVm2eUj~{~(u?Cu<29di&?m zIkP}^7f-T(oE3ln-yqFCc`aF(o?cT}fKY4hqhleH>uL71pIJap_DBF0@YD5*g#}6| zf7Ax+)Afpl6{>@JRLA!8It@AWNk6j!pjP<5kA>htpJu?PYc?A*)Ry_^Sm-^fM;YW2 z^|)Vvkm&I_16i1!j29pq)4z{NI~#2X>emJyy1SM$cF~7!P>C31EiEB*Vd$8IvitvG zo&)}|H$xu@KmgyI2D(hTtZa;I%#1n)dSG2WHbxMDNe_ZD*V8cofLVdu@c(nk|Gsl+ YYX{kr|Fb&=vN5v&;K|4YrG?=C7q_CA!~g&Q literal 0 HcmV?d00001 diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt index 75b8df676c52b..6744944fd8b99 100644 --- a/examples/CMakeLists.txt +++ b/examples/CMakeLists.txt @@ -24,6 +24,7 @@ else() add_subdirectory(llama-bench) add_subdirectory(llava) add_subdirectory(main) + add_subdirectory(tokenize) add_subdirectory(parallel) add_subdirectory(perplexity) add_subdirectory(quantize) @@ -31,6 +32,7 @@ else() add_subdirectory(save-load-state) add_subdirectory(simple) add_subdirectory(speculative) + add_subdirectory(lookahead) add_subdirectory(train-text-from-scratch) if (LLAMA_METAL) add_subdirectory(metal) diff --git a/examples/batched-bench/batched-bench.cpp b/examples/batched-bench/batched-bench.cpp index 533c55c17aad1..57596ed986050 100644 --- a/examples/batched-bench/batched-bench.cpp +++ b/examples/batched-bench/batched-bench.cpp @@ -155,7 +155,7 @@ int main(int argc, char ** argv) { } LOG_TEE("\n"); - LOG_TEE("%s: n_kv_max = %d, is_pp_shared = %d, n_gpu_layers = %d, mmq = %d\n", __func__, n_kv_max, is_pp_shared, n_gpu_layers, mmq); + LOG_TEE("%s: n_kv_max = %d, is_pp_shared = %d, n_gpu_layers = %d, mmq = %d, n_threads = %d, n_threads_batch = %d\n", __func__, n_kv_max, is_pp_shared, n_gpu_layers, mmq, ctx_params.n_threads, ctx_params.n_threads_batch); LOG_TEE("\n"); LOG_TEE("|%6s | %6s | %4s | %6s | %8s | %8s | %8s | %8s | %8s | %8s |\n", "PP", "TG", "B", "N_KV", "T_PP s", "S_PP t/s", "T_TG s", "S_TG t/s", "T s", "S t/s"); diff --git a/examples/batched.swift/README.md b/examples/batched.swift/README.md index 464c9079c4660..4c2721fe85b00 100644 --- a/examples/batched.swift/README.md +++ b/examples/batched.swift/README.md @@ -1,4 +1,4 @@ This is a swift clone of `examples/batched`. $ `make` -$ `./swift MODEL_PATH [PROMPT] [PARALLEL]` +$ `./batched_swift MODEL_PATH [PROMPT] [PARALLEL]` diff --git a/examples/batched.swift/Sources/main.swift b/examples/batched.swift/Sources/main.swift index 772730382ebe0..4d000534900af 100644 --- a/examples/batched.swift/Sources/main.swift +++ b/examples/batched.swift/Sources/main.swift @@ -153,7 +153,7 @@ while n_cur <= n_len { // const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p); // is it an end of stream? -> mark the stream as finished - if new_token_id == llama_token_eos(context) || n_cur == n_len { + if new_token_id == llama_token_eos(model) || n_cur == n_len { i_batch[i] = -1 // print("") if n_parallel > 1 { @@ -215,9 +215,10 @@ print("decoded \(n_decode) tokens in \(String(format: "%.2f", Double(t_main_end llama_print_timings(context) private func tokenize(text: String, add_bos: Bool) -> [llama_token] { - let n_tokens = text.count + (add_bos ? 1 : 0) + let utf8Count = text.utf8.count + let n_tokens = utf8Count + (add_bos ? 1 : 0) let tokens = UnsafeMutablePointer.allocate(capacity: n_tokens) - let tokenCount = llama_tokenize(model, text, Int32(text.count), tokens, Int32(n_tokens), add_bos, /*special tokens*/ false) + let tokenCount = llama_tokenize(model, text, Int32(utf8Count), tokens, Int32(n_tokens), add_bos, /*special tokens*/ false) var swiftTokens: [llama_token] = [] for i in 0 ..< tokenCount { swiftTokens.append(tokens[Int(i)]) @@ -230,18 +231,15 @@ private func token_to_piece(token: llama_token, buffer: inout [CChar]) -> String var result = [CChar](repeating: 0, count: 8) let nTokens = llama_token_to_piece(model, token, &result, Int32(result.count)) if nTokens < 0 { - if result.count >= -Int(nTokens) { - result.removeLast(-Int(nTokens)) - } else { - result.removeAll() - } + let actualTokensCount = -Int(nTokens) + result = .init(repeating: 0, count: actualTokensCount) let check = llama_token_to_piece( model, token, &result, Int32(result.count) ) - assert(check == nTokens) + assert(check == actualTokensCount) } else { result.removeLast(result.count - Int(nTokens)) } @@ -259,5 +257,4 @@ private func token_to_piece(token: llama_token, buffer: inout [CChar]) -> String buffer = [] return bufferString } - return nil } diff --git a/examples/finetune/README.md b/examples/finetune/README.md index 36e62578c9527..a2a2c12814bdd 100644 --- a/examples/finetune/README.md +++ b/examples/finetune/README.md @@ -21,7 +21,7 @@ wget https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/s ./bin/main -m open-llama-3b-v2-q8_0.gguf --lora lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin ``` -Finetune output files will be saved every N iterations (config with `--save-every N`). +**Only llama based models are supported!** The output files will be saved every N iterations (config with `--save-every N`). The pattern 'ITERATION' in the output filenames will be replaced with the iteration number and with 'LATEST' for the latest output. So in above example after 10 iterations these files will be written: - chk-lora-open-llama-3b-v2-q8_0-shakespeare-10.gguf diff --git a/examples/finetune/convert-finetune-checkpoint-to-gguf.py b/examples/finetune/convert-finetune-checkpoint-to-gguf.py index c8e14da87e9e8..c89090918da97 100644 --- a/examples/finetune/convert-finetune-checkpoint-to-gguf.py +++ b/examples/finetune/convert-finetune-checkpoint-to-gguf.py @@ -3,9 +3,7 @@ import argparse import gguf -import os import struct -import sys import numpy as np from pathlib import Path diff --git a/examples/finetune/finetune.cpp b/examples/finetune/finetune.cpp index 5a6cf22ce1b95..af46e44a6e216 100644 --- a/examples/finetune/finetune.cpp +++ b/examples/finetune/finetune.cpp @@ -548,35 +548,35 @@ static void randomize_lora(struct my_llama_lora * lora, int seed, float mean, fl struct random_normal_distribution * rnd = init_random_normal_distribution(seed, mean, std, min, max); randomize_tensor_normal(lora->tok_embeddings_a, rnd); - randomize_tensor_normal(lora->tok_embeddings_b, rnd); + ggml_set_zero(lora->tok_embeddings_b); randomize_tensor_normal(lora->norm_a, rnd); - randomize_tensor_normal(lora->norm_b, rnd); + ggml_set_zero(lora->norm_b); randomize_tensor_normal(lora->output_a, rnd); - randomize_tensor_normal(lora->output_b, rnd); + ggml_set_zero(lora->output_b); for (uint32_t i = 0; i < n_layer; ++i) { auto & layer = lora->layers[i]; randomize_tensor_normal(layer.attention_norm_a, rnd); - randomize_tensor_normal(layer.attention_norm_b, rnd); + ggml_set_zero(layer.attention_norm_b); randomize_tensor_normal(layer.wq_a, rnd); - randomize_tensor_normal(layer.wq_b, rnd); + ggml_set_zero(layer.wq_b); randomize_tensor_normal(layer.wk_a, rnd); - randomize_tensor_normal(layer.wk_b, rnd); + ggml_set_zero(layer.wk_b); randomize_tensor_normal(layer.wv_a, rnd); - randomize_tensor_normal(layer.wv_b, rnd); + ggml_set_zero(layer.wv_b); randomize_tensor_normal(layer.wo_a, rnd); - randomize_tensor_normal(layer.wo_b, rnd); + ggml_set_zero(layer.wo_b); randomize_tensor_normal(layer.ffn_norm_a, rnd); - randomize_tensor_normal(layer.ffn_norm_b, rnd); + ggml_set_zero(layer.ffn_norm_b); randomize_tensor_normal(layer.w1_a, rnd); - randomize_tensor_normal(layer.w1_b, rnd); + ggml_set_zero(layer.w1_b); randomize_tensor_normal(layer.w2_a, rnd); - randomize_tensor_normal(layer.w2_b, rnd); + ggml_set_zero(layer.w2_b); randomize_tensor_normal(layer.w3_a, rnd); - randomize_tensor_normal(layer.w3_b, rnd); + ggml_set_zero(layer.w3_b); } free_random_normal_distribution(rnd); @@ -1460,17 +1460,6 @@ static bool train_params_parse(int argc, char ** argv, struct train_params * par } params->n_rank_w3 = std::stoi(argv[i]); params->custom_n_rank_w3 = true; - } else if (arg == "--gpu-layers" || arg == "-ngl" || arg == "--n-gpu-layers") { - if (++i >= argc) { - invalid_param = true; - break; - } -#ifdef LLAMA_SUPPORTS_GPU_OFFLOAD - params->common.n_gpu_layers = std::stoi(argv[i]); -#else - fprintf(stderr, "warning: not compiled with GPU offload support, --n-gpu-layers option will be ignored\n"); - fprintf(stderr, "warning: see main README.md for information on enabling GPU BLAS support\n"); -#endif } else { fprintf(stderr, "error: unknown argument: %s\n", arg.c_str()); train_print_usage(argc, argv, &default_params); diff --git a/examples/infill/infill.cpp b/examples/infill/infill.cpp index 62f5ce3c16a32..4a7827876e215 100644 --- a/examples/infill/infill.cpp +++ b/examples/infill/infill.cpp @@ -146,6 +146,13 @@ int main(int argc, char ** argv) { return 0; } + if (params.chatml) { + printf("\n************\n"); + printf("%s: please use the 'main' tool for chatml mode\n", __func__); + printf("************\n\n"); + + return 0; + } if (!params.antiprompt.empty()) { printf("\n************\n"); printf("%s: please use the 'main' tool for antiprompt mode\n", __func__); @@ -230,7 +237,7 @@ int main(int argc, char ** argv) { LOG_TEE("\n"); LOG_TEE("%s\n", get_system_info(params).c_str()); } - const bool add_bos = llama_vocab_type(model) == LLAMA_VOCAB_TYPE_SPM; + const bool add_bos = llama_should_add_bos_token(model); LOG("add_bos: %d\n", add_bos); bool suff_rm_leading_spc = params.escape; diff --git a/examples/llama-bench/llama-bench.cpp b/examples/llama-bench/llama-bench.cpp index 9bd82d565834a..6617c050ddfec 100644 --- a/examples/llama-bench/llama-bench.cpp +++ b/examples/llama-bench/llama-bench.cpp @@ -53,6 +53,13 @@ static std::vector split(const std::string & str, char delim) { return values; } +template +static std::vector transform_to_str(const std::vector & values, F f) { + std::vector str_values; + std::transform(values.begin(), values.end(), std::back_inserter(str_values), f); + return str_values; +} + template static T avg(const std::vector & v) { if (v.empty()) { @@ -126,7 +133,8 @@ struct cmd_params { std::vector n_prompt; std::vector n_gen; std::vector n_batch; - std::vector f32_kv; + std::vector type_k; + std::vector type_v; std::vector n_threads; std::vector n_gpu_layers; std::vector main_gpu; @@ -142,7 +150,8 @@ static const cmd_params cmd_params_defaults = { /* n_prompt */ {512}, /* n_gen */ {128}, /* n_batch */ {512}, - /* f32_kv */ {false}, + /* type_k */ {GGML_TYPE_F16}, + /* type_v */ {GGML_TYPE_F16}, /* n_threads */ {get_num_physical_cores()}, /* n_gpu_layers */ {99}, /* main_gpu */ {0}, @@ -162,7 +171,8 @@ static void print_usage(int /* argc */, char ** argv) { printf(" -p, --n-prompt (default: %s)\n", join(cmd_params_defaults.n_prompt, ",").c_str()); printf(" -n, --n-gen (default: %s)\n", join(cmd_params_defaults.n_gen, ",").c_str()); printf(" -b, --batch-size (default: %s)\n", join(cmd_params_defaults.n_batch, ",").c_str()); - printf(" --memory-f32 <0|1> (default: %s)\n", join(cmd_params_defaults.f32_kv, ",").c_str()); + printf(" -ctk , --cache-type-k (default: %s)\n", join(transform_to_str(cmd_params_defaults.type_k, ggml_type_name), ",").c_str()); + printf(" -ctv , --cache-type-v (default: %s)\n", join(transform_to_str(cmd_params_defaults.type_v, ggml_type_name), ",").c_str()); printf(" -t, --threads (default: %s)\n", join(cmd_params_defaults.n_threads, ",").c_str()); printf(" -ngl, --n-gpu-layers (default: %s)\n", join(cmd_params_defaults.n_gpu_layers, ",").c_str()); printf(" -mg, --main-gpu (default: %s)\n", join(cmd_params_defaults.main_gpu, ",").c_str()); @@ -173,9 +183,32 @@ static void print_usage(int /* argc */, char ** argv) { printf(" -v, --verbose (default: %s)\n", cmd_params_defaults.verbose ? "1" : "0"); printf("\n"); printf("Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.\n"); +} +static ggml_type ggml_type_from_name(const std::string & s) { + if (s == "f16") { + return GGML_TYPE_F16; + } + if (s == "q8_0") { + return GGML_TYPE_Q8_0; + } + if (s == "q4_0") { + return GGML_TYPE_Q4_0; + } + if (s == "q4_1") { + return GGML_TYPE_Q4_1; + } + if (s == "q5_0") { + return GGML_TYPE_Q5_0; + } + if (s == "q5_1") { + return GGML_TYPE_Q5_1; + } + + return GGML_TYPE_COUNT; } + static cmd_params parse_cmd_params(int argc, char ** argv) { cmd_params params; std::string arg; @@ -224,13 +257,38 @@ static cmd_params parse_cmd_params(int argc, char ** argv) { } auto p = split(argv[i], split_delim); params.n_batch.insert(params.n_batch.end(), p.begin(), p.end()); - } else if (arg == "--memory-f32") { + } else if (arg == "-ctk" || arg == "--cache-type-k") { if (++i >= argc) { invalid_param = true; break; } - auto p = split(argv[i], split_delim); - params.f32_kv.insert(params.f32_kv.end(), p.begin(), p.end()); + auto p = split(argv[i], split_delim); + std::vector types; + for (const auto & t : p) { + ggml_type gt = ggml_type_from_name(t); + if (gt == GGML_TYPE_COUNT) { + invalid_param = true; + break; + } + types.push_back(gt); + } + params.type_k.insert(params.type_k.end(), types.begin(), types.end()); + } else if (arg == "-ctv" || arg == "--cache-type-v") { + if (++i >= argc) { + invalid_param = true; + break; + } + auto p = split(argv[i], split_delim); + std::vector types; + for (const auto & t : p) { + ggml_type gt = ggml_type_from_name(t); + if (gt == GGML_TYPE_COUNT) { + invalid_param = true; + break; + } + types.push_back(gt); + } + params.type_v.insert(params.type_v.end(), types.begin(), types.end()); } else if (arg == "-t" || arg == "--threads") { if (++i >= argc) { invalid_param = true; @@ -321,7 +379,8 @@ static cmd_params parse_cmd_params(int argc, char ** argv) { if (params.n_prompt.empty()) { params.n_prompt = cmd_params_defaults.n_prompt; } if (params.n_gen.empty()) { params.n_gen = cmd_params_defaults.n_gen; } if (params.n_batch.empty()) { params.n_batch = cmd_params_defaults.n_batch; } - if (params.f32_kv.empty()) { params.f32_kv = cmd_params_defaults.f32_kv; } + if (params.type_k.empty()) { params.type_k = cmd_params_defaults.type_k; } + if (params.type_v.empty()) { params.type_v = cmd_params_defaults.type_v; } if (params.n_gpu_layers.empty()) { params.n_gpu_layers = cmd_params_defaults.n_gpu_layers; } if (params.main_gpu.empty()) { params.main_gpu = cmd_params_defaults.main_gpu; } if (params.mul_mat_q.empty()) { params.mul_mat_q = cmd_params_defaults.mul_mat_q; } @@ -336,7 +395,8 @@ struct cmd_params_instance { int n_prompt; int n_gen; int n_batch; - bool f32_kv; + ggml_type type_k; + ggml_type type_v; int n_threads; int n_gpu_layers; int main_gpu; @@ -365,7 +425,8 @@ struct cmd_params_instance { cparams.n_ctx = n_prompt + n_gen; cparams.n_batch = n_batch; - cparams.f16_kv = !f32_kv; + cparams.type_k = type_k; + cparams.type_v = type_v; cparams.mul_mat_q = mul_mat_q; return cparams; @@ -380,7 +441,8 @@ static std::vector get_cmd_params_instances_int(const cmd_p for (const auto & mg : params.main_gpu) for (const auto & ts : params.tensor_split) for (const auto & nb : params.n_batch) - for (const auto & fk : params.f32_kv) + for (const auto & tk : params.type_k) + for (const auto & tv : params.type_v) for (const auto & mmq : params.mul_mat_q) for (const auto & nt : params.n_threads) { cmd_params_instance instance = { @@ -388,7 +450,8 @@ static std::vector get_cmd_params_instances_int(const cmd_p /* .n_prompt = */ n_prompt, /* .n_gen = */ n_gen, /* .n_batch = */ nb, - /* .f32_kv = */ fk, + /* .type_k = */ tk, + /* .type_v = */ tv, /* .n_threads = */ nt, /* .n_gpu_layers = */ nl, /* .main_gpu = */ mg, @@ -410,7 +473,8 @@ static std::vector get_cmd_params_instances(const cmd_param for (const auto & mg : params.main_gpu) for (const auto & ts : params.tensor_split) for (const auto & nb : params.n_batch) - for (const auto & fk : params.f32_kv) + for (const auto & tk : params.type_k) + for (const auto & tv : params.type_v) for (const auto & mmq : params.mul_mat_q) for (const auto & nt : params.n_threads) { for (const auto & n_prompt : params.n_prompt) { @@ -422,7 +486,8 @@ static std::vector get_cmd_params_instances(const cmd_param /* .n_prompt = */ n_prompt, /* .n_gen = */ 0, /* .n_batch = */ nb, - /* .f32_kv = */ fk, + /* .type_k = */ tk, + /* .type_v = */ tv, /* .n_threads = */ nt, /* .n_gpu_layers = */ nl, /* .main_gpu = */ mg, @@ -441,7 +506,8 @@ static std::vector get_cmd_params_instances(const cmd_param /* .n_prompt = */ 0, /* .n_gen = */ n_gen, /* .n_batch = */ nb, - /* .f32_kv = */ fk, + /* .type_k = */ tk, + /* .type_v = */ tv, /* .n_threads = */ nt, /* .n_gpu_layers = */ nl, /* .main_gpu = */ mg, @@ -489,7 +555,8 @@ struct test { uint64_t model_n_params; int n_batch; int n_threads; - bool f32_kv; + ggml_type type_k; + ggml_type type_v; int n_gpu_layers; int main_gpu; bool mul_mat_q; @@ -508,7 +575,8 @@ struct test { model_n_params = llama_model_n_params(lmodel); n_batch = inst.n_batch; n_threads = inst.n_threads; - f32_kv = inst.f32_kv; + type_k = inst.type_k; + type_v = inst.type_v; n_gpu_layers = inst.n_gpu_layers; main_gpu = inst.main_gpu; mul_mat_q = inst.mul_mat_q; @@ -571,7 +639,7 @@ struct test { "cuda", "opencl", "metal", "gpu_blas", "blas", "cpu_info", "gpu_info", "model_filename", "model_type", "model_size", "model_n_params", - "n_batch", "n_threads", "f16_kv", + "n_batch", "n_threads", "type_k", "type_v", "n_gpu_layers", "main_gpu", "mul_mat_q", "tensor_split", "n_prompt", "n_gen", "test_time", "avg_ns", "stddev_ns", @@ -621,7 +689,7 @@ struct test { std::to_string(cuda), std::to_string(opencl), std::to_string(metal), std::to_string(gpu_blas), std::to_string(blas), cpu_info, gpu_info, model_filename, model_type, std::to_string(model_size), std::to_string(model_n_params), - std::to_string(n_batch), std::to_string(n_threads), std::to_string(!f32_kv), + std::to_string(n_batch), std::to_string(n_threads), ggml_type_name(type_k), ggml_type_name(type_v), std::to_string(n_gpu_layers), std::to_string(main_gpu), std::to_string(mul_mat_q), tensor_split_str, std::to_string(n_prompt), std::to_string(n_gen), test_time, std::to_string(avg_ns()), std::to_string(stdev_ns()), @@ -805,8 +873,11 @@ struct markdown_printer : public printer { if (params.n_batch.size() > 1 || params.n_batch != cmd_params_defaults.n_batch) { fields.push_back("n_batch"); } - if (params.f32_kv.size() > 1 || params.f32_kv != cmd_params_defaults.f32_kv) { - fields.push_back("f16_kv"); + if (params.type_k.size() > 1 || params.type_k != cmd_params_defaults.type_k) { + fields.push_back("type_k"); + } + if (params.type_v.size() > 1 || params.type_v != cmd_params_defaults.type_v) { + fields.push_back("type_v"); } if (params.main_gpu.size() > 1 || params.main_gpu != cmd_params_defaults.main_gpu) { fields.push_back("main_gpu"); diff --git a/examples/llama.swiftui/.gitignore b/examples/llama.swiftui/.gitignore new file mode 100644 index 0000000000000..9bce6af399ba9 --- /dev/null +++ b/examples/llama.swiftui/.gitignore @@ -0,0 +1 @@ +xcuserdata diff --git a/examples/llama.swiftui/README.md b/examples/llama.swiftui/README.md new file mode 100644 index 0000000000000..fa68e6ed8e34d --- /dev/null +++ b/examples/llama.swiftui/README.md @@ -0,0 +1,7 @@ +# llama.swiftui + +Local inference of llama.cpp on an iPhone. +So far I only tested with starcoder 1B model, but it can most likely handle 7B models as well. + +https://github.com/bachittle/llama.cpp/assets/39804642/e290827a-4edb-4093-9642-2a5e399ec545 + diff --git a/examples/llama.swiftui/llama.cpp.swift/LibLlama.swift b/examples/llama.swiftui/llama.cpp.swift/LibLlama.swift new file mode 100644 index 0000000000000..3754f055163ea --- /dev/null +++ b/examples/llama.swiftui/llama.cpp.swift/LibLlama.swift @@ -0,0 +1,208 @@ +import Foundation + +// import llama + +enum LlamaError: Error { + case couldNotInitializeContext +} + +actor LlamaContext { + private var model: OpaquePointer + private var context: OpaquePointer + private var batch: llama_batch + private var tokens_list: [llama_token] + /// This variable is used to store temporarily invalid cchars + private var temporary_invalid_cchars: [CChar] + + var n_len: Int32 = 512 + var n_cur: Int32 = 0 + var n_decode: Int32 = 0 + + init(model: OpaquePointer, context: OpaquePointer) { + self.model = model + self.context = context + self.tokens_list = [] + self.batch = llama_batch_init(512, 0, 1) + self.temporary_invalid_cchars = [] + } + + deinit { + llama_free(context) + llama_free_model(model) + llama_backend_free() + } + + static func createContext(path: String) throws -> LlamaContext { + llama_backend_init(false) + let model_params = llama_model_default_params() + + let model = llama_load_model_from_file(path, model_params) + guard let model else { + print("Could not load model at \(path)") + throw LlamaError.couldNotInitializeContext + } + var ctx_params = llama_context_default_params() + ctx_params.seed = 1234 + ctx_params.n_ctx = 2048 + ctx_params.n_threads = 8 + ctx_params.n_threads_batch = 8 + + let context = llama_new_context_with_model(model, ctx_params) + guard let context else { + print("Could not load context!") + throw LlamaError.couldNotInitializeContext + } + + return LlamaContext(model: model, context: context) + } + + func get_n_tokens() -> Int32 { + return batch.n_tokens; + } + + func completion_init(text: String) { + print("attempting to complete \"\(text)\"") + + tokens_list = tokenize(text: text, add_bos: true) + temporary_invalid_cchars = [] + + let n_ctx = llama_n_ctx(context) + let n_kv_req = tokens_list.count + (Int(n_len) - tokens_list.count) + + print("\n n_len = \(n_len), n_ctx = \(n_ctx), n_kv_req = \(n_kv_req)") + + if n_kv_req > n_ctx { + print("error: n_kv_req > n_ctx, the required KV cache size is not big enough") + } + + for id in tokens_list { + print(String(cString: token_to_piece(token: id) + [0])) + } + + // batch = llama_batch_init(512, 0) // done in init() + batch.n_tokens = Int32(tokens_list.count) + + for i1 in 0.. String { + var new_token_id: llama_token = 0 + + let n_vocab = llama_n_vocab(model) + let logits = llama_get_logits_ith(context, batch.n_tokens - 1) + + var candidates = Array() + candidates.reserveCapacity(Int(n_vocab)) + + for token_id in 0.. [llama_token] { + let utf8Count = text.utf8.count + let n_tokens = utf8Count + (add_bos ? 1 : 0) + let tokens = UnsafeMutablePointer.allocate(capacity: n_tokens) + let tokenCount = llama_tokenize(model, text, Int32(utf8Count), tokens, Int32(n_tokens), add_bos, false) + + var swiftTokens: [llama_token] = [] + for i in 0.. [CChar] { + let result = UnsafeMutablePointer.allocate(capacity: 8) + result.initialize(repeating: Int8(0), count: 8) + defer { + result.deallocate() + } + let nTokens = llama_token_to_piece(model, token, result, 8) + + if nTokens < 0 { + let newResult = UnsafeMutablePointer.allocate(capacity: Int(-nTokens)) + newResult.initialize(repeating: Int8(0), count: Int(-nTokens)) + defer { + newResult.deallocate() + } + let nNewTokens = llama_token_to_piece(model, token, newResult, -nTokens) + let bufferPointer = UnsafeBufferPointer(start: newResult, count: Int(nNewTokens)) + return Array(bufferPointer) + } else { + let bufferPointer = UnsafeBufferPointer(start: result, count: Int(nTokens)) + return Array(bufferPointer) + } + } +} diff --git a/examples/llama.swiftui/llama.cpp.swift/bridging-header.h b/examples/llama.swiftui/llama.cpp.swift/bridging-header.h new file mode 100644 index 0000000000000..6cd72c97919ea --- /dev/null +++ b/examples/llama.swiftui/llama.cpp.swift/bridging-header.h @@ -0,0 +1,5 @@ +// +// Use this file to import your target's public headers that you would like to expose to Swift. +// + +#import "llama.h" diff --git a/examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj new file mode 100644 index 0000000000000..bc1fd15cebb31 --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj @@ -0,0 +1,481 @@ +// !$*UTF8*$! +{ + archiveVersion = 1; + classes = { + }; + objectVersion = 56; + objects = { + +/* Begin PBXBuildFile section */ + 542376082B0D9BFB008E6A1C /* ggml-quants.c in Sources */ = {isa = PBXBuildFile; fileRef = 542376072B0D9BFB008E6A1C /* ggml-quants.c */; }; + 5423760B2B0D9C4B008E6A1C /* ggml-backend.c in Sources */ = {isa = PBXBuildFile; fileRef = 5423760A2B0D9C4B008E6A1C /* ggml-backend.c */; }; + 542378792ACE3F3500834A7B /* ggml-metal.metal in Resources */ = {isa = PBXBuildFile; fileRef = 549479C82AC9E10B00E0F78B /* ggml-metal.metal */; }; + 542EA09D2AC8723900A8AEE9 /* ggml.c in Sources */ = {isa = PBXBuildFile; fileRef = 542EA09B2AC8723900A8AEE9 /* ggml.c */; settings = {COMPILER_FLAGS = "-DGGML_USE_ACCELERATE -DGGML_USE_METAL -DGGML_USE_K_QUANTS -O3"; }; }; + 542EA0A02AC8725700A8AEE9 /* ggml-alloc.c in Sources */ = {isa = PBXBuildFile; fileRef = 542EA09F2AC8725700A8AEE9 /* ggml-alloc.c */; }; + 542EA0A32AC8729100A8AEE9 /* llama.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 542EA0A12AC8729100A8AEE9 /* llama.cpp */; settings = {COMPILER_FLAGS = "-DGGML_USE_K_QUANTS -DGGML_USE_METAL -O3"; }; }; + 549479CB2AC9E16000E0F78B /* Metal.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 549479CA2AC9E16000E0F78B /* Metal.framework */; }; + 549479CD2AC9E42A00E0F78B /* ggml-metal.m in Sources */ = {isa = PBXBuildFile; fileRef = 549479C52AC9E0F200E0F78B /* ggml-metal.m */; settings = {COMPILER_FLAGS = "-fno-objc-arc -DGGML_SWIFT -DGGML_USE_METAL -O3"; }; }; + 8A1C83772AC328BD0096AF73 /* llama_swiftuiApp.swift in Sources */ = {isa = PBXBuildFile; fileRef = 8A1C83762AC328BD0096AF73 /* llama_swiftuiApp.swift */; }; + 8A1C83792AC328BD0096AF73 /* ContentView.swift in Sources */ = {isa = PBXBuildFile; fileRef = 8A1C83782AC328BD0096AF73 /* ContentView.swift */; }; + 8A1C837B2AC328BE0096AF73 /* Assets.xcassets in Resources */ = {isa = PBXBuildFile; fileRef = 8A1C837A2AC328BE0096AF73 /* Assets.xcassets */; }; + 8A1C837E2AC328BE0096AF73 /* Preview Assets.xcassets in Resources */ = {isa = PBXBuildFile; fileRef = 8A1C837D2AC328BE0096AF73 /* Preview Assets.xcassets */; }; + 8A39BE0A2AC7601100BFEB40 /* Accelerate.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 8A39BE092AC7601000BFEB40 /* Accelerate.framework */; }; + 8A3F84242AC4C891005E2EE8 /* models in Resources */ = {isa = PBXBuildFile; fileRef = 8A3F84232AC4C891005E2EE8 /* models */; }; + 8A907F332AC7138A006146EA /* LibLlama.swift in Sources */ = {isa = PBXBuildFile; fileRef = 8A907F322AC7134E006146EA /* LibLlama.swift */; }; + 8A9F7C4D2AC332EE008AE1EA /* LlamaState.swift in Sources */ = {isa = PBXBuildFile; fileRef = 8A9F7C4C2AC332EE008AE1EA /* LlamaState.swift */; }; +/* End PBXBuildFile section */ + +/* Begin PBXFileReference section */ + 542376062B0D9BEA008E6A1C /* ggml-quants.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = "ggml-quants.h"; path = "../../ggml-quants.h"; sourceTree = ""; }; + 542376072B0D9BFB008E6A1C /* ggml-quants.c */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.c; name = "ggml-quants.c"; path = "../../ggml-quants.c"; sourceTree = ""; }; + 542376092B0D9C40008E6A1C /* ggml-backend.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = "ggml-backend.h"; path = "../../ggml-backend.h"; sourceTree = ""; }; + 5423760A2B0D9C4B008E6A1C /* ggml-backend.c */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.c; name = "ggml-backend.c"; path = "../../ggml-backend.c"; sourceTree = ""; }; + 542EA09B2AC8723900A8AEE9 /* ggml.c */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.c; name = ggml.c; path = ../../ggml.c; sourceTree = ""; }; + 542EA09C2AC8723900A8AEE9 /* ggml.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ggml.h; path = ../../ggml.h; sourceTree = ""; }; + 542EA09E2AC8725700A8AEE9 /* ggml-alloc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = "ggml-alloc.h"; path = "../../ggml-alloc.h"; sourceTree = ""; }; + 542EA09F2AC8725700A8AEE9 /* ggml-alloc.c */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.c; name = "ggml-alloc.c"; path = "../../ggml-alloc.c"; sourceTree = ""; }; + 542EA0A12AC8729100A8AEE9 /* llama.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = llama.cpp; path = ../../llama.cpp; sourceTree = ""; }; + 542EA0A22AC8729100A8AEE9 /* llama.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = llama.h; path = ../../llama.h; sourceTree = ""; }; + 549479C52AC9E0F200E0F78B /* ggml-metal.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = "ggml-metal.m"; path = "../../ggml-metal.m"; sourceTree = ""; }; + 549479C62AC9E0F200E0F78B /* ggml-metal.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = "ggml-metal.h"; path = "../../ggml-metal.h"; sourceTree = ""; }; + 549479C82AC9E10B00E0F78B /* ggml-metal.metal */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.metal; name = "ggml-metal.metal"; path = "../../ggml-metal.metal"; sourceTree = ""; }; + 549479CA2AC9E16000E0F78B /* Metal.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = Metal.framework; path = System/Library/Frameworks/Metal.framework; sourceTree = SDKROOT; }; + 8A08D20A2AC73B1500FE6CD4 /* bridging-header.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = "bridging-header.h"; sourceTree = ""; }; + 8A1C83732AC328BD0096AF73 /* llama.swiftui.app */ = {isa = PBXFileReference; explicitFileType = wrapper.application; includeInIndex = 0; path = llama.swiftui.app; sourceTree = BUILT_PRODUCTS_DIR; }; + 8A1C83762AC328BD0096AF73 /* llama_swiftuiApp.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = llama_swiftuiApp.swift; sourceTree = ""; }; + 8A1C83782AC328BD0096AF73 /* ContentView.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = ContentView.swift; sourceTree = ""; }; + 8A1C837A2AC328BE0096AF73 /* Assets.xcassets */ = {isa = PBXFileReference; lastKnownFileType = folder.assetcatalog; path = Assets.xcassets; sourceTree = ""; }; + 8A1C837D2AC328BE0096AF73 /* Preview Assets.xcassets */ = {isa = PBXFileReference; lastKnownFileType = folder.assetcatalog; path = "Preview Assets.xcassets"; sourceTree = ""; }; + 8A39BE092AC7601000BFEB40 /* Accelerate.framework */ = {isa = PBXFileReference; lastKnownFileType = wrapper.framework; name = Accelerate.framework; path = System/Library/Frameworks/Accelerate.framework; sourceTree = SDKROOT; }; + 8A3F841F2AC4C824005E2EE8 /* llama-2-7b-chat.Q2_K.gguf */ = {isa = PBXFileReference; lastKnownFileType = file; path = "llama-2-7b-chat.Q2_K.gguf"; sourceTree = ""; }; + 8A3F84232AC4C891005E2EE8 /* models */ = {isa = PBXFileReference; lastKnownFileType = folder; name = models; path = llama.swiftui/Resources/models; sourceTree = ""; }; + 8A907F322AC7134E006146EA /* LibLlama.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LibLlama.swift; sourceTree = ""; }; + 8A9F7C4C2AC332EE008AE1EA /* LlamaState.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = LlamaState.swift; sourceTree = ""; }; +/* End PBXFileReference section */ + +/* Begin PBXFrameworksBuildPhase section */ + 8A1C83702AC328BD0096AF73 /* Frameworks */ = { + isa = PBXFrameworksBuildPhase; + buildActionMask = 2147483647; + files = ( + 549479CB2AC9E16000E0F78B /* Metal.framework in Frameworks */, + 8A39BE0A2AC7601100BFEB40 /* Accelerate.framework in Frameworks */, + ); + runOnlyForDeploymentPostprocessing = 0; + }; +/* End PBXFrameworksBuildPhase section */ + +/* Begin PBXGroup section */ + 8A08D1F62AC7383900FE6CD4 /* llama.cpp */ = { + isa = PBXGroup; + children = ( + 5423760A2B0D9C4B008E6A1C /* ggml-backend.c */, + 542376092B0D9C40008E6A1C /* ggml-backend.h */, + 542376062B0D9BEA008E6A1C /* ggml-quants.h */, + 542376072B0D9BFB008E6A1C /* ggml-quants.c */, + 549479C82AC9E10B00E0F78B /* ggml-metal.metal */, + 549479C62AC9E0F200E0F78B /* ggml-metal.h */, + 549479C52AC9E0F200E0F78B /* ggml-metal.m */, + 542EA09B2AC8723900A8AEE9 /* ggml.c */, + 542EA09C2AC8723900A8AEE9 /* ggml.h */, + 542EA09F2AC8725700A8AEE9 /* ggml-alloc.c */, + 542EA09E2AC8725700A8AEE9 /* ggml-alloc.h */, + 542EA0A12AC8729100A8AEE9 /* llama.cpp */, + 542EA0A22AC8729100A8AEE9 /* llama.h */, + ); + name = llama.cpp; + sourceTree = ""; + }; + 8A1C836A2AC328BD0096AF73 = { + isa = PBXGroup; + children = ( + 8A08D1F62AC7383900FE6CD4 /* llama.cpp */, + 8A907F312AC7134E006146EA /* llama.cpp.swift */, + 8A3F84232AC4C891005E2EE8 /* models */, + 8A1C83752AC328BD0096AF73 /* llama.swiftui */, + 8A1C83742AC328BD0096AF73 /* Products */, + 8A39BE082AC7601000BFEB40 /* Frameworks */, + ); + sourceTree = ""; + }; + 8A1C83742AC328BD0096AF73 /* Products */ = { + isa = PBXGroup; + children = ( + 8A1C83732AC328BD0096AF73 /* llama.swiftui.app */, + ); + name = Products; + sourceTree = ""; + }; + 8A1C83752AC328BD0096AF73 /* llama.swiftui */ = { + isa = PBXGroup; + children = ( + 8A3F84102AC4BD85005E2EE8 /* Resources */, + 8A9F7C4B2AC332DC008AE1EA /* Models */, + 8A9F7C4A2AC332BF008AE1EA /* UI */, + 8A1C83762AC328BD0096AF73 /* llama_swiftuiApp.swift */, + 8A1C837A2AC328BE0096AF73 /* Assets.xcassets */, + 8A1C837C2AC328BE0096AF73 /* Preview Content */, + ); + path = llama.swiftui; + sourceTree = ""; + }; + 8A1C837C2AC328BE0096AF73 /* Preview Content */ = { + isa = PBXGroup; + children = ( + 8A1C837D2AC328BE0096AF73 /* Preview Assets.xcassets */, + ); + path = "Preview Content"; + sourceTree = ""; + }; + 8A39BE082AC7601000BFEB40 /* Frameworks */ = { + isa = PBXGroup; + children = ( + 549479CA2AC9E16000E0F78B /* Metal.framework */, + 8A39BE092AC7601000BFEB40 /* Accelerate.framework */, + ); + name = Frameworks; + sourceTree = ""; + }; + 8A3F84102AC4BD85005E2EE8 /* Resources */ = { + isa = PBXGroup; + children = ( + 8A3F84112AC4BD8C005E2EE8 /* models */, + ); + path = Resources; + sourceTree = ""; + }; + 8A3F84112AC4BD8C005E2EE8 /* models */ = { + isa = PBXGroup; + children = ( + 8A3F841F2AC4C824005E2EE8 /* llama-2-7b-chat.Q2_K.gguf */, + ); + path = models; + sourceTree = ""; + }; + 8A907F312AC7134E006146EA /* llama.cpp.swift */ = { + isa = PBXGroup; + children = ( + 8A08D20A2AC73B1500FE6CD4 /* bridging-header.h */, + 8A907F322AC7134E006146EA /* LibLlama.swift */, + ); + path = llama.cpp.swift; + sourceTree = ""; + }; + 8A9F7C4A2AC332BF008AE1EA /* UI */ = { + isa = PBXGroup; + children = ( + 8A1C83782AC328BD0096AF73 /* ContentView.swift */, + ); + path = UI; + sourceTree = ""; + }; + 8A9F7C4B2AC332DC008AE1EA /* Models */ = { + isa = PBXGroup; + children = ( + 8A9F7C4C2AC332EE008AE1EA /* LlamaState.swift */, + ); + path = Models; + sourceTree = ""; + }; +/* End PBXGroup section */ + +/* Begin PBXNativeTarget section */ + 8A1C83722AC328BD0096AF73 /* llama.swiftui */ = { + isa = PBXNativeTarget; + buildConfigurationList = 8A1C83812AC328BE0096AF73 /* Build configuration list for PBXNativeTarget "llama.swiftui" */; + buildPhases = ( + 8A1C836F2AC328BD0096AF73 /* Sources */, + 8A1C83702AC328BD0096AF73 /* Frameworks */, + 8A1C83712AC328BD0096AF73 /* Resources */, + ); + buildRules = ( + ); + dependencies = ( + ); + name = llama.swiftui; + packageProductDependencies = ( + ); + productName = llama.swiftui; + productReference = 8A1C83732AC328BD0096AF73 /* llama.swiftui.app */; + productType = "com.apple.product-type.application"; + }; +/* End PBXNativeTarget section */ + +/* Begin PBXProject section */ + 8A1C836B2AC328BD0096AF73 /* Project object */ = { + isa = PBXProject; + attributes = { + BuildIndependentTargetsInParallel = 1; + LastSwiftUpdateCheck = 1500; + LastUpgradeCheck = 1500; + TargetAttributes = { + 8A1C83722AC328BD0096AF73 = { + CreatedOnToolsVersion = 15.0; + LastSwiftMigration = 1500; + }; + }; + }; + buildConfigurationList = 8A1C836E2AC328BD0096AF73 /* Build configuration list for PBXProject "llama.swiftui" */; + compatibilityVersion = "Xcode 14.0"; + developmentRegion = en; + hasScannedForEncodings = 0; + knownRegions = ( + en, + Base, + ); + mainGroup = 8A1C836A2AC328BD0096AF73; + packageReferences = ( + ); + productRefGroup = 8A1C83742AC328BD0096AF73 /* Products */; + projectDirPath = ""; + projectRoot = ""; + targets = ( + 8A1C83722AC328BD0096AF73 /* llama.swiftui */, + ); + }; +/* End PBXProject section */ + +/* Begin PBXResourcesBuildPhase section */ + 8A1C83712AC328BD0096AF73 /* Resources */ = { + isa = PBXResourcesBuildPhase; + buildActionMask = 2147483647; + files = ( + 542378792ACE3F3500834A7B /* ggml-metal.metal in Resources */, + 8A3F84242AC4C891005E2EE8 /* models in Resources */, + 8A1C837E2AC328BE0096AF73 /* Preview Assets.xcassets in Resources */, + 8A1C837B2AC328BE0096AF73 /* Assets.xcassets in Resources */, + ); + runOnlyForDeploymentPostprocessing = 0; + }; +/* End PBXResourcesBuildPhase section */ + +/* Begin PBXSourcesBuildPhase section */ + 8A1C836F2AC328BD0096AF73 /* Sources */ = { + isa = PBXSourcesBuildPhase; + buildActionMask = 2147483647; + files = ( + 542376082B0D9BFB008E6A1C /* ggml-quants.c in Sources */, + 549479CD2AC9E42A00E0F78B /* ggml-metal.m in Sources */, + 542EA09D2AC8723900A8AEE9 /* ggml.c in Sources */, + 8A907F332AC7138A006146EA /* LibLlama.swift in Sources */, + 542EA0A32AC8729100A8AEE9 /* llama.cpp in Sources */, + 8A9F7C4D2AC332EE008AE1EA /* LlamaState.swift in Sources */, + 8A1C83792AC328BD0096AF73 /* ContentView.swift in Sources */, + 8A1C83772AC328BD0096AF73 /* llama_swiftuiApp.swift in Sources */, + 542EA0A02AC8725700A8AEE9 /* ggml-alloc.c in Sources */, + 5423760B2B0D9C4B008E6A1C /* ggml-backend.c in Sources */, + ); + runOnlyForDeploymentPostprocessing = 0; + }; +/* End PBXSourcesBuildPhase section */ + +/* Begin XCBuildConfiguration section */ + 8A1C837F2AC328BE0096AF73 /* Debug */ = { + isa = XCBuildConfiguration; + buildSettings = { + ALWAYS_SEARCH_USER_PATHS = NO; + ASSETCATALOG_COMPILER_GENERATE_SWIFT_ASSET_SYMBOL_EXTENSIONS = YES; + CLANG_ANALYZER_NONNULL = YES; + CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; + CLANG_CXX_LANGUAGE_STANDARD = "gnu++20"; + CLANG_ENABLE_MODULES = YES; + CLANG_ENABLE_OBJC_ARC = YES; + CLANG_ENABLE_OBJC_WEAK = YES; + CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES; + CLANG_WARN_BOOL_CONVERSION = YES; + CLANG_WARN_COMMA = YES; + CLANG_WARN_CONSTANT_CONVERSION = YES; + CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES; + CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; + CLANG_WARN_DOCUMENTATION_COMMENTS = YES; + CLANG_WARN_EMPTY_BODY = YES; + CLANG_WARN_ENUM_CONVERSION = YES; + CLANG_WARN_INFINITE_RECURSION = YES; + CLANG_WARN_INT_CONVERSION = YES; + CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES; + CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES; + CLANG_WARN_OBJC_LITERAL_CONVERSION = YES; + CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; + CLANG_WARN_QUOTED_INCLUDE_IN_FRAMEWORK_HEADER = YES; + CLANG_WARN_RANGE_LOOP_ANALYSIS = YES; + CLANG_WARN_STRICT_PROTOTYPES = YES; + CLANG_WARN_SUSPICIOUS_MOVE = YES; + CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE; + CLANG_WARN_UNREACHABLE_CODE = YES; + CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; + COPY_PHASE_STRIP = NO; + DEBUG_INFORMATION_FORMAT = dwarf; + ENABLE_STRICT_OBJC_MSGSEND = YES; + ENABLE_TESTABILITY = YES; + ENABLE_USER_SCRIPT_SANDBOXING = YES; + GCC_C_LANGUAGE_STANDARD = gnu17; + GCC_DYNAMIC_NO_PIC = NO; + GCC_NO_COMMON_BLOCKS = YES; + GCC_OPTIMIZATION_LEVEL = 0; + GCC_PREPROCESSOR_DEFINITIONS = ( + "DEBUG=1", + "$(inherited)", + ); + GCC_WARN_64_TO_32_BIT_CONVERSION = YES; + GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; + GCC_WARN_UNDECLARED_SELECTOR = YES; + GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; + GCC_WARN_UNUSED_FUNCTION = YES; + GCC_WARN_UNUSED_VARIABLE = YES; + IPHONEOS_DEPLOYMENT_TARGET = 17.0; + LOCALIZATION_PREFERS_STRING_CATALOGS = YES; + MTL_ENABLE_DEBUG_INFO = INCLUDE_SOURCE; + MTL_FAST_MATH = YES; + ONLY_ACTIVE_ARCH = YES; + SDKROOT = iphoneos; + SWIFT_ACTIVE_COMPILATION_CONDITIONS = "DEBUG $(inherited)"; + SWIFT_OPTIMIZATION_LEVEL = "-Onone"; + }; + name = Debug; + }; + 8A1C83802AC328BE0096AF73 /* Release */ = { + isa = XCBuildConfiguration; + buildSettings = { + ALWAYS_SEARCH_USER_PATHS = NO; + ASSETCATALOG_COMPILER_GENERATE_SWIFT_ASSET_SYMBOL_EXTENSIONS = YES; + CLANG_ANALYZER_NONNULL = YES; + CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; + CLANG_CXX_LANGUAGE_STANDARD = "gnu++20"; + CLANG_ENABLE_MODULES = YES; + CLANG_ENABLE_OBJC_ARC = YES; + CLANG_ENABLE_OBJC_WEAK = YES; + CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES; + CLANG_WARN_BOOL_CONVERSION = YES; + CLANG_WARN_COMMA = YES; + CLANG_WARN_CONSTANT_CONVERSION = YES; + CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES; + CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; + CLANG_WARN_DOCUMENTATION_COMMENTS = YES; + CLANG_WARN_EMPTY_BODY = YES; + CLANG_WARN_ENUM_CONVERSION = YES; + CLANG_WARN_INFINITE_RECURSION = YES; + CLANG_WARN_INT_CONVERSION = YES; + CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES; + CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES; + CLANG_WARN_OBJC_LITERAL_CONVERSION = YES; + CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; + CLANG_WARN_QUOTED_INCLUDE_IN_FRAMEWORK_HEADER = YES; + CLANG_WARN_RANGE_LOOP_ANALYSIS = YES; + CLANG_WARN_STRICT_PROTOTYPES = YES; + CLANG_WARN_SUSPICIOUS_MOVE = YES; + CLANG_WARN_UNGUARDED_AVAILABILITY = YES_AGGRESSIVE; + CLANG_WARN_UNREACHABLE_CODE = YES; + CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; + COPY_PHASE_STRIP = NO; + DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym"; + ENABLE_NS_ASSERTIONS = NO; + ENABLE_STRICT_OBJC_MSGSEND = YES; + ENABLE_USER_SCRIPT_SANDBOXING = YES; + GCC_C_LANGUAGE_STANDARD = gnu17; + GCC_NO_COMMON_BLOCKS = YES; + GCC_WARN_64_TO_32_BIT_CONVERSION = YES; + GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; + GCC_WARN_UNDECLARED_SELECTOR = YES; + GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; + GCC_WARN_UNUSED_FUNCTION = YES; + GCC_WARN_UNUSED_VARIABLE = YES; + IPHONEOS_DEPLOYMENT_TARGET = 17.0; + LOCALIZATION_PREFERS_STRING_CATALOGS = YES; + MTL_ENABLE_DEBUG_INFO = NO; + MTL_FAST_MATH = YES; + SDKROOT = iphoneos; + SWIFT_COMPILATION_MODE = wholemodule; + VALIDATE_PRODUCT = YES; + }; + name = Release; + }; + 8A1C83822AC328BE0096AF73 /* Debug */ = { + isa = XCBuildConfiguration; + buildSettings = { + ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; + ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; + CLANG_ENABLE_MODULES = YES; + CODE_SIGN_STYLE = Automatic; + CURRENT_PROJECT_VERSION = 1; + DEVELOPMENT_ASSET_PATHS = "\"llama.swiftui/Preview Content\""; + DEVELOPMENT_TEAM = STLSG3FG8Q; + ENABLE_PREVIEWS = YES; + GENERATE_INFOPLIST_FILE = YES; + INFOPLIST_KEY_UIApplicationSceneManifest_Generation = YES; + INFOPLIST_KEY_UIApplicationSupportsIndirectInputEvents = YES; + INFOPLIST_KEY_UILaunchScreen_Generation = YES; + INFOPLIST_KEY_UISupportedInterfaceOrientations_iPad = "UIInterfaceOrientationPortrait UIInterfaceOrientationPortraitUpsideDown UIInterfaceOrientationLandscapeLeft UIInterfaceOrientationLandscapeRight"; + INFOPLIST_KEY_UISupportedInterfaceOrientations_iPhone = "UIInterfaceOrientationPortrait UIInterfaceOrientationLandscapeLeft UIInterfaceOrientationLandscapeRight"; + IPHONEOS_DEPLOYMENT_TARGET = 16.0; + LD_RUNPATH_SEARCH_PATHS = ( + "$(inherited)", + "@executable_path/Frameworks", + ); + MARKETING_VERSION = 1.0; + PRODUCT_BUNDLE_IDENTIFIER = "com.bachittle.llama-swift"; + PRODUCT_NAME = "$(TARGET_NAME)"; + SWIFT_EMIT_LOC_STRINGS = YES; + SWIFT_OBJC_BRIDGING_HEADER = "llama.cpp.swift/bridging-header.h"; + SWIFT_OPTIMIZATION_LEVEL = "-Onone"; + SWIFT_VERSION = 5.0; + TARGETED_DEVICE_FAMILY = "1,2"; + }; + name = Debug; + }; + 8A1C83832AC328BE0096AF73 /* Release */ = { + isa = XCBuildConfiguration; + buildSettings = { + ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; + ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; + CLANG_ENABLE_MODULES = YES; + CODE_SIGN_STYLE = Automatic; + CURRENT_PROJECT_VERSION = 1; + DEVELOPMENT_ASSET_PATHS = "\"llama.swiftui/Preview Content\""; + DEVELOPMENT_TEAM = STLSG3FG8Q; + ENABLE_PREVIEWS = YES; + GENERATE_INFOPLIST_FILE = YES; + INFOPLIST_KEY_UIApplicationSceneManifest_Generation = YES; + INFOPLIST_KEY_UIApplicationSupportsIndirectInputEvents = YES; + INFOPLIST_KEY_UILaunchScreen_Generation = YES; + INFOPLIST_KEY_UISupportedInterfaceOrientations_iPad = "UIInterfaceOrientationPortrait UIInterfaceOrientationPortraitUpsideDown UIInterfaceOrientationLandscapeLeft UIInterfaceOrientationLandscapeRight"; + INFOPLIST_KEY_UISupportedInterfaceOrientations_iPhone = "UIInterfaceOrientationPortrait UIInterfaceOrientationLandscapeLeft UIInterfaceOrientationLandscapeRight"; + IPHONEOS_DEPLOYMENT_TARGET = 16.0; + LD_RUNPATH_SEARCH_PATHS = ( + "$(inherited)", + "@executable_path/Frameworks", + ); + MARKETING_VERSION = 1.0; + PRODUCT_BUNDLE_IDENTIFIER = "com.bachittle.llama-swift"; + PRODUCT_NAME = "$(TARGET_NAME)"; + SWIFT_EMIT_LOC_STRINGS = YES; + SWIFT_OBJC_BRIDGING_HEADER = "llama.cpp.swift/bridging-header.h"; + SWIFT_VERSION = 5.0; + TARGETED_DEVICE_FAMILY = "1,2"; + }; + name = Release; + }; +/* End XCBuildConfiguration section */ + +/* Begin XCConfigurationList section */ + 8A1C836E2AC328BD0096AF73 /* Build configuration list for PBXProject "llama.swiftui" */ = { + isa = XCConfigurationList; + buildConfigurations = ( + 8A1C837F2AC328BE0096AF73 /* Debug */, + 8A1C83802AC328BE0096AF73 /* Release */, + ); + defaultConfigurationIsVisible = 0; + defaultConfigurationName = Release; + }; + 8A1C83812AC328BE0096AF73 /* Build configuration list for PBXNativeTarget "llama.swiftui" */ = { + isa = XCConfigurationList; + buildConfigurations = ( + 8A1C83822AC328BE0096AF73 /* Debug */, + 8A1C83832AC328BE0096AF73 /* Release */, + ); + defaultConfigurationIsVisible = 0; + defaultConfigurationName = Release; + }; +/* End XCConfigurationList section */ + }; + rootObject = 8A1C836B2AC328BD0096AF73 /* Project object */; +} diff --git a/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/contents.xcworkspacedata b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/contents.xcworkspacedata new file mode 100644 index 0000000000000..919434a6254f0 --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/contents.xcworkspacedata @@ -0,0 +1,7 @@ + + + + + diff --git a/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/xcshareddata/IDEWorkspaceChecks.plist b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/xcshareddata/IDEWorkspaceChecks.plist new file mode 100644 index 0000000000000..3d4c1e55259fe --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui.xcodeproj/project.xcworkspace/xcshareddata/IDEWorkspaceChecks.plist @@ -0,0 +1,8 @@ + + + + + IDEDidComputeMac32BitWarning + + + diff --git a/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AccentColor.colorset/Contents.json b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AccentColor.colorset/Contents.json new file mode 100644 index 0000000000000..eb87897008164 --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AccentColor.colorset/Contents.json @@ -0,0 +1,11 @@ +{ + "colors" : [ + { + "idiom" : "universal" + } + ], + "info" : { + "author" : "xcode", + "version" : 1 + } +} diff --git a/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AppIcon.appiconset/Contents.json b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AppIcon.appiconset/Contents.json new file mode 100644 index 0000000000000..13613e3ee1a93 --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/AppIcon.appiconset/Contents.json @@ -0,0 +1,13 @@ +{ + "images" : [ + { + "idiom" : "universal", + "platform" : "ios", + "size" : "1024x1024" + } + ], + "info" : { + "author" : "xcode", + "version" : 1 + } +} diff --git a/examples/llama.swiftui/llama.swiftui/Assets.xcassets/Contents.json b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/Contents.json new file mode 100644 index 0000000000000..73c00596a7fca --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/Assets.xcassets/Contents.json @@ -0,0 +1,6 @@ +{ + "info" : { + "author" : "xcode", + "version" : 1 + } +} diff --git a/examples/llama.swiftui/llama.swiftui/Models/LlamaState.swift b/examples/llama.swiftui/llama.swiftui/Models/LlamaState.swift new file mode 100644 index 0000000000000..babc60cdcc9dc --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/Models/LlamaState.swift @@ -0,0 +1,45 @@ +import Foundation + +@MainActor +class LlamaState: ObservableObject { + @Published var messageLog = "" + + private var llamaContext: LlamaContext? + private var modelUrl: URL? { + Bundle.main.url(forResource: "q8_0", withExtension: "gguf", subdirectory: "models") + // Bundle.main.url(forResource: "llama-2-7b-chat", withExtension: "Q2_K.gguf", subdirectory: "models") + } + init() { + do { + try loadModel() + } catch { + messageLog += "Error!\n" + } + } + + private func loadModel() throws { + messageLog += "Loading model...\n" + if let modelUrl { + llamaContext = try LlamaContext.createContext(path: modelUrl.path()) + messageLog += "Loaded model \(modelUrl.lastPathComponent)\n" + } else { + messageLog += "Could not locate model\n" + } + } + + func complete(text: String) async { + guard let llamaContext else { + return + } + messageLog += "Attempting to complete text...\n" + await llamaContext.completion_init(text: text) + messageLog += "\(text)" + + while await llamaContext.n_cur <= llamaContext.n_len { + let result = await llamaContext.completion_loop() + messageLog += "\(result)" + } + await llamaContext.clear() + messageLog += "\n\ndone\n" + } +} diff --git a/examples/llama.swiftui/llama.swiftui/Preview Content/Preview Assets.xcassets/Contents.json b/examples/llama.swiftui/llama.swiftui/Preview Content/Preview Assets.xcassets/Contents.json new file mode 100644 index 0000000000000..73c00596a7fca --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/Preview Content/Preview Assets.xcassets/Contents.json @@ -0,0 +1,6 @@ +{ + "info" : { + "author" : "xcode", + "version" : 1 + } +} diff --git a/examples/llama.swiftui/llama.swiftui/Resources/models/.gitignore b/examples/llama.swiftui/llama.swiftui/Resources/models/.gitignore new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/examples/llama.swiftui/llama.swiftui/UI/ContentView.swift b/examples/llama.swiftui/llama.swiftui/UI/ContentView.swift new file mode 100644 index 0000000000000..0bd16a806d10f --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/UI/ContentView.swift @@ -0,0 +1,42 @@ +import SwiftUI + +struct ContentView: View { + @StateObject var llamaState = LlamaState() + + @State private var multiLineText = "" + + var body: some View { + VStack { + ScrollView(.vertical) { + Text(llamaState.messageLog) + } + + TextEditor(text: $multiLineText) + .frame(height: 200) + .padding() + .border(Color.gray, width: 0.5) + Button(action: { + sendText() + }) { + Text("Send") + .padding() + .background(Color.blue) + .foregroundColor(.white) + .cornerRadius(8) + } + } + .padding() + } + + func sendText() { + Task { + await llamaState.complete(text: multiLineText) + multiLineText = "" + } + } +} +/* +#Preview { + ContentView() +} +*/ diff --git a/examples/llama.swiftui/llama.swiftui/llama_swiftuiApp.swift b/examples/llama.swiftui/llama.swiftui/llama_swiftuiApp.swift new file mode 100644 index 0000000000000..cccda8a979f5e --- /dev/null +++ b/examples/llama.swiftui/llama.swiftui/llama_swiftuiApp.swift @@ -0,0 +1,10 @@ +import SwiftUI + +@main +struct llama_swiftuiApp: App { + var body: some Scene { + WindowGroup { + ContentView() + } + } +} diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp index fc0656c231a0c..4bb7b93b63440 100644 --- a/examples/llava/clip.cpp +++ b/examples/llava/clip.cpp @@ -739,7 +739,7 @@ bool clip_image_preprocess(const clip_ctx * ctx, const clip_image_u8 * img, clip temp->ny = longer_side; temp->size = 3 * longer_side * longer_side; temp->data = new uint8_t[temp->size](); - uint8_t bc[3] = {122, 116, 104}; // bakground color in RGB from LLaVA + uint8_t bc[3] = {122, 116, 104}; // background color in RGB from LLaVA // fill with background color for (size_t i = 0; i < temp->size; i++) { diff --git a/examples/llava/convert-image-encoder-to-gguf.py b/examples/llava/convert-image-encoder-to-gguf.py index 2f5eef1991955..03688e0ea1889 100644 --- a/examples/llava/convert-image-encoder-to-gguf.py +++ b/examples/llava/convert-image-encoder-to-gguf.py @@ -5,7 +5,7 @@ import torch import numpy as np from gguf import * -from transformers import CLIPModel, CLIPProcessor +from transformers import CLIPModel, CLIPProcessor, CLIPVisionModel TEXT = "clip.text" VISION = "clip.vision" @@ -51,7 +51,7 @@ def bytes_to_unicode(): The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. When you're at something like a 10B token dataset you end up needing around 5K for decent coverage. - This is a signficant percentage of your normal, say, 32K bpe vocab. + This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup tables between utf-8 bytes and unicode strings. And avoids mapping to whitespace/control characters the bpe code barfs on. """ @@ -78,11 +78,19 @@ def bytes_to_unicode(): help="Save a text-only model. It can't be used to encode images") ap.add_argument("--vision-only", action="store_true", required=False, help="Save a vision-only model. It can't be used to encode texts") +ap.add_argument("--clip_model_is_vision", action="store_true", required=False, + help="The clip model is a pure vision model (ShareGPT4V vision extract for example)") ap.add_argument("--llava-projector", help="Path to llava.projector file. If specified, save an image encoder for LLaVA models.") ap.add_argument("--image-mean", nargs=3, type=float, required=False, help="Override image mean values") ap.add_argument("--image-std", nargs=3, type=float, required=False, help="Override image std values") ap.add_argument("-o", "--output-dir", help="Directory to save GGUF files. Default is the original model directory", default=None) +# Example --image_mean 0.48145466 0.4578275 0.40821073 --image_std 0.26862954 0.26130258 0.27577711 +default_image_mean = [0.48145466, 0.4578275, 0.40821073] +default_image_std = [0.26862954, 0.26130258, 0.27577711] +ap.add_argument('--image_mean', type=float, nargs='+', help='Mean of the images for normalization (overrides processor) ', default=None) +ap.add_argument('--image_std', type=float, nargs='+', help='Standard deviation of the images for normalization (overrides processor)', default=None) +# with proper args = ap.parse_args() @@ -96,15 +104,22 @@ def bytes_to_unicode(): # output in the same directory as the model if output_dir is None dir_model = args.model_dir - -with open(dir_model + "/vocab.json", "r", encoding="utf-8") as f: - vocab = json.load(f) - tokens = [key for key in vocab] +if args.clip_model_is_vision: + vocab = None + tokens = None +else: + with open(dir_model + "/vocab.json", "r", encoding="utf-8") as f: + vocab = json.load(f) + tokens = [key for key in vocab] with open(dir_model + "/config.json", "r", encoding="utf-8") as f: config = json.load(f) - v_hparams = config["vision_config"] - t_hparams = config["text_config"] + if args.clip_model_is_vision: + v_hparams = config + t_hparams = None + else: + v_hparams = config["vision_config"] + t_hparams = config["text_config"] # possible data types # ftype == 0 -> float32 @@ -117,9 +132,12 @@ def bytes_to_unicode(): if args.use_f32: ftype = 0 - -model = CLIPModel.from_pretrained(dir_model) -processor = CLIPProcessor.from_pretrained(dir_model) +if args.clip_model_is_vision: + model = CLIPVisionModel.from_pretrained(dir_model) + processor = None +else: + model = CLIPModel.from_pretrained(dir_model) + processor = CLIPProcessor.from_pretrained(dir_model) fname_middle = None has_text_encoder = True @@ -128,13 +146,13 @@ def bytes_to_unicode(): if args.text_only: fname_middle = "text-" has_vision_encoder = False -elif args.vision_only: - fname_middle = "vision-" - has_text_encoder = False elif args.llava_projector is not None: fname_middle = "mmproj-" has_text_encoder = False has_llava_projector = True +elif args.vision_only: + fname_middle = "vision-" + has_text_encoder = False else: fname_middle = "" @@ -182,8 +200,12 @@ def bytes_to_unicode(): block_count = v_hparams["num_hidden_layers"] - 1 if has_llava_projector else v_hparams["num_hidden_layers"] fout.add_uint32(k(KEY_BLOCK_COUNT, VISION), block_count) - image_mean = processor.image_processor.image_mean if args.image_mean is None else args.image_mean - image_std = processor.image_processor.image_std if args.image_std is None else args.image_std + if processor is not None: + image_mean = processor.image_processor.image_mean if args.image_mean is None or args.image_mean == default_image_mean else args.image_mean + image_std = processor.image_processor.image_std if args.image_std is None or args.image_std == default_image_std else args.image_std + else: + image_mean = args.image_mean if args.image_mean is not None else default_image_mean + image_std = args.image_std if args.image_std is not None else default_image_std fout.add_array("clip.vision.image_mean", image_mean) fout.add_array("clip.vision.image_std", image_std) diff --git a/examples/llava/llava-cli.cpp b/examples/llava/llava-cli.cpp index 633afd1dad1bf..31f8cd8e0ef7b 100644 --- a/examples/llava/llava-cli.cpp +++ b/examples/llava/llava-cli.cpp @@ -208,9 +208,10 @@ static void process_prompt(struct llava_context * ctx_llava, struct llava_image_ int n_past = 0; const int max_tgt_len = params->n_predict < 0 ? 256 : params->n_predict; + const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx_llava->ctx_llama)); // llava chat format is "\nUSER:\n\nASSISTANT:" - eval_string(ctx_llava->ctx_llama, "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\nUSER:", params->n_batch, &n_past, true); + eval_string(ctx_llava->ctx_llama, "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\nUSER:", params->n_batch, &n_past, add_bos); llava_eval_image_embed(ctx_llava->ctx_llama, image_embed, params->n_batch, &n_past); eval_string(ctx_llava->ctx_llama, (prompt + "\nASSISTANT:").c_str(), params->n_batch, &n_past, false); diff --git a/examples/llava/llava.cpp b/examples/llava/llava.cpp index d10bcf2d22465..0cae8c4b10a3a 100644 --- a/examples/llava/llava.cpp +++ b/examples/llava/llava.cpp @@ -127,7 +127,14 @@ static bool load_file_to_bytes(const char* path, unsigned char** bytesOut, long fclose(file); return false; } - fread(buffer, 1, fileSize, file); // Read the file into the buffer + errno = 0; + size_t ret = fread(buffer, 1, fileSize, file); // Read the file into the buffer + if (ferror(file)) { + die_fmt("read error: %s", strerror(errno)); + } + if (ret != (size_t) fileSize) { + die("unexpectedly reached end of file"); + } fclose(file); // Close the file *bytesOut = buffer; diff --git a/examples/lookahead/CMakeLists.txt b/examples/lookahead/CMakeLists.txt new file mode 100644 index 0000000000000..8827e3f11ecd6 --- /dev/null +++ b/examples/lookahead/CMakeLists.txt @@ -0,0 +1,5 @@ +set(TARGET lookahead) +add_executable(${TARGET} lookahead.cpp) +install(TARGETS ${TARGET} RUNTIME) +target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT}) +target_compile_features(${TARGET} PRIVATE cxx_std_11) diff --git a/examples/lookahead/README.md b/examples/lookahead/README.md new file mode 100644 index 0000000000000..a69a471b47d39 --- /dev/null +++ b/examples/lookahead/README.md @@ -0,0 +1,7 @@ +# llama.cpp/examples/lookahead + +Demonstration of lookahead decoding technique: + +https://lmsys.org/blog/2023-11-21-lookahead-decoding/ + +More info: https://github.com/ggerganov/llama.cpp/pull/4207 diff --git a/examples/lookahead/lookahead.cpp b/examples/lookahead/lookahead.cpp new file mode 100644 index 0000000000000..e55a15a1bf054 --- /dev/null +++ b/examples/lookahead/lookahead.cpp @@ -0,0 +1,487 @@ +#include "common.h" +#include "llama.h" + +#include +#include +#include +#include + +struct ngram_data { + bool active = false; + + llama_seq_id seq_id = -1; + + std::vector i_batch; + + std::vector tokens; +}; + +// n-gram container +struct ngram_container { + ngram_container(int n_vocab, int N, int G) { + cnt.resize(n_vocab); + head.resize(n_vocab); + tokens.resize(n_vocab * G * (N - 1)); + } + + int n_total = 0; + + std::vector cnt; + std::vector head; + + // [n_vocab][G][N - 1] + // for each token of the vocab, keep a ring-buffer of capacity G of n-grams of size N - 1 + std::vector tokens; +}; + +int main(int argc, char ** argv) { + gpt_params params; + + if (gpt_params_parse(argc, argv, params) == false) { + return 1; + } + + const int W = 15; // lookahead window + const int N = 5; // n-gram size + const int G = 15; // max verification n-grams + + const bool dump_kv_cache = params.dump_kv_cache; + +#ifndef LOG_DISABLE_LOGS + log_set_target(log_filename_generator("lookahead", "log")); + LOG_TEE("Log start\n"); + log_dump_cmdline(argc, argv); +#endif // LOG_DISABLE_LOGS + + // init llama.cpp + llama_backend_init(params.numa); + + llama_model * model = NULL; + llama_context * ctx = NULL; + + // load the target model + std::tie(model, ctx) = llama_init_from_gpt_params(params); + + // Tokenize the prompt + const bool add_bos = llama_should_add_bos_token(model); + LOG("add_bos tgt: %d\n", add_bos); + + std::vector inp; + std::vector all; + + inp = ::llama_tokenize(ctx, params.prompt, add_bos, true); + all = inp; + + const int max_context_size = llama_n_ctx(ctx); + const int max_tokens_list_size = max_context_size - 4; + + if ((int) inp.size() > max_tokens_list_size) { + fprintf(stderr, "%s: error: prompt too long (%d tokens, max %d)\n", __func__, (int) inp.size(), max_tokens_list_size); + return 1; + } + + fprintf(stderr, "\n\n"); + + for (auto id : inp) { + fprintf(stderr, "%s", llama_token_to_piece(ctx, id).c_str()); + } + + fflush(stderr); + + const int n_input = inp.size(); + + const auto t_enc_start = ggml_time_us(); + + // eval the prompt + llama_decode(ctx, llama_batch_get_one( inp.data(), n_input - 1, 0, 0)); + llama_decode(ctx, llama_batch_get_one(&inp.back(), 1, n_input - 1, 0)); + + for (int s = 1; s < W + G + 1; ++s) { + llama_kv_cache_seq_cp(ctx, 0, s, -1, -1); + } + + const auto t_enc_end = ggml_time_us(); + + int n_predict = 0; + int n_accept = 0; + + int n_past = inp.size(); + + llama_token id = 0; + + // used to determine end of generation + bool has_eos = false; + + // for each decoded batch, we have at most W + G + 1 distinct sequences: + // seq_id == 0 : the current input token + // seq_id [1, W] : tokens from the past N - 1 Jacobi iterations + // seq_id [W + 1, W + G] : verification n-grams + llama_batch batch = llama_batch_init(params.n_ctx, 0, W + G + 1); + + // target model sampling context + struct llama_sampling_context * ctx_sampling = llama_sampling_init(params.sparams); + + // verification n-grams + std::vector ngrams_cur(G); + + // tokens for the past N - 1 Jacobi iterations + std::vector tokens_j_prev(W); + std::vector> tokens_j(N - 1); + for (int j = 0; j < N - 1; j++) { + tokens_j[j].resize(W); + + for (int i = 0; i < W; i++) { + // there are different ways to init these tokens + if (0) { + // initialize randomly from the prompt tokens + tokens_j[j][i] = all[1 + rand() % (all.size() - 1)]; + } else { + // initialize with a sequence of increasing numbers + tokens_j[j][i] = 100 + i; + } + } + } + + std::vector seq_id_look; + + // the input token belongs both to all sequences + std::vector seq_id_all(W + G + 1); + for (int i = 0; i < W + G + 1; i++) { + seq_id_all[i] = i; + } + + // here we keep adding new n-grams as we go + ngram_container ngrams_observed(llama_n_vocab(model), N, G); + + // debug + struct llama_kv_cache_view kvc_view = llama_kv_cache_view_init(ctx, W + G + 1); + + const auto t_dec_start = ggml_time_us(); + + // sample first token + { + id = llama_sampling_sample(ctx_sampling, ctx, NULL, 0); + + llama_sampling_accept(ctx_sampling, ctx, id, true); + + { + const std::string token_str = llama_token_to_piece(ctx, id); + + printf("%s", token_str.c_str()); + fflush(stdout); + } + } + + while (true) { + // debug + if (dump_kv_cache) { + llama_kv_cache_view_update(ctx, &kvc_view); + dump_kv_cache_view_seqs(kvc_view, 40); + } + + // build the mask from https://lmsys.org/blog/2023-11-21-lookahead-decoding/ + // + // Example for W = 5, N = 4, G = 2: + // (I = input, L = lookahead, V = verification) + // + // Batch: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 + // T: -2 -2 -2 -2 -1 -1 -1 -1 -1 0 0 0 0 0 0 + // Info: I L L L L L L L L L L L L L L V V V V V V + // Pos: 0 1 2 3 4 1 2 3 4 5 2 3 4 5 6 1 2 3 1 2 3 (+ n_past) + // Logits: 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 + // --------------------------------------------------------------------- + // Seq: 0 + // 1 1 1 + // 2 2 2 2 + // 3 3 3 3 3 + // 4 4 4 4 4 4 + // 5 5 5 5 5 5 5 + // 6 6 6 6 + // 7 7 7 7 + // --------------------------------------------------------------------- + // | | | | | | | | | | | + // V V V V V | | | | | | + // j_tokens | | | | | | + // V V V V V V + // id + { + llama_batch_clear(batch); + + // current token - first token of the first level + llama_batch_add(batch, id, n_past, seq_id_all, true); + + // verification n-grams - queue this before the lookahead tokens for less KV cache fragmentation + { + const int g_cur = ngrams_observed.cnt[id]; + + ngrams_cur.resize(g_cur); + for (int g = 0; g < g_cur; g++) { + ngrams_cur[g].active = true; + ngrams_cur[g].tokens.resize(N); + ngrams_cur[g].i_batch.resize(N); + ngrams_cur[g].seq_id = W + 1 + g; + ngrams_cur[g].i_batch[0] = 0; + ngrams_cur[g].tokens [0] = id; + } + + for (int j = 0; j < N - 1; j++) { + for (int g = 0; g < g_cur; g++) { + const int idx = id*(N - 1)*G + g*(N - 1); + + const llama_token t = ngrams_observed.tokens[idx + j]; + + ngrams_cur[g].tokens [j + 1] = t; + ngrams_cur[g].i_batch[j + 1] = batch.n_tokens; + + llama_batch_add(batch, t, n_past + j + 1, { W + 1 + g }, true); + } + } + } + + // fill the remaining W - 1 tokens for the first level + for (int i = 1; i < W; i++) { + seq_id_look.resize(W - i); + for (int j = 0; j < W - i; j++) { + seq_id_look[j] = i + j + 1; + } + + llama_batch_add(batch, tokens_j[0][i], n_past + i, seq_id_look, false); + } + + // fill the rest of the levels + for (int j = 1; j < N - 1; j++) { + for (int i = 0; i < W; i++) { + llama_batch_add(batch, tokens_j[j][i], n_past + j + i, { i + 1 }, j == N - 2); + } + } + } + + if (llama_decode(ctx, batch) != 0) { + fprintf(stderr, "\n\n%s: error: llama_decode failed - increase KV cache size\n", __func__); + return 1; + } + + int seq_id_best = 0; + + for (int v = 0; v < N; ++v) { + int i_batch = 0; + + // if no active ngrams are left, it means the sampled token does not pass the verification + if (v > 0) { + for (int g = 0; g < (int) ngrams_cur.size(); g++) { + if (ngrams_cur[g].active) { + i_batch = ngrams_cur[g].i_batch[v]; + seq_id_best = ngrams_cur[g].seq_id; + + ++n_accept; + break; + } + } + + // no more matches -> create a new batch + if (i_batch == 0) { + break; + } + } + + // sample the next token + id = llama_sampling_sample(ctx_sampling, ctx, NULL, i_batch); + + llama_sampling_accept(ctx_sampling, ctx, id, true); + + // print + { + const std::string token_str = llama_token_to_piece(ctx, id); + + if (v == 0) { + printf("%s", token_str.c_str()); + } else { + // print light cyan + printf("\033[0;96m%s\033[0m", token_str.c_str()); + } + fflush(stdout); + + if (id == llama_token_eos(model)) { + has_eos = true; + } + + all.push_back(id); + } + + ++n_predict; + ++n_past; + + if ((params.n_predict >= 0 && n_predict > params.n_predict) || has_eos) { + break; + } + + // verify across active n-grams + for (int g = 0; g < (int) ngrams_cur.size(); g++) { + if (ngrams_cur[g].active) { + if (v == N - 1) { + ngrams_cur[g].active = false; + } else { + if (id != ngrams_cur[g].tokens[v + 1]) { + ngrams_cur[g].active = false; + } + } + } + } + + // print known n-grams starting with token id (debug) + if (0 && v == 0) { + if (ngrams_observed.cnt[id] > 0) { + printf("\n - %d n-grams starting with '%s'\n", ngrams_observed.cnt[id], llama_token_to_piece(ctx, id).c_str()); + } + + for (int i = 0; i < ngrams_observed.cnt[id]; i++) { + printf(" - ngram %2d: ", i); + + const int idx = id*(N - 1)*G + i*(N - 1); + + for (int j = 0; j < N - 1; j++) { + const std::string token_str = llama_token_to_piece(ctx, ngrams_observed.tokens[idx + j]); + + printf("%s", token_str.c_str()); + } + + printf("\n"); + } + } + + // update lookahead tokens + { + for (int i = 0; i < W; i++) { + tokens_j_prev[i] = tokens_j[0][i]; + } + + for (int j = 0; j < N - 2; j++) { + tokens_j[j] = tokens_j[j + 1]; + } + + if (v == 0) { + // sample from the last level + for (int i = 0; i < W; i++) { + tokens_j[N - 2][i] = llama_sampling_sample(ctx_sampling, ctx, NULL, ngrams_cur.size()*(N-1) + W*(N - 2) + i); + } + } else { + for (int i = 0; i < W; i++) { + // there are different ways to init these tokens + if (0) { + // random init + tokens_j[N - 2][i] = all[1 + rand() % (all.size() - 1)]; + } else { + // init from the previous level + tokens_j[N - 2][i] = tokens_j[0][i]; + } + } + } + } + + // update observed ngrams + if (v == 0) { + // the first token of the n-gram is determined by the index in the container so it is not stored + std::vector ngram(N - 1); + + // n-gram generation + // ref: https://github.com/hao-ai-lab/LookaheadDecoding/issues/14#issuecomment-1826198518 + for (int f = 0; f < W; ++f) { + const int ft = tokens_j_prev[f]; // first token of the n-gram + + for (int j = 0; j < N - 1; ++j) { + ngram[j] = tokens_j[j][f]; + } + + // filter-out repeating n-grams + { + bool is_unique = true; + + for (int k = 0; k < ngrams_observed.cnt[ft]; ++k) { + const int idx = ft*(N - 1)*G + k*(N - 1); + + bool is_match = true; + for (int j = 0; j < N - 1; ++j) { + if (ngrams_observed.tokens[idx + j] != ngram[j]) { + is_match = false; + break; + } + } + + if (is_match) { + is_unique = false; + break; + } + } + + if (!is_unique) { + continue; + } + } + + const int head = ngrams_observed.head[ft]; + const int idx = ft*(N - 1)*G + head*(N - 1); + + for (int i = 0; i < N - 1; i++) { + ngrams_observed.tokens[idx + i] = ngram[i]; + } + + ngrams_observed.cnt[ft] = std::min(G, ngrams_observed.cnt[ft] + 1); + ngrams_observed.head[ft] = (head + 1) % G; + + ngrams_observed.n_total++; + } + } + } + + if ((params.n_predict >= 0 && n_predict > params.n_predict) || has_eos) { + break; + } + + // KV cache management + // if no verification token matched, we simply remove all cells from this batch -> no fragmentation + llama_kv_cache_seq_rm(ctx, -1, n_past, -1); + + if (seq_id_best != 0) { + // if a verification token matched, we keep the best sequence and remove the rest + // this leads to some KV cache fragmentation + llama_kv_cache_seq_keep(ctx, seq_id_best); + llama_kv_cache_seq_cp (ctx, seq_id_best, 0, -1, -1); + llama_kv_cache_seq_rm (ctx, seq_id_best, -1, -1); + + for (int s = 1; s < W + G + 1; ++s) { + llama_kv_cache_seq_cp(ctx, 0, s, -1, -1); + } + } + } + + auto t_dec_end = ggml_time_us(); + + LOG_TEE("\n\n"); + + LOG_TEE("encoded %4d tokens in %8.3f seconds, speed: %8.3f t/s\n", n_input, (t_enc_end - t_enc_start) / 1e6f, inp.size() / ((t_enc_end - t_enc_start) / 1e6f)); + LOG_TEE("decoded %4d tokens in %8.3f seconds, speed: %8.3f t/s\n", n_predict, (t_dec_end - t_dec_start) / 1e6f, n_predict / ((t_dec_end - t_dec_start) / 1e6f)); + + LOG_TEE("\n"); + LOG_TEE("W = %2d\n", W); + LOG_TEE("N = %2d\n", N); + LOG_TEE("G = %2d\n", G); + LOG_TEE("\n"); + LOG_TEE("n_predict = %d\n", n_predict); + LOG_TEE("n_accept = %d\n", n_accept); + + llama_print_timings(ctx); + + llama_kv_cache_view_free(&kvc_view); + llama_sampling_free(ctx_sampling); + + llama_batch_free(batch); + + llama_free(ctx); + llama_free_model(model); + + llama_backend_free(); + + fprintf(stderr, "\n\n"); + + return 0; +} diff --git a/examples/main/main.cpp b/examples/main/main.cpp index 8d985c82ac21a..c096f110b32c5 100644 --- a/examples/main/main.cpp +++ b/examples/main/main.cpp @@ -100,6 +100,12 @@ static void sigint_handler(int signo) { } #endif +static void llama_log_callback_logTee(ggml_log_level level, const char * text, void * user_data) { + (void) level; + (void) user_data; + LOG_TEE("%s", text); +} + int main(int argc, char ** argv) { gpt_params params; g_params = ¶ms; @@ -113,6 +119,7 @@ int main(int argc, char ** argv) { log_set_target(log_filename_generator("main", "log")); LOG_TEE("Log start\n"); log_dump_cmdline(argc, argv); + llama_log_set(llama_log_callback_logTee, nullptr); #endif // LOG_DISABLE_LOGS // TODO: Dump params ? @@ -229,13 +236,16 @@ int main(int argc, char ** argv) { } } - const bool add_bos = llama_vocab_type(model) == LLAMA_VOCAB_TYPE_SPM; + const bool add_bos = llama_should_add_bos_token(model); LOG("add_bos: %d\n", add_bos); std::vector embd_inp; - if (params.interactive_first || params.instruct || !params.prompt.empty() || session_tokens.empty()) { + if (params.interactive_first || params.instruct || params.chatml || !params.prompt.empty() || session_tokens.empty()) { LOG("tokenize the prompt\n"); + if (params.chatml) { + params.prompt = "<|im_start|>system\n" + params.prompt + "<|im_end|>"; + } embd_inp = ::llama_tokenize(ctx, params.prompt, add_bos, true); } else { LOG("use session tokens\n"); @@ -313,7 +323,7 @@ int main(int argc, char ** argv) { } // number of tokens to keep when resetting context - if (params.n_keep < 0 || params.n_keep > (int) embd_inp.size() || params.instruct) { + if (params.n_keep < 0 || params.n_keep > (int) embd_inp.size() || params.instruct || params.chatml) { params.n_keep = (int)embd_inp.size(); } @@ -324,11 +334,23 @@ int main(int argc, char ** argv) { LOG("inp_pfx: %s\n", LOG_TOKENS_TOSTR_PRETTY(ctx, inp_pfx).c_str()); LOG("inp_sfx: %s\n", LOG_TOKENS_TOSTR_PRETTY(ctx, inp_sfx).c_str()); + // chatml prefix & suffix + const auto cml_pfx = ::llama_tokenize(ctx, "\n<|im_start|>user\n", add_bos, true); + const auto cml_sfx = ::llama_tokenize(ctx, "<|im_end|>\n<|im_start|>assistant\n", false, true); + + LOG("cml_pfx: %s\n", LOG_TOKENS_TOSTR_PRETTY(ctx, cml_pfx).c_str()); + LOG("cml_sfx: %s\n", LOG_TOKENS_TOSTR_PRETTY(ctx, cml_sfx).c_str()); + // in instruct mode, we inject a prefix and a suffix to each input by the user if (params.instruct) { params.interactive_first = true; params.antiprompt.push_back("### Instruction:\n\n"); } + // similar for chatml mode + else if (params.chatml) { + params.interactive_first = true; + params.antiprompt.push_back("<|im_start|>user\n"); + } // enable interactive mode if interactive start is specified if (params.interactive_first) { @@ -415,6 +437,7 @@ int main(int argc, char ** argv) { } } LOG_TEE("sampling: \n%s\n", llama_sampling_print(sparams).c_str()); + LOG_TEE("sampling order: \n%s\n", llama_sampling_order_print(sparams).c_str()); LOG_TEE("generate: n_ctx = %d, n_batch = %d, n_predict = %d, n_keep = %d\n", n_ctx, params.n_batch, params.n_predict, params.n_keep); LOG_TEE("\n\n"); @@ -705,7 +728,7 @@ int main(int argc, char ** argv) { is_interacting = true; printf("\n"); - } else if (params.instruct) { + } else if (params.instruct || params.chatml) { is_interacting = true; } } @@ -713,7 +736,7 @@ int main(int argc, char ** argv) { if (n_past > 0 && is_interacting) { LOG("waiting for user input\n"); - if (params.instruct) { + if (params.instruct || params.chatml) { printf("\n> "); } @@ -760,6 +783,12 @@ int main(int argc, char ** argv) { n_consumed = embd_inp.size(); embd_inp.insert(embd_inp.end(), inp_pfx.begin(), inp_pfx.end()); } + // chatml mode: insert user chat prefix + if (params.chatml && !is_antiprompt) { + LOG("inserting chatml prefix\n"); + n_consumed = embd_inp.size(); + embd_inp.insert(embd_inp.end(), cml_pfx.begin(), cml_pfx.end()); + } if (params.escape) { process_escapes(buffer); } @@ -778,6 +807,11 @@ int main(int argc, char ** argv) { LOG("inserting instruction suffix\n"); embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end()); } + // chatml mode: insert assistant chat suffix + if (params.chatml) { + LOG("inserting chatml suffix\n"); + embd_inp.insert(embd_inp.end(), cml_sfx.begin(), cml_sfx.end()); + } for (size_t i = original_size; i < embd_inp.size(); ++i) { const llama_token token = embd_inp[i]; @@ -803,7 +837,7 @@ int main(int argc, char ** argv) { } // end of text token - if (!embd.empty() && embd.back() == llama_token_eos(model) && !(params.instruct || params.interactive)) { + if (!embd.empty() && embd.back() == llama_token_eos(model) && !(params.instruct || params.interactive || params.chatml)) { LOG_TEE(" [end of text]\n"); break; } diff --git a/examples/parallel/parallel.cpp b/examples/parallel/parallel.cpp index a78df305f415c..d2e074d9e12b0 100644 --- a/examples/parallel/parallel.cpp +++ b/examples/parallel/parallel.cpp @@ -1,5 +1,5 @@ // A basic application simulating a server with multiple clients. -// The clients submite requests to the server and they are processed in parallel. +// The clients submit requests to the server and they are processed in parallel. #include "common.h" #include "llama.h" @@ -113,6 +113,8 @@ int main(int argc, char ** argv) { // insert new requests as soon as the previous one is done const bool cont_batching = params.cont_batching; + const bool dump_kv_cache = params.dump_kv_cache; + #ifndef LOG_DISABLE_LOGS log_set_target(log_filename_generator("parallel", "log")); LOG_TEE("Log start\n"); @@ -172,6 +174,8 @@ int main(int argc, char ** argv) { int32_t n_total_gen = 0; int32_t n_cache_miss = 0; + struct llama_kv_cache_view kvc_view = llama_kv_cache_view_init(ctx, n_clients); + const auto t_main_start = ggml_time_us(); LOG_TEE("%s: Simulating parallel requests from clients:\n", __func__); @@ -201,6 +205,11 @@ int main(int argc, char ** argv) { LOG_TEE("Processing requests ...\n\n"); while (true) { + if (dump_kv_cache) { + llama_kv_cache_view_update(ctx, &kvc_view); + dump_kv_cache_view_seqs(kvc_view, 40); + } + llama_batch_clear(batch); // decode any currently ongoing sequences diff --git a/examples/perplexity/perplexity.cpp b/examples/perplexity/perplexity.cpp index de60c5227f7c1..9a77beca6df32 100644 --- a/examples/perplexity/perplexity.cpp +++ b/examples/perplexity/perplexity.cpp @@ -149,8 +149,7 @@ static results_perplexity perplexity_v2(llama_context * ctx, const gpt_params & // Output: `perplexity: 13.5106 [114/114]` // BOS tokens will be added for each chunk before eval - const bool is_spm = llama_vocab_type(llama_get_model(ctx)) == LLAMA_VOCAB_TYPE_SPM; - const bool add_bos = is_spm; + const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx)); fprintf(stderr, "%s: tokenizing the input ..\n", __func__); @@ -288,8 +287,7 @@ static results_perplexity perplexity(llama_context * ctx, const gpt_params & par // Output: `perplexity: 13.5106 [114/114]` // BOS tokens will be added for each chunk before eval - const bool is_spm = llama_vocab_type(llama_get_model(ctx)) == LLAMA_VOCAB_TYPE_SPM; - const bool add_bos = is_spm; + const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx)); const int n_ctx = llama_n_ctx(ctx); auto tim1 = std::chrono::high_resolution_clock::now(); @@ -481,7 +479,7 @@ static void hellaswag_score(llama_context * ctx, const gpt_params & params) { fprintf(stderr, "================================= is_spm = %d\n", is_spm); // This is needed as usual for LLaMA models - const bool add_bos = is_spm; + const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx)); // Number of tasks to use when computing the score if ( params.hellaswag_tasks < hs_task_count ) { diff --git a/examples/quantize-stats/quantize-stats.cpp b/examples/quantize-stats/quantize-stats.cpp index 2712824774ae7..773024160f839 100644 --- a/examples/quantize-stats/quantize-stats.cpp +++ b/examples/quantize-stats/quantize-stats.cpp @@ -321,7 +321,6 @@ int main(int argc, char ** argv) { auto cparams = llama_context_default_params(); cparams.n_ctx = 256; cparams.seed = 1; - cparams.f16_kv = false; ctx = llama_new_context_with_model(model, cparams); diff --git a/examples/server/README.md b/examples/server/README.md index a6eda3b32d576..0751b9612f17a 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -222,7 +222,7 @@ node index.js `content`: Set the text to process. - **POST** `/infill`: For code infilling. Takes a prefix and a suffix and returns the predicted completion as stream. +- **POST** `/infill`: For code infilling. Takes a prefix and a suffix and returns the predicted completion as stream. *Options:* @@ -234,6 +234,55 @@ node index.js - **GET** `/props`: Return the required assistant name and anti-prompt to generate the prompt in case you have specified a system prompt for all slots. +- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint. Compared to `api_like_OAI.py` this API implementation does not require a wrapper to be served. + + *Options:* + + See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). While some OpenAI-specific features such as function calling aren't supported, llama.cpp `/completion`-specific features such are `mirostat` are supported. + + *Examples:* + + You can use either Python `openai` library with appropriate checkpoints: + + ```python + import openai + + client = openai.OpenAI( + base_url="http://localhost:8080/v1", # "http://:port" + api_key = "sk-no-key-required" + ) + + completion = client.chat.completions.create( + model="gpt-3.5-turbo", + messages=[ + {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."}, + {"role": "user", "content": "Write a limerick about python exceptions"} + ] + ) + + print(completion.choices[0].message) + ``` + ... or raw HTTP requests: + + ```shell + curl http://localhost:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer no-key" \ + -d '{ + "model": "gpt-3.5-turbo", + "messages": [ + { + "role": "system", + "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests." + }, + { + "role": "user", + "content": "Write a limerick about python exceptions" + } + ] + }' + ``` + ## More examples ### Change system prompt on runtime diff --git a/examples/server/api_like_OAI.py b/examples/server/api_like_OAI.py index 313e1a9652d14..607fe49d3ff15 100755 --- a/examples/server/api_like_OAI.py +++ b/examples/server/api_like_OAI.py @@ -11,10 +11,10 @@ slot_id = -1 parser = argparse.ArgumentParser(description="An example of using server.cpp with a similar API to OAI. It must be used together with server.cpp.") -parser.add_argument("--chat-prompt", type=str, help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n')", default='A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.\\n') -parser.add_argument("--user-name", type=str, help="USER name in chat completions(default: '\\nUSER: ')", default="\\nUSER: ") -parser.add_argument("--ai-name", type=str, help="ASSISTANT name in chat completions(default: '\\nASSISTANT: ')", default="\\nASSISTANT: ") -parser.add_argument("--system-name", type=str, help="SYSTEM name in chat completions(default: '\\nASSISTANT's RULE: ')", default="\\nASSISTANT's RULE: ") +parser.add_argument("--chat-prompt", type=str, help="the top prompt in chat completions(default: 'A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.')", default='A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.') +parser.add_argument("--user-name", type=str, help="USER name in chat completions(default: 'USER: ')", default="USER: ") +parser.add_argument("--ai-name", type=str, help="ASSISTANT name in chat completions(default: 'ASSISTANT: ')", default="ASSISTANT: ") +parser.add_argument("--system-name", type=str, help="SYSTEM name in chat completions(default: 'ASSISTANT's RULE: ')", default="ASSISTANT's RULE: ") parser.add_argument("--stop", type=str, help="the end of response in chat completions(default: '')", default="") parser.add_argument("--llama-api", type=str, help="Set the address of server.cpp in llama.cpp(default: http://127.0.0.1:8080)", default='http://127.0.0.1:8080') parser.add_argument("--api-key", type=str, help="Set the api key to allow only few user(default: NULL)", default="") @@ -34,19 +34,19 @@ def is_present(json, key): #convert chat to prompt def convert_chat(messages): - prompt = "" + args.chat_prompt.replace("\\n", "\n") - system_n = args.system_name.replace("\\n", "\n") - user_n = args.user_name.replace("\\n", "\n") - ai_n = args.ai_name.replace("\\n", "\n") - stop = args.stop.replace("\\n", "\n") + system_n = args.system_name + user_n = args.user_name + ai_n = args.ai_name + stop = args.stop + prompt = "" + args.chat_prompt + stop for line in messages: if (line["role"] == "system"): - prompt += f"{system_n}{line['content']}" + prompt += f"{system_n}{line['content']}{stop}" if (line["role"] == "user"): - prompt += f"{user_n}{line['content']}" + prompt += f"{user_n}{line['content']}{stop}" if (line["role"] == "assistant"): prompt += f"{ai_n}{line['content']}{stop}" prompt += ai_n.rstrip() @@ -70,6 +70,7 @@ def make_postData(body, chat=False, stream=False): if(is_present(body, "mirostat_tau")): postData["mirostat_tau"] = body["mirostat_tau"] if(is_present(body, "mirostat_eta")): postData["mirostat_eta"] = body["mirostat_eta"] if(is_present(body, "seed")): postData["seed"] = body["seed"] + if(is_present(body, "grammar")): postData["grammar"] = body["grammar"] if(is_present(body, "logit_bias")): postData["logit_bias"] = [[int(token), body["logit_bias"][token]] for token in body["logit_bias"].keys()] if (args.stop != ""): postData["stop"] = [args.stop] @@ -130,7 +131,7 @@ def make_resData_stream(data, chat=False, time_now = 0, start=False): } ] } - slot_id = data["slot_id"] + slot_id = data.get("slot_id") if (chat): if (start): resData["choices"][0]["delta"] = { @@ -150,11 +151,13 @@ def make_resData_stream(data, chat=False, time_now = 0, start=False): return resData -@app.route('/chat/completions', methods=['POST']) -@app.route('/v1/chat/completions', methods=['POST']) +@app.route('/chat/completions', methods=['POST', 'OPTIONS']) +@app.route('/v1/chat/completions', methods=['POST', 'OPTIONS']) def chat_completions(): if (args.api_key != "" and request.headers["Authorization"].split()[1] != args.api_key): return Response(status=403) + if request.method == 'OPTIONS': + return Response(headers={"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "*"}) body = request.get_json() stream = False tokenize = False @@ -177,20 +180,22 @@ def generate(): data = requests.request("POST", urllib.parse.urljoin(args.llama_api, "/completion"), data=json.dumps(postData), stream=True) time_now = int(time.time()) resData = make_resData_stream({}, chat=True, time_now=time_now, start=True) - yield 'data: {}\n'.format(json.dumps(resData)) + yield 'data: {}\n\n'.format(json.dumps(resData)) for line in data.iter_lines(): if line: decoded_line = line.decode('utf-8') resData = make_resData_stream(json.loads(decoded_line[6:]), chat=True, time_now=time_now) - yield 'data: {}\n'.format(json.dumps(resData)) - return Response(generate(), mimetype='text/event-stream') + yield 'data: {}\n\n'.format(json.dumps(resData)) + return Response(generate(), mimetype='text/event-stream', headers={"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "*"}) -@app.route('/completions', methods=['POST']) -@app.route('/v1/completions', methods=['POST']) +@app.route('/completions', methods=['POST', 'OPTIONS']) +@app.route('/v1/completions', methods=['POST', 'OPTIONS']) def completion(): if (args.api_key != "" and request.headers["Authorization"].split()[1] != args.api_key): return Response(status=403) + if request.method == 'OPTIONS': + return Response(headers={"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "*"}) body = request.get_json() stream = False tokenize = False @@ -216,8 +221,8 @@ def generate(): if line: decoded_line = line.decode('utf-8') resData = make_resData_stream(json.loads(decoded_line[6:]), chat=False, time_now=time_now) - yield 'data: {}\n'.format(json.dumps(resData)) - return Response(generate(), mimetype='text/event-stream') + yield 'data: {}\n\n'.format(json.dumps(resData)) + return Response(generate(), mimetype='text/event-stream', headers={"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "*"}) if __name__ == '__main__': app.run(args.host, port=args.port) diff --git a/examples/server/json.hpp b/examples/server/json.hpp index 4d1a37ad7cb87..ea945f346d67b 100644 --- a/examples/server/json.hpp +++ b/examples/server/json.hpp @@ -11227,7 +11227,7 @@ class binary_reader } if (is_ndarray) // ndarray dimensional vector can only contain integers, and can not embed another array { - return sax->parse_error(chars_read, get_token_string(), parse_error::create(113, chars_read, exception_message(input_format, "ndarray dimentional vector is not allowed", "size"), nullptr)); + return sax->parse_error(chars_read, get_token_string(), parse_error::create(113, chars_read, exception_message(input_format, "ndarray dimensional vector is not allowed", "size"), nullptr)); } std::vector dim; if (JSON_HEDLEY_UNLIKELY(!get_ubjson_ndarray_size(dim))) diff --git a/examples/server/public/completion.js b/examples/server/public/completion.js index 0c9bd5f1021db..c281f0fbd5535 100644 --- a/examples/server/public/completion.js +++ b/examples/server/public/completion.js @@ -94,6 +94,10 @@ export async function* llama(prompt, params = {}, config = {}) { break; } } + if (result.error) { + result.error = JSON.parse(result.error); + console.error(`llama.cpp error: ${result.error.content}`); + } } } } @@ -110,7 +114,7 @@ export async function* llama(prompt, params = {}, config = {}) { return content; } -// Call llama, return an event target that you can subcribe to +// Call llama, return an event target that you can subscribe to // // Example: // diff --git a/examples/server/public/index.html b/examples/server/public/index.html index 175c52478918a..451fd4a3be602 100644 --- a/examples/server/public/index.html +++ b/examples/server/public/index.html @@ -223,7 +223,7 @@ repeat_last_n: 256, // 0 = disable penalty, -1 = context size repeat_penalty: 1.18, // 1.0 = disabled top_k: 40, // <= 0 to use vocab size - top_p: 0.5, // 1.0 = disabled + top_p: 0.95, // 1.0 = disabled min_p: 0.05, // 0 = disabled tfs_z: 1.0, // 1.0 = disabled typical_p: 1.0, // 1.0 = disabled @@ -238,7 +238,7 @@ cache_prompt: true }) - /* START: Support for storing prompt templates and parameters in borwser LocalStorage */ + /* START: Support for storing prompt templates and parameters in browsers LocalStorage */ const local_storage_storageKey = "llamacpp_server_local_storage"; @@ -282,7 +282,7 @@ let importedTemplates = local_storage_getDataAsObject('user_templates') if (importedTemplates) { - // saved templates were successfuly imported. + // saved templates were successfully imported. console.log('Processing saved templates and updating default template') params.value = { ...params.value, image_data: [] }; @@ -303,7 +303,7 @@ } function userTemplateResetToDefault() { - console.log('Reseting themplate to default') + console.log('Resetting template to default') selectedUserTemplate.value.name = 'default'; selectedUserTemplate.value.data = savedUserTemplates.value['default']; } @@ -762,7 +762,7 @@

${IntField({ label: "Predictions", max: 2048, min: -1, name: "n_predict", value: params.value.n_predict })} - ${FloatField({ label: "Temperature", max: 1.5, min: 0.0, name: "temperature", step: 0.01, value: params.value.temperature })} + ${FloatField({ label: "Temperature", max: 2.0, min: 0.0, name: "temperature", step: 0.01, value: params.value.temperature })} ${FloatField({ label: "Penalize repeat sequence", max: 2.0, min: 0.0, name: "repeat_penalty", step: 0.01, value: params.value.repeat_penalty })} ${IntField({ label: "Consider N tokens for penalize", max: 2048, min: 0, name: "repeat_last_n", value: params.value.repeat_last_n })} ${IntField({ label: "Top-K sampling", max: 100, min: -1, name: "top_k", value: params.value.top_k })} diff --git a/examples/server/server.cpp b/examples/server/server.cpp index 46862a84b99da..d0cd8e1cdb211 100644 --- a/examples/server/server.cpp +++ b/examples/server/server.cpp @@ -29,6 +29,8 @@ #define SERVER_VERBOSE 1 #endif +#define DEFAULT_OAICOMPAT_MODEL "gpt-3.5-turbo-0613" + using json = nlohmann::json; struct server_params @@ -59,6 +61,10 @@ static bool server_verbose = false; #define LOG_WARNING(MSG, ...) server_log("WARNING", __func__, __LINE__, MSG, __VA_ARGS__) #define LOG_INFO( MSG, ...) server_log("INFO", __func__, __LINE__, MSG, __VA_ARGS__) +json oaicompat_completion_params_parse(const json &body); +std::string format_chatml(std::vector messages); + + // // base64 utils (TODO: move to common in the future) // @@ -149,15 +155,23 @@ struct task_server { json data; bool infill_mode = false; bool embedding_mode = false; + int multitask_id = -1; }; struct task_result { int id; + int multitask_id = -1; bool stop; bool error; json result_json; }; +struct task_multi { + int id; + std::set subtasks_remaining{}; + std::vector results{}; +}; + // TODO: can become bool if we can't find use of more states enum slot_state { @@ -378,6 +392,9 @@ struct llama_client_slot bool stopped_word = false; bool stopped_limit = false; + bool oaicompat = false; + std::string oaicompat_model; + std::string stopping_word; // sampling @@ -397,6 +414,9 @@ struct llama_client_slot double t_prompt_processing; // ms double t_token_generation; // ms + // multitasks + int multitask_id = -1; + void reset() { num_prompt_tokens = 0; generated_text = ""; @@ -477,7 +497,7 @@ struct llama_client_slot }; } - void print_timings() { + void print_timings() const { LOG_TEE("\n"); LOG_TEE("%s: prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)\n", __func__, t_prompt_processing, num_prompt_tokens_processed, t_prompt_processing / num_prompt_tokens_processed, 1e3 / t_prompt_processing * num_prompt_tokens_processed); @@ -501,6 +521,7 @@ struct llama_server_context bool multimodal = false; bool clean_kv_cache = true; bool all_slots_are_idle = false; + bool add_bos_token = true; int32_t id_gen; int32_t n_ctx; // total context for all clients / slots @@ -519,7 +540,8 @@ struct llama_server_context std::vector queue_tasks; std::vector queue_results; - std::mutex mutex_tasks; + std::vector queue_multitasks; + std::mutex mutex_tasks; // also guards id_gen, and queue_multitasks std::mutex mutex_results; ~llama_server_context() @@ -573,6 +595,8 @@ struct llama_server_context n_ctx = llama_n_ctx(ctx); + add_bos_token = llama_should_add_bos_token(model); + return true; } @@ -606,6 +630,11 @@ struct llama_server_context std::vector tokenize(const json & json_prompt, bool add_bos) const { + // TODO: currently, we tokenize using special tokens by default + // this is not always correct (see https://github.com/ggerganov/llama.cpp/pull/4160#issuecomment-1824826216) + // but it's better compared to completely ignoring ChatML and other chat templates + const bool TMP_FORCE_SPECIAL = true; + // If `add_bos` is true, we only add BOS, when json_prompt is a string, // or the first element of the json_prompt array is a string. std::vector prompt_tokens; @@ -621,12 +650,12 @@ struct llama_server_context std::vector p; if (first) { - p = ::llama_tokenize(ctx, s, add_bos); + p = ::llama_tokenize(ctx, s, add_bos, TMP_FORCE_SPECIAL); first = false; } else { - p = ::llama_tokenize(ctx, s, false); + p = ::llama_tokenize(ctx, s, false, TMP_FORCE_SPECIAL); } prompt_tokens.insert(prompt_tokens.end(), p.begin(), p.end()); } @@ -643,7 +672,7 @@ struct llama_server_context else { auto s = json_prompt.template get(); - prompt_tokens = ::llama_tokenize(ctx, s, add_bos); + prompt_tokens = ::llama_tokenize(ctx, s, add_bos, TMP_FORCE_SPECIAL); } return prompt_tokens; @@ -674,6 +703,14 @@ struct llama_server_context slot_params default_params; llama_sampling_params default_sparams; + if (data.count("__oaicompat") != 0) { + slot->oaicompat = true; + slot->oaicompat_model = json_value(data, "model", std::string(DEFAULT_OAICOMPAT_MODEL)); + } else { + slot->oaicompat = false; + slot->oaicompat_model = ""; + } + slot->params.stream = json_value(data, "stream", false); slot->params.cache_prompt = json_value(data, "cache_prompt", false); slot->params.n_predict = json_value(data, "n_predict", default_params.n_predict); @@ -864,7 +901,7 @@ struct llama_server_context } void update_system_prompt() { - system_tokens = ::llama_tokenize(ctx, system_prompt, true); + system_tokens = ::llama_tokenize(ctx, system_prompt, add_bos_token); llama_batch_clear(batch); @@ -1087,16 +1124,40 @@ struct llama_server_context return slot.images.size() > 0; } - void send_error(int id, std::string error) + void send_error(task_server& task, std::string error) { std::lock_guard lock(mutex_results); task_result res; - res.id = id; + res.id = task.id; + res.multitask_id = task.multitask_id; + res.stop = false; res.error = true; res.result_json = { { "content", error } }; queue_results.push_back(res); } + void add_multi_task(int id, std::vector& sub_ids) + { + std::lock_guard lock(mutex_tasks); + task_multi multi; + multi.id = id; + std::copy(sub_ids.begin(), sub_ids.end(), std::inserter(multi.subtasks_remaining, multi.subtasks_remaining.end())); + queue_multitasks.push_back(multi); + } + + void update_multi_task(int multitask_id, int subtask_id, task_result& result) + { + std::lock_guard lock(mutex_tasks); + for (auto& multitask : queue_multitasks) + { + if (multitask.id == multitask_id) + { + multitask.subtasks_remaining.erase(subtask_id); + multitask.results.push_back(result); + } + } + } + json get_model_props() { return get_formated_generation(slots[0]); @@ -1141,6 +1202,7 @@ struct llama_server_context std::lock_guard lock(mutex_results); task_result res; res.id = slot.task_id; + res.multitask_id = slot.multitask_id; res.error = false; res.stop = false; @@ -1166,6 +1228,12 @@ struct llama_server_context res.result_json["completion_probabilities"] = probs_vector_to_json(ctx, probs_output); } + if (slot.oaicompat) + { + res.result_json["oaicompat_token_ctr"] = slot.n_decoded; + res.result_json["model"] = slot.oaicompat_model; + } + queue_results.push_back(res); } @@ -1174,6 +1242,7 @@ struct llama_server_context std::lock_guard lock(mutex_results); task_result res; res.id = slot.task_id; + res.multitask_id = slot.multitask_id; res.error = false; res.stop = true; @@ -1213,6 +1282,18 @@ struct llama_server_context res.result_json["completion_probabilities"] = probs_vector_to_json(ctx, probs); } + if (slot.oaicompat) + { + res.result_json["oaicompat_token_ctr"] = slot.n_decoded; + res.result_json["model"] = slot.oaicompat_model; + } + + // parent multitask, if any, needs to be updated + if (slot.multitask_id != -1) + { + update_multi_task(slot.multitask_id, slot.task_id, res); + } + queue_results.push_back(res); } @@ -1221,6 +1302,7 @@ struct llama_server_context std::lock_guard lock(mutex_results); task_result res; res.id = slot.task_id; + res.multitask_id = slot.multitask_id; res.error = false; res.stop = true; @@ -1247,15 +1329,26 @@ struct llama_server_context queue_results.push_back(res); } - int request_completion(json data, bool infill, bool embedding) + int request_completion(json data, bool infill, bool embedding, int multitask_id) { - std::lock_guard lock(mutex_tasks); + std::unique_lock lock(mutex_tasks); task_server task; task.id = id_gen++; - task.data = data; + task.target_id = 0; + task.data = std::move(data); task.infill_mode = infill; task.embedding_mode = embedding; task.type = COMPLETION_TASK; + task.multitask_id = multitask_id; + + // when a completion task's prompt array is not a singleton, we split it into multiple requests + if (task.data.at("prompt").size() > 1) + { + lock.unlock(); // entering new func scope + return split_multiprompt_task(task); + } + + // otherwise, it's a single-prompt task, we actually queue it queue_tasks.push_back(task); return task.id; } @@ -1274,8 +1367,17 @@ struct llama_server_context for (int i = 0; i < (int) queue_results.size(); i++) { + // for now, tasks that have associated parent multitasks just get erased once multitask picks up the result + if (queue_results[i].multitask_id == task_id) + { + update_multi_task(task_id, queue_results[i].id, queue_results[i]); + queue_results.erase(queue_results.begin() + i); + continue; + } + if (queue_results[i].id == task_id) { + assert(queue_results[i].multitask_id == -1); task_result res = queue_results[i]; queue_results.erase(queue_results.begin() + i); return res; @@ -1365,6 +1467,27 @@ struct llama_server_context queue_tasks.push_back(task); } + int split_multiprompt_task(task_server& multiprompt_task) + { + int prompt_count = multiprompt_task.data.at("prompt").size(); + assert(prompt_count > 1); + + int multitask_id = id_gen++; + std::vector subtask_ids(prompt_count); + for (int i = 0; i < prompt_count; i++) + { + json subtask_data = multiprompt_task.data; + subtask_data["prompt"] = subtask_data["prompt"][i]; + + // subtasks inherit everything else (infill mode, embedding mode, etc.) + subtask_ids[i] = request_completion(subtask_data, multiprompt_task.infill_mode, multiprompt_task.embedding_mode, multitask_id); + } + + // queue up the multitask so we can track its subtask progression + add_multi_task(multitask_id, subtask_ids); + return multitask_id; + } + void process_tasks() { std::lock_guard lock(mutex_tasks); @@ -1380,7 +1503,7 @@ struct llama_server_context { LOG_TEE("slot unavailable\n"); // send error result - send_error(task.id, "slot unavailable"); + send_error(task, "slot unavailable"); return; } @@ -1394,11 +1517,12 @@ struct llama_server_context slot->infill = task.infill_mode; slot->embedding = task.embedding_mode; slot->task_id = task.id; + slot->multitask_id = task.multitask_id; if (!launch_slot_with_data(slot, task.data)) { // send error result - send_error(task.id, "internal_error"); + send_error(task, "internal_error"); break; } } break; @@ -1414,6 +1538,38 @@ struct llama_server_context } break; } } + + // remove finished multitasks from the queue of multitasks, and add the corresponding result to the result queue + auto queue_iterator = queue_multitasks.begin(); + while (queue_iterator != queue_multitasks.end()) + { + if (queue_iterator->subtasks_remaining.empty()) + { + // all subtasks done == multitask is done + task_result aggregate_result; + aggregate_result.id = queue_iterator->id; + aggregate_result.stop = true; + aggregate_result.error = false; + + // collect json results into one json result + std::vector result_jsons; + for (auto& subres : queue_iterator->results) + { + result_jsons.push_back(subres.result_json); + aggregate_result.error = aggregate_result.error && subres.error; + } + aggregate_result.result_json = json{ "results", result_jsons }; + + std::lock_guard lock(mutex_results); + queue_results.push_back(aggregate_result); + + queue_iterator = queue_multitasks.erase(queue_iterator); + } + else + { + ++queue_iterator; + } + } } bool update_slots() { @@ -1552,7 +1708,7 @@ struct llama_server_context } else { - prompt_tokens = tokenize(slot.prompt, system_prompt.empty()); // add BOS if there isn't system prompt + prompt_tokens = tokenize(slot.prompt, system_prompt.empty() && add_bos_token); // add BOS if there isn't system prompt } slot.num_prompt_tokens = prompt_tokens.size(); @@ -1629,7 +1785,7 @@ struct llama_server_context const bool has_images = process_images(slot); // process the prefix of first image - std::vector prefix_tokens = has_images ? tokenize(slot.images[0].prefix_prompt, true) : prompt_tokens; + std::vector prefix_tokens = has_images ? tokenize(slot.images[0].prefix_prompt, add_bos_token) : prompt_tokens; for (; slot.n_past < (int) prefix_tokens.size(); ++slot.n_past) { llama_batch_add(batch, prefix_tokens[slot.n_past], system_tokens.size() + slot.n_past, { slot.id }, false); @@ -1805,6 +1961,7 @@ static void server_print_usage(const char *argv0, const gpt_params ¶ms, printf(" -spf FNAME, --system-prompt-file FNAME\n"); printf(" Set a file to load a system prompt (initial prompt of all slots), this is useful for chat applications.\n"); printf(" --mmproj MMPROJ_FILE path to a multimodal projector file for LLaVA.\n"); + printf(" --log-disable disables logging to a file.\n"); printf("\n"); } @@ -1951,10 +2108,6 @@ static void server_params_parse(int argc, char **argv, server_params &sparams, } params.yarn_beta_slow = std::stof(argv[i]); } - else if (arg == "--memory-f32" || arg == "--memory_f32") - { - params.memory_f16 = false; - } else if (arg == "--threads" || arg == "-t") { if (++i >= argc) @@ -2159,6 +2312,11 @@ static void server_params_parse(int argc, char **argv, server_params &sparams, } params.mmproj = argv[i]; } + else if (arg == "--log-disable") + { + log_set_target(stdout); + LOG_INFO("logging to file is disabled.", {}); + } else { fprintf(stderr, "error: unknown argument: %s\n", arg.c_str()); @@ -2175,6 +2333,233 @@ static void server_params_parse(int argc, char **argv, server_params &sparams, } } + +static std::string random_string() +{ + static const std::string str("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); + + std::random_device rd; + std::mt19937 generator(rd()); + + std::string result(32, ' '); + + for (int i = 0; i < 32; ++i) { + result[i] = str[generator() % str.size()]; + } + + return result; +} + +static std::string gen_chatcmplid() +{ + std::stringstream chatcmplid; + chatcmplid << "chatcmpl-" << random_string(); + return chatcmplid.str(); +} + +std::string format_chatml(std::vector messages) +{ + std::ostringstream chatml_msgs; + + for (auto it = messages.begin(); it != messages.end(); ++it) { + chatml_msgs << "<|im_start|>" + << json_value(*it, "role", std::string("user")) << '\n'; + chatml_msgs << json_value(*it, "content", std::string("")) + << "<|im_end|>\n"; + } + + chatml_msgs << "<|im_start|>assistant" << '\n'; + + return chatml_msgs.str(); +} + +/* llama.cpp completion api semantics */ +json oaicompat_completion_params_parse( + const json &body /* openai api json semantics */) +{ + json llama_params; + + llama_params["__oaicompat"] = true; + + // Map OpenAI parameters to llama.cpp parameters + llama_params["model"] = json_value(body, "model", std::string("uknown")); + llama_params["prompt"] = format_chatml(body["messages"]); // OpenAI 'messages' to llama.cpp 'prompt' + llama_params["cache_prompt"] = json_value(body, "cache_prompt", false); + llama_params["temperature"] = json_value(body, "temperature", 0.8); + llama_params["top_k"] = json_value(body, "top_k", 40); + llama_params["top_p"] = json_value(body, "top_p", 0.95); + llama_params["n_predict"] = json_value(body, "max_tokens", -1); + llama_params["logit_bias"] = json_value(body, "logit_bias",json::object()); + llama_params["frequency_penalty"] = json_value(body, "frequency_penalty", 0.0); + llama_params["presence_penalty"] = json_value(body, "presence_penalty", 0.0); + llama_params["seed"] = json_value(body, "seed", 0); + llama_params["stream"] = json_value(body, "stream", false); + llama_params["mirostat"] = json_value(body, "mirostat", false); + llama_params["mirostat_tau"] = json_value(body, "mirostat_tau", 0.0); + llama_params["mirostat_eta"] = json_value(body, "mirostat_eta", 0.0); + llama_params["penalize_nl"] = json_value(body, "penalize_nl", false); + llama_params["typical_p"] = json_value(body, "typical_p", 0.0); + llama_params["repeat_last_n"] = json_value(body, "repeat_last_n", 0); + llama_params["ignore_eos"] = json_value(body, "ignore_eos", false); + llama_params["tfs_z"] = json_value(body, "tfs_z", 0.0); + + if (llama_params.count("grammar") != 0) { + llama_params["grammar"] = json_value(body, "grammar", json::object()); + } + + // Handle 'stop' field + if (body.contains("stop") && body["stop"].is_string()) { + llama_params["stop"] = json::array({body["stop"].get()}); + } else { + llama_params["stop"] = json_value(body, "stop", json::array()); + } + + // Ensure there is ChatML-specific end sequence among stop words + llama_params["stop"].push_back("<|im_end|>"); + + return llama_params; +} + +static json format_final_response_oaicompat(const json &request, const task_result &response, bool streaming = false) +{ + json result = response.result_json; + + bool stopped_word = result.count("stopped_word") != 0; + bool stopped_eos = json_value(result, "stopped_eos", false); + int num_tokens_predicted = json_value(result, "tokens_predicted", 0); + int num_prompt_tokens = json_value(result, "tokens_evaluated", 0); + std::string content = json_value(result, "content", std::string("")); + + std::string finish_reason = "length"; + if (stopped_word || stopped_eos) { + finish_reason = "stop"; + } + + json choices = + streaming ? json::array({json{{"finish_reason", finish_reason}, + {"index", 0}, + {"delta", json::object()}}}) + : json::array({json{{"finish_reason", finish_reason}, + {"index", 0}, + {"message", json{{"content", content}, + {"role", "assistant"}}}}}); + + std::time_t t = std::time(0); + + json res = + json{{"choices", choices}, + {"created", t}, + {"model", + json_value(request, "model", std::string(DEFAULT_OAICOMPAT_MODEL))}, + {"object", streaming ? "chat.completion.chunk" : "chat.completion"}, + {"usage", + json{{"completion_tokens", num_tokens_predicted}, + {"prompt_tokens", num_prompt_tokens}, + {"total_tokens", num_tokens_predicted + num_prompt_tokens}}}, + {"id", gen_chatcmplid()}}; + + if (server_verbose) { + res["__verbose"] = result; + } + + if (result.contains("completion_probabilities")) { + res["completion_probabilities"] = json_value(result, "completion_probabilities", json::array()); + } + + return res; +} + +// return value is vector as there is one case where we might need to generate two responses +static std::vector format_partial_response_oaicompat(const task_result &response) { + json result = response.result_json; + + if (!result.contains("model") || !result.contains("oaicompat_token_ctr")) { + return std::vector({response.result_json}); + } + + bool first = json_value(result, "oaicompat_token_ctr", 0) == 0; + std::string modelname = json_value(result, "model", std::string(DEFAULT_OAICOMPAT_MODEL)); + + bool stopped_word = json_value(result, "stopped_word", false); + bool stopped_eos = json_value(result, "stopped_eos", false); + bool stopped_limit = json_value(result, "stopped_limit", false); + std::string content = json_value(result, "content", std::string("")); + + std::string finish_reason; + if (stopped_word || stopped_eos) { + finish_reason = "stop"; + } + if (stopped_limit) { + finish_reason = "length"; + } + + std::time_t t = std::time(0); + + json choices; + + if (!finish_reason.empty()) { + choices = json::array({json{{"finish_reason", finish_reason}, + {"index", 0}, + {"delta", json::object()}}}); + } else { + if (first) { + if (content.empty()) { + choices = json::array({json{{"finish_reason", nullptr}, + {"index", 0}, + {"delta", json{{"role", "assistant"}}}}}); + } else { + // We have to send this as two updates to conform to openai behavior + json initial_ret = json{{"choices", json::array({json{ + {"finish_reason", nullptr}, + {"index", 0}, + {"delta", json{ + {"role", "assistant"} + }}}})}, + {"created", t}, + {"id", gen_chatcmplid()}, + {"model", modelname}, + {"object", "chat.completion.chunk"}}; + + json second_ret = json{ + {"choices", json::array({json{{"finish_reason", nullptr}, + {"index", 0}, + {"delta", json{ + {"content", content}}} + }})}, + {"created", t}, + {"id", gen_chatcmplid()}, + {"model", modelname}, + {"object", "chat.completion.chunk"}}; + + return std::vector({initial_ret, second_ret}); + } + } else { + // Some idiosyncrasy in task processing logic makes several trailing calls + // with empty content, we ignore these at the calee site. + if (content.empty()) { + return std::vector({json::object()}); + } + + choices = json::array({json{ + {"finish_reason", nullptr}, + {"index", 0}, + {"delta", + json{ + {"content", content}, + }}, + }}); + } + } + + json ret = json{{"choices", choices}, + {"created", t}, + {"id", gen_chatcmplid()}, + {"model", modelname}, + {"object", "chat.completion.chunk"}}; + + return std::vector({ret}); +} + static json format_partial_response( llama_server_context &llama, llama_client_slot *slot, const std::string &content, const std::vector &probs ) { @@ -2330,7 +2715,7 @@ int main(int argc, char **argv) svr.Post("/completion", [&llama](const httplib::Request &req, httplib::Response &res) { json data = json::parse(req.body); - const int task_id = llama.request_completion(data, false, false); + const int task_id = llama.request_completion(data, false, false, -1); if (!json_value(data, "stream", false)) { std::string completion_text; task_result result = llama.next_result(task_id); @@ -2351,9 +2736,9 @@ int main(int argc, char **argv) task_result result = llama.next_result(task_id); if (!result.error) { const std::string str = - "data: " + - result.result_json.dump(-1, ' ', false, json::error_handler_t::replace) + - "\n\n"; + "data: " + + result.result_json.dump(-1, ' ', false, json::error_handler_t::replace) + + "\n\n"; LOG_VERBOSE("data stream", { { "to_send", str } }); @@ -2365,6 +2750,17 @@ int main(int argc, char **argv) break; } } else { + const std::string str = + "error: " + + result.result_json.dump(-1, ' ', false, json::error_handler_t::replace) + + "\n\n"; + LOG_VERBOSE("data stream", { + { "to_send", str } + }); + if (!sink.write(str.c_str(), str.size())) + { + return false; + } break; } } @@ -2382,10 +2778,102 @@ int main(int argc, char **argv) } }); + + + svr.Get("/v1/models", [¶ms](const httplib::Request&, httplib::Response& res) + { + std::time_t t = std::time(0); + + json models = { + {"object", "list"}, + {"data", { + { + {"id", params.model_alias}, + {"object", "model"}, + {"created", t}, + {"owned_by", "llamacpp"} + }, + }} + }; + + res.set_content(models.dump(), "application/json"); + }); + + // TODO: add mount point without "/v1" prefix -- how? + svr.Post("/v1/chat/completions", [&llama](const httplib::Request &req, httplib::Response &res) + { + json data = oaicompat_completion_params_parse(json::parse(req.body)); + + const int task_id = llama.request_completion(data, false, false, -1); + + if (!json_value(data, "stream", false)) { + std::string completion_text; + task_result result = llama.next_result(task_id); + + if (!result.error && result.stop) { + json oaicompat_result = format_final_response_oaicompat(data, result); + + res.set_content(oaicompat_result.dump(-1, ' ', false, + json::error_handler_t::replace), + "application/json"); + } else { + res.status = 500; + res.set_content(result.result_json["content"], "text/plain"); + return; + } + } else { + const auto chunked_content_provider = [task_id, &llama](size_t, httplib::DataSink &sink) { + while (true) { + task_result llama_result = llama.next_result(task_id); + if (!llama_result.error) { + std::vector result_array = format_partial_response_oaicompat( llama_result); + + for (auto it = result_array.begin(); it != result_array.end(); ++it) + { + if (!it->empty()) { + const std::string str = + "data: " + + it->dump(-1, ' ', false, json::error_handler_t::replace) + + "\n\n"; + LOG_VERBOSE("data stream", {{"to_send", str}}); + if (!sink.write(str.c_str(), str.size())) { + return false; + } + } + } + if (llama_result.stop) { + break; + } + } else { + const std::string str = + "error: " + + llama_result.result_json.dump(-1, ' ', false, + json::error_handler_t::replace) + + "\n\n"; + LOG_VERBOSE("data stream", {{"to_send", str}}); + if (!sink.write(str.c_str(), str.size())) { + return false; + } + break; + } + } + sink.done(); + return true; + }; + + auto on_complete = [task_id, &llama](bool) { + // cancel request + llama.request_cancel(task_id); + }; + + res.set_chunked_content_provider("text/event-stream", chunked_content_provider, on_complete); + } + }); + svr.Post("/infill", [&llama](const httplib::Request &req, httplib::Response &res) { json data = json::parse(req.body); - const int task_id = llama.request_completion(data, true, false); + const int task_id = llama.request_completion(data, true, false, -1); if (!json_value(data, "stream", false)) { std::string completion_text; task_result result = llama.next_result(task_id); @@ -2489,7 +2977,7 @@ int main(int argc, char **argv) { prompt = ""; } - const int task_id = llama.request_completion({ {"prompt", prompt}, { "n_predict", 0} }, false, true); + const int task_id = llama.request_completion({ {"prompt", prompt}, { "n_predict", 0} }, false, true, -1); task_result result = llama.next_result(task_id); return res.set_content(result.result_json.dump(), "application/json"); }); diff --git a/examples/simple/simple.cpp b/examples/simple/simple.cpp index 374aef6f16189..9cfde8308f18f 100644 --- a/examples/simple/simple.cpp +++ b/examples/simple/simple.cpp @@ -75,7 +75,7 @@ int main(int argc, char ** argv) { // make sure the KV cache is big enough to hold all the prompt and generated tokens if (n_kv_req > n_ctx) { LOG_TEE("%s: error: n_kv_req > n_ctx, the required KV cache size is not big enough\n", __func__); - LOG_TEE("%s: either reduce n_parallel or increase n_ctx\n", __func__); + LOG_TEE("%s: either reduce n_len or increase n_ctx\n", __func__); return 1; } diff --git a/examples/speculative/README.md b/examples/speculative/README.md new file mode 100644 index 0000000000000..814efa592d94f --- /dev/null +++ b/examples/speculative/README.md @@ -0,0 +1,8 @@ +# llama.cpp/examples/speculative + +Demonstration of speculative decoding and tree-based speculative decoding techniques + +More info: + +- https://github.com/ggerganov/llama.cpp/pull/2926 +- https://github.com/ggerganov/llama.cpp/pull/3624 diff --git a/examples/speculative/speculative.cpp b/examples/speculative/speculative.cpp index 3a8e278110c20..20f1fb5bfcd99 100644 --- a/examples/speculative/speculative.cpp +++ b/examples/speculative/speculative.cpp @@ -94,9 +94,22 @@ int main(int argc, char ** argv) { } } - // tokenize the prompt + + // Tokenize the prompt + const bool add_bos_tgt = llama_should_add_bos_token(model_tgt); + LOG("add_bos tgt: %d\n", add_bos_tgt); + + const bool add_bos_dft = llama_should_add_bos_token(model_dft); + LOG("add_bos dft: %d\n", add_bos_dft); + + if (add_bos_tgt != add_bos_dft) { + fprintf(stderr, "%s: error: draft model add_bos must match target model to use speculation but ", __func__); + fprintf(stderr, "add_bos_dft = %d while add_bos_tgt = %d\n", add_bos_dft, add_bos_tgt); + return 1; + } + std::vector inp; - inp = ::llama_tokenize(ctx_tgt, params.prompt, true); + inp = ::llama_tokenize(ctx_tgt, params.prompt, add_bos_tgt, true); const int max_context_size = llama_n_ctx(ctx_tgt); const int max_tokens_list_size = max_context_size - 4; @@ -190,8 +203,9 @@ int main(int argc, char ** argv) { const std::string token_str = llama_token_to_piece(ctx_tgt, id); - printf("%s", token_str.c_str()); - fflush(stdout); + if (!params.use_color) { + printf("%s", token_str.c_str()); + } if (id == llama_token_eos(model_tgt)) { has_eos = true; @@ -223,10 +237,18 @@ int main(int argc, char ** argv) { ++n_past_tgt; ++n_past_dft; ++i_dft; - + if (params.use_color) { + // Color token according to its origin sequence + printf("\u001b[%dm%s\u001b[37m", (36 - s_keep % 6), token_str.c_str()); + fflush(stdout); + } continue; } } + if (params.use_color) { + printf("%s", token_str.c_str()); + } + fflush(stdout); LOG("the sampled target token (%d, '%s') did not match, or we ran out of drafted tokens\n", id, token_str.c_str()); @@ -406,7 +428,7 @@ int main(int argc, char ** argv) { ++n_past_tgt; } - // the first token is always proposed by the traget model before the speculation loop so we erase it here + // the first token is always proposed by the target model before the speculation loop so we erase it here for (int s = 0; s < n_seq_dft; ++s) { if (!drafts[s].active) { continue; diff --git a/examples/tokenize/CMakeLists.txt b/examples/tokenize/CMakeLists.txt new file mode 100644 index 0000000000000..5e6654d7e5988 --- /dev/null +++ b/examples/tokenize/CMakeLists.txt @@ -0,0 +1,5 @@ +set(TARGET tokenize) +add_executable(${TARGET} tokenize.cpp) +install(TARGETS ${TARGET} RUNTIME) +target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT}) +target_compile_features(${TARGET} PRIVATE cxx_std_11) diff --git a/examples/tokenize/tokenize.cpp b/examples/tokenize/tokenize.cpp new file mode 100644 index 0000000000000..4ff8e3fa72749 --- /dev/null +++ b/examples/tokenize/tokenize.cpp @@ -0,0 +1,44 @@ +#include "common.h" +#include "llama.h" + +#include +#include +#include +#include + +int main(int argc, char ** argv) { + if (argc < 3 || argv[1][0] == '-') { + printf("usage: %s MODEL_PATH PROMPT [--ids]\n" , argv[0]); + return 1; + } + + const char * model_path = argv[1]; + const char * prompt = argv[2]; + + const bool printing_ids = argc > 3 && std::string(argv[3]) == "--ids"; + + llama_backend_init(false); + + llama_model_params model_params = llama_model_default_params(); + model_params.vocab_only = true; + llama_model * model = llama_load_model_from_file(model_path, model_params); + + llama_context_params ctx_params = llama_context_default_params(); + llama_context * ctx = llama_new_context_with_model(model, ctx_params); + + const bool add_bos = llama_should_add_bos_token(model); + + std::vector tokens; + + tokens = ::llama_tokenize(model, prompt, add_bos, true); + + for (int i = 0; i < (int) tokens.size(); i++) { + if (printing_ids) { + printf("%d\n", tokens[i]); + } else { + printf("%6d -> '%s'\n", tokens[i], llama_token_to_piece(ctx, tokens[i]).c_str()); + } + } + + return 0; +} diff --git a/examples/train-text-from-scratch/train-text-from-scratch.cpp b/examples/train-text-from-scratch/train-text-from-scratch.cpp index f049a3923669b..f7ed63365211b 100644 --- a/examples/train-text-from-scratch/train-text-from-scratch.cpp +++ b/examples/train-text-from-scratch/train-text-from-scratch.cpp @@ -1295,10 +1295,6 @@ int main(int argc, char ** argv) { opt_cb_data.last_save_iter = opt->iter; } - if (alloc) { - ggml_allocr_free(alloc); - } - ggml_free(opt->ctx); free_train_state(train); ggml_free(model.ctx); diff --git a/ggml-alloc.c b/ggml-alloc.c index cdfe4caf69613..d3049efb497a0 100644 --- a/ggml-alloc.c +++ b/ggml-alloc.c @@ -137,7 +137,7 @@ void ggml_tallocr_alloc(ggml_tallocr_t alloc, struct ggml_tensor * tensor) { #ifdef GGML_ALLOCATOR_DEBUG add_allocated_tensor(alloc, tensor); - size_t cur_max = (char*)addr - (char*)alloc->data + size; + size_t cur_max = (char*)addr - (char*)alloc->base + size; if (cur_max > alloc->max_size) { printf("max_size = %.2f MB: tensors: ", cur_max / 1024.0 / 1024.0); for (int i = 0; i < 1024; i++) { @@ -168,10 +168,6 @@ static void ggml_tallocr_free_tensor(ggml_tallocr_t alloc, struct ggml_tensor * size = aligned_offset(NULL, size, alloc->alignment); AT_PRINTF("%s: freeing %s at %p (%zu bytes) - n_free_blocks = %d\n", __func__, tensor->name, ptr, size, alloc->n_free_blocks); - if (!alloc->measure) { - ggml_backend_buffer_free_tensor(alloc->buffer, tensor); - } - #ifdef GGML_ALLOCATOR_DEBUG remove_allocated_tensor(alloc, tensor); #endif @@ -237,7 +233,7 @@ void ggml_tallocr_reset(ggml_tallocr_t alloc) { } ggml_tallocr_t ggml_tallocr_new(void * data, size_t size, size_t alignment) { - struct ggml_backend_buffer * buffer = ggml_backend_cpu_buffer_from_ptr(NULL, data, size); + struct ggml_backend_buffer * buffer = ggml_backend_cpu_buffer_from_ptr(data, size); ggml_tallocr_t alloc = (ggml_tallocr_t)malloc(sizeof(struct ggml_tallocr)); @@ -449,7 +445,6 @@ static ggml_tallocr_t node_tallocr(ggml_gallocr_t galloc, struct ggml_tensor * n static void init_view(ggml_gallocr_t galloc, struct ggml_tensor * view, bool update_backend) { ggml_tallocr_t alloc = node_tallocr(galloc, view); - //printf("init_view: %s from src %s\n", view->name, view->view_src->name); GGML_ASSERT(view->view_src != NULL && view->view_src->data != NULL); if (update_backend) { view->backend = view->view_src->backend; @@ -459,7 +454,7 @@ static void init_view(ggml_gallocr_t galloc, struct ggml_tensor * view, bool upd // FIXME: the view should be initialized by the owning buffer, but currently this breaks the CUDA backend // due to the ggml_tensor_extra_gpu ring buffer overwriting the KV cache extras - assert(ggml_tallocr_is_measure(alloc) || !view->buffer || view->buffer->backend == alloc->buffer->backend); + assert(ggml_tallocr_is_measure(alloc) || !view->buffer || view->buffer->buft == alloc->buffer->buft); if (!alloc->measure) { ggml_backend_buffer_init_tensor(alloc->buffer, view); @@ -765,3 +760,43 @@ size_t ggml_allocr_max_size(ggml_allocr_t alloc) { size_t ggml_allocr_alloc_graph(ggml_allocr_t alloc, struct ggml_cgraph * graph) { return ggml_gallocr_alloc_graph(alloc->galloc, alloc->talloc, graph); } + +// utils +ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, ggml_backend_buffer_type_t buft) { + GGML_ASSERT(ggml_get_no_alloc(ctx) == true); + + size_t alignment = ggml_backend_buft_get_alignment(buft); + + size_t nbytes = 0; + for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->data == NULL && t->view_src == NULL) { + nbytes += GGML_PAD(ggml_backend_buft_get_alloc_size(buft, t), alignment); + } + } + + if (nbytes == 0) { + fprintf(stderr, "%s: no tensors to allocate\n", __func__); + return NULL; + } + + ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(buft, nbytes); + ggml_tallocr_t tallocr = ggml_tallocr_new_from_buffer(buffer); + + for (struct ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->data == NULL) { + if (t->view_src == NULL) { + ggml_tallocr_alloc(tallocr, t); + } else { + ggml_backend_view_init(buffer, t); + } + } + } + + ggml_tallocr_free(tallocr); + + return buffer; +} + +ggml_backend_buffer_t ggml_backend_alloc_ctx_tensors(struct ggml_context * ctx, ggml_backend_t backend) { + return ggml_backend_alloc_ctx_tensors_from_buft(ctx, ggml_backend_get_default_buffer_type(backend)); +} diff --git a/ggml-alloc.h b/ggml-alloc.h index dde2a06bf8030..64a412468915b 100644 --- a/ggml-alloc.h +++ b/ggml-alloc.h @@ -8,6 +8,7 @@ extern "C" { struct ggml_backend; struct ggml_backend_buffer; +struct ggml_backend_buffer_type; // // Legacy API @@ -42,7 +43,7 @@ GGML_API size_t ggml_allocr_alloc_graph(ggml_allocr_t alloc, struct ggml_cgraph // ggml-backend v2 API // -// Seperate tensor and graph allocator objects +// Separate tensor and graph allocator objects // This is necessary for multi-backend allocation because the graph allocator needs to use multiple tensor allocators // The original API is kept as a wrapper around the new API @@ -80,6 +81,12 @@ GGML_API void ggml_gallocr_alloc_graph_n( struct ggml_hash_set hash_set, ggml_tallocr_t * hash_node_talloc); + +// Utils +// Create a buffer and allocate all the tensors in a ggml_context +GGML_API struct ggml_backend_buffer * ggml_backend_alloc_ctx_tensors_from_buft(struct ggml_context * ctx, struct ggml_backend_buffer_type * buft); +GGML_API struct ggml_backend_buffer * ggml_backend_alloc_ctx_tensors(struct ggml_context * ctx, struct ggml_backend * backend); + #ifdef __cplusplus } #endif diff --git a/ggml-backend-impl.h b/ggml-backend-impl.h index 211e3d4247387..f588af6028265 100644 --- a/ggml-backend-impl.h +++ b/ggml-backend-impl.h @@ -12,31 +12,50 @@ extern "C" { // Backend buffer // + // buffer type + typedef void * ggml_backend_buffer_type_context_t; + + struct ggml_backend_buffer_type_i { + ggml_backend_buffer_t (*alloc_buffer) (ggml_backend_buffer_type_t buft, size_t size); + size_t (*get_alignment) (ggml_backend_buffer_type_t buft); // tensor alignment + size_t (*get_alloc_size) (ggml_backend_buffer_type_t buft, struct ggml_tensor * tensor); // data size needed to allocate the tensor, including padding + bool (*supports_backend)(ggml_backend_buffer_type_t buft, ggml_backend_t backend); // check if the buffer type is usable by the backend + }; + + struct ggml_backend_buffer_type { + struct ggml_backend_buffer_type_i iface; + ggml_backend_buffer_type_context_t context; + }; + + // buffer typedef void * ggml_backend_buffer_context_t; struct ggml_backend_buffer_i { - void (*free_buffer) (ggml_backend_buffer_t buffer); - void * (*get_base) (ggml_backend_buffer_t buffer); // get base pointer - size_t (*get_alloc_size)(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); // pre-allocation callback - void (*init_tensor) (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); // post-allocation callback - void (*free_tensor) (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); // pre-free callback + void (*free_buffer)(ggml_backend_buffer_t buffer); + //void (*reset) (ggml_backend_buffer_t buffer); // reset any internal state due to tensor initialization, such as tensor extras + void * (*get_base) (ggml_backend_buffer_t buffer); + void (*init_tensor)(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); + void (*set_tensor) (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size); + void (*get_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size); + // (optional) copy tensor between different buffer-type, allow for single-copy tranfers + void (*cpy_tensor_from)(ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst); + void (*cpy_tensor_to) (ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst); }; struct ggml_backend_buffer { - struct ggml_backend_buffer_i iface; - - ggml_backend_t backend; + struct ggml_backend_buffer_i iface; + ggml_backend_buffer_type_t buft; ggml_backend_buffer_context_t context; - size_t size; }; - GGML_API ggml_backend_buffer_t ggml_backend_buffer_init( - struct ggml_backend * backend, + ggml_backend_buffer_t ggml_backend_buffer_init( + ggml_backend_buffer_type_t buft, struct ggml_backend_buffer_i iface, ggml_backend_buffer_context_t context, size_t size); + // // Backend // @@ -49,20 +68,17 @@ extern "C" { void (*free)(ggml_backend_t backend); // buffer allocation - ggml_backend_buffer_t (*alloc_buffer)(ggml_backend_t backend, size_t size); + ggml_backend_buffer_type_t (*get_default_buffer_type)(ggml_backend_t backend); - // get buffer alignment - size_t (*get_alignment)(ggml_backend_t backend); - - // tensor data access - // these functions can be asynchronous, helper functions are provided for synchronous access that automatically call synchronize + // (optional) asynchroneous tensor data access void (*set_tensor_async)(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size); void (*get_tensor_async)(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size); - void (*synchronize) (ggml_backend_t backend); - // (optional) copy tensor between different backends, allow for single-copy tranfers - void (*cpy_tensor_from)(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst); - void (*cpy_tensor_to) (ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst); + // (optional) asynchroneous tensor copy + void (*cpy_tensor_from_async)(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst); + void (*cpy_tensor_to_async) (ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst); + + void (*synchronize) (ggml_backend_t backend); // compute graph with a plan ggml_backend_graph_plan_t (*graph_plan_create) (ggml_backend_t backend, struct ggml_cgraph * cgraph); @@ -82,6 +98,15 @@ extern "C" { ggml_backend_context_t context; }; + + // + // Backend registry + // + + typedef ggml_backend_t (*ggml_backend_init_fn)(const char * params, void * user_data); + + void ggml_backend_register(const char * name, ggml_backend_init_fn init_fn, ggml_backend_buffer_type_t default_buffer_type, void * user_data); + #ifdef __cplusplus } #endif diff --git a/ggml-backend.c b/ggml-backend.c index f6e5fceed0f4d..3a22cd085eac0 100644 --- a/ggml-backend.c +++ b/ggml-backend.c @@ -9,14 +9,36 @@ #include #include -#define UNUSED GGML_UNUSED #define MAX(a, b) ((a) > (b) ? (a) : (b)) + +// backend buffer type + +ggml_backend_buffer_t ggml_backend_buft_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { + return buft->iface.alloc_buffer(buft, size); +} + +size_t ggml_backend_buft_get_alignment(ggml_backend_buffer_type_t buft) { + return buft->iface.get_alignment(buft); +} + +size_t ggml_backend_buft_get_alloc_size(ggml_backend_buffer_type_t buft, struct ggml_tensor * tensor) { + // get_alloc_size is optional, defaults to ggml_nbytes + if (buft->iface.get_alloc_size) { + return buft->iface.get_alloc_size(buft, tensor); + } + return ggml_nbytes(tensor); +} + +bool ggml_backend_buft_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend) { + return buft->iface.supports_backend(buft, backend); +} + // backend buffer ggml_backend_buffer_t ggml_backend_buffer_init( - struct ggml_backend * backend, + ggml_backend_buffer_type_t buft, struct ggml_backend_buffer_i iface, ggml_backend_buffer_context_t context, size_t size) { @@ -26,7 +48,7 @@ ggml_backend_buffer_t ggml_backend_buffer_init( (*buffer) = (struct ggml_backend_buffer) { /* .interface = */ iface, - /* .backend = */ backend, + /* .buft = */ buft, /* .context = */ context, /* .size = */ size, }; @@ -45,10 +67,6 @@ void ggml_backend_buffer_free(ggml_backend_buffer_t buffer) { free(buffer); } -size_t ggml_backend_buffer_get_alignment(ggml_backend_buffer_t buffer) { - return ggml_backend_get_alignment(buffer->backend); -} - size_t ggml_backend_buffer_get_size(ggml_backend_buffer_t buffer) { return buffer->size; } @@ -61,14 +79,6 @@ void * ggml_backend_buffer_get_base(ggml_backend_buffer_t buffer) { return base; } -size_t ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor) { - // get_alloc_size is optional, defaults to ggml_nbytes - if (buffer->iface.get_alloc_size) { - return buffer->iface.get_alloc_size(buffer, tensor); - } - return ggml_nbytes(tensor); -} - void ggml_backend_buffer_init_tensor(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor) { // init_tensor is optional if (buffer->iface.init_tensor) { @@ -76,19 +86,20 @@ void ggml_backend_buffer_init_tensor(ggml_backend_buffer_t buffer, struct ggml_t } } -void ggml_backend_buffer_free_tensor(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor) { - // free_tensor is optional - if (buffer->iface.free_tensor) { - buffer->iface.free_tensor(buffer, tensor); - } +size_t ggml_backend_buffer_get_alignment (ggml_backend_buffer_t buffer) { + return ggml_backend_buft_get_alignment(ggml_backend_buffer_type(buffer)); } -// backend +size_t ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor) { + return ggml_backend_buft_get_alloc_size(ggml_backend_buffer_type(buffer), tensor); +} -ggml_backend_t ggml_get_backend(const struct ggml_tensor * tensor) { - return tensor->buffer ? tensor->buffer->backend : NULL; +ggml_backend_buffer_type_t ggml_backend_buffer_type(ggml_backend_buffer_t buffer) { + return buffer->buft; } +// backend + const char * ggml_backend_name(ggml_backend_t backend) { if (backend == NULL) { return "NULL"; @@ -104,43 +115,53 @@ void ggml_backend_free(ggml_backend_t backend) { backend->iface.free(backend); } +ggml_backend_buffer_type_t ggml_backend_get_default_buffer_type(ggml_backend_t backend) { + return backend->iface.get_default_buffer_type(backend); +} + ggml_backend_buffer_t ggml_backend_alloc_buffer(ggml_backend_t backend, size_t size) { - return backend->iface.alloc_buffer(backend, size); + return ggml_backend_buft_alloc_buffer(ggml_backend_get_default_buffer_type(backend), size); } size_t ggml_backend_get_alignment(ggml_backend_t backend) { - return backend->iface.get_alignment(backend); + return ggml_backend_buft_get_alignment(ggml_backend_get_default_buffer_type(backend)); } -void ggml_backend_tensor_set_async(struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { - ggml_get_backend(tensor)->iface.set_tensor_async(ggml_get_backend(tensor), tensor, data, offset, size); +void ggml_backend_tensor_set_async(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); + + backend->iface.set_tensor_async(backend, tensor, data, offset, size); } -void ggml_backend_tensor_get_async(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { - ggml_get_backend(tensor)->iface.get_tensor_async(ggml_get_backend(tensor), tensor, data, offset, size); +void ggml_backend_tensor_get_async(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); + + backend->iface.get_tensor_async(backend, tensor, data, offset, size); } void ggml_backend_tensor_set(struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { - ggml_backend_t backend = ggml_get_backend(tensor); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); - GGML_ASSERT(backend != NULL && "tensor backend not set"); + GGML_ASSERT(tensor->buffer != NULL && "tensor buffer not set"); + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); - backend->iface.set_tensor_async(backend, tensor, data, offset, size); - backend->iface.synchronize(backend); + tensor->buffer->iface.set_tensor(tensor->buffer, tensor, data, offset, size); } void ggml_backend_tensor_get(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { - ggml_backend_t backend = ggml_get_backend(tensor); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); - GGML_ASSERT(backend != NULL && "tensor backend not set"); + GGML_ASSERT(tensor->buffer != NULL && "tensor buffer not set"); + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); - backend->iface.get_tensor_async(backend, tensor, data, offset, size); - backend->iface.synchronize(backend); + tensor->buffer->iface.get_tensor(tensor->buffer, tensor, data, offset, size); } void ggml_backend_synchronize(ggml_backend_t backend) { + if (backend->iface.synchronize == NULL) { + return; + } + backend->iface.synchronize(backend); } @@ -154,10 +175,16 @@ void ggml_backend_graph_plan_free(ggml_backend_t backend, ggml_backend_graph_pla void ggml_backend_graph_plan_compute(ggml_backend_t backend, ggml_backend_graph_plan_t plan) { backend->iface.graph_plan_compute(backend, plan); + + // TODO: optional sync + ggml_backend_synchronize(backend); } void ggml_backend_graph_compute(ggml_backend_t backend, struct ggml_cgraph * cgraph) { backend->iface.graph_compute(backend, cgraph); + + // TODO: optional sync + ggml_backend_synchronize(backend); } bool ggml_backend_supports_op(ggml_backend_t backend, const struct ggml_tensor * op) { @@ -194,14 +221,15 @@ void ggml_backend_tensor_copy(struct ggml_tensor * src, struct ggml_tensor * dst // TODO: allow backends to support copy to/from same backend - if (ggml_get_backend(dst)->iface.cpy_tensor_from != NULL) { - ggml_get_backend(dst)->iface.cpy_tensor_from(ggml_get_backend(dst)->context, src, dst); - } else if (ggml_get_backend(src)->iface.cpy_tensor_to != NULL) { - ggml_get_backend(src)->iface.cpy_tensor_to(ggml_get_backend(src)->context, src, dst); + if (dst->buffer->iface.cpy_tensor_from != NULL) { + dst->buffer->iface.cpy_tensor_from(dst->buffer, src, dst); + } else if (src->buffer->iface.cpy_tensor_to != NULL) { + src->buffer->iface.cpy_tensor_to(src->buffer, src, dst); } else { // shouldn't be hit when copying from/to CPU #ifndef NDEBUG - fprintf(stderr, "ggml_backend_tensor_copy: neither cpy_tensor_from nor cpy_tensor_to are implemented for backends %s and %s, falling back to get/set\n", ggml_backend_name(src->buffer->backend), ggml_backend_name(dst->buffer->backend)); + fprintf(stderr, "ggml_backend_tensor_copy: neither cpy_tensor_from nor cpy_tensor_to " + "are implemented for %s and %s, falling back to get/set\n", src->name, dst->name); #endif size_t nbytes = ggml_nbytes(src); void * data = malloc(nbytes); @@ -211,101 +239,259 @@ void ggml_backend_tensor_copy(struct ggml_tensor * src, struct ggml_tensor * dst } } -// backend CPU +// backend registry -struct ggml_backend_cpu_context { - int n_threads; - void * work_data; - size_t work_size; +#define GGML_MAX_BACKENDS_REG 16 + +struct ggml_backend_reg { + char name[128]; + ggml_backend_init_fn init_fn; + ggml_backend_buffer_type_t default_buffer_type; + void * user_data; }; -static const char * ggml_backend_cpu_name(ggml_backend_t backend) { - return "CPU"; +static struct ggml_backend_reg ggml_backend_registry[GGML_MAX_BACKENDS_REG]; +static size_t ggml_backend_registry_count = 0; + +static ggml_backend_t ggml_backend_reg_cpu_init(const char * params, void * user_data); + +static void ggml_backend_registry_init(void) { + static bool initialized = false; + + if (initialized) { + return; + } + + initialized = true; - UNUSED(backend); + ggml_backend_register("CPU", ggml_backend_reg_cpu_init, ggml_backend_cpu_buffer_type(), NULL); + + // add forward decls here to avoid including the backend headers +#ifdef GGML_USE_CUBLAS + extern void ggml_backend_cuda_reg_devices(void); + ggml_backend_cuda_reg_devices(); +#endif + +#ifdef GGML_USE_METAL + extern ggml_backend_t ggml_backend_reg_metal_init(const char * params, void * user_data); + extern ggml_backend_buffer_type_t ggml_backend_metal_buffer_type(void); + ggml_backend_register("Metal", ggml_backend_reg_metal_init, ggml_backend_metal_buffer_type(), NULL); +#endif } -static void ggml_backend_cpu_free(ggml_backend_t backend) { - struct ggml_backend_cpu_context * cpu_ctx = (struct ggml_backend_cpu_context *)backend->context; - free(cpu_ctx->work_data); - free(cpu_ctx); - free(backend); +void ggml_backend_register(const char * name, ggml_backend_init_fn init_fn, ggml_backend_buffer_type_t default_buffer_type, void * user_data) { + GGML_ASSERT(ggml_backend_registry_count < GGML_MAX_BACKENDS_REG); + + int id = ggml_backend_registry_count; + + ggml_backend_registry[id] = (struct ggml_backend_reg) { + /* .name = */ {0}, + /* .fn = */ init_fn, + /* .default_buffer_type = */ default_buffer_type, + /* .user_data = */ user_data, + }; + + snprintf(ggml_backend_registry[id].name, sizeof(ggml_backend_registry[id].name), "%s", name); + +#ifndef NDEBUG + fprintf(stderr, "%s: registered backend %s\n", __func__, name); +#endif + + ggml_backend_registry_count++; +} + +size_t ggml_backend_reg_get_count(void) { + ggml_backend_registry_init(); + + return ggml_backend_registry_count; +} + +size_t ggml_backend_reg_find_by_name(const char * name) { + ggml_backend_registry_init(); + + for (size_t i = 0; i < ggml_backend_registry_count; i++) { + // TODO: case insensitive in a portable way + if (strcmp(ggml_backend_registry[i].name, name) == 0) { + return i; + } + } + return SIZE_MAX; +} + +// init from backend:params string +ggml_backend_t ggml_backend_reg_init_backend_from_str(const char * backend_str) { + ggml_backend_registry_init(); + + const char * params = strchr(backend_str, ':'); + char backend_name[128]; + if (params == NULL) { + strcpy(backend_name, backend_str); + params = ""; + } else { + strncpy(backend_name, backend_str, params - backend_str); + backend_name[params - backend_str] = '\0'; + params++; + } + + size_t backend_i = ggml_backend_reg_find_by_name(backend_name); + if (backend_i == SIZE_MAX) { + fprintf(stderr, "%s: backend %s not found\n", __func__, backend_name); + return NULL; + } + + return ggml_backend_reg_init_backend(backend_i, params); +} + +const char * ggml_backend_reg_get_name(size_t i) { + ggml_backend_registry_init(); + + GGML_ASSERT(i < ggml_backend_registry_count); + return ggml_backend_registry[i].name; +} + +ggml_backend_t ggml_backend_reg_init_backend(size_t i, const char * params) { + ggml_backend_registry_init(); + + GGML_ASSERT(i < ggml_backend_registry_count); + return ggml_backend_registry[i].init_fn(params, ggml_backend_registry[i].user_data); +} + +ggml_backend_buffer_type_t ggml_backend_reg_get_default_buffer_type(size_t i) { + ggml_backend_registry_init(); + + GGML_ASSERT(i < ggml_backend_registry_count); + return ggml_backend_registry[i].default_buffer_type; +} + +ggml_backend_buffer_t ggml_backend_reg_alloc_buffer(size_t i, size_t size) { + ggml_backend_registry_init(); + + GGML_ASSERT(i < ggml_backend_registry_count); + return ggml_backend_buft_alloc_buffer(ggml_backend_registry[i].default_buffer_type, size); } +// backend CPU + static void * ggml_backend_cpu_buffer_get_base(ggml_backend_buffer_t buffer) { return (void *)buffer->context; } static void ggml_backend_cpu_buffer_free_buffer(ggml_backend_buffer_t buffer) { free(buffer->context); - UNUSED(buffer); + GGML_UNUSED(buffer); +} + +static void ggml_backend_cpu_buffer_set_tensor(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + + memcpy((char *)tensor->data + offset, data, size); + + GGML_UNUSED(buffer); +} + +static void ggml_backend_cpu_buffer_get_tensor(ggml_backend_buffer_t buffer, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + + memcpy(data, (const char *)tensor->data + offset, size); + + GGML_UNUSED(buffer); +} + +static void ggml_backend_cpu_buffer_cpy_tensor_from(ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst) { + ggml_backend_tensor_get(src, dst->data, 0, ggml_nbytes(src)); + + GGML_UNUSED(buffer); +} + +static void ggml_backend_cpu_buffer_cpy_tensor_to(ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst) { + ggml_backend_tensor_set(dst, src->data, 0, ggml_nbytes(src)); + + GGML_UNUSED(buffer); } static struct ggml_backend_buffer_i cpu_backend_buffer_i = { - /* .free_buffer = */ ggml_backend_cpu_buffer_free_buffer, - /* .get_base = */ ggml_backend_cpu_buffer_get_base, - /* .get_alloc_size = */ NULL, // defaults to ggml_nbytes - /* .init_tensor = */ NULL, // no initialization required - /* .free_tensor = */ NULL, // no cleanup required + /* .free_buffer = */ ggml_backend_cpu_buffer_free_buffer, + /* .get_base = */ ggml_backend_cpu_buffer_get_base, + /* .init_tensor = */ NULL, // no initialization required + /* .set_tensor = */ ggml_backend_cpu_buffer_set_tensor, + /* .get_tensor = */ ggml_backend_cpu_buffer_get_tensor, + /* .cpy_tensor_from = */ ggml_backend_cpu_buffer_cpy_tensor_from, + /* .cpy_tensor_to = */ ggml_backend_cpu_buffer_cpy_tensor_to, }; // for buffers from ptr, free is not called static struct ggml_backend_buffer_i cpu_backend_buffer_i_from_ptr = { - /* .free_buffer = */ NULL, // ptr is not owned by the buffer, so it does not need to be freed - /* .get_base = */ ggml_backend_cpu_buffer_get_base, - /* .get_alloc_size = */ NULL, // defaults to ggml_nbytes - /* .init_tensor = */ NULL, - /* .free_tensor = */ NULL, + /* .free_buffer = */ NULL, // ptr is not owned by the buffer, so it does not need to be freed + /* .get_base = */ ggml_backend_cpu_buffer_get_base, + /* .init_tensor = */ NULL, // no initialization required + /* .set_tensor = */ ggml_backend_cpu_buffer_set_tensor, + /* .get_tensor = */ ggml_backend_cpu_buffer_get_tensor, + /* .cpy_tensor_from = */ ggml_backend_cpu_buffer_cpy_tensor_from, + /* .cpy_tensor_to = */ ggml_backend_cpu_buffer_cpy_tensor_to, }; static const size_t TENSOR_ALIGNMENT = 64; // should be enough for AVX 512 -static ggml_backend_buffer_t ggml_backend_cpu_alloc_buffer(ggml_backend_t backend, size_t size) { +static ggml_backend_buffer_t ggml_backend_cpu_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { size += TENSOR_ALIGNMENT; // malloc may return an address that is not aligned void * data = malloc(size); // TODO: maybe use GGML_ALIGNED_MALLOC? GGML_ASSERT(data != NULL && "failed to allocate buffer"); - return ggml_backend_buffer_init(backend, cpu_backend_buffer_i, data, size); + return ggml_backend_buffer_init(buft, cpu_backend_buffer_i, data, size); } -static size_t ggml_backend_cpu_get_alignment(ggml_backend_t backend) { +static size_t ggml_backend_cpu_buffer_type_get_alignment(ggml_backend_buffer_type_t buft) { return TENSOR_ALIGNMENT; - UNUSED(backend); -} -static void ggml_backend_cpu_set_tensor_async(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { - GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + GGML_UNUSED(buft); +} - memcpy((char *)tensor->data + offset, data, size); +static bool ggml_backend_cpu_buffer_type_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend) { + return ggml_backend_is_cpu(backend); - UNUSED(backend); + GGML_UNUSED(buft); } -static void ggml_backend_cpu_get_tensor_async(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { - GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); - - memcpy(data, (const char *)tensor->data + offset, size); +ggml_backend_buffer_type_t ggml_backend_cpu_buffer_type(void) { + static struct ggml_backend_buffer_type ggml_backend_buffer_type_cpu = { + /* .iface = */ { + /* .alloc_buffer = */ ggml_backend_cpu_buffer_type_alloc_buffer, + /* .get_alignment = */ ggml_backend_cpu_buffer_type_get_alignment, + /* .get_alloc_size = */ NULL, // defaults to ggml_nbytes + /* .supports_backend = */ ggml_backend_cpu_buffer_type_supports_backend, + }, + /* .context = */ NULL, + }; - UNUSED(backend); + return &ggml_backend_buffer_type_cpu; } -static void ggml_backend_cpu_synchronize(ggml_backend_t backend) { - UNUSED(backend); -} +struct ggml_backend_cpu_context { + int n_threads; + void * work_data; + size_t work_size; +}; -static void ggml_backend_cpu_cpy_tensor_from(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst) { - ggml_backend_tensor_get(src, dst->data, 0, ggml_nbytes(src)); +static const char * ggml_backend_cpu_name(ggml_backend_t backend) { + return "CPU"; - UNUSED(backend); + GGML_UNUSED(backend); } -static void ggml_backend_cpu_cpy_tensor_to(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst) { - ggml_backend_tensor_set(dst, src->data, 0, ggml_nbytes(src)); +static void ggml_backend_cpu_free(ggml_backend_t backend) { + struct ggml_backend_cpu_context * cpu_ctx = (struct ggml_backend_cpu_context *)backend->context; + free(cpu_ctx->work_data); + free(cpu_ctx); + free(backend); +} + +static ggml_backend_buffer_type_t ggml_backend_cpu_get_default_buffer_type(ggml_backend_t backend) { + return ggml_backend_cpu_buffer_type(); - UNUSED(backend); + GGML_UNUSED(backend); } struct ggml_backend_plan_cpu { @@ -334,7 +520,7 @@ static void ggml_backend_cpu_graph_plan_free(ggml_backend_t backend, ggml_backen free(cpu_plan->cplan.work_data); free(cpu_plan); - UNUSED(backend); + GGML_UNUSED(backend); } static void ggml_backend_cpu_graph_plan_compute(ggml_backend_t backend, ggml_backend_graph_plan_t plan) { @@ -342,7 +528,7 @@ static void ggml_backend_cpu_graph_plan_compute(ggml_backend_t backend, ggml_bac ggml_graph_compute(&cpu_plan->cgraph, &cpu_plan->cplan); - UNUSED(backend); + GGML_UNUSED(backend); } static void ggml_backend_cpu_graph_compute(ggml_backend_t backend, struct ggml_cgraph * cgraph) { @@ -363,25 +549,25 @@ static void ggml_backend_cpu_graph_compute(ggml_backend_t backend, struct ggml_c static bool ggml_backend_cpu_supports_op(ggml_backend_t backend, const struct ggml_tensor * op) { return true; - UNUSED(backend); - UNUSED(op); + + GGML_UNUSED(backend); + GGML_UNUSED(op); } static struct ggml_backend_i cpu_backend_i = { - /* .get_name = */ ggml_backend_cpu_name, - /* .free = */ ggml_backend_cpu_free, - /* .alloc_buffer = */ ggml_backend_cpu_alloc_buffer, - /* .get_alignment = */ ggml_backend_cpu_get_alignment, - /* .set_tensor_async = */ ggml_backend_cpu_set_tensor_async, - /* .get_tensor_async = */ ggml_backend_cpu_get_tensor_async, - /* .synchronize = */ ggml_backend_cpu_synchronize, - /* .cpy_tensor_from = */ ggml_backend_cpu_cpy_tensor_from, - /* .cpy_tensor_to = */ ggml_backend_cpu_cpy_tensor_to, - /* .graph_plan_create = */ ggml_backend_cpu_graph_plan_create, - /* .graph_plan_free = */ ggml_backend_cpu_graph_plan_free, - /* .graph_plan_compute = */ ggml_backend_cpu_graph_plan_compute, - /* .graph_compute = */ ggml_backend_cpu_graph_compute, - /* .supports_op = */ ggml_backend_cpu_supports_op, + /* .get_name = */ ggml_backend_cpu_name, + /* .free = */ ggml_backend_cpu_free, + /* .get_default_buffer_type = */ ggml_backend_cpu_get_default_buffer_type, + /* .set_tensor_async = */ NULL, + /* .get_tensor_async = */ NULL, + /* .cpy_tensor_from_async = */ NULL, + /* .cpy_tensor_to_async = */ NULL, + /* .synchronize = */ NULL, + /* .graph_plan_create = */ ggml_backend_cpu_graph_plan_create, + /* .graph_plan_free = */ ggml_backend_cpu_graph_plan_free, + /* .graph_plan_compute = */ ggml_backend_cpu_graph_plan_compute, + /* .graph_compute = */ ggml_backend_cpu_graph_compute, + /* .supports_op = */ ggml_backend_cpu_supports_op, }; ggml_backend_t ggml_backend_cpu_init(void) { @@ -411,10 +597,18 @@ void ggml_backend_cpu_set_n_threads(ggml_backend_t backend_cpu, int n_threads) { ctx->n_threads = n_threads; } -ggml_backend_buffer_t ggml_backend_cpu_buffer_from_ptr(ggml_backend_t backend_cpu, void * ptr, size_t size) { - return ggml_backend_buffer_init(backend_cpu, cpu_backend_buffer_i_from_ptr, ptr, size); +ggml_backend_buffer_t ggml_backend_cpu_buffer_from_ptr(void * ptr, size_t size) { + return ggml_backend_buffer_init(ggml_backend_cpu_buffer_type(), cpu_backend_buffer_i_from_ptr, ptr, size); +} + +static ggml_backend_t ggml_backend_reg_cpu_init(const char * params, void * user_data) { + return ggml_backend_cpu_init(); + + GGML_UNUSED(params); + GGML_UNUSED(user_data); } + // scheduler #define GGML_MAX_BACKENDS 4 @@ -427,7 +621,7 @@ struct ggml_backend_sched_split { int i_end; struct ggml_tensor * inputs[GGML_MAX_SPLIT_INPUTS]; int n_inputs; - struct ggml_cgraph * graph; + struct ggml_cgraph graph; }; struct ggml_backend_sched { @@ -453,7 +647,7 @@ struct ggml_backend_sched { #else __attribute__((aligned(GGML_MEM_ALIGN))) #endif - char context_buffer[GGML_MAX_SPLITS*GGML_MAX_SPLIT_INPUTS*sizeof(struct ggml_tensor) + GGML_MAX_SPLITS*sizeof(struct ggml_cgraph)]; + char context_buffer[GGML_MAX_SPLITS*GGML_MAX_SPLIT_INPUTS*sizeof(struct ggml_tensor) + sizeof(struct ggml_cgraph)]; }; #define hash_id(node) ggml_hash_find_or_insert(sched->hash_set, node) @@ -482,23 +676,57 @@ static int sched_allocr_prio(ggml_backend_sched_t sched, ggml_tallocr_t allocr) return INT_MAX; } +static ggml_backend_t get_buffer_backend(ggml_backend_sched_t sched, ggml_backend_buffer_t buffer) { + if (buffer == NULL) { + return NULL; + } + // find highest prio backend that supports the buffer type + for (int i = 0; i < sched->n_backends; i++) { + if (ggml_backend_buft_supports_backend(buffer->buft, sched->backends[i])) { + return sched->backends[i]; + } + } + GGML_ASSERT(false && "tensor buffer type not supported by any backend"); +} + +static ggml_backend_t get_allocr_backend(ggml_backend_sched_t sched, ggml_tallocr_t allocr) { + if (allocr == NULL) { + return NULL; + } + // find highest prio backend that supports the buffer type + for (int i = 0; i < sched->n_backends; i++) { + if (sched->tallocs[i] == allocr) { + return sched->backends[i]; + } + } + GGML_UNREACHABLE(); +} + +#if 0 +static char causes[GGML_DEFAULT_GRAPH_SIZE*8 + GGML_MAX_SPLITS*GGML_MAX_SPLIT_INPUTS][128]; // debug, remove +#define SET_CAUSE(node, ...) sprintf(causes[hash_id(node)], __VA_ARGS__) +#define GET_CAUSE(node) causes[hash_id(node)] +#else +#define SET_CAUSE(node, ...) +#define GET_CAUSE(node) "" +#endif + // returns the backend that should be used for the node based on the current locations -char causes[GGML_DEFAULT_GRAPH_SIZE*4 + GGML_MAX_SPLITS*GGML_MAX_SPLIT_INPUTS][128]; // debug, remove static ggml_backend_t sched_backend_from_cur(ggml_backend_sched_t sched, struct ggml_tensor * node) { // if the dst tensor is already allocated in a buffer, we must assume that it is critical to keep it there // ie. kv cache updates // note that this doesn't allow fallback to CPU. need to add output tensors to the splits to copy the data back to the original backend. // dst - ggml_backend_t cur_backend = ggml_get_backend(node); + ggml_backend_t cur_backend = get_buffer_backend(sched, node->buffer); if (cur_backend != NULL) { - sprintf(causes[hash_id(node)], "1.dst"); + SET_CAUSE(node, "1.dst"); return cur_backend; } // view_src - if (node->view_src != NULL && ggml_get_backend(node->view_src) != NULL) { - sprintf(causes[hash_id(node)], "1.vsrc"); - return ggml_get_backend(node->view_src); + if (node->view_src != NULL && get_buffer_backend(sched, node->view_src->buffer) != NULL) { + SET_CAUSE(node, "1.vsrc"); + return get_buffer_backend(sched, node->view_src->buffer); } // src @@ -510,7 +738,7 @@ static ggml_backend_t sched_backend_from_cur(ggml_backend_sched_t sched, struct if (src == NULL) { break; } - ggml_backend_t src_backend = ggml_get_backend(src); + ggml_backend_t src_backend = get_buffer_backend(sched, src->buffer); if (src_backend != NULL) { int src_prio = sched_backend_prio(sched, src_backend); size_t src_size = ggml_nbytes(src); @@ -518,7 +746,7 @@ static ggml_backend_t sched_backend_from_cur(ggml_backend_sched_t sched, struct cur_prio = src_prio; cur_size = src_size; cur_backend = src_backend; - sprintf(causes[hash_id(node)], "1.src%d", i); + SET_CAUSE(node, "1.src%d", i); } } } @@ -539,10 +767,12 @@ static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgra int cur_split = 0; for (int i = 0; i < graph->n_nodes; i++) { if (cur_split < sched->n_splits && i == sched->splits[cur_split].i_start) { - ggml_backend_t split_backend = ggml_tallocr_get_buffer(sched->splits[cur_split].tallocr)->backend; - fprintf(stderr, "\n## SPLIT #%d: %s # %d inputs: ", cur_split, ggml_backend_name(split_backend), sched->splits[cur_split].n_inputs); + ggml_backend_t split_backend = get_allocr_backend(sched, sched->splits[cur_split].tallocr); + fprintf(stderr, "\n## SPLIT #%d: %s # %d inputs: ", cur_split, ggml_backend_name(split_backend), + sched->splits[cur_split].n_inputs); for (int j = 0; j < sched->splits[cur_split].n_inputs; j++) { - fprintf(stderr, "[%s (%5.5s)] ", sched->splits[cur_split].inputs[j]->name, fmt_size(ggml_nbytes(sched->splits[cur_split].inputs[j]))); + fprintf(stderr, "[%s (%5.5s)] ", sched->splits[cur_split].inputs[j]->name, + fmt_size(ggml_nbytes(sched->splits[cur_split].inputs[j]))); } fprintf(stderr, "\n"); cur_split++; @@ -552,16 +782,18 @@ static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgra continue; } ggml_tallocr_t node_allocr = node_allocr(node); - ggml_backend_t node_backend = node_allocr ? ggml_tallocr_get_buffer(node_allocr)->backend : NULL; - fprintf(stderr, "node #%3d (%10.10s): %20.20s (%4.4s) [%4.4s %8.8s]:", i, ggml_op_name(node->op), node->name, fmt_size(ggml_nbytes(node)), node_allocr ? ggml_backend_name(node_backend) : "NULL", causes[hash_id(node)]); + ggml_backend_t node_backend = node_allocr ? get_allocr_backend(sched, node_allocr) : NULL; // FIXME: + fprintf(stderr, "node #%3d (%10.10s): %20.20s (%4.4s) [%4.4s %8.8s]:", i, ggml_op_name(node->op), node->name, + fmt_size(ggml_nbytes(node)), node_allocr ? ggml_backend_name(node_backend) : "NULL", GET_CAUSE(node)); for (int j = 0; j < GGML_MAX_SRC; j++) { struct ggml_tensor * src = node->src[j]; if (src == NULL) { break; } ggml_tallocr_t src_allocr = node_allocr(src); - ggml_backend_t src_backend = src_allocr ? ggml_tallocr_get_buffer(src_allocr)->backend : NULL; - fprintf(stderr, " %20.20s (%4.4s) [%4.4s %8.8s]", src->name, fmt_size(ggml_nbytes(src)), src_backend ? ggml_backend_name(src_backend) : "NULL", causes[hash_id(src)]); + ggml_backend_t src_backend = src_allocr ? get_allocr_backend(sched, src_allocr) : NULL; + fprintf(stderr, " %20.20s (%4.4s) [%4.4s %8.8s]", src->name, + fmt_size(ggml_nbytes(src)), src_backend ? ggml_backend_name(src_backend) : "NULL", GET_CAUSE(src)); } fprintf(stderr, "\n"); } @@ -587,9 +819,9 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g sched->n_splits = 0; struct ggml_init_params params = { - /*.mem_size = */ sizeof(sched->context_buffer), - /*.mem_buffer = */ sched->context_buffer, - /*.no_alloc = */ true + /* .mem_size = */ sizeof(sched->context_buffer), + /* .mem_buffer = */ sched->context_buffer, + /* .no_alloc = */ true }; if (sched->ctx != NULL) { @@ -605,9 +837,9 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g // do not overwrite user assignments continue; } - ggml_backend_t leaf_backend = ggml_get_backend(leaf); + ggml_backend_t leaf_backend = get_buffer_backend(sched, leaf->buffer); if (leaf_backend == NULL && leaf->view_src != NULL) { - leaf_backend = ggml_get_backend(leaf->view_src); + leaf_backend = get_buffer_backend(sched, leaf->view_src->buffer); } if (leaf_backend != NULL) { node_allocr(leaf) = ggml_backend_sched_get_tallocr(sched, leaf_backend); @@ -649,7 +881,7 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g cur_prio = src_prio; cur_size = src_size; node_allocr = src_allocr; - sprintf(causes[hash_id(node)], "2.src%d", j); + SET_CAUSE(node, "2.src%d", j); } } } @@ -733,7 +965,7 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g struct ggml_tensor * tensor_copy = ggml_dup_tensor_layout(sched->ctx, src); sched->node_copies[id][cur_backend_id] = tensor_copy; node_allocr(tensor_copy) = cur_allocr; - ggml_backend_t backend = ggml_tallocr_get_buffer(cur_allocr)->backend; + ggml_backend_t backend = get_allocr_backend(sched, cur_allocr); ggml_format_name(tensor_copy, "%s#%s", ggml_backend_name(backend), src->name); } node->src[j] = sched->node_copies[id][cur_backend_id]; @@ -761,8 +993,8 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g ggml_tallocr_t src_allocr = node_allocr(src); if (src_allocr != node_allocr /* && src_backend != NULL */) { // ignore nulls for now fprintf(stderr, "!!!! %s has backend %s, src %d (%s) has backend %s\n", - node->name, node_allocr ? ggml_backend_name(ggml_tallocr_get_buffer(node_allocr)->backend) : "NULL", - j, src->name, src_allocr ? ggml_backend_name(ggml_tallocr_get_buffer(src_allocr)->backend) : "NULL"); + node->name, node_allocr ? ggml_backend_name(get_allocr_backend(sched, node_allocr)) : "NULL", + j, src->name, src_allocr ? ggml_backend_name(get_allocr_backend(sched, src_allocr)) : "NULL"); } } } @@ -773,7 +1005,7 @@ static void sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgraph * g struct ggml_cgraph * graph_copy = ggml_new_graph_custom(sched->ctx, graph->n_nodes + sched->n_splits*GGML_MAX_SPLIT_INPUTS, false); for (int i = 0; i < sched->n_splits; i++) { struct ggml_backend_sched_split * split = &sched->splits[i]; - split->graph = ggml_graph_view(sched->ctx, graph, split->i_start, split->i_end); + split->graph = ggml_graph_view(graph, split->i_start, split->i_end); // add inputs to the graph copy so that they are allocated by ggml-alloc at the start of the split for (int j = 0; j < split->n_inputs; j++) { @@ -806,31 +1038,29 @@ static void sched_compute_splits(ggml_backend_sched_t sched) { for (int i = 0; i < sched->n_splits; i++) { struct ggml_backend_sched_split * split = &splits[i]; - ggml_backend_t split_backend = ggml_tallocr_get_buffer(split->tallocr)->backend; + ggml_backend_t split_backend = get_allocr_backend(sched, split->tallocr); int split_backend_id = sched_backend_prio(sched, split_backend); // copy the input tensors to the split backend uint64_t copy_start_us = ggml_time_us(); for (int j = 0; j < split->n_inputs; j++) { - struct ggml_tensor * input_cpy = sched->node_copies[hash_id(split->inputs[j])][sched_backend_prio(sched, split_backend)]; - if (split->inputs[j]->buffer == NULL) { - if (split->inputs[j]->view_src == NULL) { - fprintf(stderr, "input %s has no buffer and no view_src\n", split->inputs[j]->name); + struct ggml_tensor * input = split->inputs[j]; + struct ggml_tensor * input_cpy = sched->node_copies[hash_id(input)][sched_backend_prio(sched, split_backend)]; + if (input->buffer == NULL) { + if (input->view_src == NULL) { + fprintf(stderr, "input %s has no buffer and no view_src\n", input->name); exit(1); } - struct ggml_tensor * view = split->inputs[j]; - view->backend = view->view_src->backend; - view->buffer = view->view_src->buffer; - view->data = (char *)view->view_src->data + view->view_offs; - ggml_backend_buffer_init_tensor(ggml_backend_sched_get_buffer(sched, view->buffer->backend), view); + // FIXME: may need to use the sched buffer instead + ggml_backend_view_init(input->view_src->buffer, input); } if (input_cpy->buffer == NULL) { fprintf(stderr, "input_cpy %s has no buffer\n", input_cpy->name); exit(1); } - GGML_ASSERT(split->inputs[j]->buffer->backend != input_cpy->buffer->backend); - GGML_ASSERT(input_cpy->buffer->backend == split_backend); - ggml_backend_tensor_copy(split->inputs[j], input_cpy); + //GGML_ASSERT(input->buffer->backend != input_cpy->buffer->backend); + //GGML_ASSERT(input_cpy->buffer->backend == split_backend); + ggml_backend_tensor_copy(input, input_cpy); } // ggml_backend_synchronize(split_backend); int64_t copy_end_us = ggml_time_us(); @@ -843,7 +1073,7 @@ static void sched_compute_splits(ggml_backend_sched_t sched) { #endif uint64_t compute_start_us = ggml_time_us(); - ggml_backend_graph_compute(split_backend, split->graph); + ggml_backend_graph_compute(split_backend, &split->graph); // ggml_backend_synchronize(split_backend); uint64_t compute_end_us = ggml_time_us(); compute_us[split_backend_id] += compute_end_us - compute_start_us; @@ -872,8 +1102,6 @@ ggml_backend_sched_t ggml_backend_sched_new(ggml_backend_t * backends, int n_bac struct ggml_backend_sched * sched = malloc(sizeof(struct ggml_backend_sched)); memset(sched, 0, sizeof(struct ggml_backend_sched)); - fprintf(stderr, "ggml_backend_sched size: %lu KB\n", sizeof(struct ggml_backend_sched)/1024); - sched->n_backends = n_backends; for (int i = 0; i < n_backends; i++) { sched->backends[i] = backends[i]; @@ -948,3 +1176,182 @@ void ggml_backend_sched_set_node_backend(ggml_backend_sched_t sched, struct ggml GGML_ASSERT(backend_index >= 0 && backend_index < sched->n_backends); node_allocr(node) = sched->tallocs[backend_index]; } + +// utils +void ggml_backend_view_init(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor) { + GGML_ASSERT(tensor->buffer == NULL); + GGML_ASSERT(tensor->data == NULL); + GGML_ASSERT(tensor->view_src != NULL); + GGML_ASSERT(tensor->view_src->buffer != NULL); + GGML_ASSERT(tensor->view_src->data != NULL); + + tensor->buffer = buffer; + tensor->data = (char *)tensor->view_src->data + tensor->view_offs; + tensor->backend = tensor->view_src->backend; + ggml_backend_buffer_init_tensor(buffer, tensor); +} + +void ggml_backend_tensor_alloc(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, void * addr) { + GGML_ASSERT(tensor->buffer == NULL); + GGML_ASSERT(tensor->data == NULL); + GGML_ASSERT(tensor->view_src == NULL); + GGML_ASSERT(addr >= ggml_backend_buffer_get_base(buffer)); + GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= + (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)); + + tensor->buffer = buffer; + tensor->data = addr; + ggml_backend_buffer_init_tensor(buffer, tensor); +} + +static struct ggml_tensor * graph_dup_tensor(struct ggml_hash_set hash_set, struct ggml_tensor ** node_copies, + struct ggml_context * ctx_allocated, struct ggml_context * ctx_unallocated, struct ggml_tensor * src) { + + GGML_ASSERT(src != NULL); + GGML_ASSERT(src->data && "graph must be allocated"); + + size_t id = ggml_hash_insert(hash_set, src); + if (id == GGML_HASHTABLE_ALREADY_EXISTS) { + return node_copies[ggml_hash_find(hash_set, src)]; + } + + struct ggml_tensor * dst = ggml_dup_tensor_layout(src->data && !src->view_src ? ctx_allocated : ctx_unallocated, src); + if (src->view_src != NULL) { + dst->view_src = graph_dup_tensor(hash_set, node_copies, ctx_allocated, ctx_unallocated, src->view_src); + dst->view_offs = src->view_offs; + } + dst->op = src->op; + memcpy(dst->op_params, src->op_params, sizeof(dst->op_params)); + ggml_set_name(dst, src->name); + + // copy src + for (int i = 0; i < GGML_MAX_SRC; i++) { + struct ggml_tensor * s = src->src[i]; + if (s == NULL) { + break; + } + dst->src[i] = graph_dup_tensor(hash_set, node_copies, ctx_allocated, ctx_unallocated, s); + } + + node_copies[id] = dst; + return dst; +} + +static void graph_init_tensor(struct ggml_hash_set hash_set, struct ggml_tensor ** node_copies, bool * node_init, struct ggml_tensor * src) { + size_t id = ggml_hash_find(hash_set, src); + if (node_init[id]) { + return; + } + node_init[id] = true; + + struct ggml_tensor * dst = node_copies[id]; + if (dst->view_src != NULL) { + ggml_backend_view_init(dst->view_src->buffer, dst); + } + else { + ggml_backend_tensor_copy(src, dst); + } + + // init src + for (int i = 0; i < GGML_MAX_SRC; i++) { + struct ggml_tensor * s = src->src[i]; + if (s == NULL) { + break; + } + graph_init_tensor(hash_set, node_copies, node_init, s); + } +} + +struct ggml_backend_graph_copy ggml_backend_graph_copy(ggml_backend_t backend, struct ggml_cgraph * graph) { + struct ggml_hash_set hash_set = { + /* .size = */ graph->visited_hash_table.size, + /* .keys = */ calloc(sizeof(hash_set.keys[0]) * graph->visited_hash_table.size, 1) + }; + struct ggml_tensor ** node_copies = calloc(sizeof(node_copies[0]) * hash_set.size, 1); + bool * node_init = calloc(sizeof(node_init[0]) * hash_set.size, 1); + + struct ggml_init_params params = { + /* .mem_size = */ ggml_tensor_overhead()*hash_set.size + ggml_graph_overhead_custom(graph->size, false), + /* .mem_buffer = */ NULL, + /* .no_alloc = */ true + }; + + struct ggml_context * ctx_allocated = ggml_init(params); + struct ggml_context * ctx_unallocated = ggml_init(params); + + // dup nodes + for (int i = 0; i < graph->n_nodes; i++) { + struct ggml_tensor * node = graph->nodes[i]; + graph_dup_tensor(hash_set, node_copies, ctx_allocated, ctx_unallocated, node); + } + + // allocate nodes + ggml_backend_buffer_t buffer = ggml_backend_alloc_ctx_tensors(ctx_allocated, backend); + + //printf("copy buffer size: %zu MB\n", ggml_backend_buffer_get_size(buffer) / 1024 / 1024); + + // copy data and init views + for (int i = 0; i < graph->n_nodes; i++) { + struct ggml_tensor * node = graph->nodes[i]; + graph_init_tensor(hash_set, node_copies, node_init, node); + } + + // build graph copy + struct ggml_cgraph * graph_copy = ggml_new_graph_custom(ctx_allocated, graph->size, false); + for (int i = 0; i < graph->n_nodes; i++) { + struct ggml_tensor * node = graph->nodes[i]; + struct ggml_tensor * node_copy = node_copies[ggml_hash_find(hash_set, node)]; + graph_copy->nodes[i] = node_copy; + } + graph_copy->n_nodes = graph->n_nodes; + + free(hash_set.keys); + free(node_copies); + free(node_init); + + return (struct ggml_backend_graph_copy) { + /* .buffer = */ buffer, + /* .ctx_allocated = */ ctx_allocated, + /* .ctx_unallocated = */ ctx_unallocated, + /* .graph = */ graph_copy, + }; +} + +void ggml_backend_graph_copy_free(struct ggml_backend_graph_copy copy) { + ggml_backend_buffer_free(copy.buffer); + ggml_free(copy.ctx_allocated); + ggml_free(copy.ctx_unallocated); +} + +void ggml_backend_compare_graph_backend(ggml_backend_t backend1, ggml_backend_t backend2, struct ggml_cgraph * graph, ggml_backend_eval_callback callback, void * user_data) { + struct ggml_backend_graph_copy copy = ggml_backend_graph_copy(backend2, graph); + struct ggml_cgraph * g1 = graph; + struct ggml_cgraph * g2 = copy.graph; + + assert(g1->n_nodes == g2->n_nodes); + + for (int i = 0; i < g1->n_nodes; i++) { + //printf("eval %d/%d\n", i, g1->n_nodes); + struct ggml_tensor * t1 = g1->nodes[i]; + struct ggml_tensor * t2 = g2->nodes[i]; + + assert(t1->op == t2->op && ggml_are_same_layout(t1, t2)); + + struct ggml_cgraph g1v = ggml_graph_view(g1, i, i + 1); + struct ggml_cgraph g2v = ggml_graph_view(g2, i, i + 1); + + ggml_backend_graph_compute(backend1, &g1v); + ggml_backend_graph_compute(backend2, &g2v); + + if (ggml_is_view_op(t1->op)) { + continue; + } + + // compare results, calculate rms etc + if (!callback(i, t1, t2, user_data)) { + break; + } + } + + ggml_backend_graph_copy_free(copy); +} diff --git a/ggml-backend.h b/ggml-backend.h index 966687320ac96..58d5ccae6ed10 100644 --- a/ggml-backend.h +++ b/ggml-backend.h @@ -7,41 +7,44 @@ extern "C" { #endif + typedef struct ggml_backend_buffer_type * ggml_backend_buffer_type_t; + typedef struct ggml_backend_buffer * ggml_backend_buffer_t; + typedef struct ggml_backend * ggml_backend_t; + typedef void * ggml_backend_graph_plan_t; + // // Backend buffer // - struct ggml_backend_buffer; - typedef struct ggml_backend_buffer * ggml_backend_buffer_t; + // buffer type + GGML_API ggml_backend_buffer_t ggml_backend_buft_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size); + GGML_API size_t ggml_backend_buft_get_alignment (ggml_backend_buffer_type_t buft); + GGML_API size_t ggml_backend_buft_get_alloc_size(ggml_backend_buffer_type_t buft, struct ggml_tensor * tensor); + GGML_API bool ggml_backend_buft_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend); - // backend buffer functions + // buffer GGML_API void ggml_backend_buffer_free (ggml_backend_buffer_t buffer); - GGML_API size_t ggml_backend_buffer_get_alignment (ggml_backend_buffer_t buffer); GGML_API void * ggml_backend_buffer_get_base (ggml_backend_buffer_t buffer); GGML_API size_t ggml_backend_buffer_get_size (ggml_backend_buffer_t buffer); - GGML_API size_t ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); GGML_API void ggml_backend_buffer_init_tensor (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); - GGML_API void ggml_backend_buffer_free_tensor (ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); + GGML_API size_t ggml_backend_buffer_get_alignment (ggml_backend_buffer_t buffer); + GGML_API size_t ggml_backend_buffer_get_alloc_size(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); + GGML_API ggml_backend_buffer_type_t ggml_backend_buffer_type(ggml_backend_buffer_t buffer); // // Backend // - struct ggml_backend; - typedef struct ggml_backend * ggml_backend_t; - typedef void * ggml_backend_graph_plan_t; - - GGML_API ggml_backend_t ggml_get_backend(const struct ggml_tensor * tensor); GGML_API const char * ggml_backend_name(ggml_backend_t backend); GGML_API void ggml_backend_free(ggml_backend_t backend); - GGML_API ggml_backend_buffer_t ggml_backend_alloc_buffer(ggml_backend_t backend, size_t size); - - GGML_API size_t ggml_backend_get_alignment(ggml_backend_t backend); + GGML_API ggml_backend_buffer_type_t ggml_backend_get_default_buffer_type(ggml_backend_t backend); + GGML_API ggml_backend_buffer_t ggml_backend_alloc_buffer(ggml_backend_t backend, size_t size); + GGML_API size_t ggml_backend_get_alignment(ggml_backend_t backend); - GGML_API void ggml_backend_tensor_set_async( struct ggml_tensor * tensor, const void * data, size_t offset, size_t size); - GGML_API void ggml_backend_tensor_get_async(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size); + GGML_API void ggml_backend_tensor_set_async(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size); + GGML_API void ggml_backend_tensor_get_async(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size); GGML_API void ggml_backend_tensor_set( struct ggml_tensor * tensor, const void * data, size_t offset, size_t size); GGML_API void ggml_backend_tensor_get(const struct ggml_tensor * tensor, void * data, size_t offset, size_t size); @@ -57,6 +60,7 @@ extern "C" { // tensor copy between different backends GGML_API void ggml_backend_tensor_copy(struct ggml_tensor * src, struct ggml_tensor * dst); + GGML_API void ggml_backend_tensor_copy_async(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst); // automatic fallback to sync copy // // CPU backend @@ -68,8 +72,23 @@ extern "C" { GGML_API void ggml_backend_cpu_set_n_threads(ggml_backend_t backend_cpu, int n_threads); // Create a backend buffer from an existing pointer - GGML_API ggml_backend_buffer_t ggml_backend_cpu_buffer_from_ptr(ggml_backend_t backend_cpu, void * ptr, size_t size); + GGML_API ggml_backend_buffer_t ggml_backend_cpu_buffer_from_ptr(void * ptr, size_t size); + + GGML_API ggml_backend_buffer_type_t ggml_backend_cpu_buffer_type(void); + // + // Backend registry + // + + // The backend registry is a registry of all the available backends, and allows initializing backends in a generic way + + GGML_API size_t ggml_backend_reg_get_count(void); + GGML_API size_t ggml_backend_reg_find_by_name(const char * name); + GGML_API ggml_backend_t ggml_backend_reg_init_backend_from_str(const char * backend_str); // str is name[:params] + GGML_API const char * ggml_backend_reg_get_name(size_t i); + GGML_API ggml_backend_t ggml_backend_reg_init_backend(size_t i, const char * params); // params is backend-specific + GGML_API ggml_backend_buffer_type_t ggml_backend_reg_get_default_buffer_type(size_t i); + GGML_API ggml_backend_buffer_t ggml_backend_reg_alloc_buffer(size_t i, size_t size); // // Backend scheduler @@ -131,6 +150,32 @@ extern "C" { ggml_backend_sched_t sched, struct ggml_cgraph * graph); + + // + // Utils + // + + struct ggml_backend_graph_copy { + ggml_backend_buffer_t buffer; + struct ggml_context * ctx_allocated; + struct ggml_context * ctx_unallocated; + struct ggml_cgraph * graph; + }; + + // Copy a graph to a different backend + GGML_API struct ggml_backend_graph_copy ggml_backend_graph_copy(ggml_backend_t backend, struct ggml_cgraph * graph); + GGML_API void ggml_backend_graph_copy_free(struct ggml_backend_graph_copy copy); + + typedef bool (*ggml_backend_eval_callback)(int node_index, struct ggml_tensor * t1, struct ggml_tensor * t2, void * user_data); + + // Compare the output of two backends + GGML_API void ggml_backend_compare_graph_backend(ggml_backend_t backend1, ggml_backend_t backend2, struct ggml_cgraph * graph, ggml_backend_eval_callback callback, void * user_data); + + // Tensor initialization + GGML_API void ggml_backend_tensor_alloc(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, void * addr); + GGML_API void ggml_backend_view_init(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor); + + #ifdef __cplusplus } #endif diff --git a/ggml-cuda.cu b/ggml-cuda.cu index c0c9edd56dbc2..9e1acd3f19e5f 100644 --- a/ggml-cuda.cu +++ b/ggml-cuda.cu @@ -1,11 +1,15 @@ #include +#include +#include +#include #include #include +#include #include #include #include -#include -#include +#include + #if defined(GGML_USE_HIPBLAS) #include @@ -68,6 +72,7 @@ #define cudaOccupancyMaxPotentialBlockSize hipOccupancyMaxPotentialBlockSize #define cudaSetDevice hipSetDevice #define cudaStreamCreateWithFlags hipStreamCreateWithFlags +#define cudaStreamFireAndForget hipStreamFireAndForget #define cudaStreamNonBlocking hipStreamNonBlocking #define cudaStreamSynchronize hipStreamSynchronize #define cudaStreamWaitEvent(stream, event, flags) hipStreamWaitEvent(stream, event, flags) @@ -189,7 +194,7 @@ static_assert(sizeof(half) == sizeof(ggml_fp16_t), "wrong fp16 size"); fprintf(stderr, "\nCUDA error %d at %s:%d: %s\n", err_, __FILE__, __LINE__, \ cudaGetErrorString(err_)); \ fprintf(stderr, "current device: %d\n", id); \ - exit(1); \ + GGML_ASSERT(!"CUDA error"); \ } \ } while (0) @@ -203,7 +208,7 @@ static_assert(sizeof(half) == sizeof(ggml_fp16_t), "wrong fp16 size"); fprintf(stderr, "\ncuBLAS error %d at %s:%d: %s\n", \ err_, __FILE__, __LINE__, cublasGetStatusString(err_)); \ fprintf(stderr, "current device: %d\n", id); \ - exit(1); \ + GGML_ASSERT(!"cuBLAS error"); \ } \ } while (0) #else @@ -215,7 +220,7 @@ static_assert(sizeof(half) == sizeof(ggml_fp16_t), "wrong fp16 size"); cudaGetDevice(&id); \ fprintf(stderr, "\ncuBLAS error %d at %s:%d\n", err_, __FILE__, __LINE__); \ fprintf(stderr, "current device: %d\n", id); \ - exit(1); \ + GGML_ASSERT(!"cuBLAS error"); \ } \ } while (0) #endif // CUDART_VERSION >= 11 @@ -235,7 +240,7 @@ typedef float2 dfloat2; #endif //GGML_CUDA_F16 static __device__ __forceinline__ int get_int_from_int8(const int8_t * x8, const int & i32) { - const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment + const uint16_t * x16 = (const uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment int x32 = 0; x32 |= x16[0] << 0; @@ -245,7 +250,7 @@ static __device__ __forceinline__ int get_int_from_int8(const int8_t * x8, const } static __device__ __forceinline__ int get_int_from_uint8(const uint8_t * x8, const int & i32) { - const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment + const uint16_t * x16 = (const uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment int x32 = 0; x32 |= x16[0] << 0; @@ -255,11 +260,11 @@ static __device__ __forceinline__ int get_int_from_uint8(const uint8_t * x8, con } static __device__ __forceinline__ int get_int_from_int8_aligned(const int8_t * x8, const int & i32) { - return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment + return *((const int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment } static __device__ __forceinline__ int get_int_from_uint8_aligned(const uint8_t * x8, const int & i32) { - return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment + return *((const int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment } template @@ -432,8 +437,6 @@ static_assert(sizeof(block_q6_K) == sizeof(ggml_fp16_t) + 13*QK_K/16, "wrong q6_ #define WARP_SIZE 32 #define MATRIX_ROW_PADDING 512 // last row of quant. matrices is a multiple of this to avoid out-of-bounds memory accesses -#define CUDA_ADD_BLOCK_SIZE 256 -#define CUDA_MUL_BLOCK_SIZE 256 #define CUDA_GELU_BLOCK_SIZE 256 #define CUDA_SILU_BLOCK_SIZE 256 #define CUDA_RELU_BLOCK_SIZE 256 @@ -442,6 +445,7 @@ static_assert(sizeof(block_q6_K) == sizeof(ggml_fp16_t) + 13*QK_K/16, "wrong q6_ #define CUDA_SCALE_BLOCK_SIZE 256 #define CUDA_CLAMP_BLOCK_SIZE 256 #define CUDA_ROPE_BLOCK_SIZE 256 +#define CUDA_SOFT_MAX_BLOCK_SIZE 1024 #define CUDA_ALIBI_BLOCK_SIZE 32 #define CUDA_DIAG_MASK_INF_BLOCK_SIZE 32 #define CUDA_QUANTIZE_BLOCK_SIZE 256 @@ -469,7 +473,7 @@ static_assert(K_QUANTS_PER_ITERATION == 1 || K_QUANTS_PER_ITERATION == 2, "K_QUA #define MUL_MAT_SRC1_COL_STRIDE 128 #define MAX_STREAMS 8 -static cudaStream_t g_cudaStreams[GGML_CUDA_MAX_DEVICES][MAX_STREAMS] = { nullptr }; +static cudaStream_t g_cudaStreams[GGML_CUDA_MAX_DEVICES][MAX_STREAMS] = { { nullptr } }; struct ggml_tensor_extra_gpu { void * data_device[GGML_CUDA_MAX_DEVICES]; // 1 pointer for each device for split tensors @@ -500,40 +504,112 @@ static size_t g_scratch_offset = 0; static cublasHandle_t g_cublas_handles[GGML_CUDA_MAX_DEVICES] = {nullptr}; -static __global__ void add_f32(const float * x, const float * y, float * dst, const int kx, const int ky) { - const int i = blockDim.x*blockIdx.x + threadIdx.x; - - if (i >= kx) { - return; +static __device__ __forceinline__ float warp_reduce_sum(float x) { +#pragma unroll + for (int mask = 16; mask > 0; mask >>= 1) { + x += __shfl_xor_sync(0xffffffff, x, mask, 32); } - dst[i] = x[i] + y[i%ky]; + return x; } -static __global__ void add_f16_f32_f16(const half * x, const float * y, half * dst, const int k) { - const int i = blockDim.x*blockIdx.x + threadIdx.x; +static __device__ __forceinline__ float2 warp_reduce_sum(float2 a) { +#pragma unroll + for (int mask = 16; mask > 0; mask >>= 1) { + a.x += __shfl_xor_sync(0xffffffff, a.x, mask, 32); + a.y += __shfl_xor_sync(0xffffffff, a.y, mask, 32); + } + return a; +} - if (i >= k) { - return; +static __device__ __forceinline__ float warp_reduce_max(float x) { +#pragma unroll + for (int mask = 16; mask > 0; mask >>= 1) { + x = fmaxf(x, __shfl_xor_sync(0xffffffff, x, mask, 32)); } - dst[i] = __hadd(x[i], __float2half(y[i])); + return x; } -static __global__ void add_f16_f32_f32(const half * x, const float * y, float * dst, const int k) { - const int i = blockDim.x*blockIdx.x + threadIdx.x; +static __device__ __forceinline__ float op_repeat(const float a, const float b) { + return b; +} - if (i >= k) { +static __device__ __forceinline__ float op_add(const float a, const float b) { + return a + b; +} + +static __device__ __forceinline__ float op_mul(const float a, const float b) { + return a * b; +} + +static __device__ __forceinline__ float op_div(const float a, const float b) { + return a / b; +} + +template +static __global__ void k_bin_bcast(const src0_t * src0, const src1_t * src1, dst_t * dst, + int ne0, int ne1, int ne2, int ne3, + int ne10, int ne11, int ne12, int ne13, + /*int s0, */ int s1, int s2, int s3, + /*int s10,*/ int s11, int s12, int s13) { + const int i0s = blockDim.x*blockIdx.x + threadIdx.x; + const int i1 = (blockDim.y*blockIdx.y + threadIdx.y); + const int i2 = (blockDim.z*blockIdx.z + threadIdx.z) / ne3; + const int i3 = (blockDim.z*blockIdx.z + threadIdx.z) % ne3; + + if (i0s >= ne0 || i1 >= ne1 || i2 >= ne2 || i3 >= ne3) { return; } - dst[i] = __half2float(x[i]) + y[i]; + + const int i11 = i1 % ne11; + const int i12 = i2 % ne12; + const int i13 = i3 % ne13; + + const size_t i_src0 = i3*s3 + i2*s2 + i1*s1; + const size_t i_src1 = i13*s13 + i12*s12 + i11*s11; + const size_t i_dst = i_src0; + + const src0_t * src0_row = src0 + i_src0; + const src1_t * src1_row = src1 + i_src1; + dst_t * dst_row = dst + i_dst; + + for (int i0 = i0s; i0 < ne0; i0 += blockDim.x*gridDim.x) { + const int i10 = i0 % ne10; + dst_row[i0] = (dst_t)bin_op(src0 ? (float)src0_row[i0] : 0.0f, (float)src1_row[i10]); + } } -static __global__ void mul_f32(const float * x, const float * y, float * dst, const int kx, const int ky) { +template +static __global__ void k_bin_bcast_unravel(const src0_t * src0, const src1_t * src1, dst_t * dst, + int ne0, int ne1, int ne2, int ne3, + int ne10, int ne11, int ne12, int ne13, + /*int s0, */ int s1, int s2, int s3, + /*int s10,*/ int s11, int s12, int s13) { + const int i = blockDim.x*blockIdx.x + threadIdx.x; - if (i >= kx) { + const int i3 = i/(ne2*ne1*ne0); + const int i2 = (i/(ne1*ne0)) % ne2; + const int i1 = (i/ne0) % ne1; + const int i0 = i % ne0; + + if (i0 >= ne0 || i1 >= ne1 || i2 >= ne2 || i3 >= ne3) { return; } - dst[i] = x[i] * y[i%ky]; + + const int i11 = i1 % ne11; + const int i12 = i2 % ne12; + const int i13 = i3 % ne13; + + const size_t i_src0 = i3*s3 + i2*s2 + i1*s1; + const size_t i_src1 = i13*s13 + i12*s12 + i11*s11; + const size_t i_dst = i_src0; + + const src0_t * src0_row = src0 + i_src0; + const src1_t * src1_row = src1 + i_src1; + dst_t * dst_row = dst + i_dst; + + const int i10 = i0 % ne10; + dst_row[i0] = (dst_t)bin_op(src0 ? (float)src0_row[i0] : 0.0f, (float)src1_row[i10]); } static __global__ void gelu_f32(const float * x, float * dst, const int k) { @@ -576,22 +652,11 @@ static __global__ void sqr_f32(const float * x, float * dst, const int k) { dst[i] = x[i] * x[i]; } -static __device__ __forceinline__ float2 warp_reduce_sum(float2 a) { -#pragma unroll - for (int mask = 16; mask > 0; mask >>= 1) { - a.x += __shfl_xor_sync(0xffffffff, a.x, mask, 32); - a.y += __shfl_xor_sync(0xffffffff, a.y, mask, 32); - } - return a; -} - template -static __global__ void norm_f32(const float * x, float * dst, const int ncols) { +static __global__ void norm_f32(const float * x, float * dst, const int ncols, const float eps) { const int row = blockIdx.x*blockDim.y + threadIdx.y; const int tid = threadIdx.x; - const float eps = 1e-5f; - float2 mean_var = make_float2(0.f, 0.f); for (int col = tid; col < ncols; col += block_size) { @@ -623,14 +688,6 @@ static __global__ void norm_f32(const float * x, float * dst, const int ncols) { } } -static __device__ __forceinline__ float warp_reduce_sum(float x) { -#pragma unroll - for (int mask = 16; mask > 0; mask >>= 1) { - x += __shfl_xor_sync(0xffffffff, x, mask, 32); - } - return x; -} - template static __global__ void rms_norm_f32(const float * x, float * dst, const int ncols, const float eps) { const int row = blockIdx.x*blockDim.y + threadIdx.y; @@ -1629,31 +1686,65 @@ static __global__ void quantize_q8_1(const float * __restrict__ x, void * __rest } template -static __global__ void k_get_rows(const void * x, const int32_t * y, dst_t * dst, const int ncols) { - const int col = (blockIdx.x*blockDim.x + threadIdx.x)*2; - const int row = blockDim.y*blockIdx.y + threadIdx.y; - - if (col >= ncols) { +static __global__ void k_get_rows( + const void * src0, const int32_t * src1, dst_t * dst, + int64_t ne00, /*int64_t ne01, int64_t ne02, int64_t ne03,*/ + /*int64_t ne10, int64_t ne11,*/ int64_t ne12, /*int64_t ne13,*/ + /*size_t s0,*/ size_t s1, size_t s2, size_t s3, + /*size_t nb00,*/ size_t nb01, size_t nb02, size_t nb03, + size_t s10, size_t s11, size_t s12/*, size_t s13*/) { + + const int i00 = (blockIdx.x*blockDim.x + threadIdx.x)*2; + const int i10 = blockDim.y*blockIdx.y + threadIdx.y; + const int i11 = (blockIdx.z*blockDim.z + threadIdx.z)/ne12; + const int i12 = (blockIdx.z*blockDim.z + threadIdx.z)%ne12; + + if (i00 >= ne00) { return; } - const int r = y[row]; + const int i01 = src1[i10*s10 + i11*s11 + i12*s12]; - // copy x[r*ncols + col] to dst[row*ncols + col] - const int xi = r*ncols + col; - const int di = row*ncols + col; + dst_t * dst_row = dst + i10*s1 + i11*s2 + i12*s3; + const void * src0_row = (const char *)src0 + i01*nb01 + i11*nb02 + i12*nb03; - const int ib = xi/qk; // block index - const int iqs = (xi%qk)/qr; // quant index - const int iybs = di - di%qk; // y block start index + const int ib = i00/qk; // block index + const int iqs = (i00%qk)/qr; // quant index + const int iybs = i00 - i00%qk; // dst block start index const int y_offset = qr == 1 ? 1 : qk/2; // dequantize dfloat2 v; - dequantize_kernel(x, ib, iqs, v); + dequantize_kernel(src0_row, ib, iqs, v); + + dst_row[iybs + iqs + 0] = v.x; + dst_row[iybs + iqs + y_offset] = v.y; +} + +template +static __global__ void k_get_rows_float( + const src0_t * src0, const int32_t * src1, dst_t * dst, + int64_t ne00, /*int64_t ne01, int64_t ne02, int64_t ne03,*/ + /*int64_t ne10, int64_t ne11,*/ int64_t ne12, /*int64_t ne13,*/ + /*size_t s0,*/ size_t s1, size_t s2, size_t s3, + /*size_t nb00,*/ size_t nb01, size_t nb02, size_t nb03, + size_t s10, size_t s11, size_t s12/*, size_t s13*/) { - dst[iybs + iqs + 0] = v.x; - dst[iybs + iqs + y_offset] = v.y; + const int i00 = blockIdx.x*blockDim.x + threadIdx.x; + const int i10 = blockDim.y*blockIdx.y + threadIdx.y; + const int i11 = (blockIdx.z*blockDim.z + threadIdx.z)/ne12; + const int i12 = (blockIdx.z*blockDim.z + threadIdx.z)%ne12; + + if (i00 >= ne00) { + return; + } + + const int i01 = src1[i10*s10 + i11*s11 + i12*s12]; + + dst_t * dst_row = dst + i10*s1 + i11*s2 + i12*s3; + const src0_t * src0_row = (const src0_t *)((const char *)src0 + i01*nb01 + i11*nb02 + i12*nb03); + + dst_row[i00] = src0_row[i00]; } template @@ -2248,6 +2339,7 @@ static __device__ __forceinline__ float vec_dot_q4_0_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; (void)x_sc; __shared__ int tile_x_qs[mmq_y * (WARP_SIZE) + mmq_y]; __shared__ float tile_x_d[mmq_y * (WARP_SIZE/QI4_0) + mmq_y/QI4_0]; @@ -2259,7 +2351,7 @@ template static __device__ __forceinline__ void allocate_tiles_q4_0( template static __device__ __forceinline__ void load_tiles_q4_0( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { - + (void)x_qh; (void)x_sc; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); GGML_CUDA_ASSUME(k >= 0); @@ -2268,7 +2360,7 @@ template static __device__ __forceinlin const int kbx = k / QI4_0; const int kqsx = k % QI4_0; - const block_q4_0 * bx0 = (block_q4_0 *) vx; + const block_q4_0 * bx0 = (const block_q4_0 *) vx; float * x_dmf = (float *) x_dm; @@ -2306,9 +2398,10 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q4_0_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; (void)x_sc; const int kyqs = k % (QI8_1/2) + QI8_1 * (k / (QI8_1/2)); - const float * x_dmf = (float *) x_dm; + const float * x_dmf = (const float *) x_dm; int u[2*VDR_Q4_0_Q8_1_MMQ]; @@ -2342,6 +2435,7 @@ static __device__ __forceinline__ float vec_dot_q4_1_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; (void)x_sc; __shared__ int tile_x_qs[mmq_y * (WARP_SIZE) + + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI4_1) + mmq_y/QI4_1]; @@ -2353,6 +2447,7 @@ template static __device__ __forceinline__ void allocate_tiles_q4_1( template static __device__ __forceinline__ void load_tiles_q4_1( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; (void)x_sc; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -2362,7 +2457,7 @@ template static __device__ __forceinlin const int kbx = k / QI4_1; const int kqsx = k % QI4_1; - const block_q4_1 * bx0 = (block_q4_1 *) vx; + const block_q4_1 * bx0 = (const block_q4_1 *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2397,6 +2492,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q4_1_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; (void)x_sc; const int kyqs = k % (QI8_1/2) + QI8_1 * (k / (QI8_1/2)); @@ -2434,6 +2530,7 @@ static __device__ __forceinline__ float vec_dot_q5_0_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; (void)x_sc; __shared__ int tile_x_ql[mmq_y * (2*WARP_SIZE) + mmq_y]; __shared__ float tile_x_d[mmq_y * (WARP_SIZE/QI5_0) + mmq_y/QI5_0]; @@ -2445,6 +2542,7 @@ template static __device__ __forceinline__ void allocate_tiles_q5_0( template static __device__ __forceinline__ void load_tiles_q5_0( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; (void)x_sc; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -2454,7 +2552,7 @@ template static __device__ __forceinlin const int kbx = k / QI5_0; const int kqsx = k % QI5_0; - const block_q5_0 * bx0 = (block_q5_0 *) vx; + const block_q5_0 * bx0 = (const block_q5_0 *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2509,6 +2607,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q5_0_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; (void)x_sc; const int kyqs = k % (QI8_1/2) + QI8_1 * (k / (QI8_1/2)); const int index_bx = i * (WARP_SIZE/QI5_0) + i/QI5_0 + k/QI5_0; @@ -2548,6 +2647,7 @@ static __device__ __forceinline__ float vec_dot_q5_1_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; (void)x_sc; __shared__ int tile_x_ql[mmq_y * (2*WARP_SIZE) + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI5_1) + mmq_y/QI5_1]; @@ -2559,6 +2659,7 @@ template static __device__ __forceinline__ void allocate_tiles_q5_1( template static __device__ __forceinline__ void load_tiles_q5_1( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; (void)x_sc; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -2568,7 +2669,7 @@ template static __device__ __forceinlin const int kbx = k / QI5_1; const int kqsx = k % QI5_1; - const block_q5_1 * bx0 = (block_q5_1 *) vx; + const block_q5_1 * bx0 = (const block_q5_1 *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2620,6 +2721,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q5_1_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; (void)x_sc; const int kyqs = k % (QI8_1/2) + QI8_1 * (k / (QI8_1/2)); const int index_bx = i * (WARP_SIZE/QI5_1) + + i/QI5_1 + k/QI5_1; @@ -2654,6 +2756,7 @@ static __device__ __forceinline__ float vec_dot_q8_0_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; (void)x_sc; __shared__ int tile_x_qs[mmq_y * (WARP_SIZE) + mmq_y]; __shared__ float tile_x_d[mmq_y * (WARP_SIZE/QI8_0) + mmq_y/QI8_0]; @@ -2665,6 +2768,7 @@ template static __device__ __forceinline__ void allocate_tiles_q8_0( template static __device__ __forceinline__ void load_tiles_q8_0( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; (void)x_sc; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -2675,7 +2779,7 @@ template static __device__ __forceinlin const int kqsx = k % QI8_0; float * x_dmf = (float *) x_dm; - const block_q8_0 * bx0 = (block_q8_0 *) vx; + const block_q8_0 * bx0 = (const block_q8_0 *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2710,6 +2814,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q8_0_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; (void)x_sc; const float * x_dmf = (const float *) x_dm; const float * y_df = (const float *) y_ds; @@ -2743,6 +2848,7 @@ static __device__ __forceinline__ float vec_dot_q2_K_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; __shared__ int tile_x_ql[mmq_y * (WARP_SIZE) + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI2_K) + mmq_y/QI2_K]; @@ -2756,6 +2862,7 @@ template static __device__ __forceinline__ void allocate_tiles_q2_K( template static __device__ __forceinline__ void load_tiles_q2_K( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -2765,7 +2872,7 @@ template static __device__ __forceinlin const int kbx = k / QI2_K; const int kqsx = k % QI2_K; - const block_q2_K * bx0 = (block_q2_K *) vx; + const block_q2_K * bx0 = (const block_q2_K *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2813,6 +2920,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q2_K_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; const int kbx = k / QI2_K; const int ky = (k % QI2_K) * QR2_K; @@ -2886,7 +2994,7 @@ template static __device__ __forceinlin const int kbx = k / QI3_K; const int kqsx = k % QI3_K; - const block_q3_K * bx0 = (block_q3_K *) vx; + const block_q3_K * bx0 = (const block_q3_K *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -2967,7 +3075,7 @@ static __device__ __forceinline__ float vec_dot_q3_K_q8_1_mul_mat( const float * x_dmf = (const float *) x_dm; const float * y_df = (const float *) y_ds; - const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4; + const int8_t * scales = ((const int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4; int v[QR3_K*VDR_Q3_K_Q8_1_MMQ]; @@ -3082,6 +3190,7 @@ static __device__ __forceinline__ float vec_dot_q4_K_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; __shared__ int tile_x_ql[mmq_y * (WARP_SIZE) + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI4_K) + mmq_y/QI4_K]; @@ -3095,6 +3204,7 @@ template static __device__ __forceinline__ void allocate_tiles_q4_K( template static __device__ __forceinline__ void load_tiles_q4_K( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -3104,7 +3214,7 @@ template static __device__ __forceinlin const int kbx = k / QI4_K; // == 0 if QK_K == 256 const int kqsx = k % QI4_K; // == k if QK_K == 256 - const block_q4_K * bx0 = (block_q4_K *) vx; + const block_q4_K * bx0 = (const block_q4_K *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -3149,7 +3259,7 @@ template static __device__ __forceinlin const block_q4_K * bxi = bx0 + i*blocks_per_row + (k % (WARP_SIZE/8)) / (QI4_K/8); - const int * scales = (int *) bxi->scales; + const int * scales = (const int *) bxi->scales; const int ksc = k % (WARP_SIZE/8); @@ -3164,6 +3274,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q4_K_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; const uint8_t * sc = ((const uint8_t *) &x_sc[i * (WARP_SIZE/8) + i/8 + k/16]) + 2*((k % 16) / 8); @@ -3263,6 +3374,7 @@ static __device__ __forceinline__ float vec_dot_q5_K_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; __shared__ int tile_x_ql[mmq_y * (2*WARP_SIZE) + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI5_K) + mmq_y/QI5_K]; @@ -3276,6 +3388,7 @@ template static __device__ __forceinline__ void allocate_tiles_q5_K( template static __device__ __forceinline__ void load_tiles_q5_K( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -3285,7 +3398,7 @@ template static __device__ __forceinlin const int kbx = k / QI5_K; // == 0 if QK_K == 256 const int kqsx = k % QI5_K; // == k if QK_K == 256 - const block_q5_K * bx0 = (block_q5_K *) vx; + const block_q5_K * bx0 = (const block_q5_K *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -3341,7 +3454,7 @@ template static __device__ __forceinlin const block_q5_K * bxi = bx0 + i*blocks_per_row + (k % (WARP_SIZE/8)) / (QI5_K/8); - const int * scales = (int *) bxi->scales; + const int * scales = (const int *) bxi->scales; const int ksc = k % (WARP_SIZE/8); @@ -3356,6 +3469,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q5_K_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; const uint8_t * sc = ((const uint8_t *) &x_sc[i * (WARP_SIZE/8) + i/8 + k/16]) + 2 * ((k % 16) / 8); @@ -3392,6 +3506,7 @@ static __device__ __forceinline__ float vec_dot_q6_K_q8_1( } template static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) { + (void)x_qh; __shared__ int tile_x_ql[mmq_y * (2*WARP_SIZE) + mmq_y]; __shared__ half2 tile_x_dm[mmq_y * (WARP_SIZE/QI6_K) + mmq_y/QI6_K]; @@ -3405,6 +3520,7 @@ template static __device__ __forceinline__ void allocate_tiles_q6_K( template static __device__ __forceinline__ void load_tiles_q6_K( const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh, int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) { + (void)x_qh; GGML_CUDA_ASSUME(i_offset >= 0); GGML_CUDA_ASSUME(i_offset < nwarps); @@ -3414,7 +3530,7 @@ template static __device__ __forceinlin const int kbx = k / QI6_K; // == 0 if QK_K == 256 const int kqsx = k % QI6_K; // == k if QK_K == 256 - const block_q6_K * bx0 = (block_q6_K *) vx; + const block_q6_K * bx0 = (const block_q6_K *) vx; #pragma unroll for (int i0 = 0; i0 < mmq_y; i0 += nwarps) { @@ -3476,6 +3592,7 @@ template static __device__ __forceinlin static __device__ __forceinline__ float vec_dot_q6_K_q8_1_mul_mat( const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc, const int * __restrict__ y_qs, const half2 * __restrict__ y_ds, const int & i, const int & j, const int & k) { + (void)x_qh; const float * x_dmf = (const float *) x_dm; const float * y_df = (const float *) y_ds; @@ -3518,7 +3635,7 @@ static __device__ __forceinline__ void mul_mat_q( __shared__ int tile_y_qs[mmq_x * WARP_SIZE]; __shared__ half2 tile_y_ds[mmq_x * WARP_SIZE/QI8_1]; - float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {0.0f}; + float sum[mmq_y/WARP_SIZE][mmq_x/nwarps] = {{0.0f}}; for (int ib0 = 0; ib0 < blocks_per_row_x; ib0 += blocks_per_warp) { @@ -4523,6 +4640,116 @@ static __global__ void cpy_f32_f16(const char * cx, char * cdst, const int ne, cpy_1(cx + x_offset, cdst + dst_offset); } +static __device__ void cpy_blck_f32_q8_0(const char * cxi, char * cdsti) { + const float * xi = (const float *) cxi; + block_q8_0 * dsti = (block_q8_0 *) cdsti; + + float amax = 0.0f; // absolute max + + for (int j = 0; j < QK8_0; j++) { + const float v = xi[j]; + amax = fmaxf(amax, fabsf(v)); + } + + const float d = amax / ((1 << 7) - 1); + const float id = d ? 1.0f/d : 0.0f; + + dsti->d = d; + + for (int j = 0; j < QK8_0; ++j) { + const float x0 = xi[j]*id; + + dsti->qs[j] = roundf(x0); + } +} + +static __device__ void cpy_blck_f32_q4_0(const char * cxi, char * cdsti) { + const float * xi = (const float *) cxi; + block_q4_0 * dsti = (block_q4_0 *) cdsti; + + float amax = 0.0f; + float vmax = 0.0f; + + for (int j = 0; j < QK4_0; ++j) { + const float v = xi[j]; + if (amax < fabsf(v)) { + amax = fabsf(v); + vmax = v; + } + } + + const float d = vmax / -8; + const float id = d ? 1.0f/d : 0.0f; + + dsti->d = d; + + for (int j = 0; j < QK4_0/2; ++j) { + const float x0 = xi[0 + j]*id; + const float x1 = xi[QK4_0/2 + j]*id; + + const uint8_t xi0 = min(15, (int8_t)(x0 + 8.5f)); + const uint8_t xi1 = min(15, (int8_t)(x1 + 8.5f)); + + dsti->qs[j] = xi0; + dsti->qs[j] |= xi1 << 4; + } +} + +static __device__ void cpy_blck_f32_q4_1(const char * cxi, char * cdsti) { + const float * xi = (const float *) cxi; + block_q4_1 * dsti = (block_q4_1 *) cdsti; + + float vmin = FLT_MAX; + float vmax = -FLT_MAX; + + for (int j = 0; j < QK4_1; ++j) { + const float v = xi[j]; + + if (v < vmin) vmin = v; + if (v > vmax) vmax = v; + } + + const float d = (vmax - vmin) / ((1 << 4) - 1); + const float id = d ? 1.0f/d : 0.0f; + + dsti->dm.x = d; + dsti->dm.y = vmin; + + for (int j = 0; j < QK4_1/2; ++j) { + const float x0 = (xi[0 + j] - vmin)*id; + const float x1 = (xi[QK4_1/2 + j] - vmin)*id; + + const uint8_t xi0 = min(15, (int8_t)(x0 + 0.5f)); + const uint8_t xi1 = min(15, (int8_t)(x1 + 0.5f)); + + dsti->qs[j] = xi0; + dsti->qs[j] |= xi1 << 4; + } +} + +template +static __global__ void cpy_f32_q(const char * cx, char * cdst, const int ne, + const int ne00, const int ne01, const int nb00, const int nb01, const int nb02, + const int ne10, const int ne11, const int nb10, const int nb11, const int nb12) { + const int i = (blockDim.x*blockIdx.x + threadIdx.x)*qk; + + if (i >= ne) { + return; + } + + const int i02 = i / (ne00*ne01); + const int i01 = (i - i02*ne01*ne00) / ne00; + const int i00 = (i - i02*ne01*ne00 - i01*ne00); + const int x_offset = i00*nb00 + i01*nb01 + i02*nb02; + + const int i12 = i / (ne10*ne11); + const int i11 = (i - i12*ne10*ne11) / ne10; + const int i10 = (i - i12*ne10*ne11 - i11*ne10)/qk; + const int dst_offset = i10*nb10 + i11*nb11 + i12*nb12; + + cpy_blck(cx + x_offset, cdst + dst_offset); +} + static __device__ float rope_yarn_ramp(const float low, const float high, const int i0) { const float y = (i0 / 2 - low) / max(0.001f, high - low); return 1.0f - min(1.0f, max(0.0f, y)); @@ -4583,8 +4810,8 @@ static __global__ void rope( template static __global__ void rope_neox( - const T * x, T * dst, int ncols, const int32_t * pos, float freq_scale, int p_delta_rows, float freq_base, - float ext_factor, float attn_factor, rope_corr_dims corr_dims + const T * x, T * dst, int ncols, int n_dims, const int32_t * pos, float freq_scale, int p_delta_rows, + float ext_factor, float attn_factor, rope_corr_dims corr_dims, float theta_scale, float inv_ndims ) { const int col = 2*(blockDim.y*blockIdx.y + threadIdx.y); @@ -4593,23 +4820,25 @@ static __global__ void rope_neox( } const int row = blockDim.x*blockIdx.x + threadIdx.x; - const int i = row*ncols + col/2; + const int ib = col / n_dims; + const int ic = col % n_dims; + + const int i = row*ncols + ib*n_dims + ic/2; const int i2 = row/p_delta_rows; - // simplified from `(ib * ncols + col) * (-1 / ncols)`, where ib is assumed to be zero - const float cur_rot = -float(col)/ncols; + float cur_rot = inv_ndims * ic - ib; const int p = has_pos ? pos[i2] : 0; - const float theta_base = p*powf(freq_base, cur_rot); + const float theta_base = p*freq_scale*powf(theta_scale, col/2.0f); float cos_theta, sin_theta; rope_yarn(theta_base, freq_scale, corr_dims, cur_rot, ext_factor, attn_factor, &cos_theta, &sin_theta); const float x0 = x[i + 0]; - const float x1 = x[i + ncols/2]; + const float x1 = x[i + n_dims/2]; - dst[i + 0] = x0*cos_theta - x1*sin_theta; - dst[i + ncols/2] = x0*sin_theta + x1*cos_theta; + dst[i + 0] = x0*cos_theta - x1*sin_theta; + dst[i + n_dims/2] = x0*sin_theta + x1*cos_theta; } static __global__ void rope_glm_f32( @@ -4675,6 +4904,65 @@ static __global__ void alibi_f32(const float * x, float * dst, const int ncols, dst[i] = col * m_k + x[i]; } +static __global__ void k_sum_rows_f32(const float * x, float * dst, const int ncols) { + const int row = blockIdx.y; + const int col = threadIdx.x; + + float sum = 0.0f; + for (int i = col; i < ncols; i += blockDim.x) { + sum += x[row * ncols + i]; + } + + sum = warp_reduce_sum(sum); + + if (col == 0) { + dst[row] = sum; + } +} + +template +static inline __device__ void swap(T & a, T & b) { + T tmp = a; + a = b; + b = tmp; +} + +template +static __global__ void k_argsort_f32_i32(const float * x, int * dst, const int ncols) { + // bitonic sort + int col = threadIdx.x; + int row = blockIdx.y; + + if (col >= ncols) return; + + const float * x_row = x + row * ncols; + int * dst_row = dst + row * ncols; + + // initialize indices + if (col < ncols) { + dst_row[col] = col; + } + __syncthreads(); + + for (int k = 2; k <= ncols; k *= 2) { + for (int j = k / 2; j > 0; j /= 2) { + int ixj = col ^ j; + if (ixj > col) { + if ((col & k) == 0) { + if (order == GGML_SORT_ASC ? x_row[dst_row[col]] > x_row[dst_row[ixj]] : x_row[dst_row[col]] < x_row[dst_row[ixj]]) { + swap(dst_row[col], dst_row[ixj]); + } + } else { + if (order == GGML_SORT_ASC ? x_row[dst_row[col]] < x_row[dst_row[ixj]] : x_row[dst_row[col]] > x_row[dst_row[ixj]]) { + swap(dst_row[col], dst_row[ixj]); + } + } + } + __syncthreads(); + } + } +} + static __global__ void diag_mask_inf_f32(const float * x, float * dst, const int ncols, const int rows_per_channel, const int n_past) { const int col = blockDim.y*blockIdx.y + threadIdx.y; const int row = blockDim.x*blockIdx.x + threadIdx.x; @@ -4684,49 +4972,79 @@ static __global__ void diag_mask_inf_f32(const float * x, float * dst, const int } const int i = row*ncols + col; - // dst[i] = col > n_past + row ? -INFINITY : x[i]; - dst[i] = x[i] - (col > n_past + row % rows_per_channel) * INT_MAX; // equivalent within rounding error but slightly faster on GPU + //dst[i] = col > (n_past + row % rows_per_channel) ? -INFINITY : x[i]; + //dst[i] = x[i] - (col > n_past + row % rows_per_channel) * INT_MAX; // equivalent within rounding error but slightly faster on GPU + dst[i] = x[i] - (col > n_past + row % rows_per_channel) * FLT_MAX; } -// the CUDA soft max implementation differs from the CPU implementation -// instead of doubles floats are used -static __global__ void soft_max_f32(const float * x, float * dst, const int ncols) { - const int row = blockDim.x*blockIdx.x + threadIdx.x; - const int block_size = blockDim.y; - const int tid = threadIdx.y; +static __global__ void soft_max_f32(const float * x, const float * y, float * dst, const int ncols, const int nrows_y, const float scale) { + const int tid = threadIdx.x; + const int rowx = blockIdx.x; + const int rowy = rowx % nrows_y; // broadcast the mask (y) in the row dimension + + const int block_size = blockDim.x; + + const int warp_id = threadIdx.x / WARP_SIZE; + const int lane_id = threadIdx.x % WARP_SIZE; + + __shared__ float buf[CUDA_SOFT_MAX_BLOCK_SIZE/WARP_SIZE]; float max_val = -INFINITY; for (int col = tid; col < ncols; col += block_size) { - const int i = row*ncols + col; - max_val = max(max_val, x[i]); + const int ix = rowx*ncols + col; + const int iy = rowy*ncols + col; + max_val = max(max_val, x[ix]*scale + (y ? y[iy] : 0.0f)); } // find the max value in the block -#pragma unroll - for (int mask = 16; mask > 0; mask >>= 1) { - max_val = max(max_val, __shfl_xor_sync(0xffffffff, max_val, mask, 32)); + max_val = warp_reduce_max(max_val); + if (block_size > WARP_SIZE) { + if (warp_id == 0) { + buf[lane_id] = -INFINITY; + } + __syncthreads(); + + if (lane_id == 0) { + buf[warp_id] = max_val; + } + __syncthreads(); + + max_val = buf[lane_id]; + max_val = warp_reduce_max(max_val); } float tmp = 0.f; for (int col = tid; col < ncols; col += block_size) { - const int i = row*ncols + col; - const float val = expf(x[i] - max_val); + const int ix = rowx*ncols + col; + const int iy = rowy*ncols + col; + const float val = expf((x[ix]*scale + (y ? y[iy] : 0.0f)) - max_val); tmp += val; - dst[i] = val; + dst[ix] = val; } - // sum up partial sums -#pragma unroll - for (int mask = 16; mask > 0; mask >>= 1) { - tmp += __shfl_xor_sync(0xffffffff, tmp, mask, 32); + // find the sum of exps in the block + tmp = warp_reduce_sum(tmp); + if (block_size > WARP_SIZE) { + if (warp_id == 0) { + buf[lane_id] = 0.f; + } + __syncthreads(); + + if (lane_id == 0) { + buf[warp_id] = tmp; + } + __syncthreads(); + + tmp = buf[lane_id]; + tmp = warp_reduce_sum(tmp); } const float inv_tmp = 1.f / tmp; for (int col = tid; col < ncols; col += block_size) { - const int i = row*ncols + col; + const int i = rowx*ncols + col; dst[i] *= inv_tmp; } } @@ -4771,33 +5089,186 @@ static __global__ void im2col_f32_f16( } template -static void get_rows_cuda(const void * x, const int32_t * y, float * dst, const int nrows, const int ncols, cudaStream_t stream) { +static void get_rows_cuda(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const void * src0_dd, const int32_t * src1_dd, float * dst_dd, cudaStream_t stream) { + + GGML_TENSOR_BINARY_OP_LOCALS + const dim3 block_dims(CUDA_GET_ROWS_BLOCK_SIZE, 1, 1); - const int block_num_x = (ncols + 2*CUDA_GET_ROWS_BLOCK_SIZE - 1) / (2*CUDA_GET_ROWS_BLOCK_SIZE); - const dim3 block_nums(block_num_x, nrows, 1); - k_get_rows<<>>(x, y, dst, ncols); -} + const int block_num_x = (ne00 + 2*CUDA_GET_ROWS_BLOCK_SIZE - 1) / (2*CUDA_GET_ROWS_BLOCK_SIZE); + const dim3 block_nums(block_num_x, ne10, ne11*ne12); + + // strides in elements + //const size_t s0 = nb0 / ggml_element_size(dst); + const size_t s1 = nb1 / ggml_element_size(dst); + const size_t s2 = nb2 / ggml_element_size(dst); + const size_t s3 = nb3 / ggml_element_size(dst); + + const size_t s10 = nb10 / ggml_element_size(src1); + const size_t s11 = nb11 / ggml_element_size(src1); + const size_t s12 = nb12 / ggml_element_size(src1); + //const size_t s13 = nb13 / ggml_element_size(src1); + + GGML_ASSERT(ne00 % 2 == 0); + + k_get_rows<<>>( + src0_dd, src1_dd, dst_dd, + ne00, /*ne01, ne02, ne03,*/ + /*ne10, ne11,*/ ne12, /*ne13,*/ + /* s0,*/ s1, s2, s3, + /* nb00,*/ nb01, nb02, nb03, + s10, s11, s12/*, s13*/); -static void add_f32_cuda(const float * x, const float * y, float * dst, const int kx, const int ky, cudaStream_t stream) { - const int num_blocks = (kx + CUDA_ADD_BLOCK_SIZE - 1) / CUDA_ADD_BLOCK_SIZE; - add_f32<<>>(x, y, dst, kx, ky); + (void) dst; } -static void add_f16_f32_f16_cuda(const half * x, const float * y, half * dst, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_ADD_BLOCK_SIZE - 1) / CUDA_ADD_BLOCK_SIZE; - add_f16_f32_f16<<>>(x, y, dst, k); -} +template +static void get_rows_cuda_float(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const src0_t * src0_dd, const int32_t * src1_dd, float * dst_dd, cudaStream_t stream) { -static void add_f16_f32_f32_cuda(const half * x, const float * y, float * dst, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_ADD_BLOCK_SIZE - 1) / CUDA_ADD_BLOCK_SIZE; - add_f16_f32_f32<<>>(x, y, dst, k); -} + GGML_TENSOR_BINARY_OP_LOCALS + + const dim3 block_dims(CUDA_GET_ROWS_BLOCK_SIZE, 1, 1); + const int block_num_x = (ne00 + CUDA_GET_ROWS_BLOCK_SIZE - 1) / CUDA_GET_ROWS_BLOCK_SIZE; + const dim3 block_nums(block_num_x, ne10, ne11*ne12); + + // strides in elements + //const size_t s0 = nb0 / ggml_element_size(dst); + const size_t s1 = nb1 / ggml_element_size(dst); + const size_t s2 = nb2 / ggml_element_size(dst); + const size_t s3 = nb3 / ggml_element_size(dst); + + const size_t s10 = nb10 / ggml_element_size(src1); + const size_t s11 = nb11 / ggml_element_size(src1); + const size_t s12 = nb12 / ggml_element_size(src1); + //const size_t s13 = nb13 / ggml_element_size(src1); + + k_get_rows_float<<>>( + src0_dd, src1_dd, dst_dd, + ne00, /*ne01, ne02, ne03,*/ + /*ne10, ne11,*/ ne12, /*ne13,*/ + /* s0,*/ s1, s2, s3, + /* nb00,*/ nb01, nb02, nb03, + s10, s11, s12/*, s13*/); -static void mul_f32_cuda(const float * x, const float * y, float * dst, const int kx, const int ky, cudaStream_t stream) { - const int num_blocks = (kx + CUDA_MUL_BLOCK_SIZE - 1) / CUDA_MUL_BLOCK_SIZE; - mul_f32<<>>(x, y, dst, kx, ky); + (void) dst; } +template +struct bin_bcast_cuda { + template + void operator()(const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst, + const src0_t * src0_dd, const src1_t * src1_dd, dst_t * dst_dd, + cudaStream_t stream) { + + GGML_TENSOR_BINARY_OP_LOCALS + + int nr0 = ne10/ne0; + int nr1 = ne11/ne1; + int nr2 = ne12/ne2; + int nr3 = ne13/ne3; + + int nr[4] = { nr0, nr1, nr2, nr3 }; + + // collapse dimensions until first broadcast dimension + int64_t cne0[] = {ne0, ne1, ne2, ne3}; + int64_t cne1[] = {ne10, ne11, ne12, ne13}; + size_t cnb0[] = {nb0, nb1, nb2, nb3}; + size_t cnb1[] = {nb10, nb11, nb12, nb13}; + auto collapse = [](int64_t cne[]) { + cne[0] *= cne[1]; + cne[1] = cne[2]; + cne[2] = cne[3]; + cne[3] = 1; + }; + + auto collapse_nb = [](size_t cnb[], int64_t cne[]) { + cnb[1] *= cne[1]; + cnb[2] *= cne[2]; + cnb[3] *= cne[3]; + }; + + for (int i = 0; i < 4; i++) { + if (nr[i] != 1) { + break; + } + if (i > 0) { + collapse_nb(cnb0, cne0); + collapse_nb(cnb1, cne1); + collapse(cne0); + collapse(cne1); + } + } + { + int64_t ne0 = cne0[0]; + int64_t ne1 = cne0[1]; + int64_t ne2 = cne0[2]; + int64_t ne3 = cne0[3]; + + int64_t ne10 = cne1[0]; + int64_t ne11 = cne1[1]; + int64_t ne12 = cne1[2]; + int64_t ne13 = cne1[3]; + + size_t nb0 = cnb0[0]; + size_t nb1 = cnb0[1]; + size_t nb2 = cnb0[2]; + size_t nb3 = cnb0[3]; + + size_t nb10 = cnb1[0]; + size_t nb11 = cnb1[1]; + size_t nb12 = cnb1[2]; + size_t nb13 = cnb1[3]; + + size_t s0 = nb0 / sizeof(src1_t); + size_t s1 = nb1 / sizeof(src1_t); + size_t s2 = nb2 / sizeof(src1_t); + size_t s3 = nb3 / sizeof(src1_t); + + size_t s10 = nb10 / sizeof(src1_t); + size_t s11 = nb11 / sizeof(src1_t); + size_t s12 = nb12 / sizeof(src1_t); + size_t s13 = nb13 / sizeof(src1_t); + + GGML_ASSERT(s0 == 1); + GGML_ASSERT(s10 == 1); + + const int block_size = 128; + + int64_t hne0 = std::max(ne0/2LL, 1LL); + + dim3 block_dims; + block_dims.x = std::min(hne0, block_size); + block_dims.y = std::min(ne1, block_size / block_dims.x); + block_dims.z = std::min(std::min(ne2*ne3, block_size / block_dims.x / block_dims.y), 64U); + + dim3 block_nums( + (hne0 + block_dims.x - 1) / block_dims.x, + (ne1 + block_dims.y - 1) / block_dims.y, + (ne2*ne3 + block_dims.z - 1) / block_dims.z + ); + + if (block_nums.z > 65535) { + // this is the maximum number of blocks in z direction, fallback to 1D grid kernel + int block_num = (ne0*ne1*ne2*ne3 + block_size - 1) / block_size; + k_bin_bcast_unravel<<>>( + src0_dd, src1_dd, dst_dd, + ne0, ne1, ne2, ne3, + ne10, ne11, ne12, ne13, + /* s0, */ s1, s2, s3, + /* s10, */ s11, s12, s13); + } else { + k_bin_bcast<<>>( + src0_dd, src1_dd, dst_dd, + ne0, ne1, ne2, ne3, + ne10, ne11, ne12, ne13, + /* s0, */ s1, s2, s3, + /* s10, */ s11, s12, s13); + } + } + } +}; + static void gelu_f32_cuda(const float * x, float * dst, const int k, cudaStream_t stream) { const int num_blocks = (k + CUDA_GELU_BLOCK_SIZE - 1) / CUDA_GELU_BLOCK_SIZE; gelu_f32<<>>(x, dst, k); @@ -4818,14 +5289,14 @@ static void sqr_f32_cuda(const float * x, float * dst, const int k, cudaStream_t sqr_f32<<>>(x, dst, k); } -static void norm_f32_cuda(const float * x, float * dst, const int ncols, const int nrows, cudaStream_t stream) { +static void norm_f32_cuda(const float * x, float * dst, const int ncols, const int nrows, const float eps, cudaStream_t stream) { GGML_ASSERT(ncols % WARP_SIZE == 0); if (ncols < 1024) { const dim3 block_dims(WARP_SIZE, 1, 1); - norm_f32<<>>(x, dst, ncols); + norm_f32<<>>(x, dst, ncols, eps); } else { const dim3 block_dims(1024, 1, 1); - norm_f32<1024><<>>(x, dst, ncols); + norm_f32<1024><<>>(x, dst, ncols, eps); } } @@ -4847,34 +5318,10 @@ static void quantize_row_q8_1_cuda(const float * x, void * vy, const int kx, con quantize_q8_1<<>>(x, vy, kx, kx_padded); } -template -static void dequantize_row_q4_0_cuda(const void * vx, dst_t * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<<>>(vx, y, k); -} - -template -static void dequantize_row_q4_1_cuda(const void * vx, dst_t * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<<>>(vx, y, k); -} - -template -static void dequantize_row_q5_0_cuda(const void * vx, dst_t * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<<>>(vx, y, k); -} - -template -static void dequantize_row_q5_1_cuda(const void * vx, dst_t * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<<>>(vx, y, k); -} - -template -static void dequantize_row_q8_0_cuda(const void * vx, dst_t * y, const int k, cudaStream_t stream) { +template +static void dequantize_block_cuda(const void * __restrict__ vx, dst_t * __restrict__ y, const int k, cudaStream_t stream) { const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<<>>(vx, y, k); + dequantize_block<<>>(vx, y, k); } template @@ -4923,17 +5370,75 @@ static void dequantize_row_q6_K_cuda(const void * vx, dst_t * y, const int k, cu #endif } -static void dequantize_mul_mat_vec_q4_0_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { - GGML_ASSERT(ncols % GGML_CUDA_DMMV_X == 0); - const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; - // the number of rows may exceed maximum grid size in the y or z dimensions, use the x dimension instead - const dim3 block_nums(block_num_y, 1, 1); - const dim3 block_dims(WARP_SIZE, GGML_CUDA_MMV_Y, 1); - dequantize_mul_mat_vec - <<>>(vx, y, dst, ncols, nrows); +static to_fp16_cuda_t ggml_get_to_fp16_cuda(ggml_type type) { + switch (type) { + case GGML_TYPE_Q4_0: + return dequantize_block_cuda; + case GGML_TYPE_Q4_1: + return dequantize_block_cuda; + case GGML_TYPE_Q5_0: + return dequantize_block_cuda; + case GGML_TYPE_Q5_1: + return dequantize_block_cuda; + case GGML_TYPE_Q8_0: + return dequantize_block_cuda; + case GGML_TYPE_Q2_K: + return dequantize_row_q2_K_cuda; + case GGML_TYPE_Q3_K: + return dequantize_row_q3_K_cuda; + case GGML_TYPE_Q4_K: + return dequantize_row_q4_K_cuda; + case GGML_TYPE_Q5_K: + return dequantize_row_q5_K_cuda; + case GGML_TYPE_Q6_K: + return dequantize_row_q6_K_cuda; + case GGML_TYPE_F32: + return dequantize_block_cuda<1, 1, convert_f32>; + default: + return nullptr; + } } -static void dequantize_mul_mat_vec_q4_1_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { +static to_fp32_cuda_t ggml_get_to_fp32_cuda(ggml_type type) { + switch (type) { + case GGML_TYPE_Q4_0: + return dequantize_block_cuda; + case GGML_TYPE_Q4_1: + return dequantize_block_cuda; + case GGML_TYPE_Q5_0: + return dequantize_block_cuda; + case GGML_TYPE_Q5_1: + return dequantize_block_cuda; + case GGML_TYPE_Q8_0: + return dequantize_block_cuda; + case GGML_TYPE_Q2_K: + return dequantize_row_q2_K_cuda; + case GGML_TYPE_Q3_K: + return dequantize_row_q3_K_cuda; + case GGML_TYPE_Q4_K: + return dequantize_row_q4_K_cuda; + case GGML_TYPE_Q5_K: + return dequantize_row_q5_K_cuda; + case GGML_TYPE_Q6_K: + return dequantize_row_q6_K_cuda; + case GGML_TYPE_F16: + return dequantize_block_cuda<1, 1, convert_f16>; + default: + return nullptr; + } +} + +static void dequantize_mul_mat_vec_q4_0_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { + GGML_ASSERT(ncols % GGML_CUDA_DMMV_X == 0); + const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; + // the number of rows may exceed maximum grid size in the y or z dimensions, use the x dimension instead + const dim3 block_nums(block_num_y, 1, 1); + const dim3 block_dims(WARP_SIZE, GGML_CUDA_MMV_Y, 1); + dequantize_mul_mat_vec + <<>>(vx, y, dst, ncols, nrows); +} + +static void dequantize_mul_mat_vec_q4_1_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { GGML_ASSERT(ncols % GGML_CUDA_DMMV_X == 0); const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; const dim3 block_nums(block_num_y, 1, 1); @@ -5011,6 +5516,15 @@ static void dequantize_mul_mat_vec_q6_K_cuda(const void * vx, const float * y, f dequantize_mul_mat_vec_q6_k<<>>(vx, y, dst, ncols, nrows); } +static void convert_mul_mat_vec_f16_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { + GGML_ASSERT(ncols % GGML_CUDA_DMMV_X == 0); + const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; + const dim3 block_nums(block_num_y, 1, 1); + const dim3 block_dims(WARP_SIZE, GGML_CUDA_MMV_Y, 1); + dequantize_mul_mat_vec<1, 1, convert_f16> + <<>>(vx, y, dst, ncols, nrows); +} + static void mul_mat_vec_q4_0_q8_1_cuda(const void * vx, const void * vy, float * dst, const int ncols, const int nrows, cudaStream_t stream) { GGML_ASSERT(ncols % QK4_0 == 0); const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; @@ -5101,83 +5615,6 @@ static void mul_mat_vec_q6_K_q8_1_cuda(const void * vx, const void * vy, float * <<>>(vx, vy, dst, ncols, nrows); } -static void convert_fp16_to_fp32_cuda(const void * vx, float * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_DEQUANTIZE_BLOCK_SIZE - 1) / CUDA_DEQUANTIZE_BLOCK_SIZE; - dequantize_block<1, 1, convert_f16><<>>(vx, y, k); -} - -static void convert_fp32_to_fp16_cuda(const void * vx, half * y, const int k, cudaStream_t stream) { - const int num_blocks = (k + CUDA_QUANTIZE_BLOCK_SIZE - 1) / CUDA_QUANTIZE_BLOCK_SIZE; - dequantize_block<1, 1, convert_f32><<>>(vx, y, k); -} - -static void convert_mul_mat_vec_f16_cuda(const void * vx, const dfloat * y, float * dst, const int ncols, const int nrows, cudaStream_t stream) { - GGML_ASSERT(ncols % GGML_CUDA_DMMV_X == 0); - const int block_num_y = (nrows + GGML_CUDA_MMV_Y - 1) / GGML_CUDA_MMV_Y; - const dim3 block_nums(block_num_y, 1, 1); - const dim3 block_dims(WARP_SIZE, GGML_CUDA_MMV_Y, 1); - dequantize_mul_mat_vec<1, 1, convert_f16> - <<>>(vx, y, dst, ncols, nrows); -} - -static to_fp16_cuda_t ggml_get_to_fp16_cuda(ggml_type type) { - switch (type) { - case GGML_TYPE_Q4_0: - return dequantize_row_q4_0_cuda; - case GGML_TYPE_Q4_1: - return dequantize_row_q4_1_cuda; - case GGML_TYPE_Q5_0: - return dequantize_row_q5_0_cuda; - case GGML_TYPE_Q5_1: - return dequantize_row_q5_1_cuda; - case GGML_TYPE_Q8_0: - return dequantize_row_q8_0_cuda; - case GGML_TYPE_Q2_K: - return dequantize_row_q2_K_cuda; - case GGML_TYPE_Q3_K: - return dequantize_row_q3_K_cuda; - case GGML_TYPE_Q4_K: - return dequantize_row_q4_K_cuda; - case GGML_TYPE_Q5_K: - return dequantize_row_q5_K_cuda; - case GGML_TYPE_Q6_K: - return dequantize_row_q6_K_cuda; - case GGML_TYPE_F32: - return convert_fp32_to_fp16_cuda; - default: - return nullptr; - } -} - -static to_fp32_cuda_t ggml_get_to_fp32_cuda(ggml_type type) { - switch (type) { - case GGML_TYPE_Q4_0: - return dequantize_row_q4_0_cuda; - case GGML_TYPE_Q4_1: - return dequantize_row_q4_1_cuda; - case GGML_TYPE_Q5_0: - return dequantize_row_q5_0_cuda; - case GGML_TYPE_Q5_1: - return dequantize_row_q5_1_cuda; - case GGML_TYPE_Q8_0: - return dequantize_row_q8_0_cuda; - case GGML_TYPE_Q2_K: - return dequantize_row_q2_K_cuda; - case GGML_TYPE_Q3_K: - return dequantize_row_q3_K_cuda; - case GGML_TYPE_Q4_K: - return dequantize_row_q4_K_cuda; - case GGML_TYPE_Q5_K: - return dequantize_row_q5_K_cuda; - case GGML_TYPE_Q6_K: - return dequantize_row_q6_K_cuda; - case GGML_TYPE_F16: - return convert_fp16_to_fp32_cuda; - default: - return nullptr; - } -} - static void ggml_mul_mat_q4_0_q8_1_cuda( const void * vx, const void * vy, float * dst, const int ncols_x, const int nrows_x, const int ncols_y, const int nrows_y, const int nrows_dst, cudaStream_t stream) { @@ -5670,6 +6107,39 @@ static void ggml_cpy_f32_f16_cuda( (cx, cdst, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12); } +static void ggml_cpy_f32_q8_0_cuda( + const char * cx, char * cdst, const int ne, + const int ne00, const int ne01, const int nb00, const int nb01, const int nb02, + const int ne10, const int ne11, const int nb10, const int nb11, const int nb12, cudaStream_t stream) { + + GGML_ASSERT(ne % QK8_0 == 0); + const int num_blocks = ne / QK8_0; + cpy_f32_q<<>> + (cx, cdst, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12); +} + +static void ggml_cpy_f32_q4_0_cuda( + const char * cx, char * cdst, const int ne, + const int ne00, const int ne01, const int nb00, const int nb01, const int nb02, + const int ne10, const int ne11, const int nb10, const int nb11, const int nb12, cudaStream_t stream) { + + GGML_ASSERT(ne % QK4_0 == 0); + const int num_blocks = ne / QK4_0; + cpy_f32_q<<>> + (cx, cdst, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12); +} + +static void ggml_cpy_f32_q4_1_cuda( + const char * cx, char * cdst, const int ne, + const int ne00, const int ne01, const int nb00, const int nb01, const int nb02, + const int ne10, const int ne11, const int nb10, const int nb11, const int nb12, cudaStream_t stream) { + + GGML_ASSERT(ne % QK4_1 == 0); + const int num_blocks = ne / QK4_1; + cpy_f32_q<<>> + (cx, cdst, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12); +} + static void ggml_cpy_f16_f16_cuda( const char * cx, char * cdst, const int ne, const int ne00, const int ne01, const int nb00, const int nb01, const int nb02, @@ -5712,20 +6182,26 @@ static void rope_cuda( template static void rope_neox_cuda( - const T * x, T * dst, int ncols, int nrows, const int32_t * pos, float freq_scale, int p_delta_rows, + const T * x, T * dst, int ncols, int n_dims, int nrows, const int32_t * pos, float freq_scale, int p_delta_rows, float freq_base, float ext_factor, float attn_factor, rope_corr_dims corr_dims, cudaStream_t stream ) { GGML_ASSERT(ncols % 2 == 0); const dim3 block_dims(1, CUDA_ROPE_BLOCK_SIZE, 1); const int num_blocks_x = (ncols + 2*CUDA_ROPE_BLOCK_SIZE - 1) / (2*CUDA_ROPE_BLOCK_SIZE); const dim3 block_nums(nrows, num_blocks_x, 1); + + const float theta_scale = powf(freq_base, -2.0f/n_dims); + const float inv_ndims = -1.0f / n_dims; + if (pos == nullptr) { rope_neox<<>>( - x, dst, ncols, pos, freq_scale, p_delta_rows, freq_base, ext_factor, attn_factor, corr_dims + x, dst, ncols, n_dims, pos, freq_scale, p_delta_rows, ext_factor, attn_factor, corr_dims, + theta_scale, inv_ndims ); } else { rope_neox<<>>( - x, dst, ncols, pos, freq_scale, p_delta_rows, freq_base, ext_factor, attn_factor, corr_dims + x, dst, ncols, n_dims, pos, freq_scale, p_delta_rows, ext_factor, attn_factor, corr_dims, + theta_scale, inv_ndims ); } } @@ -5750,6 +6226,27 @@ static void alibi_f32_cuda(const float * x, float * dst, const int ncols, const alibi_f32<<>>(x, dst, ncols, k_rows, n_heads_log2_floor, m0, m1); } +static void sum_rows_f32_cuda(const float * x, float * dst, const int ncols, const int nrows, cudaStream_t stream) { + const dim3 block_dims(WARP_SIZE, 1, 1); + const dim3 block_nums(1, nrows, 1); + k_sum_rows_f32<<>>(x, dst, ncols); +} + +static void argsort_f32_i32_cuda(const float * x, int * dst, const int ncols, const int nrows, ggml_sort_order order, cudaStream_t stream) { + // bitonic sort requires ncols to be power of 2 + GGML_ASSERT((ncols & (ncols - 1)) == 0); + + const dim3 block_dims(ncols, 1, 1); + const dim3 block_nums(1, nrows, 1); + if (order == GGML_SORT_ASC) { + k_argsort_f32_i32<<>>(x, dst, ncols); + } else if (order == GGML_SORT_DESC) { + k_argsort_f32_i32<<>>(x, dst, ncols); + } else { + GGML_ASSERT(false); + } +} + static void diag_mask_inf_f32_cuda(const float * x, float * dst, const int ncols_x, const int nrows_x, const int rows_per_channel, const int n_past, cudaStream_t stream) { const dim3 block_dims(1, CUDA_DIAG_MASK_INF_BLOCK_SIZE, 1); const int block_num_x = (ncols_x + CUDA_DIAG_MASK_INF_BLOCK_SIZE - 1) / CUDA_DIAG_MASK_INF_BLOCK_SIZE; @@ -5757,10 +6254,12 @@ static void diag_mask_inf_f32_cuda(const float * x, float * dst, const int ncols diag_mask_inf_f32<<>>(x, dst, ncols_x, rows_per_channel, n_past); } -static void soft_max_f32_cuda(const float * x, float * dst, const int ncols_x, const int nrows_x, cudaStream_t stream) { - const dim3 block_dims(1, WARP_SIZE, 1); +static void soft_max_f32_cuda(const float * x, const float * y, float * dst, const int ncols_x, const int nrows_x, const int nrows_y, const float scale, cudaStream_t stream) { + int nth = WARP_SIZE; + while (nth < ncols_x && nth < CUDA_SOFT_MAX_BLOCK_SIZE) nth *= 2; + const dim3 block_dims(nth, 1, 1); const dim3 block_nums(nrows_x, 1, 1); - soft_max_f32<<>>(x, dst, ncols_x); + soft_max_f32<<>>(x, y, dst, ncols_x, nrows_y, scale); } static void im2col_f32_f16_cuda(const float * x, half * dst, @@ -6037,99 +6536,40 @@ static cudaError_t ggml_cuda_cpy_tensor_2d( } } -static void ggml_cuda_op_repeat( - const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, - const float * src0_d, const float * src1_d, float * dst_d, const cudaStream_t & stream) { - // guaranteed to be an integer due to the check in ggml_can_repeat - const int64_t ne0 = dst->ne[0]; - const int64_t ne1 = dst->ne[1]; - const int64_t ne2 = dst->ne[2]; - const int64_t ne3 = dst->ne[3]; - - const int64_t ne00 = src0->ne[0]; - const int64_t ne01 = src0->ne[1]; - const int64_t ne02 = src0->ne[2]; - const int64_t ne03 = src0->ne[3]; - - const size_t nb0 = dst->nb[0]; - const size_t nb1 = dst->nb[1]; - const size_t nb2 = dst->nb[2]; - const size_t nb3 = dst->nb[3]; - - const size_t nb00 = src0->nb[0]; - const size_t nb01 = src0->nb[1]; - const size_t nb02 = src0->nb[2]; - const size_t nb03 = src0->nb[3]; - - const int nr0 = (int)(ne0/ne00); - const int nr1 = (int)(ne1/ne01); - const int nr2 = (int)(ne2/ne02); - const int nr3 = (int)(ne3/ne03); - - // TODO: support for transposed / permuted tensors - GGML_ASSERT(nb0 == sizeof(float)); - GGML_ASSERT(nb00 == sizeof(float)); - - // TODO: very inefficient, implement in a kernel, or fewer cudaMemcpyAsync calls for contiguous tensors - for (int i3 = 0; i3 < nr3; i3++) { - for (int k3 = 0; k3 < ne03; k3++) { - for (int i2 = 0; i2 < nr2; i2++) { - for (int k2 = 0; k2 < ne02; k2++) { - for (int i1 = 0; i1 < nr1; i1++) { - for (int k1 = 0; k1 < ne01; k1++) { - for (int i0 = 0; i0 < nr0; i0++) { - CUDA_CHECK(cudaMemcpyAsync( - (char *) dst_d + (i3*ne03 + k3)*nb3 + (i2*ne02 + k2)*nb2 + (i1*ne01 + k1)*nb1 + (i0*ne00)*nb0, - (const char *) src0_d + ( k3)*nb03 + ( k2)*nb02 + ( k1)*nb01, - ne00*nb0, cudaMemcpyDeviceToDevice, stream)); - } - } - } - } - } - } - } - - (void) src1; - (void) src1_d; -} - static void ggml_cuda_op_get_rows( const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, const float * src0_d, const float * src1_d, float * dst_d, const cudaStream_t & stream) { GGML_ASSERT(src1->type == GGML_TYPE_I32); GGML_ASSERT(dst->type == GGML_TYPE_F32); - GGML_ASSERT(ggml_is_contiguous(src0)); - GGML_ASSERT(ggml_is_contiguous(src1)); - GGML_ASSERT(ggml_is_contiguous(dst)); - const int ncols = src0->ne[0]; - const int nrows = ggml_nelements(src1); + GGML_ASSERT(src0->nb[0] == ggml_type_size(src0->type)); + GGML_ASSERT(src1->nb[0] == ggml_type_size(src1->type)); + GGML_ASSERT(dst->nb[0] == ggml_type_size(dst->type)); const int32_t * src1_i32 = (const int32_t *) src1_d; switch (src0->type) { case GGML_TYPE_F16: - get_rows_cuda<1, 1, convert_f16>(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda_float(src0, src1, dst, (const half *)src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_F32: - get_rows_cuda<1, 1, convert_f32>(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda_float(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_Q4_0: - get_rows_cuda(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_Q4_1: - get_rows_cuda(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_Q5_0: - get_rows_cuda(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_Q5_1: - get_rows_cuda(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; case GGML_TYPE_Q8_0: - get_rows_cuda(src0_d, src1_i32, dst_d, nrows, ncols, stream); + get_rows_cuda(src0, src1, dst, src0_d, src1_i32, dst_d, stream); break; default: // TODO: k-quants @@ -6138,44 +6578,55 @@ static void ggml_cuda_op_get_rows( } } -inline void ggml_cuda_op_add( +template +inline void ggml_cuda_op_bin_bcast( const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { GGML_ASSERT(src1->type == GGML_TYPE_F32); - const int64_t ne10 = src1->ne[0]; - const int64_t ne11 = src1->ne[1]; - if (src0->type == GGML_TYPE_F32 && dst->type == GGML_TYPE_F32) { - add_f32_cuda(src0_dd, src1_dd, dst_dd, ggml_nelements(src0), ne10*ne11, main_stream); + op()(src0, src1, dst, src0_dd, src1_dd, dst_dd, main_stream); } else if (src0->type == GGML_TYPE_F16 && dst->type == GGML_TYPE_F16) { - add_f16_f32_f16_cuda((const half *) src0_dd, src1_dd, (half *) dst_dd, ggml_nelements(src0), main_stream); + op()(src0, src1, dst, (const half *) src0_dd, src1_dd, (half *) dst_dd, main_stream); } else if (src0->type == GGML_TYPE_F16 && dst->type == GGML_TYPE_F32) { - add_f16_f32_f32_cuda((const half *) src0_dd, src1_dd, dst_dd, ggml_nelements(src0), main_stream); + op()(src0, src1, dst, (const half *) src0_dd, src1_dd, dst_dd, main_stream); } else { - fprintf(stderr, "src0->type: %d dst->type: %d\n", src0->type, dst->type); + fprintf(stderr, "%s: unsupported types: dst: %s, src0: %s, src1: %s\n", __func__, + ggml_type_name(dst->type), ggml_type_name(src0->type), ggml_type_name(src1->type)); GGML_ASSERT(false); } +} + +static void ggml_cuda_op_repeat( + const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const float * src0_d, const float * src1_d, float * dst_d, const cudaStream_t & main_stream) { + + ggml_cuda_op_bin_bcast>(dst, src0, dst, nullptr, src0_d, dst_d, main_stream); (void) src1; - (void) dst; + (void) src1_d; } -inline void ggml_cuda_op_mul( +inline void ggml_cuda_op_add( const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { - GGML_ASSERT(src0->type == GGML_TYPE_F32); - GGML_ASSERT(src1->type == GGML_TYPE_F32); - GGML_ASSERT( dst->type == GGML_TYPE_F32); + ggml_cuda_op_bin_bcast>(src0, src1, dst, src0_dd, src1_dd, dst_dd, main_stream); +} - const int64_t ne10 = src1->ne[0]; - const int64_t ne11 = src1->ne[1]; +inline void ggml_cuda_op_mul( + const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { - mul_f32_cuda(src0_dd, src1_dd, dst_dd, ggml_nelements(src0), ne10*ne11, main_stream); + ggml_cuda_op_bin_bcast>(src0, src1, dst, src0_dd, src1_dd, dst_dd, main_stream); +} - (void) dst; +inline void ggml_cuda_op_div( + const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { + + ggml_cuda_op_bin_bcast>(src0, src1, dst, src0_dd, src1_dd, dst_dd, main_stream); } inline void ggml_cuda_op_gelu( @@ -6244,7 +6695,10 @@ inline void ggml_cuda_op_norm( const int64_t ne00 = src0->ne[0]; const int64_t nrows = ggml_nrows(src0); - norm_f32_cuda(src0_dd, dst_dd, ne00, nrows, main_stream); + float eps; + memcpy(&eps, dst->op_params, sizeof(float)); + + norm_f32_cuda(src0_dd, dst_dd, ne00, nrows, eps, main_stream); (void) src1; (void) dst; @@ -6356,6 +6810,7 @@ static int64_t get_row_rounding(ggml_type type) { case GGML_TYPE_Q8_0: return max_compute_capability >= CC_RDNA2 ? 128 : 64; case GGML_TYPE_F16: + case GGML_TYPE_F32: return 1; case GGML_TYPE_Q2_K: return max_compute_capability >= CC_RDNA2 ? 128 : 32; @@ -6378,6 +6833,7 @@ static int64_t get_row_rounding(ggml_type type) { case GGML_TYPE_Q8_0: return 64; case GGML_TYPE_F16: + case GGML_TYPE_F32: return 1; case GGML_TYPE_Q2_K: case GGML_TYPE_Q3_K: @@ -6397,6 +6853,8 @@ inline void ggml_cuda_op_mul_mat_vec_q( const char * src1_ddq_i, float * dst_dd_i, const int64_t row_low, const int64_t row_high, const int64_t src1_ncols, const int64_t src1_padded_row_size, const cudaStream_t & stream) { + GGML_ASSERT(ggml_nrows(src1) == 1); + const int64_t ne00 = src0->ne[0]; const int64_t row_diff = row_high - row_low; @@ -6456,7 +6914,8 @@ inline void ggml_cuda_op_dequantize_mul_mat_vec( size_t ash; dfloat * src1_dfloat = nullptr; // dfloat == half - bool src1_convert_f16 = src0->type == GGML_TYPE_Q4_0 || src0->type == GGML_TYPE_Q4_1 || + bool src1_convert_f16 = + src0->type == GGML_TYPE_Q4_0 || src0->type == GGML_TYPE_Q4_1 || src0->type == GGML_TYPE_Q5_0 || src0->type == GGML_TYPE_Q5_1 || src0->type == GGML_TYPE_Q8_0 || src0->type == GGML_TYPE_F16; @@ -6678,15 +7137,14 @@ inline void ggml_cuda_op_rope( GGML_ASSERT(false); rope_glm_f32_cuda(src0_dd, dst_dd, ne00, nrows, pos, freq_scale, ne01, freq_base, n_ctx, main_stream); } else if (is_neox) { - GGML_ASSERT(ne00 == n_dims && "ne00 != n_dims is not implemented for CUDA yet"); if (src0->type == GGML_TYPE_F32) { rope_neox_cuda( - (const float *)src0_dd, (float *)dst_dd, ne00, nrows, pos, freq_scale, ne01, freq_base, ext_factor, + (const float *)src0_dd, (float *)dst_dd, ne00, n_dims, nrows, pos, freq_scale, ne01, freq_base, ext_factor, attn_factor, corr_dims, main_stream ); } else if (src0->type == GGML_TYPE_F16) { rope_neox_cuda( - (const half *)src0_dd, (half *)dst_dd, ne00, nrows, pos, freq_scale, ne01, freq_base, ext_factor, + (const half *)src0_dd, (half *)dst_dd, ne00, n_dims, nrows, pos, freq_scale, ne01, freq_base, ext_factor, attn_factor, corr_dims, main_stream ); } else { @@ -6783,6 +7241,42 @@ inline void ggml_cuda_op_im2col( (void) src0_dd; } +inline void ggml_cuda_op_sum_rows( + const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { + + GGML_ASSERT(src0->type == GGML_TYPE_F32); + GGML_ASSERT( dst->type == GGML_TYPE_F32); + + const int64_t ncols = src0->ne[0]; + const int64_t nrows = ggml_nrows(src0); + + sum_rows_f32_cuda(src0_dd, dst_dd, ncols, nrows, main_stream); + + (void) src1; + (void) dst; + (void) src1_dd; +} + +inline void ggml_cuda_op_argsort( + const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, + const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { + + GGML_ASSERT(src0->type == GGML_TYPE_F32); + GGML_ASSERT( dst->type == GGML_TYPE_I32); + + const int64_t ncols = src0->ne[0]; + const int64_t nrows = ggml_nrows(src0); + + enum ggml_sort_order order = (enum ggml_sort_order) dst->op_params[0]; + + argsort_f32_i32_cuda(src0_dd, (int *)dst_dd, ncols, nrows, order, main_stream); + + (void) src1; + (void) dst; + (void) src1_dd; +} + inline void ggml_cuda_op_diag_mask_inf( const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst, const float * src0_dd, const float * src1_dd, float * dst_dd, const cudaStream_t & main_stream) { @@ -6810,14 +7304,18 @@ inline void ggml_cuda_op_soft_max( GGML_ASSERT(src0->type == GGML_TYPE_F32); GGML_ASSERT( dst->type == GGML_TYPE_F32); + GGML_ASSERT(!src1 || src1->type == GGML_TYPE_F32); // src1 contains mask and it is optional + const int64_t ne00 = src0->ne[0]; - const int64_t nrows = ggml_nrows(src0); + const int64_t nrows_x = ggml_nrows(src0); + const int64_t nrows_y = src1 ? ggml_nrows(src1) : 1; - soft_max_f32_cuda(src0_dd, dst_dd, ne00, nrows, main_stream); + float scale = 1.0f; + memcpy(&scale, dst->op_params, sizeof(float)); + + soft_max_f32_cuda(src0_dd, src1 ? src1_dd : nullptr, dst_dd, ne00, nrows_x, nrows_y, scale, main_stream); - (void) src1; (void) dst; - (void) src1_dd; } inline void ggml_cuda_op_scale( @@ -7023,10 +7521,9 @@ static void ggml_cuda_op_mul_mat( const bool src0_on_device = src0->backend == GGML_BACKEND_GPU || src0->backend == GGML_BACKEND_GPU_SPLIT; const bool src0_is_contiguous = ggml_is_contiguous(src0); - const bool src1_is_contiguous = ggml_is_contiguous(src1); - const int64_t src1_padded_col_size = ne10 % MATRIX_ROW_PADDING == 0 ? - ne10 : ne10 - ne10 % MATRIX_ROW_PADDING + MATRIX_ROW_PADDING; + + const int64_t src1_padded_col_size = GGML_PAD(ne10, MATRIX_ROW_PADDING); const bool split = src0->backend == GGML_BACKEND_GPU_SPLIT; GGML_ASSERT(!(split && ne02 > 1)); @@ -7088,7 +7585,7 @@ static void ggml_cuda_op_mul_mat( if (src0_on_device && src0_is_contiguous) { src0_dd[id] = (char *) src0_extra->data_device[id]; } else { - const size_t size_src0_ddq = split ? (row_high[id]-row_low[id])*ne00 * src0_ts/src0_bs : ggml_nbytes(src0); + // const size_t size_src0_ddq = split ? (row_high[id]-row_low[id])*ne00 * src0_ts/src0_bs : ggml_nbytes(src0); src0_dd[id] = (char *) ggml_cuda_pool_malloc(ggml_nbytes(src0), &src0_as[id]); } @@ -7151,7 +7648,7 @@ static void ggml_cuda_op_mul_mat( const size_t src1_ddq_i_offset = (i0*ne11 + src1_col_0) * src1_padded_col_size*q8_1_ts/q8_1_bs; // for split tensors the data begins at i0 == i0_offset_low - char * src0_dd_i = src0_dd[id] + (i0/i02_divisor) * ne01*ne00*src0_ts/src0_bs; + char * src0_dd_i = src0_dd[id] + (i0/i02_divisor) * (ne01*ne00*src0_ts)/src0_bs; float * src1_ddf_i = src1_ddf[id] + (i0*ne11 + src1_col_0) * ne10; char * src1_ddq_i = src1_ddq[id] + src1_ddq_i_offset; float * dst_dd_i = dst_dd[id] + (i0*ne1 + src1_col_0) * (dst_on_device ? ne0 : row_diff); @@ -7296,6 +7793,10 @@ static void ggml_cuda_mul(const ggml_tensor * src0, const ggml_tensor * src1, gg ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_mul); } +static void ggml_cuda_div(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_div); +} + static void ggml_cuda_gelu(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_gelu); } @@ -7399,7 +7900,7 @@ static void ggml_cuda_mul_mat_vec_nc(const ggml_tensor * src0, const ggml_tensor ggml_mul_mat_vec_nc_f16_f32_cuda(src0_ddq, src1_ddf, dst_ddf, ne00, ne01, row_stride_x, ne02, ne12, channel_stride_x, main_stream); } -__global__ void k_compute_batched_ptrs( +static __global__ void k_compute_batched_ptrs( const half * src0_as_f16, const half * src1_as_f16, half * dst_f16, const void ** ptrs_src, void ** ptrs_dst, int ne12, int ne13, @@ -7455,9 +7956,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const CUDA_CHECK(ggml_cuda_set_device(g_main_device)); cudaStream_t main_stream = g_cudaStreams[g_main_device][0]; - int id; - CUDA_CHECK(cudaGetDevice(&id)); - CUBLAS_CHECK(cublasSetStream(g_cublas_handles[id], main_stream)); + CUBLAS_CHECK(cublasSetStream(g_cublas_handles[g_main_device], main_stream)); ggml_tensor_extra_gpu * src0_extra = (ggml_tensor_extra_gpu *) src0->extra; void * src0_ddq = src0_extra->data_device[g_main_device]; @@ -7514,7 +8013,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const // there is no broadcast and src0, src1 are contiguous across dims 2, 3 // use cublasGemmStridedBatchedEx CUBLAS_CHECK( - cublasGemmStridedBatchedEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N, + cublasGemmStridedBatchedEx(g_cublas_handles[g_main_device], CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, &alpha_f16, (const char *) src0_as_f16, CUDA_R_16F, nb01/sizeof(half), src0->nb[2]/sizeof(half), // strideA (const char *) src1_as_f16, CUDA_R_16F, nb11/sizeof(float), src1->nb[2]/sizeof(float), // strideB @@ -7548,7 +8047,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const CUDA_CHECK(cudaGetLastError()); CUBLAS_CHECK( - cublasGemmBatchedEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N, + cublasGemmBatchedEx(g_cublas_handles[g_main_device], CUBLAS_OP_T, CUBLAS_OP_N, ne01, ne11, ne10, &alpha_f16, (const void **) (ptrs_src + 0*ne23), CUDA_R_16F, nb01/sizeof(half), (const void **) (ptrs_src + 1*ne23), CUDA_R_16F, nb11/sizeof(float), @@ -7618,10 +8117,11 @@ static void ggml_cuda_mul_mat(const ggml_tensor * src0, const ggml_tensor * src1 #ifdef GGML_CUDA_FORCE_DMMV const bool use_mul_mat_vec_q = false; #else - const bool use_mul_mat_vec_q = min_compute_capability >= MIN_CC_DP4A && ggml_is_quantized(src0->type); + const bool use_mul_mat_vec_q = min_compute_capability >= MIN_CC_DP4A && ggml_is_quantized(src0->type) && ggml_nrows(src1) == 1; #endif // GGML_CUDA_FORCE_DMMV if (use_mul_mat_vec_q) { + // NOTE: this kernel does not support ggml_nrows(src1) > 1 ggml_cuda_op_mul_mat(src0, src1, dst, ggml_cuda_op_mul_mat_vec_q, true); } else { ggml_cuda_op_mul_mat(src0, src1, dst, ggml_cuda_op_dequantize_mul_mat_vec, false); @@ -7646,16 +8146,262 @@ static void ggml_cuda_mul_mat(const ggml_tensor * src0, const ggml_tensor * src1 } } -static void ggml_cuda_scale(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { - ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_scale); -} +#if 0 +template +static __global__ void k_compute_batched_ptrs_id( + const void ** ptrs_src, void ** ptrs_dst, + int ne12, int ne13, + int ne23, + int nb02, int nb03, + int nb12, int nb13, + int nb2, int nb3, + int r2, int r3, + ggml_type src0_type, half * src0_as_f16, int64_t src0_ne, + const half * src1_f16, half * dst_f16, + const int32_t * ids, const int id, + Srcs... src0s) { + + int i = ids[id]; + + half * src0_f16; + const void * srcs_ar[] = { (const half *) src0s... }; + if (src0_type == GGML_TYPE_F16) { + src0_f16 = (half *) srcs_ar[i]; + } else { + src0_f16 = src0_as_f16; + if (threadIdx.x == 0 && threadIdx.y == 0) { + const to_fp16_cuda_t to_fp16 = ggml_get_to_fp16_cuda(src0_type); + to_fp16(srcs_ar[i], src0_f16, src0_ne, cudaStreamFireAndForget); + } + } -static void ggml_cuda_clamp(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { - ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_clamp); -} + int i13 = blockIdx.x * blockDim.x + threadIdx.x; + int i12 = blockIdx.y * blockDim.y + threadIdx.y; -static void ggml_cuda_cpy(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { - const int64_t ne = ggml_nelements(src0); + if (i13 >= ne13 || i12 >= ne12) { + return; + } + + int i03 = i13 / r3; + int i02 = i12 / r2; + + ptrs_src[0*ne23 + i12 + i13*ne12] = (const char *) src0_f16 + i02*nb02 + i03*nb03; + ptrs_src[1*ne23 + i12 + i13*ne12] = (const char *) src1_f16 + i12*nb12/2 + i13*nb13/2; + ptrs_dst[0*ne23 + i12 + i13*ne12] = ( char *) dst_f16 + i12* nb2/2 + i13* nb3/2; +} + +static void ggml_cuda_mul_mat_id_cublas(ggml_tensor * dst) { + const struct ggml_tensor * ids = dst->src[0]; + const struct ggml_tensor * src1 = dst->src[1]; + const struct ggml_tensor * src00 = dst->src[2]; + + const int id = dst->op_params[0]; + + GGML_ASSERT(!ggml_is_transposed(src00)); + GGML_ASSERT(!ggml_is_transposed(src1)); + + GGML_ASSERT(src00->backend != GGML_BACKEND_GPU_SPLIT); + GGML_ASSERT(src1->type == GGML_TYPE_F32); + + const int64_t ne00 = src00->ne[0]; GGML_UNUSED(ne00); + const int64_t ne01 = src00->ne[1]; + const int64_t ne02 = src00->ne[2]; + const int64_t ne03 = src00->ne[3]; + + //const int64_t nb01 = src00->nb[1]; + const int64_t nb02 = src00->nb[2]; GGML_UNUSED(nb02); + const int64_t nb03 = src00->nb[3]; GGML_UNUSED(nb03); + + const int64_t ne10 = src1->ne[0]; + const int64_t ne11 = src1->ne[1]; + const int64_t ne12 = src1->ne[2]; + const int64_t ne13 = src1->ne[3]; + + //const int64_t nb11 = src1->nb[1]; + const int64_t nb12 = src1->nb[2]; GGML_UNUSED(nb12); + const int64_t nb13 = src1->nb[3]; GGML_UNUSED(nb13); + + const int64_t ne1 = ggml_nelements(src1); + const int64_t ne = ggml_nelements(dst); + + CUDA_CHECK(ggml_cuda_set_device(g_main_device)); + cudaStream_t main_stream = g_cudaStreams[g_main_device][0]; + + CUBLAS_CHECK(cublasSetStream(g_cublas_handles[g_main_device], main_stream)); + + //ggml_tensor_extra_gpu * src0_extra = (ggml_tensor_extra_gpu *) src0->extra; + //void * src0_ddq = src0_extra->data_device[g_main_device]; + //half * src0_as_f16 = (half *) src0_ddq; + + ggml_tensor_extra_gpu * src1_extra = (ggml_tensor_extra_gpu *) src1->extra; + float * src1_ddf = (float *) src1_extra->data_device[g_main_device]; + + ggml_tensor_extra_gpu * dst_extra = (ggml_tensor_extra_gpu *) dst->extra; + float * dst_ddf = (float *) dst_extra->data_device[g_main_device]; + + // convert src1 to fp16 + const to_fp16_cuda_t to_fp16_cuda = ggml_get_to_fp16_cuda(src1->type); + GGML_ASSERT(to_fp16_cuda != nullptr); + + size_t src1_as = 0; + half * src1_as_f16 = (half *) ggml_cuda_pool_malloc(ne1 * sizeof(half), &src1_as); + to_fp16_cuda(src1_ddf, src1_as_f16, ne1, main_stream); + + size_t dst_as = 0; + half * dst_f16 = (half *) ggml_cuda_pool_malloc(ne * sizeof(half), &dst_as); + + GGML_ASSERT(ne12 % ne02 == 0); + GGML_ASSERT(ne13 % ne03 == 0); + + // broadcast factors + const int64_t r2 = ne12/ne02; + const int64_t r3 = ne13/ne03; + + const half alpha_f16 = 1.0f; + const half beta_f16 = 0.0f; + + // use cublasGemmBatchedEx + const int ne23 = ne12*ne13; + + const void ** ptrs_src = nullptr; + void ** ptrs_dst = nullptr; + + size_t ptrs_src_s = 0; + size_t ptrs_dst_s = 0; + + ptrs_src = (const void **) ggml_cuda_pool_malloc(2*ne23*sizeof(void *), &ptrs_src_s); + ptrs_dst = ( void **) ggml_cuda_pool_malloc(1*ne23*sizeof(void *), &ptrs_dst_s); + + int64_t src0_ne = ggml_nelements(src00); + half * src0_as_f16 = nullptr; + size_t src0_as = 0; + if (src00->type != GGML_TYPE_F16) { + src0_as_f16 = (half *) ggml_cuda_pool_malloc(src0_ne * sizeof(half), &src0_as); + } + + static_assert(GGML_MAX_SRC == 6, "GGML_MAX_SRC == 6"); + dim3 block_dims(ne13, ne12); + k_compute_batched_ptrs_id<<<1, block_dims, 0, main_stream>>>( + ptrs_src, ptrs_dst, + ne12, ne13, + ne23, + ne00*ne01*sizeof(half), ne00*ne01*ne02*sizeof(half), + nb12, nb13, + dst->nb[2], dst->nb[3], + r2, r3, + src00->type, src0_as_f16, src0_ne, + src1_as_f16, dst_f16, + (const int *)((ggml_tensor_extra_gpu *)ids->extra)->data_device[g_main_device], id, + dst->src[2] ? (const half *)((ggml_tensor_extra_gpu *)dst->src[2]->extra)->data_device[g_main_device] : nullptr, + dst->src[3] ? (const half *)((ggml_tensor_extra_gpu *)dst->src[3]->extra)->data_device[g_main_device] : nullptr, + dst->src[4] ? (const half *)((ggml_tensor_extra_gpu *)dst->src[4]->extra)->data_device[g_main_device] : nullptr, + dst->src[5] ? (const half *)((ggml_tensor_extra_gpu *)dst->src[5]->extra)->data_device[g_main_device] : nullptr + ); + CUDA_CHECK(cudaGetLastError()); + + CUBLAS_CHECK( + cublasGemmBatchedEx(g_cublas_handles[g_main_device], CUBLAS_OP_T, CUBLAS_OP_N, + ne01, ne11, ne10, + &alpha_f16, (const void **) (ptrs_src + 0*ne23), CUDA_R_16F, ne00, + (const void **) (ptrs_src + 1*ne23), CUDA_R_16F, ne10, + &beta_f16, ( void **) (ptrs_dst + 0*ne23), CUDA_R_16F, ne01, + ne23, + CUBLAS_COMPUTE_16F, + CUBLAS_GEMM_DEFAULT_TENSOR_OP)); + + if (src0_as != 0) { + ggml_cuda_pool_free(src0_as_f16, src0_as); + } + if (ptrs_src_s != 0) { + ggml_cuda_pool_free(ptrs_src, ptrs_src_s); + } + if (ptrs_dst_s != 0) { + ggml_cuda_pool_free(ptrs_dst, ptrs_dst_s); + } + + const to_fp32_cuda_t to_fp32_cuda = ggml_get_to_fp32_cuda(GGML_TYPE_F16); + to_fp32_cuda(dst_f16, dst_ddf, ne, main_stream); + + ggml_cuda_pool_free(src1_as_f16, src1_as); + ggml_cuda_pool_free(dst_f16, dst_as); +} +#endif + +static void ggml_cuda_mul_mat_id(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { +#if 0 + ggml_cuda_mul_mat_id_cublas(dst); + // TODO: mmq/mmv support +#endif + + GGML_ASSERT(dst->backend == GGML_BACKEND_GPU); + + const struct ggml_tensor * ids = src0; + const int32_t id = ((int32_t *) dst->op_params)[0]; + const int32_t n_as = ((int32_t *) dst->op_params)[1]; + + std::vector ids_host(ggml_nbytes(ids)); + + if (ids->backend == GGML_BACKEND_GPU) { + const char * ids_dev = (const char *)((const ggml_tensor_extra_gpu *)ids->extra)->data_device[g_main_device]; + CUDA_CHECK(cudaMemcpyAsync(ids_host.data(), ids_dev, ggml_nbytes(ids), cudaMemcpyDeviceToHost, g_cudaStreams[g_main_device][0])); + CUDA_CHECK(cudaStreamSynchronize(g_cudaStreams[g_main_device][0])); + } else { + memcpy(ids_host.data(), ids->data, ggml_nbytes(ids)); + } + + const ggml_tensor_extra_gpu * src1_extra = (const ggml_tensor_extra_gpu *) src1->extra; + const ggml_tensor_extra_gpu * dst_extra = (const ggml_tensor_extra_gpu *) dst->extra; + + ggml_tensor_extra_gpu src1_row_extra; + ggml_tensor_extra_gpu dst_row_extra; + + ggml_tensor src1_row = *src1; + ggml_tensor dst_row = *dst; + + src1_row.ne[1] = 1; + dst_row.ne[1] = 1; + + src1_row.nb[2] = src1_row.nb[1]; + dst_row.nb[2] = dst_row.nb[1]; + + src1_row.nb[3] = src1_row.nb[1]; + dst_row.nb[3] = dst_row.nb[1]; + + src1_row.extra = &src1_row_extra; + dst_row.extra = &dst_row_extra; + + + for (int64_t i01 = 0; i01 < ids->ne[1]; i01++) { + //int32_t row_id; + //CUDA_CHECK(cudaMemcpyAsync(&row_id, ids_dev + i01*ids->nb[1] + id*ids->nb[0], sizeof(int32_t), cudaMemcpyDeviceToHost, g_cudaStreams[g_main_device][0])); + //CUDA_CHECK(cudaStreamSynchronize(g_cudaStreams[g_main_device][0])); + + const int32_t row_id = *(const int32_t *) (ids_host.data() + i01*ids->nb[1] + id*ids->nb[0]); + + GGML_ASSERT(row_id >= 0 && row_id < n_as); + + const struct ggml_tensor * src0_row = dst->src[row_id + 2]; + + src1_row_extra.data_device[g_main_device] = (char *) src1_extra->data_device[g_main_device] + i01*src1->nb[1]; + src1_row.data = (char *) src1->data + i01*src1->nb[1]; + + dst_row_extra.data_device[g_main_device] = (char *) dst_extra->data_device[g_main_device] + i01*dst->nb[1]; + dst_row.data = (char *) dst->data + i01*dst->nb[1]; + + ggml_cuda_mul_mat(src0_row, &src1_row, &dst_row); + } +} + +static void ggml_cuda_scale(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_scale); +} + +static void ggml_cuda_clamp(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_clamp); +} + +static void ggml_cuda_cpy(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + const int64_t ne = ggml_nelements(src0); GGML_ASSERT(ne == ggml_nelements(src1)); GGML_ASSERT(src0->backend == GGML_BACKEND_GPU); @@ -7690,14 +8436,17 @@ static void ggml_cuda_cpy(const ggml_tensor * src0, const ggml_tensor * src1, gg char * src1_ddc = (char *) src1_extra->data_device[g_main_device]; if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_F32) { - ggml_cpy_f32_f32_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, - ne10, ne11, nb10, nb11, nb12, main_stream); + ggml_cpy_f32_f32_cuda (src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); } else if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_F16) { - ggml_cpy_f32_f16_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, - ne10, ne11, nb10, nb11, nb12, main_stream); + ggml_cpy_f32_f16_cuda (src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); + } else if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_Q8_0) { + ggml_cpy_f32_q8_0_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); + } else if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_Q4_0) { + ggml_cpy_f32_q4_0_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); + } else if (src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_Q4_1) { + ggml_cpy_f32_q4_1_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); } else if (src0->type == GGML_TYPE_F16 && src1->type == GGML_TYPE_F16) { - ggml_cpy_f16_f16_cuda(src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, - ne10, ne11, nb10, nb11, nb12, main_stream); + ggml_cpy_f16_f16_cuda (src0_ddc, src1_ddc, ne, ne00, ne01, nb00, nb01, nb02, ne10, ne11, nb10, nb11, nb12, main_stream); } else { fprintf(stderr, "%s: unsupported type combination (%s to %s)\n", __func__, ggml_type_name(src0->type), ggml_type_name(src1->type)); @@ -7708,6 +8457,7 @@ static void ggml_cuda_cpy(const ggml_tensor * src0, const ggml_tensor * src1, gg } static void ggml_cuda_dup(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + // TODO: why do we pass dst as src1 here? ggml_cuda_cpy(src0, dst, nullptr); (void) src1; } @@ -7733,6 +8483,16 @@ static void ggml_cuda_im2col(const ggml_tensor * src0, const ggml_tensor * src1, ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_im2col); } +static void ggml_cuda_sum_rows(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + GGML_ASSERT(ggml_is_contiguous(src0)); + ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_sum_rows); +} + +static void ggml_cuda_argsort(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { + GGML_ASSERT(ggml_is_contiguous(src0)); + ggml_cuda_op_flatten(src0, src1, dst, ggml_cuda_op_argsort); +} + static void ggml_cuda_nop(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) { (void) src0; (void) src1; @@ -7988,8 +8748,9 @@ void ggml_cuda_set_main_device(const int main_device) { main_device, g_device_count, g_main_device); return; } - g_main_device = main_device; - if (g_device_count > 1) { + + if (g_main_device != main_device && g_device_count > 1) { + g_main_device = main_device; cudaDeviceProp prop; CUDA_CHECK(cudaGetDeviceProperties(&prop, g_main_device)); fprintf(stderr, "%s: using device %d (%s) as main device\n", __func__, g_main_device, prop.name); @@ -8029,7 +8790,7 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ if (tensor->op == GGML_OP_MUL_MAT) { if (tensor->src[0]->ne[3] != tensor->src[1]->ne[3]) { #ifndef NDEBUG - fprintf(stderr, "%s: cannot compute %s: src0->ne[3] = %d, src1->ne[3] = %d - fallback to CPU\n", __func__, tensor->name, tensor->src[0]->ne[3], tensor->src[1]->ne[3]); + fprintf(stderr, "%s: cannot compute %s: src0->ne[3] = " PRId64 ", src1->ne[3] = " PRId64 " - fallback to CPU\n", __func__, tensor->name, tensor->src[0]->ne[3], tensor->src[1]->ne[3]); #endif return false; } @@ -8051,6 +8812,9 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ case GGML_OP_MUL: func = ggml_cuda_mul; break; + case GGML_OP_DIV: + func = ggml_cuda_div; + break; case GGML_OP_UNARY: switch (ggml_get_unary_op(tensor)) { case GGML_UNARY_OP_GELU: @@ -8064,7 +8828,8 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ break; default: return false; - } break; + } + break; case GGML_OP_NORM: func = ggml_cuda_norm; break; @@ -8077,6 +8842,12 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ } func = ggml_cuda_mul_mat; break; + case GGML_OP_MUL_MAT_ID: + if (!any_on_device && !ggml_cuda_can_mul_mat(tensor->src[2], tensor->src[1], tensor)) { + return false; + } + func = ggml_cuda_mul_mat_id; + break; case GGML_OP_SCALE: func = ggml_cuda_scale; break; @@ -8116,6 +8887,12 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ case GGML_OP_IM2COL: func = ggml_cuda_im2col; break; + case GGML_OP_SUM_ROWS: + func = ggml_cuda_sum_rows; + break; + case GGML_OP_ARGSORT: + func = ggml_cuda_argsort; + break; default: return false; } @@ -8132,7 +8909,9 @@ bool ggml_cuda_compute_forward(struct ggml_compute_params * params, struct ggml_ int ggml_cuda_get_device_count() { int device_count; - CUDA_CHECK(cudaGetDeviceCount(&device_count)); + if (cudaGetDeviceCount(&device_count) != cudaSuccess) { + return 0; + } return device_count; } @@ -8148,27 +8927,16 @@ void ggml_cuda_get_device_description(int device, char * description, size_t des #define UNUSED GGML_UNUSED -struct ggml_backend_context_cuda { -}; - -static const char * ggml_backend_cuda_name(ggml_backend_t backend) { - return GGML_CUDA_NAME; - - UNUSED(backend); -} - -static void ggml_backend_cuda_free(ggml_backend_t backend) { - ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; - delete cuda_ctx; - delete backend; -} +// cuda buffer struct ggml_backend_buffer_context_cuda { - void * device; - + int device; + void * dev_ptr = nullptr; ggml_tensor_extra_gpu * temp_tensor_extras = nullptr; size_t temp_tensor_extra_index = 0; + ggml_backend_buffer_context_cuda(int device, void * dev_ptr) : device(device), dev_ptr(dev_ptr) {} + ~ggml_backend_buffer_context_cuda() { delete[] temp_tensor_extras; } @@ -8189,41 +8957,20 @@ struct ggml_backend_buffer_context_cuda { static void ggml_backend_cuda_buffer_free_buffer(ggml_backend_buffer_t buffer) { ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context; - CUDA_CHECK(cudaFree(ctx->device)); + CUDA_CHECK(cudaFree(ctx->dev_ptr)); delete ctx; } static void * ggml_backend_cuda_buffer_get_base(ggml_backend_buffer_t buffer) { ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context; - return ctx->device; -} - -static size_t ggml_backend_cuda_buffer_get_alloc_size(ggml_backend_buffer_t buffer, ggml_tensor * tensor) { - int64_t row_low = 0; - int64_t row_high = ggml_nrows(tensor); - int64_t nrows_split = row_high - row_low; - - size_t size = ggml_nbytes_split(tensor, nrows_split); - - int64_t ne0 = tensor->ne[0]; - - if (ggml_is_quantized(tensor->type)) { - if (ne0 % MATRIX_ROW_PADDING != 0) { - size += (MATRIX_ROW_PADDING - ne0 % MATRIX_ROW_PADDING) - * ggml_type_size(tensor->type)/ggml_blck_size(tensor->type); - } - } - - return size; - - UNUSED(buffer); + return ctx->dev_ptr; } static void ggml_backend_cuda_buffer_init_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor) { ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context; if (tensor->view_src != NULL && tensor->view_offs == 0) { - assert(tensor->view_src->buffer->backend == buffer->backend); + assert(tensor->view_src->buffer->buft == buffer->buft); // TODO tensor->backend = tensor->view_src->backend; tensor->extra = tensor->view_src->extra; return; @@ -8231,7 +8978,7 @@ static void ggml_backend_cuda_buffer_init_tensor(ggml_backend_buffer_t buffer, g ggml_tensor_extra_gpu * extra = ctx->ggml_cuda_alloc_temp_tensor_extra(); - extra->data_device[g_main_device] = tensor->data; + extra->data_device[ctx->device] = tensor->data; tensor->backend = GGML_BACKEND_GPU; tensor->extra = extra; @@ -8243,64 +8990,208 @@ static void ggml_backend_cuda_buffer_init_tensor(ggml_backend_buffer_t buffer, g int64_t nrows_split = row_high - row_low; size_t original_size = ggml_nbytes_split(tensor, nrows_split); - size_t padded_size = ggml_backend_cuda_buffer_get_alloc_size(tensor->buffer, tensor); + size_t padded_size = ggml_backend_buft_get_alloc_size(buffer->buft, tensor); if (padded_size > original_size && tensor->view_src == nullptr) { - CUDA_CHECK(cudaMemsetAsync((char *)tensor->data + original_size, 0, padded_size - original_size, g_cudaStreams[g_main_device][0])); + CUDA_CHECK(cudaMemsetAsync((char *)tensor->data + original_size, 0, padded_size - original_size, g_cudaStreams[ctx->device][0])); } } UNUSED(buffer); } +static void ggml_backend_cuda_buffer_set_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor, const void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU); + + CUDA_CHECK(cudaMemcpy((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice)); + + UNUSED(buffer); +} + +static void ggml_backend_cuda_buffer_get_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * tensor, void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU); + + CUDA_CHECK(cudaMemcpy(data, (const char *)tensor->data + offset, size, cudaMemcpyDeviceToHost)); + + UNUSED(buffer); +} + static struct ggml_backend_buffer_i cuda_backend_buffer_interface = { - /* .free_buffer = */ ggml_backend_cuda_buffer_free_buffer, - /* .get_base = */ ggml_backend_cuda_buffer_get_base, - /* .get_alloc_size = */ ggml_backend_cuda_buffer_get_alloc_size, - /* .init_tensor = */ ggml_backend_cuda_buffer_init_tensor, - /* .free_tensor = */ NULL, + /* .free_buffer = */ ggml_backend_cuda_buffer_free_buffer, + /* .get_base = */ ggml_backend_cuda_buffer_get_base, + /* .init_tensor = */ ggml_backend_cuda_buffer_init_tensor, + /* .set_tensor = */ ggml_backend_cuda_buffer_set_tensor, + /* .get_tensor = */ ggml_backend_cuda_buffer_get_tensor, + /* .cpy_tensor_from = */ NULL, + /* .cpy_tensor_to = */ NULL, }; -static ggml_backend_buffer_t ggml_backend_cuda_alloc_buffer(ggml_backend_t backend, size_t size) { - ggml_cuda_set_device(g_main_device); +// cuda buffer type - ggml_backend_buffer_context_cuda * ctx = new ggml_backend_buffer_context_cuda; +static ggml_backend_buffer_t ggml_backend_cuda_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { + int device = (int) (intptr_t) buft->context; + + ggml_cuda_set_device(device); size = std::max(size, (size_t)1); // cudaMalloc returns null for size 0 - ggml_cuda_set_device(g_main_device); - CUDA_CHECK(cudaMalloc(&ctx->device, size)); + void * dev_ptr; + CUDA_CHECK(cudaMalloc(&dev_ptr, size)); - return ggml_backend_buffer_init(backend, cuda_backend_buffer_interface, ctx, size); + ggml_backend_buffer_context_cuda * ctx = new ggml_backend_buffer_context_cuda(device, dev_ptr); + + return ggml_backend_buffer_init(buft, cuda_backend_buffer_interface, ctx, size); } -static size_t ggml_backend_cuda_get_alignment(ggml_backend_t backend) { +static size_t ggml_backend_cuda_buffer_type_get_alignment(ggml_backend_buffer_type_t buft) { return 128; + + UNUSED(buft); +} + +static size_t ggml_backend_cuda_buffer_type_get_alloc_size(ggml_backend_buffer_type_t buft, ggml_tensor * tensor) { + int64_t row_low = 0; + int64_t row_high = ggml_nrows(tensor); + int64_t nrows_split = row_high - row_low; + + size_t size = ggml_nbytes_split(tensor, nrows_split); + + int64_t ne0 = tensor->ne[0]; + + if (ggml_is_quantized(tensor->type)) { + if (ne0 % MATRIX_ROW_PADDING != 0) { + size += (MATRIX_ROW_PADDING - ne0 % MATRIX_ROW_PADDING) + * ggml_type_size(tensor->type)/ggml_blck_size(tensor->type); + } + } + + return size; + + UNUSED(buft); +} + +static bool ggml_backend_cuda_buffer_type_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend) { + return ggml_backend_is_cuda(backend); + + UNUSED(buft); +} + +static ggml_backend_buffer_type_i cuda_backend_buffer_type_interface = { + /* .alloc_buffer = */ ggml_backend_cuda_buffer_type_alloc_buffer, + /* .get_alignment = */ ggml_backend_cuda_buffer_type_get_alignment, + /* .get_alloc_size = */ ggml_backend_cuda_buffer_type_get_alloc_size, + /* .supports_backend = */ ggml_backend_cuda_buffer_type_supports_backend, +}; + +ggml_backend_buffer_type_t ggml_backend_cuda_buffer_type(int device) { + static struct ggml_backend_buffer_type ggml_backend_buffer_type_cuda[GGML_CUDA_MAX_DEVICES]; + static bool ggml_backend_buffer_type_cuda_initialized = false; + if (!ggml_backend_buffer_type_cuda_initialized) { + for (int i = 0; i < GGML_CUDA_MAX_DEVICES; i++) { + ggml_backend_buffer_type_cuda[i] = { + /* .iface = */ cuda_backend_buffer_type_interface, + /* .context = */ (ggml_backend_buffer_type_context_t) (intptr_t) i, + }; + } + ggml_backend_buffer_type_cuda_initialized = true; + } + + return &ggml_backend_buffer_type_cuda[device]; +} + +// host buffer type + +static void ggml_backend_cuda_host_buffer_free_buffer(ggml_backend_buffer_t buffer) { + ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context; + CUDA_CHECK(cudaFreeHost(ctx->dev_ptr)); + delete ctx; +} + +static ggml_backend_buffer_t ggml_backend_cuda_host_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { + void * ptr; + CUDA_CHECK(cudaMallocHost(&ptr, size)); + + // FIXME: this is a hack to avoid having to implement a new buffer type + ggml_backend_buffer_t buffer = ggml_backend_cpu_buffer_from_ptr(ptr, size); + buffer->buft = buft; + buffer->iface.free_buffer = ggml_backend_cuda_host_buffer_free_buffer; + + return buffer; + + UNUSED(buft); +} + +struct ggml_backend_buffer_type_i cuda_backend_host_buffer_type_interface = { + /* .alloc_buffer = */ ggml_backend_cuda_host_buffer_type_alloc_buffer, + /* .get_alignment = */ ggml_backend_cpu_buffer_type()->iface.get_alignment, + /* .get_alloc_size = */ ggml_backend_cpu_buffer_type()->iface.get_alloc_size, + /* .supports_backend = */ ggml_backend_cpu_buffer_type()->iface.supports_backend, +}; + +ggml_backend_buffer_type_t ggml_backend_cuda_host_buffer_type() { + static struct ggml_backend_buffer_type ggml_backend_buffer_type_cuda_host = { + /* .iface = */ cuda_backend_host_buffer_type_interface, + /* .context = */ nullptr, + }; + + return &ggml_backend_buffer_type_cuda_host; +} + +// backend + +struct ggml_backend_context_cuda { + int device; +}; + +static const char * ggml_backend_cuda_name(ggml_backend_t backend) { + return GGML_CUDA_NAME; + UNUSED(backend); } +static void ggml_backend_cuda_free(ggml_backend_t backend) { + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + delete cuda_ctx; + delete backend; +} + +static ggml_backend_buffer_type_t ggml_backend_cuda_get_default_buffer_type(ggml_backend_t backend) { + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + return ggml_backend_cuda_buffer_type(cuda_ctx->device); +} + static void ggml_backend_cuda_set_tensor_async(ggml_backend_t backend, ggml_tensor * tensor, const void * data, size_t offset, size_t size) { + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + GGML_ASSERT(tensor->buffer->buft == ggml_backend_cuda_buffer_type(cuda_ctx->device) && "unsupported buffer type"); GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU); - CUDA_CHECK(cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, g_cudaStreams[g_main_device][0])); - - UNUSED(backend); + CUDA_CHECK(cudaMemcpyAsync((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice, g_cudaStreams[cuda_ctx->device][0])); } static void ggml_backend_cuda_get_tensor_async(ggml_backend_t backend, const ggml_tensor * tensor, void * data, size_t offset, size_t size) { + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + GGML_ASSERT(tensor->buffer->buft == ggml_backend_cuda_buffer_type(cuda_ctx->device) && "unsupported buffer type"); GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU); - CUDA_CHECK(cudaMemcpyAsync(data, (const char *)tensor->data + offset, size, cudaMemcpyDeviceToHost, g_cudaStreams[g_main_device][0])); - - UNUSED(backend); + CUDA_CHECK(cudaMemcpyAsync(data, (const char *)tensor->data + offset, size, cudaMemcpyDeviceToHost, g_cudaStreams[cuda_ctx->device][0])); } static void ggml_backend_cuda_synchronize(ggml_backend_t backend) { - CUDA_CHECK(cudaStreamSynchronize(g_cudaStreams[g_main_device][0])); + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + CUDA_CHECK(cudaStreamSynchronize(g_cudaStreams[cuda_ctx->device][0])); UNUSED(backend); } @@ -8329,7 +9220,9 @@ static void ggml_backend_cuda_graph_plan_compute(ggml_backend_t backend, ggml_ba } static void ggml_backend_cuda_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) { - ggml_cuda_set_device(g_main_device); + ggml_backend_context_cuda * cuda_ctx = (ggml_backend_context_cuda *)backend->context; + + ggml_cuda_set_main_device(cuda_ctx->device); ggml_compute_params params = {}; params.type = GGML_TASK_COMPUTE; @@ -8339,10 +9232,16 @@ static void ggml_backend_cuda_graph_compute(ggml_backend_t backend, ggml_cgraph if (node->op == GGML_OP_RESHAPE || node->op == GGML_OP_TRANSPOSE || node->op == GGML_OP_VIEW || node->op == GGML_OP_PERMUTE) continue; + assert(node->backend == GGML_BACKEND_GPU); + assert(node->buffer->buft == ggml_backend_cuda_buffer_type(cuda_ctx->device)); + assert(node->extra != nullptr); + for (int j = 0; j < GGML_MAX_SRC; j++) { if (node->src[j] != nullptr) { assert(node->src[j]->backend == GGML_BACKEND_GPU); + assert(node->src[j]->buffer->buft == ggml_backend_cuda_buffer_type(cuda_ctx->device)); + assert(node->src[j]->extra != nullptr); } } @@ -8379,27 +9278,135 @@ static void ggml_backend_cuda_graph_compute(ggml_backend_t backend, ggml_cgraph UNUSED(backend); } +static bool ggml_backend_cuda_supports_op(ggml_backend_t backend, const ggml_tensor * op) { + switch (op->op) { + case GGML_OP_UNARY: + switch (ggml_get_unary_op(op)) { + case GGML_UNARY_OP_GELU: + case GGML_UNARY_OP_SILU: + case GGML_UNARY_OP_RELU: + return true; + default: + return false; + } + break; + case GGML_OP_MUL_MAT: + case GGML_OP_MUL_MAT_ID: + { + struct ggml_tensor * a; + struct ggml_tensor * b; + if (op->op == GGML_OP_MUL_MAT) { + a = op->src[0]; + b = op->src[1]; + } else { + a = op->src[2]; + b = op->src[1]; + } + if (a->ne[3] != b->ne[3]) { + return false; + } + return true; + } break; + case GGML_OP_GET_ROWS: + { + switch (op->src[0]->type) { + case GGML_TYPE_F16: + case GGML_TYPE_F32: + case GGML_TYPE_Q4_0: + case GGML_TYPE_Q4_1: + case GGML_TYPE_Q5_0: + case GGML_TYPE_Q5_1: + case GGML_TYPE_Q8_0: + return true; + default: + return false; + } + } break; + case GGML_OP_CPY: + { + ggml_type src0_type = op->src[0]->type; + ggml_type src1_type = op->src[1]->type; + if (src0_type == GGML_TYPE_F32 && src1_type == GGML_TYPE_F32) { + return true; + } + if (src0_type == GGML_TYPE_F32 && src1_type == GGML_TYPE_F16) { + return true; + } + if (src0_type == GGML_TYPE_F32 && src1_type == GGML_TYPE_Q8_0) { + return true; + } + if (src0_type == GGML_TYPE_F32 && src1_type == GGML_TYPE_Q4_0) { + return true; + } + if (src0_type == GGML_TYPE_F32 && src1_type == GGML_TYPE_Q4_1) { + return true; + } + if (src0_type == GGML_TYPE_F16 && src1_type == GGML_TYPE_F16) { + return true; + } + return false; + } break; + case GGML_OP_NONE: + case GGML_OP_RESHAPE: + case GGML_OP_VIEW: + case GGML_OP_PERMUTE: + case GGML_OP_TRANSPOSE: + case GGML_OP_NORM: + case GGML_OP_REPEAT: + case GGML_OP_DUP: + case GGML_OP_ADD: + case GGML_OP_MUL: + case GGML_OP_DIV: + case GGML_OP_RMS_NORM: + case GGML_OP_SCALE: + case GGML_OP_SQR: + case GGML_OP_CLAMP: + case GGML_OP_CONT: + case GGML_OP_DIAG_MASK_INF: + case GGML_OP_SOFT_MAX: + case GGML_OP_ROPE: + case GGML_OP_ALIBI: + case GGML_OP_IM2COL: + case GGML_OP_SUM_ROWS: + case GGML_OP_ARGSORT: + return true; + default: + return false; + } + + UNUSED(backend); +} + static ggml_backend_i cuda_backend_i = { - /* .get_name = */ ggml_backend_cuda_name, - /* .free = */ ggml_backend_cuda_free, - /* .alloc_buffer = */ ggml_backend_cuda_alloc_buffer, - /* .get_alignment = */ ggml_backend_cuda_get_alignment, - /* .set_tensor_async = */ ggml_backend_cuda_set_tensor_async, - /* .get_tensor_async = */ ggml_backend_cuda_get_tensor_async, - /* .synchronize = */ ggml_backend_cuda_synchronize, - /* .cpy_tensor_from = */ nullptr, - /* .cpy_tensor_to = */ nullptr, - /* .graph_plan_create = */ ggml_backend_cuda_graph_plan_create, - /* .graph_plan_free = */ ggml_backend_cuda_graph_plan_free, - /* .graph_plan_compute = */ ggml_backend_cuda_graph_plan_compute, - /* .graph_compute = */ ggml_backend_cuda_graph_compute, - /* .supports_op = */ nullptr, + /* .get_name = */ ggml_backend_cuda_name, + /* .free = */ ggml_backend_cuda_free, + /* .get_default_buffer_type = */ ggml_backend_cuda_get_default_buffer_type, + /* .set_tensor_async = */ ggml_backend_cuda_set_tensor_async, + /* .get_tensor_async = */ ggml_backend_cuda_get_tensor_async, + /* .cpy_tensor_from_async = */ NULL, + /* .cpy_tensor_to_async = */ NULL, + /* .synchronize = */ ggml_backend_cuda_synchronize, + /* .graph_plan_create = */ ggml_backend_cuda_graph_plan_create, + /* .graph_plan_free = */ ggml_backend_cuda_graph_plan_free, + /* .graph_plan_compute = */ ggml_backend_cuda_graph_plan_compute, + /* .graph_compute = */ ggml_backend_cuda_graph_compute, + /* .supports_op = */ ggml_backend_cuda_supports_op, }; -ggml_backend_t ggml_backend_cuda_init() { +ggml_backend_t ggml_backend_cuda_init(int device) { ggml_init_cublas(); // TODO: remove from ggml.c - ggml_backend_context_cuda * ctx = new ggml_backend_context_cuda; + if (device < 0 || device >= ggml_cuda_get_device_count()) { + fprintf(stderr, "%s: error: invalid device %d\n", __func__, device); + return nullptr; + } + + // not strictly necessary, but it may reduce the overhead of the first graph_compute + ggml_cuda_set_main_device(device); + + ggml_backend_context_cuda * ctx = new ggml_backend_context_cuda { + /* .device = */ device + }; ggml_backend_t cuda_backend = new ggml_backend { /* .interface = */ cuda_backend_i, @@ -8408,3 +9415,27 @@ ggml_backend_t ggml_backend_cuda_init() { return cuda_backend; } + +bool ggml_backend_is_cuda(ggml_backend_t backend) { + return backend->iface.get_name == ggml_backend_cuda_name; +} + +static ggml_backend_t ggml_backend_reg_cuda_init(const char * params, void * user_data) { + ggml_backend_t cuda_backend = ggml_backend_cuda_init((int) (intptr_t) user_data); + return cuda_backend; + + UNUSED(params); +} + +extern "C" int ggml_backend_cuda_reg_devices(); + +int ggml_backend_cuda_reg_devices() { + int device_count = ggml_cuda_get_device_count(); + //int device_count = 1; // DEBUG: some tools require delaying CUDA initialization + for (int i = 0; i < device_count; i++) { + char name[128]; + snprintf(name, sizeof(name), "%s%d", GGML_CUDA_NAME, i); + ggml_backend_register(name, ggml_backend_reg_cuda_init, ggml_backend_cuda_buffer_type(i), (void *) (intptr_t) i); + } + return device_count; +} diff --git a/ggml-cuda.h b/ggml-cuda.h index 528e66c33a207..cdb0c0c41618a 100644 --- a/ggml-cuda.h +++ b/ggml-cuda.h @@ -49,7 +49,15 @@ GGML_API int ggml_cuda_get_device_count(void); GGML_API void ggml_cuda_get_device_description(int device, char * description, size_t description_size); // backend API -GGML_API ggml_backend_t ggml_backend_cuda_init(void); // TODO: take a list of devices to use +GGML_API ggml_backend_t ggml_backend_cuda_init(int device); + +GGML_API bool ggml_backend_is_cuda(ggml_backend_t backend); +GGML_API int ggml_backend_cuda_get_device(ggml_backend_t backend); + +GGML_API ggml_backend_buffer_type_t ggml_backend_cuda_buffer_type(int device); + +// pinned host buffer for use with CPU backend for faster copies between CPU and GPU +GGML_API ggml_backend_buffer_type_t ggml_backend_cuda_host_buffer_type(void); #ifdef __cplusplus } diff --git a/ggml-impl.h b/ggml-impl.h index 06c07339e9269..1f5610a86cfd9 100644 --- a/ggml-impl.h +++ b/ggml-impl.h @@ -232,7 +232,7 @@ bool ggml_hash_contains (const struct ggml_hash_set hash_set, struct ggml // returns GGML_HASHTABLE_FULL if table is full, otherwise the current index of the key or where it should be inserted size_t ggml_hash_find (const struct ggml_hash_set hash_set, struct ggml_tensor * key); -// returns GGML_HAHSHTABLE_ALREADY_EXISTS if key already exists, index otherwise, asserts if table is full +// returns GGML_HASHTABLE_ALREADY_EXISTS if key already exists, index otherwise, asserts if table is full size_t ggml_hash_insert ( struct ggml_hash_set hash_set, struct ggml_tensor * key); // return index, asserts if table is full diff --git a/ggml-metal.h b/ggml-metal.h index be2731f8ba476..bf52d9cd34da4 100644 --- a/ggml-metal.h +++ b/ggml-metal.h @@ -99,6 +99,12 @@ GGML_API ggml_backend_t ggml_backend_metal_init(void); GGML_API bool ggml_backend_is_metal(ggml_backend_t backend); GGML_API void ggml_backend_metal_set_n_cb(ggml_backend_t backend, int n_cb); +GGML_API ggml_backend_buffer_type_t ggml_backend_metal_buffer_type(void); + +// helper to check if the device supports a specific family +// ideally, the user code should be doing these checks +// ref: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf +GGML_API bool ggml_backend_metal_supports_family(ggml_backend_t backend, int family); #ifdef __cplusplus } diff --git a/ggml-metal.m b/ggml-metal.m index 3d22b0b27e444..1dcfa6eddbfa5 100644 --- a/ggml-metal.m +++ b/ggml-metal.m @@ -62,6 +62,8 @@ GGML_METAL_DECL_KERNEL(add_row); // TODO: avoid this extra kernel, instead extend the "add" kernel to support broadcast GGML_METAL_DECL_KERNEL(mul); GGML_METAL_DECL_KERNEL(mul_row); // TODO: avoid this extra kernel, instead extend the "mul" kernel to support broadcast + GGML_METAL_DECL_KERNEL(div); + GGML_METAL_DECL_KERNEL(div_row); GGML_METAL_DECL_KERNEL(scale); GGML_METAL_DECL_KERNEL(scale_4); GGML_METAL_DECL_KERNEL(silu); @@ -100,6 +102,21 @@ GGML_METAL_DECL_KERNEL(mul_mv_q4_K_f32); GGML_METAL_DECL_KERNEL(mul_mv_q5_K_f32); GGML_METAL_DECL_KERNEL(mul_mv_q6_K_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_f32_f32); + //GGML_METAL_DECL_KERNEL(mul_mv_id_f16_f16); + GGML_METAL_DECL_KERNEL(mul_mv_id_f16_f32); + //GGML_METAL_DECL_KERNEL(mul_mv_id_f16_f32_1row); + //GGML_METAL_DECL_KERNEL(mul_mv_id_f16_f32_l4); + GGML_METAL_DECL_KERNEL(mul_mv_id_q4_0_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q4_1_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q5_0_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q5_1_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q8_0_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q2_K_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q3_K_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q4_K_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q5_K_f32); + GGML_METAL_DECL_KERNEL(mul_mv_id_q6_K_f32); GGML_METAL_DECL_KERNEL(mul_mm_f32_f32); GGML_METAL_DECL_KERNEL(mul_mm_f16_f32); GGML_METAL_DECL_KERNEL(mul_mm_q4_0_f32); @@ -112,15 +129,36 @@ GGML_METAL_DECL_KERNEL(mul_mm_q4_K_f32); GGML_METAL_DECL_KERNEL(mul_mm_q5_K_f32); GGML_METAL_DECL_KERNEL(mul_mm_q6_K_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_f32_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_f16_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q4_0_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q4_1_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q5_0_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q5_1_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q8_0_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q2_K_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q3_K_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q4_K_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q5_K_f32); + GGML_METAL_DECL_KERNEL(mul_mm_id_q6_K_f32); GGML_METAL_DECL_KERNEL(rope_f32); GGML_METAL_DECL_KERNEL(rope_f16); GGML_METAL_DECL_KERNEL(alibi_f32); GGML_METAL_DECL_KERNEL(im2col_f16); + GGML_METAL_DECL_KERNEL(argsort_f32_i32_asc); + GGML_METAL_DECL_KERNEL(argsort_f32_i32_desc); GGML_METAL_DECL_KERNEL(cpy_f32_f16); GGML_METAL_DECL_KERNEL(cpy_f32_f32); + GGML_METAL_DECL_KERNEL(cpy_f32_q8_0); + GGML_METAL_DECL_KERNEL(cpy_f32_q4_0); + GGML_METAL_DECL_KERNEL(cpy_f32_q4_1); + //GGML_METAL_DECL_KERNEL(cpy_f32_q5_0); + //GGML_METAL_DECL_KERNEL(cpy_f32_q5_1); GGML_METAL_DECL_KERNEL(cpy_f16_f16); + GGML_METAL_DECL_KERNEL(cpy_f16_f32); GGML_METAL_DECL_KERNEL(concat); GGML_METAL_DECL_KERNEL(sqr); + GGML_METAL_DECL_KERNEL(sum_rows); #undef GGML_METAL_DECL_KERNEL }; @@ -155,6 +193,8 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ ggml_metal_log_callback(level, buffer, ggml_metal_log_user_data); } else { char* buffer2 = malloc(len+1); + va_end(args); + va_start(args, format); vsnprintf(buffer2, len+1, format, args); buffer2[len] = 0; ggml_metal_log_callback(level, buffer2, ggml_metal_log_user_data); @@ -164,12 +204,10 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ } } - - struct ggml_metal_context * ggml_metal_init(int n_cb) { GGML_METAL_LOG_INFO("%s: allocating\n", __func__); - id device; + id device; NSString * s; #if TARGET_OS_OSX @@ -215,6 +253,9 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ NSString * sourcePath; NSString * ggmlMetalPathResources = [[NSProcessInfo processInfo].environment objectForKey:@"GGML_METAL_PATH_RESOURCES"]; + + GGML_METAL_LOG_INFO("%s: GGML_METAL_PATH_RESOURCES = %s\n", __func__, ggmlMetalPathResources ? [ggmlMetalPathResources UTF8String] : "nil"); + if (ggmlMetalPathResources) { sourcePath = [ggmlMetalPathResources stringByAppendingPathComponent:@"ggml-metal.metal"]; } else { @@ -245,6 +286,29 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ } } +#if TARGET_OS_OSX + // print MTL GPU family: + GGML_METAL_LOG_INFO("%s: GPU name: %s\n", __func__, [[ctx->device name] UTF8String]); + + // determine max supported GPU family + // https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf + // https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf + for (int i = MTLGPUFamilyApple1 + 20; i >= MTLGPUFamilyApple1; --i) { + if ([ctx->device supportsFamily:i]) { + GGML_METAL_LOG_INFO("%s: GPU family: MTLGPUFamilyApple%d (%d)\n", __func__, i - (int) MTLGPUFamilyApple1 + 1, i); + break; + } + } + + GGML_METAL_LOG_INFO("%s: hasUnifiedMemory = %s\n", __func__, ctx->device.hasUnifiedMemory ? "true" : "false"); + GGML_METAL_LOG_INFO("%s: recommendedMaxWorkingSetSize = %8.2f MB\n", __func__, ctx->device.recommendedMaxWorkingSetSize / 1e6); + if (ctx->device.maxTransferRate != 0) { + GGML_METAL_LOG_INFO("%s: maxTransferRate = %8.2f MB/s\n", __func__, ctx->device.maxTransferRate / 1e6); + } else { + GGML_METAL_LOG_INFO("%s: maxTransferRate = built-in GPU\n", __func__); + } +#endif + // load kernels { NSError * error = nil; @@ -266,6 +330,8 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ GGML_METAL_ADD_KERNEL(add_row); GGML_METAL_ADD_KERNEL(mul); GGML_METAL_ADD_KERNEL(mul_row); + GGML_METAL_ADD_KERNEL(div); + GGML_METAL_ADD_KERNEL(div_row); GGML_METAL_ADD_KERNEL(scale); GGML_METAL_ADD_KERNEL(scale_4); GGML_METAL_ADD_KERNEL(silu); @@ -304,6 +370,21 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ GGML_METAL_ADD_KERNEL(mul_mv_q4_K_f32); GGML_METAL_ADD_KERNEL(mul_mv_q5_K_f32); GGML_METAL_ADD_KERNEL(mul_mv_q6_K_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_f32_f32); + //GGML_METAL_ADD_KERNEL(mul_mv_id_f16_f16); + GGML_METAL_ADD_KERNEL(mul_mv_id_f16_f32); + //GGML_METAL_ADD_KERNEL(mul_mv_id_f16_f32_1row); + //GGML_METAL_ADD_KERNEL(mul_mv_id_f16_f32_l4); + GGML_METAL_ADD_KERNEL(mul_mv_id_q4_0_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q4_1_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q5_0_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q5_1_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q8_0_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q2_K_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q3_K_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q4_K_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q5_K_f32); + GGML_METAL_ADD_KERNEL(mul_mv_id_q6_K_f32); if ([ctx->device supportsFamily:MTLGPUFamilyApple7]) { GGML_METAL_ADD_KERNEL(mul_mm_f32_f32); GGML_METAL_ADD_KERNEL(mul_mm_f16_f32); @@ -317,43 +398,41 @@ static void ggml_metal_log(enum ggml_log_level level, const char * format, ...){ GGML_METAL_ADD_KERNEL(mul_mm_q4_K_f32); GGML_METAL_ADD_KERNEL(mul_mm_q5_K_f32); GGML_METAL_ADD_KERNEL(mul_mm_q6_K_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_f32_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_f16_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q4_0_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q4_1_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q5_0_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q5_1_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q8_0_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q2_K_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q3_K_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q4_K_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q5_K_f32); + GGML_METAL_ADD_KERNEL(mul_mm_id_q6_K_f32); } GGML_METAL_ADD_KERNEL(rope_f32); GGML_METAL_ADD_KERNEL(rope_f16); GGML_METAL_ADD_KERNEL(alibi_f32); GGML_METAL_ADD_KERNEL(im2col_f16); + GGML_METAL_ADD_KERNEL(argsort_f32_i32_asc); + GGML_METAL_ADD_KERNEL(argsort_f32_i32_desc); GGML_METAL_ADD_KERNEL(cpy_f32_f16); GGML_METAL_ADD_KERNEL(cpy_f32_f32); + GGML_METAL_ADD_KERNEL(cpy_f32_q8_0); + GGML_METAL_ADD_KERNEL(cpy_f32_q4_0); + GGML_METAL_ADD_KERNEL(cpy_f32_q4_1); + //GGML_METAL_ADD_KERNEL(cpy_f32_q5_0); + //GGML_METAL_ADD_KERNEL(cpy_f32_q5_1); GGML_METAL_ADD_KERNEL(cpy_f16_f16); + GGML_METAL_ADD_KERNEL(cpy_f16_f32); GGML_METAL_ADD_KERNEL(concat); GGML_METAL_ADD_KERNEL(sqr); + GGML_METAL_ADD_KERNEL(sum_rows); #undef GGML_METAL_ADD_KERNEL } -#if TARGET_OS_OSX - // print MTL GPU family: - GGML_METAL_LOG_INFO("%s: GPU name: %s\n", __func__, [[ctx->device name] UTF8String]); - - // determine max supported GPU family - // https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf - // https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf - for (int i = MTLGPUFamilyApple1 + 20; i >= MTLGPUFamilyApple1; --i) { - if ([ctx->device supportsFamily:i]) { - GGML_METAL_LOG_INFO("%s: GPU family: MTLGPUFamilyApple%d (%d)\n", __func__, i - (int) MTLGPUFamilyApple1 + 1, i); - break; - } - } - - GGML_METAL_LOG_INFO("%s: hasUnifiedMemory = %s\n", __func__, ctx->device.hasUnifiedMemory ? "true" : "false"); - GGML_METAL_LOG_INFO("%s: recommendedMaxWorkingSetSize = %8.2f MB\n", __func__, ctx->device.recommendedMaxWorkingSetSize / 1024.0 / 1024.0); - if (ctx->device.maxTransferRate != 0) { - GGML_METAL_LOG_INFO("%s: maxTransferRate = %8.2f MB/s\n", __func__, ctx->device.maxTransferRate / 1024.0 / 1024.0); - } else { - GGML_METAL_LOG_INFO("%s: maxTransferRate = built-in GPU\n", __func__); - } -#endif - return ctx; } @@ -367,6 +446,8 @@ void ggml_metal_free(struct ggml_metal_context * ctx) { GGML_METAL_DEL_KERNEL(add_row); GGML_METAL_DEL_KERNEL(mul); GGML_METAL_DEL_KERNEL(mul_row); + GGML_METAL_DEL_KERNEL(div); + GGML_METAL_DEL_KERNEL(div_row); GGML_METAL_DEL_KERNEL(scale); GGML_METAL_DEL_KERNEL(scale_4); GGML_METAL_DEL_KERNEL(silu); @@ -405,6 +486,21 @@ void ggml_metal_free(struct ggml_metal_context * ctx) { GGML_METAL_DEL_KERNEL(mul_mv_q4_K_f32); GGML_METAL_DEL_KERNEL(mul_mv_q5_K_f32); GGML_METAL_DEL_KERNEL(mul_mv_q6_K_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_f32_f32); + //GGML_METAL_DEL_KERNEL(mul_mv_id_f16_f16); + GGML_METAL_DEL_KERNEL(mul_mv_id_f16_f32); + //GGML_METAL_DEL_KERNEL(mul_mv_id_f16_f32_1row); + //GGML_METAL_DEL_KERNEL(mul_mv_id_f16_f32_l4); + GGML_METAL_DEL_KERNEL(mul_mv_id_q4_0_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q4_1_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q5_0_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q5_1_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q8_0_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q2_K_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q3_K_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q4_K_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q5_K_f32); + GGML_METAL_DEL_KERNEL(mul_mv_id_q6_K_f32); if ([ctx->device supportsFamily:MTLGPUFamilyApple7]) { GGML_METAL_DEL_KERNEL(mul_mm_f32_f32); GGML_METAL_DEL_KERNEL(mul_mm_f16_f32); @@ -418,16 +514,37 @@ void ggml_metal_free(struct ggml_metal_context * ctx) { GGML_METAL_DEL_KERNEL(mul_mm_q4_K_f32); GGML_METAL_DEL_KERNEL(mul_mm_q5_K_f32); GGML_METAL_DEL_KERNEL(mul_mm_q6_K_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_f32_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_f16_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q4_0_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q4_1_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q5_0_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q5_1_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q8_0_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q2_K_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q3_K_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q4_K_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q5_K_f32); + GGML_METAL_DEL_KERNEL(mul_mm_id_q6_K_f32); } GGML_METAL_DEL_KERNEL(rope_f32); GGML_METAL_DEL_KERNEL(rope_f16); GGML_METAL_DEL_KERNEL(alibi_f32); GGML_METAL_DEL_KERNEL(im2col_f16); + GGML_METAL_DEL_KERNEL(argsort_f32_i32_asc); + GGML_METAL_DEL_KERNEL(argsort_f32_i32_desc); GGML_METAL_DEL_KERNEL(cpy_f32_f16); GGML_METAL_DEL_KERNEL(cpy_f32_f32); + GGML_METAL_DEL_KERNEL(cpy_f32_q8_0); + GGML_METAL_DEL_KERNEL(cpy_f32_q4_0); + GGML_METAL_DEL_KERNEL(cpy_f32_q4_1); + //GGML_METAL_DEL_KERNEL(cpy_f32_q5_0); + //GGML_METAL_DEL_KERNEL(cpy_f32_q5_1); GGML_METAL_DEL_KERNEL(cpy_f16_f16); + GGML_METAL_DEL_KERNEL(cpy_f16_f32); GGML_METAL_DEL_KERNEL(concat); GGML_METAL_DEL_KERNEL(sqr); + GGML_METAL_DEL_KERNEL(sum_rows); #undef GGML_METAL_DEL_KERNEL @@ -471,6 +588,13 @@ int ggml_metal_if_optimized(struct ggml_metal_context * ctx) { return ctx->concur_list; } +// temporarily defined here for compatibility between ggml-backend and the old API +struct ggml_backend_metal_buffer_context { + void * data; + + id metal; +}; + // finds the Metal buffer that contains the tensor data on the GPU device // the assumption is that there is 1-to-1 mapping between the host and device memory buffers, so we can find the // Metal buffer based on the host memory pointer @@ -480,8 +604,17 @@ int ggml_metal_if_optimized(struct ggml_metal_context * ctx) { const int64_t tsize = ggml_nbytes(t); - if (t->buffer && t->buffer->backend && t->buffer->backend->context) { - ctx = t->buffer->backend->context; + // compatibility with ggml-backend + if (t->buffer && t->buffer->buft == ggml_backend_metal_buffer_type()) { + struct ggml_backend_metal_buffer_context * buf_ctx = (struct ggml_backend_metal_buffer_context *) t->buffer->context; + + const int64_t ioffs = (int64_t) t->data - (int64_t) buf_ctx->data; + + GGML_ASSERT(ioffs >= 0 && ioffs + tsize <= (int64_t) t->buffer->size); + + *offs = (size_t) ioffs; + + return buf_ctx->metal; } // find the view that contains the tensor fully @@ -541,11 +674,11 @@ bool ggml_metal_add_buffer( ctx->buffers[ctx->n_buffers].metal = [ctx->device newBufferWithBytesNoCopy:data length:size_aligned options:MTLResourceStorageModeShared deallocator:nil]; if (ctx->buffers[ctx->n_buffers].metal == nil) { - GGML_METAL_LOG_ERROR("%s: error: failed to allocate '%-16s' buffer, size = %8.2f MB\n", __func__, name, size_aligned / 1024.0 / 1024.0); + GGML_METAL_LOG_ERROR("%s: error: failed to allocate '%-16s' buffer, size = %8.2f MiB\n", __func__, name, size_aligned / 1024.0 / 1024.0); return false; } - GGML_METAL_LOG_INFO("%s: allocated '%-16s' buffer, size = %8.2f MB", __func__, name, size_aligned / 1024.0 / 1024.0); + GGML_METAL_LOG_INFO("%s: allocated '%-16s' buffer, size = %8.2f MiB", __func__, name, size_aligned / 1024.0 / 1024.0); ++ctx->n_buffers; } else { @@ -565,11 +698,11 @@ bool ggml_metal_add_buffer( ctx->buffers[ctx->n_buffers].metal = [ctx->device newBufferWithBytesNoCopy:(void *) ((uint8_t *) data + i) length:size_step_aligned options:MTLResourceStorageModeShared deallocator:nil]; if (ctx->buffers[ctx->n_buffers].metal == nil) { - GGML_METAL_LOG_ERROR("%s: error: failed to allocate '%-16s' buffer, size = %8.2f MB\n", __func__, name, size_step_aligned / 1024.0 / 1024.0); + GGML_METAL_LOG_ERROR("%s: error: failed to allocate '%-16s' buffer, size = %8.2f MiB\n", __func__, name, size_step_aligned / 1024.0 / 1024.0); return false; } - GGML_METAL_LOG_INFO("%s: allocated '%-16s' buffer, size = %8.2f MB, offs = %12ld", __func__, name, size_step_aligned / 1024.0 / 1024.0, i); + GGML_METAL_LOG_INFO("%s: allocated '%-16s' buffer, size = %8.2f MiB, offs = %12ld", __func__, name, size_step_aligned / 1024.0 / 1024.0, i); if (i + size_step < size) { GGML_METAL_LOG_INFO("\n"); } @@ -706,6 +839,76 @@ void ggml_metal_graph_find_concurrency( } } +static bool ggml_metal_supports_op(const struct ggml_tensor * op) { + switch (op->op) { + case GGML_OP_UNARY: + switch (ggml_get_unary_op(op)) { + case GGML_UNARY_OP_SILU: + case GGML_UNARY_OP_RELU: + case GGML_UNARY_OP_GELU: + return true; + default: + return false; + } + case GGML_OP_NONE: + case GGML_OP_RESHAPE: + case GGML_OP_VIEW: + case GGML_OP_PERMUTE: + case GGML_OP_TRANSPOSE: + case GGML_OP_GET_ROWS: + case GGML_OP_CONCAT: + case GGML_OP_ADD: + case GGML_OP_MUL: + case GGML_OP_DIV: + case GGML_OP_SCALE: + case GGML_OP_SQR: + case GGML_OP_SUM_ROWS: + case GGML_OP_SOFT_MAX: + case GGML_OP_RMS_NORM: + case GGML_OP_NORM: + case GGML_OP_ALIBI: + case GGML_OP_ROPE: + case GGML_OP_IM2COL: + case GGML_OP_ARGSORT: + case GGML_OP_MUL_MAT: + case GGML_OP_MUL_MAT_ID: + return true; + case GGML_OP_CPY: + case GGML_OP_DUP: + case GGML_OP_CONT: + { + switch (op->src[0]->type) { + case GGML_TYPE_F32: + switch (op->type) { + case GGML_TYPE_F16: + case GGML_TYPE_F32: + case GGML_TYPE_Q8_0: + case GGML_TYPE_Q4_0: + case GGML_TYPE_Q4_1: + return true; + default: + return false; + } + case GGML_TYPE_F16: + switch (op->type) { + case GGML_TYPE_F16: + case GGML_TYPE_F32: + return true; + default: + return false; + } + default: + return false; + }; + } + case GGML_OP_DIAG_MASK_INF: + { + return op->ne[0] % 4 == 0; + } + default: + return false; + } +} void ggml_metal_graph_compute( struct ggml_metal_context * ctx, struct ggml_cgraph * gf) { @@ -776,6 +979,8 @@ void ggml_metal_graph_compute( } break; } + GGML_ASSERT(ggml_metal_supports_op(dst)); + const int64_t ne00 = src0 ? src0->ne[0] : 0; const int64_t ne01 = src0 ? src0->ne[1] : 0; const int64_t ne02 = src0 ? src0->ne[2] : 0; @@ -868,25 +1073,40 @@ void ggml_metal_graph_compute( [encoder dispatchThreadgroups:MTLSizeMake(ne1, ne2, ne3) threadsPerThreadgroup:MTLSizeMake(nth, 1, 1)]; } break; case GGML_OP_ADD: + case GGML_OP_MUL: + case GGML_OP_DIV: { - GGML_ASSERT(ggml_is_contiguous(src0)); - GGML_ASSERT(ggml_is_contiguous(src1)); - bool bcast_row = false; int64_t nb = ne00; - if (ggml_nelements(src1) == ne10 && ne00 % 4 == 0) { + id pipeline = nil; + + if (ggml_nelements(src1) == ne10 && ggml_is_contiguous(src1) && ne00 % 4 == 0 && ne10 % 4 == 0) { + GGML_ASSERT(ggml_is_contiguous(src0)); + // src1 is a row GGML_ASSERT(ne11 == 1); nb = ne00 / 4; - [encoder setComputePipelineState:ctx->pipeline_add_row]; + switch (dst->op) { + case GGML_OP_ADD: pipeline = ctx->pipeline_add_row; break; + case GGML_OP_MUL: pipeline = ctx->pipeline_mul_row; break; + case GGML_OP_DIV: pipeline = ctx->pipeline_div_row; break; + default: GGML_ASSERT(false); + } bcast_row = true; } else { - [encoder setComputePipelineState:ctx->pipeline_add]; + switch (dst->op) { + case GGML_OP_ADD: pipeline = ctx->pipeline_add; break; + case GGML_OP_MUL: pipeline = ctx->pipeline_mul; break; + case GGML_OP_DIV: pipeline = ctx->pipeline_div; break; + default: GGML_ASSERT(false); + } } + + [encoder setComputePipelineState:pipeline]; [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; @@ -921,36 +1141,11 @@ void ggml_metal_graph_compute( [encoder dispatchThreadgroups:MTLSizeMake(n, 1, 1) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)]; } else { - const int nth = MIN(1024, ne0); + const int nth = MIN((int) pipeline.maxTotalThreadsPerThreadgroup, ne0); [encoder dispatchThreadgroups:MTLSizeMake(ne01, ne02, ne03) threadsPerThreadgroup:MTLSizeMake(nth, 1, 1)]; } } break; - case GGML_OP_MUL: - { - GGML_ASSERT(ggml_is_contiguous(src0)); - GGML_ASSERT(ggml_is_contiguous(src1)); - - // utilize float4 - GGML_ASSERT(ne00 % 4 == 0); - const int64_t nb = ne00/4; - - if (ggml_nelements(src1) == ne10) { - // src1 is a row - GGML_ASSERT(ne11 == 1); - [encoder setComputePipelineState:ctx->pipeline_mul_row]; - } else { - [encoder setComputePipelineState:ctx->pipeline_mul]; - } - [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; - [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; - [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; - [encoder setBytes:&nb length:sizeof(nb) atIndex:3]; - - const int64_t n = ggml_nelements(dst)/4; - - [encoder dispatchThreadgroups:MTLSizeMake(n, 1, 1) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)]; - } break; case GGML_OP_SCALE: { GGML_ASSERT(ggml_is_contiguous(src0)); @@ -1023,25 +1218,70 @@ void ggml_metal_graph_compute( const int64_t n = ggml_nelements(dst); [encoder dispatchThreadgroups:MTLSizeMake(n, 1, 1) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)]; } break; + case GGML_OP_SUM_ROWS: + { + GGML_ASSERT(src0->nb[0] == ggml_type_size(src0->type)); + + [encoder setComputePipelineState:ctx->pipeline_sum_rows]; + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:1]; + [encoder setBytes:&ne00 length:sizeof(ne00) atIndex:2]; + [encoder setBytes:&ne01 length:sizeof(ne01) atIndex:3]; + [encoder setBytes:&ne02 length:sizeof(ne02) atIndex:4]; + [encoder setBytes:&ne03 length:sizeof(ne03) atIndex:5]; + [encoder setBytes:&nb00 length:sizeof(nb00) atIndex:6]; + [encoder setBytes:&nb01 length:sizeof(nb01) atIndex:7]; + [encoder setBytes:&nb02 length:sizeof(nb02) atIndex:8]; + [encoder setBytes:&nb03 length:sizeof(nb03) atIndex:9]; + [encoder setBytes:&ne10 length:sizeof(ne10) atIndex:10]; + [encoder setBytes:&ne11 length:sizeof(ne11) atIndex:11]; + [encoder setBytes:&ne12 length:sizeof(ne12) atIndex:12]; + [encoder setBytes:&ne13 length:sizeof(ne13) atIndex:13]; + [encoder setBytes:&nb10 length:sizeof(nb10) atIndex:14]; + [encoder setBytes:&nb11 length:sizeof(nb11) atIndex:15]; + [encoder setBytes:&nb12 length:sizeof(nb12) atIndex:16]; + [encoder setBytes:&nb13 length:sizeof(nb13) atIndex:17]; + [encoder setBytes:&ne0 length:sizeof(ne0) atIndex:18]; + [encoder setBytes:&ne1 length:sizeof(ne1) atIndex:19]; + [encoder setBytes:&ne2 length:sizeof(ne2) atIndex:20]; + [encoder setBytes:&ne3 length:sizeof(ne3) atIndex:21]; + [encoder setBytes:&nb0 length:sizeof(nb0) atIndex:22]; + [encoder setBytes:&nb1 length:sizeof(nb1) atIndex:23]; + [encoder setBytes:&nb2 length:sizeof(nb2) atIndex:24]; + [encoder setBytes:&nb3 length:sizeof(nb3) atIndex:25]; + + [encoder dispatchThreadgroups:MTLSizeMake(ne01, ne02, ne03) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)]; + } break; case GGML_OP_SOFT_MAX: { int nth = 32; // SIMD width if (ne00%4 == 0) { + while (nth < ne00/4 && nth < 256) { + nth *= 2; + } [encoder setComputePipelineState:ctx->pipeline_soft_max_4]; } else { - do { + while (nth < ne00 && nth < 1024) { nth *= 2; - } while (nth <= ne00 && nth <= 1024); - nth /= 2; + } [encoder setComputePipelineState:ctx->pipeline_soft_max]; } - [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; - [encoder setBuffer:id_dst offset:offs_dst atIndex:1]; - [encoder setBytes:&ne00 length:sizeof(ne00) atIndex:2]; - [encoder setBytes:&ne01 length:sizeof(ne01) atIndex:3]; - [encoder setBytes:&ne02 length:sizeof(ne02) atIndex:4]; - [encoder setThreadgroupMemoryLength:GGML_PAD(nth/32*sizeof(float), 16) atIndex:0]; + + const float scale = ((float *) dst->op_params)[0]; + + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + if (id_src1) { + [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; + } else { + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:1]; + } + [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; + [encoder setBytes:&ne00 length:sizeof(ne00) atIndex:3]; + [encoder setBytes:&ne01 length:sizeof(ne01) atIndex:4]; + [encoder setBytes:&ne02 length:sizeof(ne02) atIndex:5]; + [encoder setBytes:&scale length:sizeof(scale) atIndex:6]; + [encoder setThreadgroupMemoryLength:32*sizeof(float) atIndex:0]; [encoder dispatchThreadgroups:MTLSizeMake(ne01*ne02*ne03, 1, 1) threadsPerThreadgroup:MTLSizeMake(nth, 1, 1)]; } break; @@ -1070,9 +1310,13 @@ void ggml_metal_graph_compute( case GGML_OP_MUL_MAT: { GGML_ASSERT(ne00 == ne10); - GGML_ASSERT(ne03 == ne13); - const uint gqa = ne12/ne02; + // TODO: assert that dim2 and dim3 are contiguous + GGML_ASSERT(ne12 % ne02 == 0); + GGML_ASSERT(ne13 % ne03 == 0); + + const uint r2 = ne12/ne02; + const uint r3 = ne13/ne03; // find the break-even point where the matrix-matrix kernel becomes more efficient compared // to the matrix-vector kernel @@ -1107,7 +1351,7 @@ void ggml_metal_graph_compute( !ggml_is_transposed(src1) && src1t == GGML_TYPE_F32 && ne00 % 32 == 0 && ne00 >= 64 && - ne11 > ne11_mm_min) { + (ne11 > ne11_mm_min || (ggml_is_quantized(src0t) && ne12 > 1))) { //printf("matrix: ne00 = %6d, ne01 = %6d, ne02 = %6d, ne11 = %6d, ne12 = %6d\n", ne00, ne01, ne02, ne11, ne12); switch (src0->type) { case GGML_TYPE_F32: [encoder setComputePipelineState:ctx->pipeline_mul_mm_f32_f32]; break; @@ -1137,9 +1381,10 @@ void ggml_metal_graph_compute( [encoder setBytes:&nb12 length:sizeof(nb12) atIndex:10]; [encoder setBytes:&ne0 length:sizeof(ne0) atIndex:11]; [encoder setBytes:&ne1 length:sizeof(ne1) atIndex:12]; - [encoder setBytes:&gqa length:sizeof(gqa) atIndex:13]; + [encoder setBytes:&r2 length:sizeof(r2) atIndex:13]; + [encoder setBytes:&r3 length:sizeof(r3) atIndex:14]; [encoder setThreadgroupMemoryLength:8192 atIndex:0]; - [encoder dispatchThreadgroups:MTLSizeMake( (ne11 + 31)/32, (ne01 + 63)/64, ne12) threadsPerThreadgroup:MTLSizeMake(128, 1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake( (ne11 + 31)/32, (ne01 + 63)/64, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(128, 1, 1)]; } else { int nth0 = 32; int nth1 = 1; @@ -1175,90 +1420,60 @@ void ggml_metal_graph_compute( } break; case GGML_TYPE_Q4_0: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 8; nth1 = 8; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q4_0_f32]; } break; case GGML_TYPE_Q4_1: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 8; nth1 = 8; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q4_1_f32]; } break; case GGML_TYPE_Q5_0: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 8; nth1 = 8; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q5_0_f32]; } break; case GGML_TYPE_Q5_1: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 8; nth1 = 8; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q5_1_f32]; } break; case GGML_TYPE_Q8_0: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 8; nth1 = 8; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q8_0_f32]; } break; case GGML_TYPE_Q2_K: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 2; nth1 = 32; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q2_K_f32]; } break; case GGML_TYPE_Q3_K: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 2; nth1 = 32; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q3_K_f32]; } break; case GGML_TYPE_Q4_K: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 4; //1; nth1 = 8; //32; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q4_K_f32]; } break; case GGML_TYPE_Q5_K: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 2; nth1 = 32; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q5_K_f32]; } break; case GGML_TYPE_Q6_K: { - GGML_ASSERT(ne02 == 1); - GGML_ASSERT(ne12 == 1); - nth0 = 2; nth1 = 32; [encoder setComputePipelineState:ctx->pipeline_mul_mv_q6_K_f32]; @@ -1287,31 +1502,281 @@ void ggml_metal_graph_compute( [encoder setBytes:&nb12 length:sizeof(nb12) atIndex:14]; [encoder setBytes:&ne0 length:sizeof(ne0) atIndex:15]; [encoder setBytes:&ne1 length:sizeof(ne1) atIndex:16]; - [encoder setBytes:&gqa length:sizeof(gqa) atIndex:17]; + [encoder setBytes:&r2 length:sizeof(r2) atIndex:17]; + [encoder setBytes:&r3 length:sizeof(r3) atIndex:18]; if (src0t == GGML_TYPE_Q4_0 || src0t == GGML_TYPE_Q4_1 || src0t == GGML_TYPE_Q5_0 || src0t == GGML_TYPE_Q5_1 || src0t == GGML_TYPE_Q8_0 || src0t == GGML_TYPE_Q2_K) { // || src0t == GGML_TYPE_Q4_K) { - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 7)/8, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 7)/8, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; } else if (src0t == GGML_TYPE_Q4_K) { - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; } else if (src0t == GGML_TYPE_Q3_K) { #ifdef GGML_QKK_64 - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 1)/2, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 1)/2, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; #else - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; #endif } else if (src0t == GGML_TYPE_Q5_K) { - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 3)/4, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; } else if (src0t == GGML_TYPE_Q6_K) { - [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 1)/2, ne11, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + [encoder dispatchThreadgroups:MTLSizeMake((ne01 + 1)/2, ne11, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; } else { - int64_t ny = (ne11 + nrows - 1)/nrows; - [encoder dispatchThreadgroups:MTLSizeMake(ne01, ny, ne12) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + const int64_t ny = (ne11 + nrows - 1)/nrows; + [encoder dispatchThreadgroups:MTLSizeMake(ne01, ny, ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + } + } + } break; + case GGML_OP_MUL_MAT_ID: + { + //GGML_ASSERT(ne00 == ne10); + //GGML_ASSERT(ne03 == ne13); + + GGML_ASSERT(src0t == GGML_TYPE_I32); + + const int n_as = ((int32_t *) dst->op_params)[1]; + + // TODO: make this more general + GGML_ASSERT(n_as <= 8); + + struct ggml_tensor * src2 = gf->nodes[i]->src[2]; + + const int64_t ne20 = src2 ? src2->ne[0] : 0; + const int64_t ne21 = src2 ? src2->ne[1] : 0; + const int64_t ne22 = src2 ? src2->ne[2] : 0; + const int64_t ne23 = src2 ? src2->ne[3] : 0; GGML_UNUSED(ne23); + + const uint64_t nb20 = src2 ? src2->nb[0] : 0; GGML_UNUSED(nb20); + const uint64_t nb21 = src2 ? src2->nb[1] : 0; + const uint64_t nb22 = src2 ? src2->nb[2] : 0; + const uint64_t nb23 = src2 ? src2->nb[3] : 0; GGML_UNUSED(nb23); + + const enum ggml_type src2t = src2 ? src2->type : GGML_TYPE_COUNT; GGML_UNUSED(src2t); + + GGML_ASSERT(!ggml_is_transposed(src2)); + GGML_ASSERT(!ggml_is_transposed(src1)); + + GGML_ASSERT(ne20 % 32 == 0); + // !!!!!!!!! TODO: this assert is probably required but not sure! + //GGML_ASSERT(ne20 >= 64); + GGML_ASSERT(src1t == GGML_TYPE_F32); + + const uint r2 = ne12/ne22; + const uint r3 = ne13/ne23; + + // find the break-even point where the matrix-matrix kernel becomes more efficient compared + // to the matrix-vector kernel + int ne11_mm_min = 1; + + const int idx = ((int32_t *) dst->op_params)[0]; + + // batch size + GGML_ASSERT(ne01 == ne11); + + const int64_t _ne1 = 1; // kernel_mul_mm_impl needs a reference in constant memory + + // for now the matrix-matrix multiplication kernel only works on A14+/M1+ SoCs + // AMD GPU and older A-chips will reuse matrix-vector multiplication kernel + // !!! + // TODO: for now, always use mat-vec kernels until we figure out how to improve the + // indirect matrix multiplication + // !!! + if ([ctx->device supportsFamily:MTLGPUFamilyApple7] && _ne1 > ne11_mm_min) { + switch (src2->type) { + case GGML_TYPE_F32: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_f32_f32]; break; + case GGML_TYPE_F16: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_f16_f32]; break; + case GGML_TYPE_Q4_0: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q4_0_f32]; break; + case GGML_TYPE_Q4_1: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q4_1_f32]; break; + case GGML_TYPE_Q5_0: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q5_0_f32]; break; + case GGML_TYPE_Q5_1: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q5_1_f32]; break; + case GGML_TYPE_Q8_0: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q8_0_f32]; break; + case GGML_TYPE_Q2_K: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q2_K_f32]; break; + case GGML_TYPE_Q3_K: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q3_K_f32]; break; + case GGML_TYPE_Q4_K: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q4_K_f32]; break; + case GGML_TYPE_Q5_K: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q5_K_f32]; break; + case GGML_TYPE_Q6_K: [encoder setComputePipelineState:ctx->pipeline_mul_mm_id_q6_K_f32]; break; + default: GGML_ASSERT(false && "MUL_MAT_ID not implemented"); + } + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; + [encoder setBytes:&nb01 length:sizeof(nb01) atIndex:3]; + [encoder setBytes:&ne20 length:sizeof(ne20) atIndex:4]; + [encoder setBytes:&ne22 length:sizeof(ne22) atIndex:5]; + [encoder setBytes:&nb21 length:sizeof(nb21) atIndex:6]; + [encoder setBytes:&nb22 length:sizeof(nb22) atIndex:7]; + [encoder setBytes:&ne12 length:sizeof(ne12) atIndex:8]; + [encoder setBytes:&ne13 length:sizeof(ne13) atIndex:9]; + [encoder setBytes:&nb10 length:sizeof(nb10) atIndex:10]; + [encoder setBytes:&nb11 length:sizeof(nb11) atIndex:11]; + [encoder setBytes:&nb12 length:sizeof(nb12) atIndex:12]; + [encoder setBytes:&ne0 length:sizeof(ne0) atIndex:13]; + [encoder setBytes:&_ne1 length:sizeof(_ne1) atIndex:14]; + [encoder setBytes:&nb1 length:sizeof(nb1) atIndex:15]; + [encoder setBytes:&r2 length:sizeof(r2) atIndex:16]; + [encoder setBytes:&r3 length:sizeof(r3) atIndex:17]; + [encoder setBytes:&idx length:sizeof(idx) atIndex:18]; + // TODO: how to make this an array? read Metal docs + for (int j = 0; j < n_as; ++j) { + struct ggml_tensor * src_cur = dst->src[2 + j]; + + size_t offs_src_cur = 0; + id id_src_cur = ggml_metal_get_buffer(ctx, src_cur, &offs_src_cur); + + [encoder setBuffer:id_src_cur offset:offs_src_cur atIndex:19 + j]; + } + + [encoder setThreadgroupMemoryLength:8192 atIndex:0]; + + // TODO: processing one row at a time (ne11 -> 1) is not efficient + [encoder dispatchThreadgroups:MTLSizeMake( (_ne1 + 31)/32, (ne21 + 63)/64, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(128, 1, 1)]; + } else { + int nth0 = 32; + int nth1 = 1; + int nrows = 1; + //printf("vector: ne00 = %6d, ne01 = %6d, ne02 = %6d, ne11 = %6d, ne12 = %6d\n", ne00, ne01, ne02, ne11, ne12); + + // use custom matrix x vector kernel + switch (src2t) { + case GGML_TYPE_F32: + { + GGML_ASSERT(src1t == GGML_TYPE_F32); + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_f32_f32]; + } break; + case GGML_TYPE_F16: + { + GGML_ASSERT(src1t == GGML_TYPE_F32); + nth0 = 32; + nth1 = 1; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_f16_f32]; + } break; + case GGML_TYPE_Q4_0: + { + nth0 = 8; + nth1 = 8; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q4_0_f32]; + } break; + case GGML_TYPE_Q4_1: + { + nth0 = 8; + nth1 = 8; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q4_1_f32]; + } break; + case GGML_TYPE_Q5_0: + { + nth0 = 8; + nth1 = 8; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q5_0_f32]; + } break; + case GGML_TYPE_Q5_1: + { + nth0 = 8; + nth1 = 8; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q5_1_f32]; + } break; + case GGML_TYPE_Q8_0: + { + nth0 = 8; + nth1 = 8; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q8_0_f32]; + } break; + case GGML_TYPE_Q2_K: + { + nth0 = 2; + nth1 = 32; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q2_K_f32]; + } break; + case GGML_TYPE_Q3_K: + { + nth0 = 2; + nth1 = 32; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q3_K_f32]; + } break; + case GGML_TYPE_Q4_K: + { + nth0 = 4; //1; + nth1 = 8; //32; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q4_K_f32]; + } break; + case GGML_TYPE_Q5_K: + { + nth0 = 2; + nth1 = 32; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q5_K_f32]; + } break; + case GGML_TYPE_Q6_K: + { + nth0 = 2; + nth1 = 32; + [encoder setComputePipelineState:ctx->pipeline_mul_mv_id_q6_K_f32]; + } break; + default: + { + GGML_METAL_LOG_ERROR("Asserting on type %d\n", (int)src0t); + GGML_ASSERT(false && "not implemented"); + } + }; + + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; + [encoder setBytes:&nb01 length:sizeof(nb01) atIndex:3]; + [encoder setBytes:&ne20 length:sizeof(ne20) atIndex:4]; + [encoder setBytes:&ne21 length:sizeof(ne21) atIndex:5]; + [encoder setBytes:&ne22 length:sizeof(ne22) atIndex:6]; + [encoder setBytes:&nb20 length:sizeof(nb20) atIndex:7]; + [encoder setBytes:&nb21 length:sizeof(nb21) atIndex:8]; + [encoder setBytes:&nb22 length:sizeof(nb22) atIndex:9]; + [encoder setBytes:&ne10 length:sizeof(ne10) atIndex:10]; + [encoder setBytes:&_ne1 length:sizeof(_ne1) atIndex:11]; + [encoder setBytes:&ne12 length:sizeof(ne12) atIndex:12]; + [encoder setBytes:&ne13 length:sizeof(ne13) atIndex:13]; + [encoder setBytes:&nb10 length:sizeof(nb10) atIndex:14]; + [encoder setBytes:&nb11 length:sizeof(nb11) atIndex:15]; + [encoder setBytes:&nb12 length:sizeof(nb12) atIndex:16]; + [encoder setBytes:&ne0 length:sizeof(ne0) atIndex:17]; + [encoder setBytes:&_ne1 length:sizeof(_ne1) atIndex:18]; + [encoder setBytes:&nb1 length:sizeof(nb1) atIndex:19]; + [encoder setBytes:&r2 length:sizeof(r2) atIndex:20]; + [encoder setBytes:&r3 length:sizeof(r3) atIndex:21]; + [encoder setBytes:&idx length:sizeof(idx) atIndex:22]; + // TODO: how to make this an array? read Metal docs + for (int j = 0; j < n_as; ++j) { + struct ggml_tensor * src_cur = dst->src[2 + j]; + + size_t offs_src_cur = 0; + id id_src_cur = ggml_metal_get_buffer(ctx, src_cur, &offs_src_cur); + + [encoder setBuffer:id_src_cur offset:offs_src_cur atIndex:23 + j]; + } + + if (src2t == GGML_TYPE_Q4_0 || src2t == GGML_TYPE_Q4_1 || + src2t == GGML_TYPE_Q5_0 || src2t == GGML_TYPE_Q5_1 || src2t == GGML_TYPE_Q8_0 || + src2t == GGML_TYPE_Q2_K) { // || src2t == GGML_TYPE_Q4_K) { + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 7)/8, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + } + else if (src2t == GGML_TYPE_Q4_K) { + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 3)/4, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + } + else if (src2t == GGML_TYPE_Q3_K) { +#ifdef GGML_QKK_64 + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 1)/2, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; +#else + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 3)/4, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; +#endif + } + else if (src2t == GGML_TYPE_Q5_K) { + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 3)/4, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + } + else if (src2t == GGML_TYPE_Q6_K) { + [encoder dispatchThreadgroups:MTLSizeMake((ne21 + 1)/2, _ne1, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; + } else { + const int64_t ny = (_ne1 + nrows - 1)/nrows; + [encoder dispatchThreadgroups:MTLSizeMake(ne21, ny, ne01*ne12*ne13) threadsPerThreadgroup:MTLSizeMake(nth0, nth1, 1)]; } } } break; @@ -1333,16 +1798,19 @@ void ggml_metal_graph_compute( default: GGML_ASSERT(false && "not implemented"); } - [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; - [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; - [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_src1 offset:offs_src1 atIndex:1]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:2]; [encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:3]; [encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:4]; - [encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:5]; - - const int64_t n = ggml_nelements(src1); - - [encoder dispatchThreadgroups:MTLSizeMake(n, 1, 1) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)]; + [encoder setBytes:&nb02 length:sizeof(uint64_t) atIndex:5]; + [encoder setBytes:&ne10 length:sizeof( int64_t) atIndex:6]; + [encoder setBytes:&nb10 length:sizeof( int64_t) atIndex:7]; + [encoder setBytes:&nb11 length:sizeof( int64_t) atIndex:8]; + [encoder setBytes:&nb1 length:sizeof(uint64_t) atIndex:9]; + [encoder setBytes:&nb2 length:sizeof(uint64_t) atIndex:10]; + + [encoder dispatchThreadgroups:MTLSizeMake(ne10, ne11, 1) threadsPerThreadgroup:MTLSizeMake(32, 1, 1)]; } break; case GGML_OP_RMS_NORM: { @@ -1351,15 +1819,19 @@ void ggml_metal_graph_compute( float eps; memcpy(&eps, dst->op_params, sizeof(float)); - const int nth = MIN(512, ne00); + int nth = 32; // SIMD width + + while (nth < ne00/4 && nth < 1024) { + nth *= 2; + } [encoder setComputePipelineState:ctx->pipeline_rms_norm]; - [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; - [encoder setBuffer:id_dst offset:offs_dst atIndex:1]; - [encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2]; - [encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:3]; - [encoder setBytes:&eps length:sizeof( float) atIndex:4]; - [encoder setThreadgroupMemoryLength:GGML_PAD(nth/32*sizeof(float), 16) atIndex:0]; + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:1]; + [encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2]; + [encoder setBytes:&nb01 length:sizeof(uint64_t) atIndex:3]; + [encoder setBytes:&eps length:sizeof( float) atIndex:4]; + [encoder setThreadgroupMemoryLength:32*sizeof(float) atIndex:0]; const int64_t nrows = ggml_nrows(src0); @@ -1433,7 +1905,8 @@ void ggml_metal_graph_compute( const int n_past = ((int32_t *) dst->op_params)[0]; const int n_dims = ((int32_t *) dst->op_params)[1]; const int mode = ((int32_t *) dst->op_params)[2]; - const int n_orig_ctx = ((int32_t *) dst->op_params)[3]; + // skip 3, n_ctx, used in GLM RoPE, unimplemented in metal + const int n_orig_ctx = ((int32_t *) dst->op_params)[4]; float freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow; memcpy(&freq_base, (int32_t *) dst->op_params + 5, sizeof(float)); @@ -1533,18 +2006,48 @@ void ggml_metal_graph_compute( [encoder dispatchThreadgroups:MTLSizeMake(IC, OH, OW) threadsPerThreadgroup:MTLSizeMake(N, KH, KW)]; } break; + case GGML_OP_ARGSORT: + { + GGML_ASSERT(src0->type == GGML_TYPE_F32); + GGML_ASSERT( dst->type == GGML_TYPE_I32); + + const int nrows = ggml_nrows(src0); + + enum ggml_sort_order order = (enum ggml_sort_order) dst->op_params[0]; + + switch (order) { + case GGML_SORT_ASC: [encoder setComputePipelineState:ctx->pipeline_argsort_f32_i32_asc]; break; + case GGML_SORT_DESC: [encoder setComputePipelineState:ctx->pipeline_argsort_f32_i32_desc]; break; + default: GGML_ASSERT(false); + }; + + [encoder setBuffer:id_src0 offset:offs_src0 atIndex:0]; + [encoder setBuffer:id_dst offset:offs_dst atIndex:1]; + [encoder setBytes:&ne00 length:sizeof( int64_t) atIndex:2]; + + [encoder dispatchThreadgroups:MTLSizeMake(1, nrows, 1) threadsPerThreadgroup:MTLSizeMake(ne00, 1, 1)]; + } break; case GGML_OP_DUP: case GGML_OP_CPY: case GGML_OP_CONT: { - const int nth = MIN(1024, ne00); + GGML_ASSERT(ne00 % ggml_blck_size(src0->type) == 0); + + int nth = MIN(1024, ne00/ggml_blck_size(src0->type)); switch (src0t) { case GGML_TYPE_F32: { + GGML_ASSERT(ne0 % ggml_blck_size(dst->type) == 0); + switch (dstt) { - case GGML_TYPE_F16: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_f16]; break; - case GGML_TYPE_F32: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_f32]; break; + case GGML_TYPE_F16: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_f16]; break; + case GGML_TYPE_F32: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_f32]; break; + case GGML_TYPE_Q8_0: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_q8_0]; break; + case GGML_TYPE_Q4_0: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_q4_0]; break; + case GGML_TYPE_Q4_1: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_q4_1]; break; + //case GGML_TYPE_Q5_0: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_q5_0]; break; + //case GGML_TYPE_Q5_1: [encoder setComputePipelineState:ctx->pipeline_cpy_f32_q5_1]; break; default: GGML_ASSERT(false && "not implemented"); }; } break; @@ -1552,7 +2055,7 @@ void ggml_metal_graph_compute( { switch (dstt) { case GGML_TYPE_F16: [encoder setComputePipelineState:ctx->pipeline_cpy_f16_f16]; break; - case GGML_TYPE_F32: GGML_ASSERT(false && "cpy_f16_f32 not implemented"); break; + case GGML_TYPE_F32: [encoder setComputePipelineState:ctx->pipeline_cpy_f16_f32]; break; default: GGML_ASSERT(false && "not implemented"); }; } break; @@ -1619,81 +2122,150 @@ void ggml_metal_graph_compute( // backend interface -static const char * ggml_backend_metal_name(ggml_backend_t backend) { - return "Metal"; +static id g_backend_device = nil; +static int g_backend_device_ref_count = 0; - UNUSED(backend); +static id ggml_backend_metal_get_device(void) { + if (g_backend_device == nil) { + g_backend_device = MTLCreateSystemDefaultDevice(); + } + + g_backend_device_ref_count++; + + return g_backend_device; } -static void ggml_backend_metal_free(ggml_backend_t backend) { - struct ggml_metal_context * ctx = (struct ggml_metal_context *)backend->context; - ggml_metal_free(ctx); - free(backend); +static void ggml_backend_metal_free_device(void) { + assert(g_backend_device_ref_count > 0); + + g_backend_device_ref_count--; + + if (g_backend_device_ref_count == 0) { + [g_backend_device release]; + g_backend_device = nil; + } } static void * ggml_backend_metal_buffer_get_base(ggml_backend_buffer_t buffer) { - return (void *)buffer->context; + struct ggml_backend_metal_buffer_context * ctx = (struct ggml_backend_metal_buffer_context *)buffer->context; + + return ctx->data; } static void ggml_backend_metal_buffer_free_buffer(ggml_backend_buffer_t buffer) { - free(buffer->context); + struct ggml_backend_metal_buffer_context * ctx = (struct ggml_backend_metal_buffer_context *)buffer->context; + + [ctx->metal release]; + ggml_backend_metal_free_device(); + + free(ctx->data); + free(ctx); + + UNUSED(buffer); +} + +static void ggml_backend_metal_buffer_set_tensor(ggml_backend_buffer_t buffer, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + + memcpy((char *)tensor->data + offset, data, size); + + UNUSED(buffer); +} + +static void ggml_backend_metal_buffer_get_tensor(ggml_backend_buffer_t buffer, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { + GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); + GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); + + memcpy(data, (const char *)tensor->data + offset, size); + + UNUSED(buffer); +} + +static void ggml_backend_metal_buffer_cpy_tensor_from(ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst) { + ggml_backend_tensor_get(src, dst->data, 0, ggml_nbytes(src)); + + UNUSED(buffer); +} + +static void ggml_backend_metal_buffer_cpy_tensor_to(ggml_backend_buffer_t buffer, struct ggml_tensor * src, struct ggml_tensor * dst) { + ggml_backend_tensor_set(dst, src->data, 0, ggml_nbytes(src)); + UNUSED(buffer); } static struct ggml_backend_buffer_i metal_backend_buffer_i = { - /* .free_buffer = */ ggml_backend_metal_buffer_free_buffer, - /* .get_base = */ ggml_backend_metal_buffer_get_base, - /* .get_alloc_size = */ NULL, // defaults to ggml_nbytes - /* .init_tensor = */ NULL, // no initialization required - /* .free_tensor = */ NULL, // no cleanup required + /* .free_buffer = */ ggml_backend_metal_buffer_free_buffer, + /* .get_base = */ ggml_backend_metal_buffer_get_base, + /* .init_tensor = */ NULL, + /* .set_tensor = */ ggml_backend_metal_buffer_set_tensor, + /* .get_tensor = */ ggml_backend_metal_buffer_get_tensor, + /* .cpy_tensor_from = */ ggml_backend_metal_buffer_cpy_tensor_from, + /* .cpy_tensor_to = */ ggml_backend_metal_buffer_cpy_tensor_to, }; -static ggml_backend_buffer_t ggml_backend_metal_alloc_buffer(ggml_backend_t backend, size_t size) { - struct ggml_metal_context * ctx = (struct ggml_metal_context *)backend->context; +static ggml_backend_buffer_t ggml_backend_metal_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { + struct ggml_backend_metal_buffer_context * ctx = malloc(sizeof(struct ggml_backend_metal_buffer_context)); - void * data = ggml_metal_host_malloc(size); + const size_t size_page = sysconf(_SC_PAGESIZE); - // TODO: set proper name of the buffers - ggml_metal_add_buffer(ctx, "backend", data, size, 0); + size_t size_aligned = size; + if ((size_aligned % size_page) != 0) { + size_aligned += (size_page - (size_aligned % size_page)); + } + + ctx->data = ggml_metal_host_malloc(size); + ctx->metal = [ggml_backend_metal_get_device() newBufferWithBytesNoCopy:ctx->data + length:size_aligned + options:MTLResourceStorageModeShared + deallocator:nil]; - return ggml_backend_buffer_init(backend, metal_backend_buffer_i, data, size); + return ggml_backend_buffer_init(buft, metal_backend_buffer_i, ctx, size); } -static size_t ggml_backend_metal_get_alignment(ggml_backend_t backend) { +static size_t ggml_backend_metal_buffer_type_get_alignment(ggml_backend_buffer_type_t buft) { return 32; - UNUSED(backend); + UNUSED(buft); } -static void ggml_backend_metal_set_tensor_async(ggml_backend_t backend, struct ggml_tensor * tensor, const void * data, size_t offset, size_t size) { - GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); +static bool ggml_backend_metal_buffer_type_supports_backend(ggml_backend_buffer_type_t buft, ggml_backend_t backend) { + return ggml_backend_is_metal(backend) || ggml_backend_is_cpu(backend); - memcpy((char *)tensor->data + offset, data, size); - - UNUSED(backend); + GGML_UNUSED(buft); } -static void ggml_backend_metal_get_tensor_async(ggml_backend_t backend, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) { - GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds"); - GGML_ASSERT(tensor->data != NULL && "tensor not allocated"); - - memcpy(data, (const char *)tensor->data + offset, size); +ggml_backend_buffer_type_t ggml_backend_metal_buffer_type(void) { + static struct ggml_backend_buffer_type ggml_backend_buffer_type_metal = { + /* .iface = */ { + /* .alloc_buffer = */ ggml_backend_metal_buffer_type_alloc_buffer, + /* .get_alignment = */ ggml_backend_metal_buffer_type_get_alignment, + /* .get_alloc_size = */ NULL, // defaults to ggml_nbytes + /* .supports_backend = */ ggml_backend_metal_buffer_type_supports_backend, + }, + /* .context = */ NULL, + }; - UNUSED(backend); + return &ggml_backend_buffer_type_metal; } -static void ggml_backend_metal_synchronize(ggml_backend_t backend) { +static const char * ggml_backend_metal_name(ggml_backend_t backend) { + return "Metal"; + UNUSED(backend); } -static void ggml_backend_metal_cpy_tensor_from(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst) { - ggml_backend_tensor_get(src, dst->data, 0, ggml_nbytes(src)); +static void ggml_backend_metal_free(ggml_backend_t backend) { + struct ggml_metal_context * ctx = (struct ggml_metal_context *)backend->context; + ggml_metal_free(ctx); + free(backend); +} +static void ggml_backend_metal_synchronize(ggml_backend_t backend) { UNUSED(backend); } -static void ggml_backend_metal_cpy_tensor_to(ggml_backend_t backend, struct ggml_tensor * src, struct ggml_tensor * dst) { - ggml_backend_tensor_set_async(dst, src->data, 0, ggml_nbytes(src)); +static ggml_backend_buffer_type_t ggml_backend_metal_get_default_buffer_type(ggml_backend_t backend) { + return ggml_backend_metal_buffer_type(); UNUSED(backend); } @@ -1705,32 +2277,43 @@ static void ggml_backend_metal_graph_compute(ggml_backend_t backend, struct ggml } static bool ggml_backend_metal_supports_op(ggml_backend_t backend, const struct ggml_tensor * op) { - return true; + return ggml_metal_supports_op(op); + UNUSED(backend); - UNUSED(op); } static struct ggml_backend_i metal_backend_i = { - /* .get_name = */ ggml_backend_metal_name, - /* .free = */ ggml_backend_metal_free, - /* .alloc_buffer = */ ggml_backend_metal_alloc_buffer, - /* .get_alignment = */ ggml_backend_metal_get_alignment, - /* .set_tensor_async = */ ggml_backend_metal_set_tensor_async, - /* .get_tensor_async = */ ggml_backend_metal_get_tensor_async, - /* .synchronize = */ ggml_backend_metal_synchronize, - /* .cpy_tensor_from = */ ggml_backend_metal_cpy_tensor_from, - /* .cpy_tensor_to = */ ggml_backend_metal_cpy_tensor_to, - /* .graph_plan_create = */ NULL, // the metal implementation does not require creating graph plans atm - /* .graph_plan_free = */ NULL, - /* .graph_plan_compute = */ NULL, - /* .graph_compute = */ ggml_backend_metal_graph_compute, - /* .supports_op = */ ggml_backend_metal_supports_op, + /* .get_name = */ ggml_backend_metal_name, + /* .free = */ ggml_backend_metal_free, + /* .get_default_buffer_type = */ ggml_backend_metal_get_default_buffer_type, + /* .set_tensor_async = */ NULL, + /* .get_tensor_async = */ NULL, + /* .cpy_tensor_from_async = */ NULL, + /* .cpy_tensor_to_async = */ NULL, + /* .synchronize = */ ggml_backend_metal_synchronize, + /* .graph_plan_create = */ NULL, // the metal implementation does not require creating graph plans atm + /* .graph_plan_free = */ NULL, + /* .graph_plan_compute = */ NULL, + /* .graph_compute = */ ggml_backend_metal_graph_compute, + /* .supports_op = */ ggml_backend_metal_supports_op, }; +// TODO: make a common log callback for all backends in ggml-backend +static void ggml_backend_log_callback(enum ggml_log_level level, const char * msg, void * user_data) { + fprintf(stderr, "%s", msg); + + UNUSED(level); + UNUSED(user_data); +} + ggml_backend_t ggml_backend_metal_init(void) { - struct ggml_metal_context * ctx = malloc(sizeof(struct ggml_metal_context)); + ggml_metal_log_set_callback(ggml_backend_log_callback, NULL); - ctx = ggml_metal_init(GGML_DEFAULT_N_THREADS); + struct ggml_metal_context * ctx = ggml_metal_init(GGML_DEFAULT_N_THREADS); + + if (ctx == NULL) { + return NULL; + } ggml_backend_t metal_backend = malloc(sizeof(struct ggml_backend)); @@ -1747,7 +2330,26 @@ bool ggml_backend_is_metal(ggml_backend_t backend) { } void ggml_backend_metal_set_n_cb(ggml_backend_t backend, int n_cb) { + GGML_ASSERT(ggml_backend_is_metal(backend)); + struct ggml_metal_context * ctx = (struct ggml_metal_context *)backend->context; ggml_metal_set_n_cb(ctx, n_cb); } + +bool ggml_backend_metal_supports_family(ggml_backend_t backend, int family) { + GGML_ASSERT(ggml_backend_is_metal(backend)); + + struct ggml_metal_context * ctx = (struct ggml_metal_context *)backend->context; + + return [ctx->device supportsFamily:(MTLGPUFamilyApple1 + family - 1)]; +} + +ggml_backend_t ggml_backend_reg_metal_init(const char * params, void * user_data); // silence warning + +ggml_backend_t ggml_backend_reg_metal_init(const char * params, void * user_data) { + return ggml_backend_metal_init(); + + GGML_UNUSED(params); + GGML_UNUSED(user_data); +} diff --git a/ggml-metal.metal b/ggml-metal.metal index 5d1357cd72d45..773fac124b0c4 100644 --- a/ggml-metal.metal +++ b/ggml-metal.metal @@ -3,6 +3,8 @@ using namespace metal; #define MAX(x, y) ((x) > (y) ? (x) : (y)) +#define MIN(x, y) ((x) < (y) ? (x) : (y)) +#define SWAP(x, y) { auto tmp = (x); (x) = (y); (y) = tmp; } #define QK4_0 32 #define QR4_0 2 @@ -39,8 +41,15 @@ typedef struct { int8_t qs[QK8_0]; // quants } block_q8_0; -// general-purpose kernel for addition of two tensors -// pros: works for non-contiguous tensors, supports broadcast across dims 1, 2 and 3 +#define N_SIMDWIDTH 32 // assuming SIMD group size is 32 + +enum ggml_sort_order { + GGML_SORT_ASC, + GGML_SORT_DESC, +}; + +// general-purpose kernel for addition, multiplication and division of two tensors +// pros: works for non-contiguous tensors, supports broadcast across all dims // cons: not very efficient kernel void kernel_add( device const char * src0, @@ -81,16 +90,111 @@ kernel void kernel_add( const int64_t i12 = i02 % ne12; const int64_t i11 = i01 % ne11; - device const char * src0_ptr = src0 + i03*nb03 + i02*nb02 + i01*nb01 + tpitg.x*nb00; - device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11 + tpitg.x*nb10; - device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1 + tpitg.x*nb0; + device const char * src0_ptr = src0 + i03*nb03 + i02*nb02 + i01*nb01; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1; + + for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { + const int i10 = i0 % ne10; + *((device float *)(dst_ptr + i0*nb0)) = *((device float *)(src0_ptr + i0*nb00)) + *((device float *)(src1_ptr + i10*nb10)); + } +} + +kernel void kernel_mul( + device const char * src0, + device const char * src1, + device char * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant int64_t & nb00, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant int64_t & nb0, + constant int64_t & nb1, + constant int64_t & nb2, + constant int64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig.z; + const int64_t i02 = tgpig.y; + const int64_t i01 = tgpig.x; + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + device const char * src0_ptr = src0 + i03*nb03 + i02*nb02 + i01*nb01; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1; for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { - ((device float *)dst_ptr)[0] = ((device float *)src0_ptr)[0] + ((device float *)src1_ptr)[0]; + const int i10 = i0 % ne10; + *((device float *)(dst_ptr + i0*nb0)) = *((device float *)(src0_ptr + i0*nb00)) * *((device float *)(src1_ptr + i10*nb10)); + } +} + +kernel void kernel_div( + device const char * src0, + device const char * src1, + device char * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant int64_t & nb00, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant int64_t & nb0, + constant int64_t & nb1, + constant int64_t & nb2, + constant int64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig.z; + const int64_t i02 = tgpig.y; + const int64_t i01 = tgpig.x; + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + device const char * src0_ptr = src0 + i03*nb03 + i02*nb02 + i01*nb01; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1; - src0_ptr += ntg.x*nb00; - src1_ptr += ntg.x*nb10; - dst_ptr += ntg.x*nb0; + for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { + const int i10 = i0 % ne10; + *((device float *)(dst_ptr + i0*nb0)) = *((device float *)(src0_ptr + i0*nb00)) / *((device float *)(src1_ptr + i10*nb10)); } } @@ -105,23 +209,22 @@ kernel void kernel_add_row( dst[tpig] = src0[tpig] + src1[tpig % nb]; } -kernel void kernel_mul( +kernel void kernel_mul_row( device const float4 * src0, device const float4 * src1, device float4 * dst, + constant int64_t & nb [[buffer(27)]], uint tpig[[thread_position_in_grid]]) { - dst[tpig] = src0[tpig] * src1[tpig]; + dst[tpig] = src0[tpig] * src1[tpig % nb]; } -// assumption: src1 is a row -// broadcast src1 into src0 -kernel void kernel_mul_row( +kernel void kernel_div_row( device const float4 * src0, device const float4 * src1, device float4 * dst, - constant int64_t & nb, + constant int64_t & nb [[buffer(27)]], uint tpig[[thread_position_in_grid]]) { - dst[tpig] = src0[tpig] * src1[tpig % nb]; + dst[tpig] = src0[tpig] / src1[tpig % nb]; } kernel void kernel_scale( @@ -162,6 +265,54 @@ kernel void kernel_sqr( dst[tpig] = src0[tpig] * src0[tpig]; } +kernel void kernel_sum_rows( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant int64_t & nb00, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant int64_t & nb0, + constant int64_t & nb1, + constant int64_t & nb2, + constant int64_t & nb3, + uint3 tpig[[thread_position_in_grid]]) { + int64_t i3 = tpig.z; + int64_t i2 = tpig.y; + int64_t i1 = tpig.x; + + if (i3 >= ne03 || i2 >= ne02 || i1 >= ne01) { + return; + } + + device const float * src_row = (device const float *) ((device const char *) src0 + i1*nb01 + i2*nb02 + i3*nb03); + device float * dst_row = (device float *) ((device char *) dst + i1*nb1 + i2*nb2 + i3*nb3); + + float row_sum = 0; + + for (int64_t i0 = 0; i0 < ne00; i0++) { + row_sum += src_row[i0]; + } + + dst_row[0] = row_sum; +} + constant float GELU_COEF_A = 0.044715f; constant float SQRT_2_OVER_PI = 0.79788456080286535587989211986876f; @@ -180,10 +331,12 @@ kernel void kernel_gelu( kernel void kernel_soft_max( device const float * src0, + device const float * src1, device float * dst, constant int64_t & ne00, constant int64_t & ne01, constant int64_t & ne02, + constant float & scale, threadgroup float * buf [[threadgroup(0)]], uint tgpig[[threadgroup_position_in_grid]], uint tpitg[[thread_position_in_threadgroup]], @@ -194,73 +347,82 @@ kernel void kernel_soft_max( const int64_t i02 = (tgpig - i03*ne02*ne01) / ne01; const int64_t i01 = (tgpig - i03*ne02*ne01 - i02*ne01); - device const float * psrc0 = src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; - device float * pdst = dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + device const float * psrc0 = src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + device const float * pmask = src1 != src0 ? src1 + i01*ne00 : nullptr; + device float * pdst = dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; // parallel max - float lmax = tpitg < ne00 ? psrc0[tpitg] : -INFINITY; + float lmax = -INFINITY; - for (int i00 = tpitg + ntg; i00 < ne00; i00 += ntg) { - lmax = MAX(lmax, psrc0[i00]); + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + lmax = MAX(lmax, psrc0[i00]*scale + (pmask ? pmask[i00] : 0.0f)); } - float max = simd_max(lmax); - if (tiisg == 0) { - buf[sgitg] = max; - } + // find the max value in the block + float max_val = simd_max(lmax); + if (ntg > N_SIMDWIDTH) { + if (sgitg == 0) { + buf[tiisg] = -INFINITY; + } - threadgroup_barrier(mem_flags::mem_threadgroup); + threadgroup_barrier(mem_flags::mem_threadgroup); - // broadcast, simd group number is ntg / 32 - for (uint i = ntg / 32 / 2; i > 0; i /= 2) { - if (tpitg < i) { - buf[tpitg] = MAX(buf[tpitg], buf[tpitg + i]); - } - } + if (tiisg == 0) { + buf[sgitg] = max_val; + } - threadgroup_barrier(mem_flags::mem_threadgroup); + threadgroup_barrier(mem_flags::mem_threadgroup); - max = buf[0]; + max_val = buf[tiisg]; + max_val = simd_max(max_val); + } // parallel sum float lsum = 0.0f; for (int i00 = tpitg; i00 < ne00; i00 += ntg) { - const float exp_psrc0 = exp(psrc0[i00] - max); + const float exp_psrc0 = exp((psrc0[i00]*scale + (pmask ? pmask[i00] : 0.0f)) - max_val); lsum += exp_psrc0; - // Remember the result of exp here. exp is expensive, so we really do not - // wish to compute it twice. pdst[i00] = exp_psrc0; } + // This barrier fixes a failing test + // ref: https://github.com/ggerganov/ggml/pull/621#discussion_r1425156335 + threadgroup_barrier(mem_flags::mem_none); + float sum = simd_sum(lsum); - if (tiisg == 0) { - buf[sgitg] = sum; - } - threadgroup_barrier(mem_flags::mem_threadgroup); + if (ntg > N_SIMDWIDTH) { + if (sgitg == 0) { + buf[tiisg] = 0.0f; + } - // broadcast, simd group number is ntg / 32 - for (uint i = ntg / 32 / 2; i > 0; i /= 2) { - if (tpitg < i) { - buf[tpitg] += buf[tpitg + i]; - } - } + threadgroup_barrier(mem_flags::mem_threadgroup); - threadgroup_barrier(mem_flags::mem_threadgroup); + if (tiisg == 0) { + buf[sgitg] = sum; + } - sum = buf[0]; + threadgroup_barrier(mem_flags::mem_threadgroup); + + sum = buf[tiisg]; + sum = simd_sum(sum); + } + + const float inv_sum = 1.0f/sum; for (int i00 = tpitg; i00 < ne00; i00 += ntg) { - pdst[i00] /= sum; + pdst[i00] *= inv_sum; } } kernel void kernel_soft_max_4( device const float * src0, + device const float * src1, device float * dst, constant int64_t & ne00, constant int64_t & ne01, constant int64_t & ne02, + constant float & scale, threadgroup float * buf [[threadgroup(0)]], uint tgpig[[threadgroup_position_in_grid]], uint tpitg[[thread_position_in_threadgroup]], @@ -271,64 +433,74 @@ kernel void kernel_soft_max_4( const int64_t i02 = (tgpig - i03*ne02*ne01) / ne01; const int64_t i01 = (tgpig - i03*ne02*ne01 - i02*ne01); - device const float4 * psrc4 = (device const float4 *)(src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); - device float4 * pdst4 = (device float4 *)(dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); + device const float4 * psrc4 = (device const float4 *)(src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); + device const float4 * pmask = src1 != src0 ? (device const float4 *)(src1 + i01*ne00) : nullptr; + device float4 * pdst4 = (device float4 *)(dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); // parallel max - float4 lmax4 = tpitg < ne00/4 ? psrc4[tpitg] : -INFINITY; + float4 lmax4 = -INFINITY; - for (int i00 = tpitg + ntg; i00 < ne00/4; i00 += ntg) { - lmax4 = fmax(lmax4, psrc4[i00]); + for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { + lmax4 = fmax(lmax4, psrc4[i00]*scale + (pmask ? pmask[i00] : 0.0f)); } const float lmax = MAX(MAX(lmax4[0], lmax4[1]), MAX(lmax4[2], lmax4[3])); - float max = simd_max(lmax); - if (tiisg == 0) { - buf[sgitg] = max; - } - threadgroup_barrier(mem_flags::mem_threadgroup); + float max_val = simd_max(lmax); + if (ntg > N_SIMDWIDTH) { + if (sgitg == 0) { + buf[tiisg] = -INFINITY; + } - // broadcast, simd group number is ntg / 32 - for (uint i = ntg / 32 / 2; i > 0; i /= 2) { - if (tpitg < i) { - buf[tpitg] = MAX(buf[tpitg], buf[tpitg + i]); - } - } + threadgroup_barrier(mem_flags::mem_threadgroup); - threadgroup_barrier(mem_flags::mem_threadgroup); + if (tiisg == 0) { + buf[sgitg] = max_val; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); - max = buf[0]; + max_val = buf[tiisg]; + max_val = simd_max(max_val); + } // parallel sum float4 lsum4 = 0.0f; for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { - const float4 exp_psrc4 = exp(psrc4[i00] - max); + const float4 exp_psrc4 = exp((psrc4[i00]*scale + (pmask ? pmask[i00] : 0.0f)) - max_val); lsum4 += exp_psrc4; pdst4[i00] = exp_psrc4; } const float lsum = lsum4[0] + lsum4[1] + lsum4[2] + lsum4[3]; + + // This barrier fixes a failing test + // ref: https://github.com/ggerganov/ggml/pull/621#discussion_r1425156335 + threadgroup_barrier(mem_flags::mem_none); + float sum = simd_sum(lsum); - if (tiisg == 0) { - buf[sgitg] = sum; - } - threadgroup_barrier(mem_flags::mem_threadgroup); + if (ntg > N_SIMDWIDTH) { + if (sgitg == 0) { + buf[tiisg] = 0.0f; + } - // broadcast, simd group number is ntg / 32 - for (uint i = ntg / 32 / 2; i > 0; i /= 2) { - if (tpitg < i) { - buf[tpitg] += buf[tpitg + i]; - } - } + threadgroup_barrier(mem_flags::mem_threadgroup); - threadgroup_barrier(mem_flags::mem_threadgroup); + if (tiisg == 0) { + buf[sgitg] = sum; + } - sum = buf[0]; + threadgroup_barrier(mem_flags::mem_threadgroup); + + sum = buf[tiisg]; + sum = simd_sum(sum); + } + + const float inv_sum = 1.0f/sum; for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { - pdst4[i00] /= sum; + pdst4[i00] *= inv_sum; } } @@ -435,14 +607,13 @@ kernel void kernel_rms_norm( constant int64_t & ne00, constant uint64_t & nb01, constant float & eps, - threadgroup float * sum [[threadgroup(0)]], + threadgroup float * buf [[threadgroup(0)]], uint tgpig[[threadgroup_position_in_grid]], uint tpitg[[thread_position_in_threadgroup]], uint sgitg[[simdgroup_index_in_threadgroup]], uint tiisg[[thread_index_in_simdgroup]], uint ntg[[threads_per_threadgroup]]) { - device const float4 * x = (device const float4 *) ((device const char *) src0 + tgpig*nb01); - device const float * x_scalar = (device const float *) x; + device const float4 * x = (device const float4 *) ((device const char *) src0 + tgpig*nb01); float4 sumf = 0; float all_sum = 0; @@ -453,40 +624,30 @@ kernel void kernel_rms_norm( } all_sum = sumf[0] + sumf[1] + sumf[2] + sumf[3]; all_sum = simd_sum(all_sum); - if (tiisg == 0) { - sum[sgitg] = all_sum; - } + if (ntg > N_SIMDWIDTH) { + if (sgitg == 0) { + buf[tiisg] = 0.0f; + } - threadgroup_barrier(mem_flags::mem_threadgroup); + threadgroup_barrier(mem_flags::mem_threadgroup); - // broadcast, simd group number is ntg / 32 - for (uint i = ntg / 32 / 2; i > 0; i /= 2) { - if (tpitg < i) { - sum[tpitg] += sum[tpitg + i]; - } - } - if (tpitg == 0) { - for (int i = 4 * (ne00 / 4); i < ne00; i++) { - sum[0] += x_scalar[i]; + if (tiisg == 0) { + buf[sgitg] = all_sum; } - sum[0] /= ne00; - } - threadgroup_barrier(mem_flags::mem_threadgroup); + threadgroup_barrier(mem_flags::mem_threadgroup); + + all_sum = buf[tiisg]; + all_sum = simd_sum(all_sum); + } - const float mean = sum[0]; + const float mean = all_sum/ne00; const float scale = 1.0f/sqrt(mean + eps); device float4 * y = (device float4 *) (dst + tgpig*ne00); - device float * y_scalar = (device float *) y; for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { y[i00] = x[i00] * scale; } - if (tpitg == 0) { - for (int i00 = 4 * (ne00 / 4); i00 < ne00; i00++) { - y_scalar[i00] = x_scalar[i00] * scale; - } - } } // function for calculate inner product between half a q4_0 block and 16 floats (yl), sumy is SUM(yl[i]) @@ -576,15 +737,25 @@ inline float block_q_n_dot_y(device const block_q5_1 * qb_curr, float sumy, thre // putting them in the kernel cause a significant performance penalty #define N_DST 4 // each SIMD group works on 4 rows #define N_SIMDGROUP 2 // number of SIMD groups in a thread group -#define N_SIMDWIDTH 32 // assuming SIMD group size is 32 //Note: This is a template, but strictly speaking it only applies to // quantizations where the block size is 32. It also does not // giard against the number of rows not being divisible by // N_DST, so this is another explicit assumption of the implementation. template -void mul_vec_q_n_f32(device const void * src0, device const float * src1, device float * dst, - int64_t ne00, int64_t ne01, int64_t ne02, int64_t ne10, int64_t ne12, int64_t ne0, int64_t ne1, uint gqa, - uint3 tgpig, uint tiisg, uint sgitg) { +void mul_vec_q_n_f32_impl( + device const void * src0, + device const float * src1, + device float * dst, + int64_t ne00, + int64_t ne01, + int64_t ne02, + int64_t ne10, + int64_t ne12, + int64_t ne0, + int64_t ne1, + uint r2, + uint r3, + uint3 tgpig, uint tiisg, uint sgitg) { const int nb = ne00/QK4_0; const int r0 = tgpig.x; @@ -593,7 +764,10 @@ void mul_vec_q_n_f32(device const void * src0, device const float * src1, device const int first_row = (r0 * nsg + sgitg) * nr; - const uint offset0 = first_row * nb + im/gqa*(nb*ne0); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = first_row * nb + (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); device const block_q_type * x = (device const block_q_type *) src0 + offset0; device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; @@ -643,13 +817,14 @@ kernel void kernel_mul_mv_q4_0_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { - mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); + mul_vec_q_n_f32_impl(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,r2,r3,tgpig,tiisg,sgitg); } kernel void kernel_mul_mv_q4_1_f32( @@ -661,13 +836,14 @@ kernel void kernel_mul_mv_q4_1_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { - mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); + mul_vec_q_n_f32_impl(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,r2,r3,tgpig,tiisg,sgitg); } kernel void kernel_mul_mv_q5_0_f32( @@ -679,13 +855,14 @@ kernel void kernel_mul_mv_q5_0_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { - mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); + mul_vec_q_n_f32_impl(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,r2,r3,tgpig,tiisg,sgitg); } kernel void kernel_mul_mv_q5_1_f32( @@ -697,33 +874,35 @@ kernel void kernel_mul_mv_q5_1_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { - mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); + mul_vec_q_n_f32_impl(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,r2,r3,tgpig,tiisg,sgitg); } #define NB_Q8_0 8 -kernel void kernel_mul_mv_q8_0_f32( +void kernel_mul_mv_q8_0_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01[[buffer(4)]], - constant int64_t & ne02[[buffer(5)]], - constant int64_t & ne10[[buffer(9)]], - constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const int nr = N_DST; const int nsg = N_SIMDGROUP; const int nw = N_SIMDWIDTH; @@ -732,8 +911,14 @@ kernel void kernel_mul_mv_q8_0_f32( const int r0 = tgpig.x; const int r1 = tgpig.y; const int im = tgpig.z; + const int first_row = (r0 * nsg + sgitg) * nr; - const uint offset0 = first_row * nb + im/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = first_row * nb + (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q8_0 * x = (device const block_q8_0 *) src0 + offset0; device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; @@ -771,9 +956,29 @@ kernel void kernel_mul_mv_q8_0_f32( } } +[[host_name("kernel_mul_mv_q8_0_f32")]] +kernel void kernel_mul_mv_q8_0_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + kernel_mul_mv_q8_0_f32_impl(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,r2,r3,tgpig,tiisg,sgitg); +} + #define N_F32_F32 4 -kernel void kernel_mul_mv_f32_f32( +void kernel_mul_mv_f32_f32_impl( device const char * src0, device const char * src1, device float * dst, @@ -791,6 +996,8 @@ kernel void kernel_mul_mv_f32_f32( constant uint64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]]) { @@ -798,7 +1005,12 @@ kernel void kernel_mul_mv_f32_f32( const int64_t rb = tgpig.y*N_F32_F32; const int64_t im = tgpig.z; - device const float * x = (device const float *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = r0*nb01 + (i12/r2)*nb02 + (i13/r3)*nb02*ne02; + + device const float * x = (device const float *) (src0 + offset0); if (ne00 < 128) { for (int row = 0; row < N_F32_F32; ++row) { @@ -844,7 +1056,33 @@ kernel void kernel_mul_mv_f32_f32( } } -#define N_F16_F16 4 +[[host_name("kernel_mul_mv_f32_f32")]] +kernel void kernel_mul_mv_f32_f32( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + kernel_mul_mv_f32_f32_impl(src0, src1, dst, ne00, ne01, ne02, nb00, nb01, nb02, ne10, ne11, ne12, nb10, nb11, nb12, ne0, ne1, r2, r3, tgpig, tiisg); +} + +#define N_F16_F16 4 kernel void kernel_mul_mv_f16_f16( device const char * src0, @@ -864,6 +1102,8 @@ kernel void kernel_mul_mv_f16_f16( constant uint64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]]) { @@ -871,7 +1111,12 @@ kernel void kernel_mul_mv_f16_f16( const int64_t rb = tgpig.y*N_F16_F16; const int64_t im = tgpig.z; - device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = r0*nb01 + (i12/r2)*nb02 + (i13/r3)*nb02*ne02; + + device const half * x = (device const half *) (src0 + offset0); if (ne00 < 128) { for (int row = 0; row < N_F16_F16; ++row) { @@ -917,7 +1162,7 @@ kernel void kernel_mul_mv_f16_f16( } } -kernel void kernel_mul_mv_f16_f32_1row( +void kernel_mul_mv_f16_f32_1row_impl( device const char * src0, device const char * src1, device float * dst, @@ -935,6 +1180,8 @@ kernel void kernel_mul_mv_f16_f32_1row( constant uint64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]]) { @@ -942,7 +1189,12 @@ kernel void kernel_mul_mv_f16_f32_1row( const int64_t r1 = tgpig.y; const int64_t im = tgpig.z; - device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = r0*nb01 + (i12/r2)*nb02 + (i13/r3)*nb02*ne02; + + device const half * x = (device const half *) (src0 + offset0); device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); float sumf = 0; @@ -966,12 +1218,37 @@ kernel void kernel_mul_mv_f16_f32_1row( dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; } } +} +[[host_name("kernel_mul_mv_f16_f32_1row")]] +kernel void kernel_mul_mv_f16_f32_1row( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + kernel_mul_mv_f16_f32_1row_impl(src0, src1, dst, ne00, ne01, ne02, nb00, nb01, nb02, ne10, ne11, ne12, nb10, nb11, nb12, ne0, ne1, r2, r3, tgpig, tiisg); } #define N_F16_F32 4 -kernel void kernel_mul_mv_f16_f32( +void kernel_mul_mv_f16_f32_impl( device const char * src0, device const char * src1, device float * dst, @@ -989,6 +1266,8 @@ kernel void kernel_mul_mv_f16_f32( constant uint64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]]) { @@ -996,7 +1275,12 @@ kernel void kernel_mul_mv_f16_f32( const int64_t rb = tgpig.y*N_F16_F32; const int64_t im = tgpig.z; - device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = r0*nb01 + (i12/r2)*nb02 + (i13/r3)*nb02*ne02; + + device const half * x = (device const half *) (src0 + offset0); if (ne00 < 128) { for (int row = 0; row < N_F16_F32; ++row) { @@ -1042,6 +1326,32 @@ kernel void kernel_mul_mv_f16_f32( } } +[[host_name("kernel_mul_mv_f16_f32")]] +kernel void kernel_mul_mv_f16_f32( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + kernel_mul_mv_f16_f32_impl(src0, src1, dst, ne00, ne01, ne02, nb00, nb01, nb02, ne10, ne11, ne12, nb10, nb11, nb12, ne0, ne1, r2, r3, tgpig, tiisg); +} + // Assumes row size (ne00) is a multiple of 4 kernel void kernel_mul_mv_f16_f32_l4( device const char * src0, @@ -1061,6 +1371,8 @@ kernel void kernel_mul_mv_f16_f32_l4( constant uint64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]]) { @@ -1068,7 +1380,12 @@ kernel void kernel_mul_mv_f16_f32_l4( const int64_t r0 = tgpig.x; const int64_t im = tgpig.z; - device const half4 * x4 = (device const half4 *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = r0*nb01 + (i12/r2)*nb02 + (i13/r3)*nb02*ne02; + + device const half4 * x4 = (device const half4 *) (src0 + offset0); for (int r1 = 0; r1 < nrows; ++r1) { device const float4 * y4 = (device const float4 *) (src1 + r1*nb11 + im*nb12); @@ -1120,17 +1437,21 @@ kernel void kernel_alibi_f32( const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + const int64_t k = i3*ne3 + i2; - device float * dst_data = (device float *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); float m_k; - if (i2 < n_heads_log2_floor) { - m_k = pow(m0, i2 + 1); + if (k < n_heads_log2_floor) { + m_k = pow(m0, k + 1); } else { - m_k = pow(m1, 2 * (i2 - n_heads_log2_floor) + 1); + m_k = pow(m1, 2 * (k - n_heads_log2_floor) + 1); } + + device char * dst_row = (device char *) dst + i3*nb3 + i2*nb2 + i1*nb1; + device const char * src_row = (device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01; for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { - device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); - dst_data[i00] = src[0] + m_k * (i00 - ne00 + 1); + const float src_v = *(device float *)(src_row + i00*nb00); + device float * dst_v = (device float *)(dst_row + i00*nb0); + *dst_v = i00 * m_k + src_v; } } @@ -1335,9 +1656,61 @@ kernel void kernel_im2col_f16( } } +// bitonic sort implementation following the CUDA kernels as reference +typedef void (argsort_t)( + device const float * x, + device int32_t * dst, + constant int64_t & ncols, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]]); + +template +kernel void kernel_argsort_f32_i32( + device const float * x, + device int32_t * dst, + constant int64_t & ncols, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]]) { + // bitonic sort + int col = tpitg[0]; + int row = tgpig[1]; + + if (col >= ncols) return; + + device const float * x_row = x + row * ncols; + device int32_t * dst_row = dst + row * ncols; + + // initialize indices + if (col < ncols) { + dst_row[col] = col; + } + threadgroup_barrier(mem_flags::mem_threadgroup); + + for (int k = 2; k <= ncols; k *= 2) { + for (int j = k / 2; j > 0; j /= 2) { + int ixj = col ^ j; + if (ixj > col) { + if ((col & k) == 0) { + if (order == GGML_SORT_ASC ? x_row[dst_row[col]] > x_row[dst_row[ixj]] : x_row[dst_row[col]] < x_row[dst_row[ixj]]) { + SWAP(dst_row[col], dst_row[ixj]); + } + } else { + if (order == GGML_SORT_ASC ? x_row[dst_row[col]] < x_row[dst_row[ixj]] : x_row[dst_row[col]] > x_row[dst_row[ixj]]) { + SWAP(dst_row[col], dst_row[ixj]); + } + } + } + threadgroup_barrier(mem_flags::mem_threadgroup); + } + } +} + +template [[host_name("kernel_argsort_f32_i32_asc")]] kernel argsort_t kernel_argsort_f32_i32; +template [[host_name("kernel_argsort_f32_i32_desc")]] kernel argsort_t kernel_argsort_f32_i32; + kernel void kernel_cpy_f16_f16( - device const half * src0, - device half * dst, + device const half * src0, + device half * dst, constant int64_t & ne00, constant int64_t & ne01, constant int64_t & ne02, @@ -1376,6 +1749,47 @@ kernel void kernel_cpy_f16_f16( } } +kernel void kernel_cpy_f16_f32( + device const half * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + + device float * dst_data = (device float *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { + device const half * src = (device half *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + dst_data[i00] = src[0]; + } +} + kernel void kernel_cpy_f32_f16( device const float * src0, device half * dst, @@ -1460,106 +1874,297 @@ kernel void kernel_cpy_f32_f32( } } -kernel void kernel_concat( - device const char * src0, - device const char * src1, - device char * dst, - constant int64_t & ne00, - constant int64_t & ne01, - constant int64_t & ne02, - constant int64_t & ne03, - constant uint64_t & nb00, - constant uint64_t & nb01, - constant uint64_t & nb02, - constant uint64_t & nb03, - constant int64_t & ne10, - constant int64_t & ne11, - constant int64_t & ne12, - constant int64_t & ne13, - constant uint64_t & nb10, - constant uint64_t & nb11, - constant uint64_t & nb12, - constant uint64_t & nb13, - constant int64_t & ne0, - constant int64_t & ne1, - constant int64_t & ne2, - constant int64_t & ne3, - constant uint64_t & nb0, - constant uint64_t & nb1, - constant uint64_t & nb2, - constant uint64_t & nb3, - uint3 tgpig[[threadgroup_position_in_grid]], - uint3 tpitg[[thread_position_in_threadgroup]], - uint3 ntg[[threads_per_threadgroup]]) { +kernel void kernel_cpy_f32_q8_0( + device const float * src0, + device void * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; - const int64_t i03 = tgpig.z; - const int64_t i02 = tgpig.y; - const int64_t i01 = tgpig.x; + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; - const int64_t i13 = i03 % ne13; - const int64_t i12 = i02 % ne12; - const int64_t i11 = i01 % ne11; + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0)/QK8_0; - device const char * src0_ptr = src0 + i03 * nb03 + i02 * nb02 + i01 * nb01 + tpitg.x*nb00; - device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11 + tpitg.x*nb10; - device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1 + tpitg.x*nb0; + device block_q8_0 * dst_data = (device block_q8_0 *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); - for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { - if (i02 < ne02) { - ((device float *)dst_ptr)[0] = ((device float *)src0_ptr)[0]; - src0_ptr += ntg.x*nb00; - } else { - ((device float *)dst_ptr)[0] = ((device float *)src1_ptr)[0]; - src1_ptr += ntg.x*nb10; - } - dst_ptr += ntg.x*nb0; - } -} + for (int64_t i00 = tpitg.x*QK8_0; i00 < ne00; i00 += ntg.x*QK8_0) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); -//============================================ k-quants ====================================================== + float amax = 0.0f; // absolute max -#ifndef QK_K -#define QK_K 256 -#else -static_assert(QK_K == 256 || QK_K == 64, "QK_K must be 256 or 64"); -#endif + for (int j = 0; j < QK8_0; j++) { + const float v = src[j]; + amax = MAX(amax, fabs(v)); + } -#if QK_K == 256 -#define K_SCALE_SIZE 12 -#else -#define K_SCALE_SIZE 4 -#endif + const float d = amax / ((1 << 7) - 1); + const float id = d ? 1.0f/d : 0.0f; -typedef struct { - uint8_t scales[QK_K/16]; // scales and mins, quantized with 4 bits - uint8_t qs[QK_K/4]; // quants - half d; // super-block scale for quantized scales - half dmin; // super-block scale for quantized mins -} block_q2_K; -// 84 bytes / block + dst_data[i00/QK8_0].d = d; -typedef struct { - uint8_t hmask[QK_K/8]; // quants - high bit - uint8_t qs[QK_K/4]; // quants - low 2 bits -#if QK_K == 64 - uint8_t scales[2]; -#else - uint8_t scales[K_SCALE_SIZE]; // scales, quantized with 6 bits -#endif - half d; // super-block scale -} block_q3_K; + for (int j = 0; j < QK8_0; ++j) { + const float x0 = src[j]*id; -#if QK_K == 64 -typedef struct { - half d[2]; // super-block scales/mins - uint8_t scales[2]; - uint8_t qs[QK_K/2]; // 4-bit quants -} block_q4_K; -#else -typedef struct { - half d; // super-block scale for quantized scales - half dmin; // super-block scale for quantized mins - uint8_t scales[K_SCALE_SIZE]; // scales and mins, quantized with 6 bits + dst_data[i00/QK8_0].qs[j] = round(x0); + } + } +} + +kernel void kernel_cpy_f32_q4_0( + device const float * src0, + device void * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0)/QK4_0; + + device block_q4_0 * dst_data = (device block_q4_0 *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x*QK4_0; i00 < ne00; i00 += ntg.x*QK4_0) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + + float amax = 0.0f; // absolute max + float max = 0.0f; + + for (int j = 0; j < QK4_0; j++) { + const float v = src[j]; + if (amax < fabs(v)) { + amax = fabs(v); + max = v; + } + } + + const float d = max / -8; + const float id = d ? 1.0f/d : 0.0f; + + dst_data[i00/QK4_0].d = d; + + for (int j = 0; j < QK4_0/2; ++j) { + const float x0 = src[0 + j]*id; + const float x1 = src[QK4_0/2 + j]*id; + + const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f)); + const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f)); + + dst_data[i00/QK4_0].qs[j] = xi0; + dst_data[i00/QK4_0].qs[j] |= xi1 << 4; + } + } +} + +kernel void kernel_cpy_f32_q4_1( + device const float * src0, + device void * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0)/QK4_1; + + device block_q4_1 * dst_data = (device block_q4_1 *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x*QK4_1; i00 < ne00; i00 += ntg.x*QK4_1) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + + float min = FLT_MAX; + float max = -FLT_MAX; + + for (int j = 0; j < QK4_1; j++) { + const float v = src[j]; + if (min > v) min = v; + if (max < v) max = v; + } + + const float d = (max - min) / ((1 << 4) - 1); + const float id = d ? 1.0f/d : 0.0f; + + dst_data[i00/QK4_1].d = d; + dst_data[i00/QK4_1].m = min; + + for (int j = 0; j < QK4_1/2; ++j) { + const float x0 = (src[0 + j] - min)*id; + const float x1 = (src[QK4_1/2 + j] - min)*id; + + const uint8_t xi0 = MIN(15, (int8_t)(x0 + 0.5f)); + const uint8_t xi1 = MIN(15, (int8_t)(x1 + 0.5f)); + + dst_data[i00/QK4_1].qs[j] = xi0; + dst_data[i00/QK4_1].qs[j] |= xi1 << 4; + } + } +} + +kernel void kernel_concat( + device const char * src0, + device const char * src1, + device char * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant uint64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + + const int64_t i03 = tgpig.z; + const int64_t i02 = tgpig.y; + const int64_t i01 = tgpig.x; + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + device const char * src0_ptr = src0 + i03 * nb03 + i02 * nb02 + i01 * nb01 + tpitg.x*nb00; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11 + tpitg.x*nb10; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1 + tpitg.x*nb0; + + for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { + if (i02 < ne02) { + ((device float *)dst_ptr)[0] = ((device float *)src0_ptr)[0]; + src0_ptr += ntg.x*nb00; + } else { + ((device float *)dst_ptr)[0] = ((device float *)src1_ptr)[0]; + src1_ptr += ntg.x*nb10; + } + dst_ptr += ntg.x*nb0; + } +} + +//============================================ k-quants ====================================================== + +#ifndef QK_K +#define QK_K 256 +#else +static_assert(QK_K == 256 || QK_K == 64, "QK_K must be 256 or 64"); +#endif + +#if QK_K == 256 +#define K_SCALE_SIZE 12 +#else +#define K_SCALE_SIZE 4 +#endif + +typedef struct { + uint8_t scales[QK_K/16]; // scales and mins, quantized with 4 bits + uint8_t qs[QK_K/4]; // quants + half d; // super-block scale for quantized scales + half dmin; // super-block scale for quantized mins +} block_q2_K; +// 84 bytes / block + +typedef struct { + uint8_t hmask[QK_K/8]; // quants - high bit + uint8_t qs[QK_K/4]; // quants - low 2 bits +#if QK_K == 64 + uint8_t scales[2]; +#else + uint8_t scales[K_SCALE_SIZE]; // scales, quantized with 6 bits +#endif + half d; // super-block scale +} block_q3_K; + +#if QK_K == 64 +typedef struct { + half d[2]; // super-block scales/mins + uint8_t scales[2]; + uint8_t qs[QK_K/2]; // 4-bit quants +} block_q4_K; +#else +typedef struct { + half d; // super-block scale for quantized scales + half dmin; // super-block scale for quantized mins + uint8_t scales[K_SCALE_SIZE]; // scales and mins, quantized with 6 bits uint8_t qs[QK_K/2]; // 4--bit quants } block_q4_K; #endif @@ -1608,32 +2213,39 @@ static inline uchar4 get_scale_min_k4(int j, device const uint8_t * q) { //====================================== dot products ========================= -kernel void kernel_mul_mv_q2_K_f32( +void kernel_mul_mv_q2_K_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01[[buffer(4)]], - constant int64_t & ne02[[buffer(5)]], - constant int64_t & ne10[[buffer(9)]], - constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const int nb = ne00/QK_K; const int r0 = tgpig.x; const int r1 = tgpig.y; - const int r2 = tgpig.z; + const int im = tgpig.z; const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; const int ib_row = first_row * nb; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q2_K * x = (device const block_q2_K *) src0 + ib_row + offset0; - device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + float yl[32]; float sumf[N_DST]={0.f}, all_sum; @@ -1642,11 +2254,11 @@ kernel void kernel_mul_mv_q2_K_f32( #if QK_K == 256 const int ix = tiisg/8; // 0...3 const int it = tiisg%8; // 0...7 - const int im = it/4; // 0 or 1 + const int iq = it/4; // 0 or 1 const int ir = it%4; // 0...3 const int is = (8*ir)/16;// 0 or 1 - device const float * y4 = y + ix * QK_K + 128 * im + 8 * ir; + device const float * y4 = y + ix * QK_K + 128 * iq + 8 * ir; for (int ib = ix; ib < nb; ib += 4) { @@ -1658,8 +2270,8 @@ kernel void kernel_mul_mv_q2_K_f32( yl[i+24] = y4[i+96]; sumy[3] += yl[i+24]; } - device const uint8_t * sc = (device const uint8_t *)x[ib].scales + 8*im + is; - device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir; + device const uint8_t * sc = (device const uint8_t *)x[ib].scales + 8*iq + is; + device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 16 * iq + 4 * ir; device const half * dh = &x[ib].d; for (int row = 0; row < N_DST; row++) { @@ -1746,13 +2358,13 @@ kernel void kernel_mul_mv_q2_K_f32( for (int row = 0; row < N_DST; ++row) { all_sum = simd_sum(sumf[row]); if (tiisg == 0) { - dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = all_sum; + dst[r1*ne0 + im*ne0*ne1 + first_row + row] = all_sum; } } } -#if QK_K == 256 -kernel void kernel_mul_mv_q3_K_f32( +[[host_name("kernel_mul_mv_q2_K_f32")]] +kernel void kernel_mul_mv_q2_K_f32( device const void * src0, device const float * src1, device float * dst, @@ -1761,23 +2373,50 @@ kernel void kernel_mul_mv_q3_K_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + kernel_mul_mv_q2_K_f32_impl(src0, src1, dst, ne00, ne01, ne02, ne10, ne12, ne0, ne1, r2, r3, tgpig, tiisg, sgitg); +} + +#if QK_K == 256 +void kernel_mul_mv_q3_K_f32_impl( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const int nb = ne00/QK_K; const int64_t r0 = tgpig.x; const int64_t r1 = tgpig.y; - const int64_t r2 = tgpig.z; + const int64_t im = tgpig.z; const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q3_K * x = (device const block_q3_K *) src0 + first_row*nb + offset0; - device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * yy = (device const float *) src1 + r1*ne10 + im*ne00*ne1; float yl[32]; @@ -1899,40 +2538,47 @@ kernel void kernel_mul_mv_q3_K_f32( } if (tiisg == 0) { for (int row = 0; row < 2; ++row) { - dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = sumf1[row]; + dst[r1*ne0 + im*ne0*ne1 + first_row + row] = sumf1[row]; } } } #else -kernel void kernel_mul_mv_q3_K_f32( +void kernel_mul_mv_q3_K_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01[[buffer(4)]], - constant int64_t & ne02[[buffer(5)]], - constant int64_t & ne10[[buffer(9)]], - constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const int nb = ne00/QK_K; const int64_t r0 = tgpig.x; const int64_t r1 = tgpig.y; - const int64_t r2 = tgpig.z; + const int64_t im = tgpig.z; const int row = 2 * r0 + sgitg; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q3_K * x = (device const block_q3_K *) src0 + row*nb + offset0; - device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * yy = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + const int ix = tiisg/4; const int il = 4 * (tiisg%4);// 0, 4, 8, 12 - const int im = il/8; // 0, 0, 1, 1 + const int iq = il/8; // 0, 0, 1, 1 const int in = il%8; // 0, 4, 0, 4 float2 sum = {0.f, 0.f}; @@ -1952,7 +2598,7 @@ kernel void kernel_mul_mv_q3_K_f32( const float d4 = d_all * ((int32_t)(s[0] & 0xF000) - 32768) * 1.f/262144.f; for (int l = 0; l < 4; l += 2) { - const uint16_t hm = h[l/2] >> im; + const uint16_t hm = h[l/2] >> iq; sum[0] += y[l+ 0] * d1 * ((int32_t)(q[l/2] & 0x0003) - ((hm & 0x0001) ? 0 : 4)) + y[l+16] * d2 * ((int32_t)(q[l/2] & 0x000c) - ((hm & 0x0004) ? 0 : 16)) + y[l+32] * d3 * ((int32_t)(q[l/2] & 0x0030) - ((hm & 0x0010) ? 0 : 64)) @@ -1968,28 +2614,50 @@ kernel void kernel_mul_mv_q3_K_f32( const float tot = simd_sum(sumf); if (tiisg == 0) { - dst[r1*ne0 + r2*ne0*ne1 + row] = tot; + dst[r1*ne0 + im*ne0*ne1 + row] = tot; } } #endif +[[host_name("kernel_mul_mv_q3_K_f32")]] +kernel void kernel_mul_mv_q3_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + kernel_mul_mv_q3_K_f32_impl(src0, src1, dst, ne00, ne01, ne02, ne10, ne12, ne0, ne1, r2, r3, tgpig, tiisg, sgitg); +} + #if QK_K == 256 -kernel void kernel_mul_mv_q4_K_f32( +void kernel_mul_mv_q4_K_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01 [[buffer(4)]], - constant int64_t & ne02 [[buffer(5)]], - constant int64_t & ne10 [[buffer(9)]], - constant int64_t & ne12 [[buffer(11)]], - constant int64_t & ne0 [[buffer(15)]], - constant int64_t & ne1 [[buffer(16)]], - constant uint & gqa [[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const uint16_t kmask1 = 0x3f3f; const uint16_t kmask2 = 0x0f0f; @@ -1997,26 +2665,32 @@ kernel void kernel_mul_mv_q4_K_f32( const int ix = tiisg/8; // 0...3 const int it = tiisg%8; // 0...7 - const int im = it/4; // 0 or 1 + const int iq = it/4; // 0 or 1 const int ir = it%4; // 0...3 const int nb = ne00/QK_K; const int r0 = tgpig.x; const int r1 = tgpig.y; - const int r2 = tgpig.z; + const int im = tgpig.z; //const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; const int first_row = r0 * N_DST; const int ib_row = first_row * nb; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row + offset0; - device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + float yl[16]; float yh[16]; float sumf[N_DST]={0.f}, all_sum; const int step = sizeof(block_q4_K) * nb / 2; - device const float * y4 = y + ix * QK_K + 64 * im + 8 * ir; + device const float * y4 = y + ix * QK_K + 64 * iq + 8 * ir; uint16_t sc16[4]; thread const uint8_t * sc8 = (thread const uint8_t *)sc16; @@ -2031,8 +2705,8 @@ kernel void kernel_mul_mv_q4_K_f32( yh[i+8] = y4[i+160]; sumy[3] += yh[i+8]; } - device const uint16_t * sc = (device const uint16_t *)x[ib].scales + im; - device const uint16_t * q1 = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir; + device const uint16_t * sc = (device const uint16_t *)x[ib].scales + iq; + device const uint16_t * q1 = (device const uint16_t *)x[ib].qs + 16 * iq + 4 * ir; device const half * dh = &x[ib].d; for (int row = 0; row < N_DST; row++) { @@ -2076,23 +2750,24 @@ kernel void kernel_mul_mv_q4_K_f32( for (int row = 0; row < N_DST; ++row) { all_sum = simd_sum(sumf[row]); if (tiisg == 0) { - dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = all_sum; + dst[r1*ne0 + im*ne0*ne1 + first_row + row] = all_sum; } } } #else -kernel void kernel_mul_mv_q4_K_f32( +void kernel_mul_mv_q4_K_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01[[buffer(4)]], - constant int64_t & ne02[[buffer(5)]], - constant int64_t & ne10[[buffer(9)]], - constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { @@ -2103,12 +2778,18 @@ kernel void kernel_mul_mv_q4_K_f32( const int nb = ne00/QK_K; const int r0 = tgpig.x; const int r1 = tgpig.y; - const int r2 = tgpig.z; + const int im = tgpig.z; const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; const int ib_row = first_row * nb; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row + offset0; - device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + float yl[8]; float yh[8]; float sumf[N_DST]={0.f}, all_sum; @@ -2164,13 +2845,14 @@ kernel void kernel_mul_mv_q4_K_f32( for (int row = 0; row < N_DST; ++row) { all_sum = simd_sum(sumf[row]); if (tiisg == 0) { - dst[r1*ne0+ r2*ne0*ne1 + first_row + row] = all_sum; + dst[r1*ne0+ im*ne0*ne1 + first_row + row] = all_sum; } } } #endif -kernel void kernel_mul_mv_q5_K_f32( +[[host_name("kernel_mul_mv_q4_K_f32")]] +kernel void kernel_mul_mv_q4_K_f32( device const void * src0, device const float * src1, device float * dst, @@ -2179,23 +2861,49 @@ kernel void kernel_mul_mv_q5_K_f32( constant int64_t & ne02[[buffer(5)]], constant int64_t & ne10[[buffer(9)]], constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], uint3 tgpig[[threadgroup_position_in_grid]], uint tiisg[[thread_index_in_simdgroup]], uint sgitg[[simdgroup_index_in_threadgroup]]) { + kernel_mul_mv_q4_K_f32_impl(src0, src1, dst, ne00, ne01, ne02, ne10, ne12, ne0, ne1, r2, r3, tgpig, tiisg, sgitg); +} + +void kernel_mul_mv_q5_K_f32_impl( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + const int nb = ne00/QK_K; const int64_t r0 = tgpig.x; const int64_t r1 = tgpig.y; - const int r2 = tgpig.z; + const int im = tgpig.z; const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q5_K * x = (device const block_q5_K *) src0 + first_row*nb + offset0; - device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * yy = (device const float *) src1 + r1*ne10 + im*ne00*ne1; float sumf[2]={0.f}; @@ -2211,15 +2919,15 @@ kernel void kernel_mul_mv_q5_K_f32( const int tid = tiisg/4; const int ix = tiisg%4; - const int im = tid/4; + const int iq = tid/4; const int ir = tid%4; const int n = 8; const int l0 = n*ir; - const int q_offset = 32*im + l0; - const int y_offset = 64*im + l0; + const int q_offset = 32*iq + l0; + const int y_offset = 64*iq + l0; - const uint8_t hm1 = 1u << (2*im); + const uint8_t hm1 = 1u << (2*iq); const uint8_t hm2 = hm1 << 1; const uint8_t hm3 = hm1 << 4; const uint8_t hm4 = hm2 << 4; @@ -2234,7 +2942,7 @@ kernel void kernel_mul_mv_q5_K_f32( device const uint8_t * q1 = x[i].qs + q_offset; device const uint8_t * qh = x[i].qh + l0; device const half * dh = &x[i].d; - device const uint16_t * a = (device const uint16_t *)x[i].scales + im; + device const uint16_t * a = (device const uint16_t *)x[i].scales + iq; device const float * y2 = y1 + 128; float4 sumy = {0.f, 0.f, 0.f, 0.f}; @@ -2290,7 +2998,7 @@ kernel void kernel_mul_mv_q5_K_f32( const int il = 4 * (tiisg/8); // 0, 4, 8, 12 const int ix = tiisg%8; - const int im = il/8; // 0, 0, 1, 1 + const int iq = il/8; // 0, 0, 1, 1 const int in = il%8; // 0, 4, 0, 4 device const float * y = yy + ix*QK_K + il; @@ -2315,7 +3023,7 @@ kernel void kernel_mul_mv_q5_K_f32( float2 acc = {0.f, 0.f}; for (int l = 0; l < 4; ++l) { - const uint8_t hl = h[l] >> im; + const uint8_t hl = h[l] >> iq; acc[0] += yl[l+0] * s[0] * ((int16_t)(q[l+ 0] & 0x0F) - (hl & 0x01 ? 0 : 16)) + yl[l+4] * s[1] * ((int16_t)(q[l+16] & 0x0F) - (hl & 0x04 ? 0 : 16)); acc[1] += yh[l+0] * s[2] * ((int16_t)(q[l+ 0] & 0xF0) - (hl & 0x10 ? 0 : 256)) @@ -2337,27 +3045,48 @@ kernel void kernel_mul_mv_q5_K_f32( for (int row = 0; row < 2; ++row) { const float tot = simd_sum(sumf[row]); if (tiisg == 0) { - dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = tot; + dst[r1*ne0 + im*ne0*ne1 + first_row + row] = tot; } } +} + +[[host_name("kernel_mul_mv_q5_K_f32")]] +kernel void kernel_mul_mv_q5_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + kernel_mul_mv_q5_K_f32_impl(src0, src1, dst, ne00, ne01, ne02, ne10, ne12, ne0, ne1, r2, r3, tgpig, tiisg, sgitg); } -kernel void kernel_mul_mv_q6_K_f32( +void kernel_mul_mv_q6_K_f32_impl( device const void * src0, device const float * src1, device float * dst, constant int64_t & ne00, - constant int64_t & ne01[[buffer(4)]], - constant int64_t & ne02[[buffer(5)]], - constant int64_t & ne10[[buffer(9)]], - constant int64_t & ne12[[buffer(11)]], - constant int64_t & ne0[[buffer(15)]], - constant int64_t & ne1[[buffer(16)]], - constant uint & gqa[[buffer(17)]], + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne10, + constant int64_t & ne12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, uint3 tgpig[[threadgroup_position_in_grid]], - uint tiisg[[thread_index_in_simdgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { const uint8_t kmask1 = 0x03; const uint8_t kmask2 = 0x0C; @@ -2368,12 +3097,17 @@ kernel void kernel_mul_mv_q6_K_f32( const int64_t r0 = tgpig.x; const int64_t r1 = tgpig.y; - const int r2 = tgpig.z; + const int im = tgpig.z; const int row = 2 * r0 + sgitg; - const uint offset0 = r2/gqa*(nb*ne0); + + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + const uint offset0 = (i12/r2)*(nb*ne01) + (i13/r3)*(nb*ne01*ne02); + device const block_q6_K * x = (device const block_q6_K *) src0 + row * nb + offset0; - device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + device const float * yy = (device const float *) src1 + r1*ne10 + im*ne00*ne1; float sumf = 0; @@ -2439,10 +3173,31 @@ kernel void kernel_mul_mv_q6_K_f32( const float tot = simd_sum(sumf); if (tiisg == 0) { - dst[r1*ne0 + r2*ne0*ne1 + row] = tot; + dst[r1*ne0 + im*ne0*ne1 + row] = tot; } } +[[host_name("kernel_mul_mv_q6_K_f32")]] +kernel void kernel_mul_mv_q6_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & r2 [[buffer(17)]], + constant uint & r3 [[buffer(18)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + kernel_mul_mv_q6_K_f32_impl(src0, src1, dst, ne00, ne01, ne02, ne10, ne12, ne0, ne1, r2, r3, tgpig, tiisg, sgitg); +} + //============================= templates and their specializations ============================= // NOTE: this is not dequantizing - we are simply fitting the template @@ -2717,22 +3472,90 @@ void dequantize_q6_K(device const block_q6_K *xb, short il, thread type4x4 & reg template kernel void kernel_get_rows( device const void * src0, - device const int * src1, + device const char * src1, device float * dst, constant int64_t & ne00, constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant uint64_t & nb10, + constant uint64_t & nb11, constant uint64_t & nb1, - uint tgpig[[threadgroup_position_in_grid]], + constant uint64_t & nb2, + uint3 tgpig[[threadgroup_position_in_grid]], uint tiitg[[thread_index_in_threadgroup]], - uint tptg[[threads_per_threadgroup]]) { - const int i = tgpig; - const int r = ((device int32_t *) src1)[i]; + uint3 tptg [[threads_per_threadgroup]]) { + //const int64_t i = tgpig; + //const int64_t r = ((device int32_t *) src1)[i]; + + const int64_t i10 = tgpig.x; + const int64_t i11 = tgpig.y; - for (int ind = tiitg; ind < ne00/16; ind += tptg) { + const int64_t r = ((device int32_t *) ((device char *) src1 + i11*nb11 + i10*nb10))[0]; + + const int64_t i02 = i11; + + for (int64_t ind = tiitg; ind < ne00/16; ind += tptg.x) { float4x4 temp; dequantize_func( - ((device const block_q *) ((device char *) src0 + r*nb01)) + ind/nl, ind%nl, temp); - *(((device float4x4 *) ((device char *) dst + i*nb1)) + ind) = temp; + ((device const block_q *) ((device char *) src0 + r*nb01 + i02*nb02)) + ind/nl, ind%nl, temp); + *(((device float4x4 *) ((device char *) dst + i11*nb2 + i10*nb1)) + ind) = temp; + } +} + +kernel void kernel_get_rows_f32( + device const void * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb1, + constant uint64_t & nb2, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint3 tptg [[threads_per_threadgroup]]) { + const int64_t i10 = tgpig.x; + const int64_t i11 = tgpig.y; + + const int64_t r = ((device int32_t *) ((device char *) src1 + i11*nb11 + i10*nb10))[0]; + + const int64_t i02 = i11; + + for (int ind = tiitg; ind < ne00; ind += tptg.x) { + ((device float *) ((device char *) dst + i11*nb2 + i10*nb1))[ind] = + ((device float *) ((device char *) src0 + r*nb01 + i02*nb02))[ind]; + } +} + +kernel void kernel_get_rows_f16( + device const void * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb1, + constant uint64_t & nb2, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint3 tptg [[threads_per_threadgroup]]) { + const int64_t i10 = tgpig.x; + const int64_t i11 = tgpig.y; + + const int64_t r = ((device int32_t *) ((device char *) src1 + i11*nb11 + i10*nb10))[0]; + + const int64_t i02 = i11; + + for (int ind = tiitg; ind < ne00; ind += tptg.x) { + ((device float *) ((device char *) dst + i11*nb2 + i10*nb1))[ind] = + ((device half *) ((device char *) src0 + r*nb01 + i02*nb02))[ind]; } } @@ -2749,24 +3572,25 @@ kernel void kernel_get_rows( // each block_q contains 16*nl weights template -kernel void kernel_mul_mm(device const uchar * src0, - device const uchar * src1, - device float * dst, - constant int64_t & ne00, - constant int64_t & ne02, - constant int64_t & nb01, - constant int64_t & nb02, - constant int64_t & ne12, - constant int64_t & nb10, - constant int64_t & nb11, - constant int64_t & nb12, - constant int64_t & ne0, - constant int64_t & ne1, - constant uint & gqa, - threadgroup uchar * shared_memory [[threadgroup(0)]], - uint3 tgpig[[threadgroup_position_in_grid]], - uint tiitg[[thread_index_in_threadgroup]], - uint sgitg[[simdgroup_index_in_threadgroup]]) { +void kernel_mul_mm_impl(device const uchar * src0, + device const uchar * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, + threadgroup uchar * shared_memory [[threadgroup(0)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { threadgroup half * sa = (threadgroup half *)(shared_memory); threadgroup float * sb = (threadgroup float *)(shared_memory + 4096); @@ -2792,7 +3616,10 @@ kernel void kernel_mul_mm(device const uchar * src0, short il = (tiitg % THREAD_PER_ROW); - uint offset0 = im/gqa*nb02; + const uint i12 = im%ne12; + const uint i13 = im/ne12; + + uint offset0 = (i12/r2)*nb02 + (i13/r3)*(nb02*ne02); ushort offset1 = il/nl; device const block_q * x = (device const block_q *)(src0 + (r0 * BLOCK_SIZE_M + thread_row) * nb01 + offset0) + offset1; @@ -2876,17 +3703,137 @@ kernel void kernel_mul_mm(device const uchar * src0, } } +template +kernel void kernel_mul_mm(device const uchar * src0, + device const uchar * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & r2, + constant uint & r3, + threadgroup uchar * shared_memory [[threadgroup(0)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + kernel_mul_mm_impl( + src0, + src1, + dst, + ne00, + ne02, + nb01, + nb02, + ne12, + nb10, + nb11, + nb12, + ne0, + ne1, + r2, + r3, + shared_memory, + tgpig, + tiitg, + sgitg); +} + +template +kernel void kernel_mul_mm_id( + device const uchar * ids, + device const uchar * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const uchar * src00, + device const uchar * src01, + device const uchar * src02, + device const uchar * src03, + device const uchar * src04, + device const uchar * src05, + device const uchar * src06, + device const uchar * src07, + threadgroup uchar * shared_memory [[threadgroup(0)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const uchar * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mm_impl( + src0[id], + src1 + bid*nb11, + (device float *) (dst + bid*nb1), + ne00, + ne02, + nb01, + nb02, + ne12, + nb10, + nb11, + nb12, + ne0, + ne1, + r2, + r3, + shared_memory, + tgpig, + tiitg, + sgitg); +} + #if QK_K == 256 #define QK_NL 16 #else #define QK_NL 4 #endif -typedef void (get_rows_t)(device const void *, device const int *, device float *, constant int64_t &, \ - constant uint64_t &, constant uint64_t &, uint, uint, uint); +// +// get rows +// -template [[host_name("kernel_get_rows_f32")]] kernel get_rows_t kernel_get_rows; -template [[host_name("kernel_get_rows_f16")]] kernel get_rows_t kernel_get_rows; +typedef void (get_rows_t)( + device const void * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb1, + constant uint64_t & nb2, + uint3, uint, uint3); + +//template [[host_name("kernel_get_rows_f32")]] kernel get_rows_t kernel_get_rows; +//template [[host_name("kernel_get_rows_f16")]] kernel get_rows_t kernel_get_rows; template [[host_name("kernel_get_rows_q4_0")]] kernel get_rows_t kernel_get_rows; template [[host_name("kernel_get_rows_q4_1")]] kernel get_rows_t kernel_get_rows; template [[host_name("kernel_get_rows_q5_0")]] kernel get_rows_t kernel_get_rows; @@ -2898,6 +3845,10 @@ template [[host_name("kernel_get_rows_q4_K")]] kernel get_rows_t kernel_get_rows template [[host_name("kernel_get_rows_q5_K")]] kernel get_rows_t kernel_get_rows; template [[host_name("kernel_get_rows_q6_K")]] kernel get_rows_t kernel_get_rows; +// +// matrix-matrix multiplication +// + typedef void (mat_mm_t)( device const uchar * src0, device const uchar * src1, @@ -2912,8 +3863,10 @@ typedef void (mat_mm_t)( constant int64_t & nb12, constant int64_t & ne0, constant int64_t & ne1, - constant uint & gqa, - threadgroup uchar *, uint3, uint, uint); + constant uint & r2, + constant uint & r3, + threadgroup uchar *, + uint3, uint, uint); template [[host_name("kernel_mul_mm_f32_f32")]] kernel mat_mm_t kernel_mul_mm; template [[host_name("kernel_mul_mm_f16_f32")]] kernel mat_mm_t kernel_mul_mm; @@ -2927,3 +3880,823 @@ template [[host_name("kernel_mul_mm_q3_K_f32")]] kernel mat_mm_t kernel_mul_mm; template [[host_name("kernel_mul_mm_q5_K_f32")]] kernel mat_mm_t kernel_mul_mm; template [[host_name("kernel_mul_mm_q6_K_f32")]] kernel mat_mm_t kernel_mul_mm; + +// +// indirect matrix-matrix multiplication +// + +typedef void (mat_mm_id_t)( + device const uchar * ids, + device const uchar * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const uchar * src00, + device const uchar * src01, + device const uchar * src02, + device const uchar * src03, + device const uchar * src04, + device const uchar * src05, + device const uchar * src06, + device const uchar * src07, + threadgroup uchar *, + uint3, uint, uint); + +template [[host_name("kernel_mul_mm_id_f32_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_f16_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q4_0_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q4_1_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q5_0_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q5_1_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q8_0_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q2_K_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q3_K_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q4_K_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q5_K_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; +template [[host_name("kernel_mul_mm_id_q6_K_f32")]] kernel mat_mm_id_t kernel_mul_mm_id; + +// +// matrix-vector multiplication +// + +[[host_name("kernel_mul_mv_id_f32_f32")]] +kernel void kernel_mul_mv_id_f32_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_f32_f32_impl( + src0[id], + src1 + bid*nb11, + (device float *) (dst + bid*nb1), + ne00, + ne01, + ne02, + nb00, + nb01, + nb02, + ne10, + ne11, + ne12, + nb10, + nb11, + nb12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg); +} + +[[host_name("kernel_mul_mv_id_f16_f32")]] +kernel void kernel_mul_mv_id_f16_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_f16_f32_impl( + src0[id], + src1 + bid*nb11, + (device float *) (dst + bid*nb1), + ne00, + ne01, + ne02, + nb00, + nb01, + nb02, + ne10, + ne11, + ne12, + nb10, + nb11, + nb12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg); +} + +[[host_name("kernel_mul_mv_id_q8_0_f32")]] +kernel void kernel_mul_mv_id_q8_0_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q8_0_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q4_0_f32")]] +kernel void kernel_mul_mv_id_q4_0_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + mul_vec_q_n_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q4_1_f32")]] +kernel void kernel_mul_mv_id_q4_1_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + mul_vec_q_n_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q5_0_f32")]] +kernel void kernel_mul_mv_id_q5_0_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + mul_vec_q_n_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q5_1_f32")]] +kernel void kernel_mul_mv_id_q5_1_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + mul_vec_q_n_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q2_K_f32")]] +kernel void kernel_mul_mv_id_q2_K_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q2_K_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q3_K_f32")]] +kernel void kernel_mul_mv_id_q3_K_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q3_K_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q4_K_f32")]] +kernel void kernel_mul_mv_id_q4_K_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q4_K_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q5_K_f32")]] +kernel void kernel_mul_mv_id_q5_K_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q5_K_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} + +[[host_name("kernel_mul_mv_id_q6_K_f32")]] +kernel void kernel_mul_mv_id_q6_K_f32( + device const char * ids, + device const char * src1, + device uchar * dst, + constant int64_t & nbi1, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & nb1, + constant uint & r2, + constant uint & r3, + constant int & idx, + device const char * src00, + device const char * src01, + device const char * src02, + device const char * src03, + device const char * src04, + device const char * src05, + device const char * src06, + device const char * src07, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + device const char * src0[8] = {src00, src01, src02, src03, src04, src05, src06, src07}; + + const int64_t bid = tgpig.z/(ne12*ne13); + + tgpig.z = tgpig.z%(ne12*ne13); + + const int32_t id = ((device int32_t *) (ids + bid*nbi1))[idx]; + + kernel_mul_mv_q6_K_f32_impl( + src0[id], + (device const float *) (src1 + bid*nb11), + (device float *) ( dst + bid*nb1), + ne00, + ne01, + ne02, + ne10, + ne12, + ne0, + ne1, + r2, + r3, + tgpig, + tiisg, + sgitg); +} diff --git a/ggml-opencl.cpp b/ggml-opencl.cpp index 202bcb4853893..496f9cdca542d 100644 --- a/ggml-opencl.cpp +++ b/ggml-opencl.cpp @@ -1,20 +1,18 @@ +#include "ggml.h" #include "ggml-opencl.h" #include #include +#include +#include +#include +#include #include #include -#include #define CL_TARGET_OPENCL_VERSION 110 #include -#include -#include -#include - -#include "ggml.h" - #if defined(_MSC_VER) #pragma warning(disable: 4244 4267) // possible loss of data #endif diff --git a/ggml-quants.c b/ggml-quants.c index cf2860b8cbd59..0e8163a16b395 100644 --- a/ggml-quants.c +++ b/ggml-quants.c @@ -19,7 +19,7 @@ #ifdef __wasm_simd128__ #include #else -#ifdef __POWER9_VECTOR__ +#if defined(__POWER9_VECTOR__) || defined(__powerpc64__) #include #undef bool #define bool _Bool @@ -3114,7 +3114,7 @@ void ggml_vec_dot_q5_0_q8_0(const int n, float * restrict s, const void * restri size_t vl = __riscv_vsetvl_e8m1(qk/2); - // These tempory registers are for masking and shift operations + // These temporary registers are for masking and shift operations vuint32m2_t vt_1 = __riscv_vid_v_u32m2(vl); vuint32m2_t vt_2 = __riscv_vsll_vv_u32m2(__riscv_vmv_v_x_u32m2(1, vl), vt_1, vl); @@ -4757,7 +4757,7 @@ void ggml_vec_dot_q3_K_q8_K(const int n, float * restrict s, const void * restri vl = 16; - // retreive lane to multiply with scale + // retrieve lane to multiply with scale vint32m2_t aux0_0 = __riscv_vwmul_vx_i32m2(__riscv_vget_v_i16m2_i16m1(a0, 0), (scale[0]), vl); vint32m2_t aux0_1 = __riscv_vwmul_vx_i32m2(__riscv_vget_v_i16m2_i16m1(a0, 1), (scale[1]), vl); vint32m2_t aux1_0 = __riscv_vwmul_vx_i32m2(__riscv_vget_v_i16m2_i16m1(a1, 0), (scale[2]), vl); diff --git a/ggml.c b/ggml.c index 3202a517b7868..66658ff4b24ce 100644 --- a/ggml.c +++ b/ggml.c @@ -1,4 +1,4 @@ -#define _CRT_SECURE_NO_DEPRECATE // Disables ridiculous "unsafe" warnigns on Windows +#define _CRT_SECURE_NO_DEPRECATE // Disables ridiculous "unsafe" warnings on Windows #define _USE_MATH_DEFINES // For M_PI on MSVC #include "ggml-impl.h" @@ -33,7 +33,7 @@ // we should just be careful :) #pragma warning(disable: 4244 4267) -// disable POSIX deprecation warnigns +// disable POSIX deprecation warnings // these functions are never going away, anyway #pragma warning(disable: 4996) #endif @@ -233,24 +233,6 @@ inline static void * ggml_aligned_malloc(size_t size) { #define UNUSED GGML_UNUSED #define SWAP(x, y, T) do { T SWAP = x; x = y; y = SWAP; } while (0) -// -// tensor access macros -// - -#define GGML_TENSOR_UNARY_OP_LOCALS \ - GGML_TENSOR_LOCALS(int64_t, ne0, src0, ne) \ - GGML_TENSOR_LOCALS(size_t, nb0, src0, nb) \ - GGML_TENSOR_LOCALS(int64_t, ne, dst, ne) \ - GGML_TENSOR_LOCALS(size_t, nb, dst, nb) - -#define GGML_TENSOR_BINARY_OP_LOCALS \ - GGML_TENSOR_LOCALS(int64_t, ne0, src0, ne) \ - GGML_TENSOR_LOCALS(size_t, nb0, src0, nb) \ - GGML_TENSOR_LOCALS(int64_t, ne1, src1, ne) \ - GGML_TENSOR_LOCALS(size_t, nb1, src1, nb) \ - GGML_TENSOR_LOCALS(int64_t, ne, dst, ne) \ - GGML_TENSOR_LOCALS(size_t, nb, dst, nb) - #if defined(GGML_USE_ACCELERATE) #include #if defined(GGML_USE_CLBLAST) // allow usage of CLBlast alongside Accelerate functions @@ -1613,6 +1595,7 @@ static const char * GGML_OP_NAME[GGML_OP_COUNT] = { "GROUP_NORM", "MUL_MAT", + "MUL_MAT_ID", "OUT_PROD", "SCALE", @@ -1640,6 +1623,7 @@ static const char * GGML_OP_NAME[GGML_OP_COUNT] = { "POOL_1D", "POOL_2D", "UPSCALE", + "ARGSORT", "FLASH_ATTN", "FLASH_FF", @@ -1666,7 +1650,7 @@ static const char * GGML_OP_NAME[GGML_OP_COUNT] = { "CROSS_ENTROPY_LOSS_BACK", }; -static_assert(GGML_OP_COUNT == 68, "GGML_OP_COUNT != 68"); +static_assert(GGML_OP_COUNT == 70, "GGML_OP_COUNT != 70"); static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = { "none", @@ -1695,6 +1679,7 @@ static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = { "group_norm(x)", "X*Y", + "X[i]*Y", "X*Y", "x*v", @@ -1722,6 +1707,7 @@ static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = { "pool_1d(x)", "pool_2d(x)", "upscale(x)", + "argsort(x)", "flash_attn(x)", "flash_ff(x)", @@ -1748,15 +1734,33 @@ static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = { "cross_entropy_loss_back(x,y)", }; -static_assert(GGML_OP_COUNT == 68, "GGML_OP_COUNT != 68"); +static_assert(GGML_OP_COUNT == 70, "GGML_OP_COUNT != 70"); static_assert(GGML_OP_POOL_COUNT == 2, "GGML_OP_POOL_COUNT != 2"); + +static const char * GGML_UNARY_OP_NAME[GGML_UNARY_OP_COUNT] = { + "ABS", + "SGN", + "NEG", + "STEP", + "TANH", + "ELU", + "RELU", + "GELU", + "GELU_QUICK", + "SILU", + "LEAKY", +}; + +static_assert(GGML_UNARY_OP_COUNT == 11, "GGML_UNARY_OP_COUNT != 11"); + + static_assert(sizeof(struct ggml_object)%GGML_MEM_ALIGN == 0, "ggml_object size must be a multiple of GGML_MEM_ALIGN"); static_assert(sizeof(struct ggml_tensor)%GGML_MEM_ALIGN == 0, "ggml_tensor size must be a multiple of GGML_MEM_ALIGN"); // WARN: -// Mis-confguration can lead to problem that's hard to reason about: +// Mis-configuration can lead to problem that's hard to reason about: // * At best it crash or talks nosense. // * At worst it talks slightly difference but hard to perceive. // @@ -1771,6 +1775,7 @@ static void ggml_setup_op_has_task_pass(void) { p[GGML_OP_ACC ] = true; p[GGML_OP_MUL_MAT ] = true; + p[GGML_OP_MUL_MAT_ID ] = true; p[GGML_OP_OUT_PROD ] = true; p[GGML_OP_SET ] = true; p[GGML_OP_GET_ROWS_BACK ] = true; @@ -2023,6 +2028,20 @@ const char * ggml_op_symbol(enum ggml_op op) { return GGML_OP_SYMBOL[op]; } +const char * ggml_unary_op_name(enum ggml_unary_op op) { + return GGML_UNARY_OP_NAME[op]; +} + +const char * ggml_op_desc(const struct ggml_tensor * t) { + if (t->op == GGML_OP_UNARY) { + enum ggml_unary_op uop = ggml_get_unary_op(t); + return ggml_unary_op_name(uop); + } + else { + return ggml_op_name(t->op); + } +} + size_t ggml_element_size(const struct ggml_tensor * tensor) { return ggml_type_size(tensor->type); } @@ -3154,9 +3173,7 @@ static struct ggml_tensor * ggml_add_impl( struct ggml_tensor * a, struct ggml_tensor * b, bool inplace) { - // TODO: support less-strict constraint - // GGML_ASSERT(ggml_can_repeat(b, a)); - GGML_ASSERT(ggml_can_repeat_rows(b, a)); + GGML_ASSERT(ggml_can_repeat(b, a)); bool is_node = false; @@ -3371,9 +3388,7 @@ static struct ggml_tensor * ggml_mul_impl( struct ggml_tensor * a, struct ggml_tensor * b, bool inplace) { - // TODO: support less-strict constraint - // GGML_ASSERT(ggml_can_repeat(b, a)); - GGML_ASSERT(ggml_can_repeat_rows(b, a)); + GGML_ASSERT(ggml_can_repeat(b, a)); bool is_node = false; @@ -3418,7 +3433,7 @@ static struct ggml_tensor * ggml_div_impl( struct ggml_tensor * a, struct ggml_tensor * b, bool inplace) { - GGML_ASSERT(ggml_are_same_shape(a, b)); + GGML_ASSERT(ggml_can_repeat(b, a)); bool is_node = false; @@ -4056,6 +4071,51 @@ struct ggml_tensor * ggml_mul_mat( return result; } +// ggml_mul_mat_id + +struct ggml_tensor * ggml_mul_mat_id( + struct ggml_context * ctx, + struct ggml_tensor * const as[], + int n_as, + struct ggml_tensor * ids, + int id, + struct ggml_tensor * b) { + + GGML_ASSERT(ids->type == GGML_TYPE_I32); + GGML_ASSERT(ids->ne[2] == 1 && ids->ne[3] == 1); + GGML_ASSERT(ids->ne[1] == b->ne[1]); + GGML_ASSERT(ids->ne[2] == b->ne[2] && ids->ne[3] == b->ne[3]); + GGML_ASSERT(n_as > 0 && n_as <= GGML_MAX_SRC - 2); + GGML_ASSERT(id >= 0 && id < ids->ne[0]); + + bool is_node = false; + + if (as[0]->grad || b->grad) { + is_node = true; + } + + const int64_t ne[4] = { as[0]->ne[1], b->ne[1], b->ne[2], b->ne[3] }; + struct ggml_tensor * result = ggml_new_tensor(ctx, GGML_TYPE_F32, MAX(as[0]->n_dims, b->n_dims), ne); + + ggml_set_op_params_i32(result, 0, id); + ggml_set_op_params_i32(result, 1, n_as); + + result->op = GGML_OP_MUL_MAT_ID; + result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL; + result->src[0] = ids; + result->src[1] = b; + + for (int i = 0; i < n_as; i++) { + struct ggml_tensor * a = as[i]; + GGML_ASSERT(ggml_are_same_shape(as[0], a)); + GGML_ASSERT(ggml_can_mul_mat(a, b)); + GGML_ASSERT(!ggml_is_transposed(a)); + result->src[i + 2] = a; + } + + return result; +} + // ggml_out_prod struct ggml_tensor * ggml_out_prod( @@ -4209,7 +4269,7 @@ struct ggml_tensor * ggml_set_2d_inplace( struct ggml_tensor * b, size_t nb1, size_t offset) { - return ggml_set_impl(ctx, a, b, nb1, a->nb[2], a->nb[3], offset, false); + return ggml_set_impl(ctx, a, b, nb1, a->nb[2], a->nb[3], offset, true); } // ggml_cpy @@ -4673,7 +4733,9 @@ struct ggml_tensor * ggml_get_rows( struct ggml_context * ctx, struct ggml_tensor * a, struct ggml_tensor * b) { - GGML_ASSERT(ggml_is_matrix(a) && ggml_is_vector(b) && b->type == GGML_TYPE_I32); + GGML_ASSERT(a->ne[2] == b->ne[1]); + GGML_ASSERT(b->ne[3] == 1); + GGML_ASSERT(b->type == GGML_TYPE_I32); bool is_node = false; @@ -4683,7 +4745,7 @@ struct ggml_tensor * ggml_get_rows( // TODO: implement non F32 return //struct ggml_tensor * result = ggml_new_tensor_2d(ctx, a->type, a->ne[0], b->ne[0]); - struct ggml_tensor * result = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, a->ne[0], b->ne[0]); + struct ggml_tensor * result = ggml_new_tensor_4d(ctx, GGML_TYPE_F32, a->ne[0], b->ne[0], b->ne[1], b->ne[2]); result->op = GGML_OP_GET_ROWS; result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL; @@ -4826,7 +4888,17 @@ struct ggml_tensor * ggml_diag_mask_zero_inplace( static struct ggml_tensor * ggml_soft_max_impl( struct ggml_context * ctx, struct ggml_tensor * a, + struct ggml_tensor * mask, + float scale, bool inplace) { + GGML_ASSERT(ggml_is_contiguous(a)); + if (mask) { + GGML_ASSERT(ggml_is_contiguous(mask)); + GGML_ASSERT(mask->ne[2] == 1); + GGML_ASSERT(mask->ne[3] == 1); + GGML_ASSERT(ggml_can_repeat_rows(mask, a)); + } + bool is_node = false; if (a->grad) { @@ -4835,9 +4907,13 @@ static struct ggml_tensor * ggml_soft_max_impl( struct ggml_tensor * result = inplace ? ggml_view_tensor(ctx, a) : ggml_dup_tensor(ctx, a); + float params[] = { scale }; + ggml_set_op_params(result, params, sizeof(params)); + result->op = GGML_OP_SOFT_MAX; result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL; result->src[0] = a; + result->src[1] = mask; return result; } @@ -4845,13 +4921,21 @@ static struct ggml_tensor * ggml_soft_max_impl( struct ggml_tensor * ggml_soft_max( struct ggml_context * ctx, struct ggml_tensor * a) { - return ggml_soft_max_impl(ctx, a, false); + return ggml_soft_max_impl(ctx, a, NULL, 1.0f, false); } struct ggml_tensor * ggml_soft_max_inplace( struct ggml_context * ctx, struct ggml_tensor * a) { - return ggml_soft_max_impl(ctx, a, true); + return ggml_soft_max_impl(ctx, a, NULL, 1.0f, true); +} + +struct ggml_tensor * ggml_soft_max_ext( + struct ggml_context * ctx, + struct ggml_tensor * a, + struct ggml_tensor * mask, + float scale) { + return ggml_soft_max_impl(ctx, a, mask, scale, false); } // ggml_soft_max_back @@ -5446,6 +5530,43 @@ struct ggml_tensor * ggml_upscale( return ggml_upscale_impl(ctx, a, scale_factor); } +// ggml_argsort + +struct ggml_tensor * ggml_argsort( + struct ggml_context * ctx, + struct ggml_tensor * a, + enum ggml_sort_order order) { + bool is_node = false; + + struct ggml_tensor * result = ggml_new_tensor(ctx, GGML_TYPE_I32, a->n_dims, a->ne); + + ggml_set_op_params_i32(result, 0, (int32_t) order); + + result->op = GGML_OP_ARGSORT; + result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL; + result->src[0] = a; + + return result; +} + +// ggml_top_k + +struct ggml_tensor * ggml_top_k( + struct ggml_context * ctx, + struct ggml_tensor * a, + int k) { + GGML_ASSERT(a->ne[0] >= k); + + struct ggml_tensor * result = ggml_argsort(ctx, a, GGML_SORT_DESC); + + result = ggml_view_4d(ctx, result, + k, result->ne[1], result->ne[2], result->ne[3], + result->nb[1], result->nb[2], result->nb[3], + 0); + + return result; +} + // ggml_flash_attn struct ggml_tensor * ggml_flash_attn( @@ -6805,7 +6926,7 @@ static void ggml_compute_forward_add_f32( const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst) { - GGML_ASSERT(ggml_can_repeat_rows(src1, src0) && ggml_are_same_shape(src0, dst)); + GGML_ASSERT(ggml_can_repeat(src1, src0) && ggml_are_same_shape(src0, dst)); if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { return; @@ -6838,16 +6959,19 @@ static void ggml_compute_forward_add_f32( const int64_t i13 = i03 % ne13; const int64_t i12 = i02 % ne12; const int64_t i11 = i01 % ne11; + const int64_t nr0 = ne00 / ne10; float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11); + for (int64_t r = 0; r < nr0; ++r) { #ifdef GGML_USE_ACCELERATE - vDSP_vadd(src0_ptr, 1, src1_ptr, 1, dst_ptr, 1, ne00); + vDSP_vadd(src0_ptr + r*ne10, 1, src1_ptr, 1, dst_ptr + r*ne10, 1, ne10); #else - ggml_vec_add_f32(ne00, dst_ptr, src0_ptr, src1_ptr); + ggml_vec_add_f32(ne10, dst_ptr + r*ne10, src0_ptr + r*ne10, src1_ptr); #endif + } } } else { // src1 is not contiguous @@ -6864,8 +6988,9 @@ static void ggml_compute_forward_add_f32( float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); - for (int i0 = 0; i0 < ne0; i0++) { - float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i0*nb10); + for (int64_t i0 = 0; i0 < ne0; ++i0) { + const int64_t i10 = i0 % ne10; + float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i10*nb10); dst_ptr[i0] = src0_ptr[i0] + *src1_ptr; } @@ -7399,7 +7524,7 @@ static void ggml_compute_forward_acc_f32( GGML_ASSERT(ggml_is_contiguous(dst) && ggml_is_contiguous(src0)); // view src0 and dst with these strides and data offset inbytes during acc - // nb0 is implicitely element_size because src0 and dst are contiguous + // nb0 is implicitly element_size because src0 and dst are contiguous size_t nb1 = ((int32_t *) dst->op_params)[0]; size_t nb2 = ((int32_t *) dst->op_params)[1]; size_t nb3 = ((int32_t *) dst->op_params)[2]; @@ -7585,7 +7710,7 @@ static void ggml_compute_forward_mul_f32( const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst) { - GGML_ASSERT(ggml_can_repeat_rows(src1, src0) && ggml_are_same_shape(src0, dst)); + GGML_ASSERT(ggml_can_repeat(src1, src0) && ggml_are_same_shape(src0, dst)); if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { return; @@ -7608,7 +7733,6 @@ static void ggml_compute_forward_mul_f32( GGML_ASSERT( nb0 == sizeof(float)); GGML_ASSERT(nb00 == sizeof(float)); - GGML_ASSERT(ne00 == ne10); if (nb10 == sizeof(float)) { for (int64_t ir = ith; ir < nr; ir += nth) { @@ -7620,20 +7744,21 @@ static void ggml_compute_forward_mul_f32( const int64_t i13 = i03 % ne13; const int64_t i12 = i02 % ne12; const int64_t i11 = i01 % ne11; + const int64_t nr0 = ne00 / ne10; float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11); + for (int64_t r = 0 ; r < nr0; ++r) { #ifdef GGML_USE_ACCELERATE - UNUSED(ggml_vec_mul_f32); + UNUSED(ggml_vec_mul_f32); - vDSP_vmul( src0_ptr, 1, src1_ptr, 1, dst_ptr, 1, ne00); + vDSP_vmul(src0_ptr + r*ne10, 1, src1_ptr, 1, dst_ptr + r*ne10, 1, ne10); #else - ggml_vec_mul_f32(ne00, dst_ptr, src0_ptr, src1_ptr); + ggml_vec_mul_f32(ne10, dst_ptr + r*ne10, src0_ptr + r*ne10, src1_ptr); #endif - // } - // } + } } } else { // src1 is not contiguous @@ -7651,8 +7776,9 @@ static void ggml_compute_forward_mul_f32( float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); - for (int64_t i0 = 0; i0 < ne00; i0++) { - float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i0*nb10); + for (int64_t i0 = 0; i0 < ne00; ++i0) { + const int64_t i10 = i0 % ne10; + float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i10*nb10); dst_ptr[i0] = src0_ptr[i0] * (*src1_ptr); } @@ -7686,14 +7812,16 @@ static void ggml_compute_forward_div_f32( const struct ggml_tensor * src0, const struct ggml_tensor * src1, struct ggml_tensor * dst) { - assert(params->ith == 0); - assert(ggml_are_same_shape(src0, src1) && ggml_are_same_shape(src0, dst)); + GGML_ASSERT(ggml_can_repeat(src1, src0) && ggml_are_same_shape(src0, dst)); if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { return; } - const int nr = ggml_nrows(src0); + const int ith = params->ith; + const int nth = params->nth; + + const int64_t nr = ggml_nrows(src0); GGML_TENSOR_BINARY_OP_LOCALS @@ -7701,41 +7829,50 @@ static void ggml_compute_forward_div_f32( GGML_ASSERT(nb00 == sizeof(float)); if (nb10 == sizeof(float)) { - for (int ir = 0; ir < nr; ++ir) { - // src0, src1 and dst are same shape => same indices - const int i3 = ir/(ne2*ne1); - const int i2 = (ir - i3*ne2*ne1)/ne1; - const int i1 = (ir - i3*ne2*ne1 - i2*ne1); + for (int64_t ir = ith; ir < nr; ir += nth) { + // src0 and dst are same shape => same indices + const int64_t i03 = ir/(ne02*ne01); + const int64_t i02 = (ir - i03*ne02*ne01)/ne01; + const int64_t i01 = (ir - i03*ne02*ne01 - i02*ne01); + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + const int64_t nr0 = ne00 / ne10; + + float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); + float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); + float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11); + for (int64_t r = 0; r < nr0; ++r) { #ifdef GGML_USE_ACCELERATE - UNUSED(ggml_vec_div_f32); + UNUSED(ggml_vec_div_f32); - vDSP_vdiv( - (float *) ((char *) src1->data + i3*nb13 + i2*nb12 + i1*nb11), 1, - (float *) ((char *) src0->data + i3*nb03 + i2*nb02 + i1*nb01), 1, - (float *) ((char *) dst->data + i3*nb3 + i2*nb2 + i1*nb1 ), 1, - ne0); + vDSP_vdiv(src1_ptr, 1, src0_ptr + r*ne10, 1, dst_ptr + r*ne10, 1, ne10); #else - ggml_vec_div_f32(ne0, - (float *) ((char *) dst->data + i3*nb3 + i2*nb2 + i1*nb1 ), - (float *) ((char *) src0->data + i3*nb03 + i2*nb02 + i1*nb01), - (float *) ((char *) src1->data + i3*nb13 + i2*nb12 + i1*nb11)); + ggml_vec_div_f32(ne10, dst_ptr + r*ne10, src0_ptr + r*ne10, src1_ptr); #endif - // } - // } + } } } else { // src1 is not contiguous - for (int ir = 0; ir < nr; ++ir) { - // src0, src1 and dst are same shape => same indices - const int i3 = ir/(ne2*ne1); - const int i2 = (ir - i3*ne2*ne1)/ne1; - const int i1 = (ir - i3*ne2*ne1 - i2*ne1); + for (int64_t ir = ith; ir < nr; ir += nth) { + // src0 and dst are same shape => same indices + // src1 is broadcastable across src0 and dst in i1, i2, i3 + const int64_t i03 = ir/(ne02*ne01); + const int64_t i02 = (ir - i03*ne02*ne01)/ne01; + const int64_t i01 = (ir - i03*ne02*ne01 - i02*ne01); - float * dst_ptr = (float *) ((char *) dst->data + i3*nb3 + i2*nb2 + i1*nb1 ); - float * src0_ptr = (float *) ((char *) src0->data + i3*nb03 + i2*nb02 + i1*nb01); - for (int i0 = 0; i0 < ne0; i0++) { - float * src1_ptr = (float *) ((char *) src1->data + i3*nb13 + i2*nb12 + i1*nb11 + i0*nb10); + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 ); + float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01); + + for (int64_t i0 = 0; i0 < ne00; ++i0) { + const int64_t i10 = i0 % ne10; + float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i10*nb10); dst_ptr[i0] = src0_ptr[i0] / (*src1_ptr); } @@ -8181,7 +8318,7 @@ static void ggml_compute_forward_repeat_f16( return; } - GGML_TENSOR_UNARY_OP_LOCALS; + GGML_TENSOR_UNARY_OP_LOCALS // guaranteed to be an integer due to the check in ggml_can_repeat const int nr0 = (int)(ne0/ne00); @@ -8326,6 +8463,7 @@ static void ggml_compute_forward_concat_f32( GGML_ASSERT(src0->nb[0] == sizeof(float)); const int ith = params->ith; + const int nth = params->nth; GGML_TENSOR_BINARY_OP_LOCALS @@ -8335,7 +8473,7 @@ static void ggml_compute_forward_concat_f32( GGML_ASSERT(nb10 == sizeof(float)); for (int i3 = 0; i3 < ne3; i3++) { - for (int i2 = ith; i2 < ne2; i2++) { + for (int i2 = ith; i2 < ne2; i2 += nth) { if (i2 < ne02) { // src0 for (int i1 = 0; i1 < ne1; i1++) { for (int i0 = 0; i0 < ne0; i0++) { @@ -9370,10 +9508,13 @@ static bool ggml_compute_forward_mul_mat_use_blas( const int64_t ne0 = dst->ne[0]; const int64_t ne1 = dst->ne[1]; + // NOTE: with GGML_OP_MUL_MAT_ID we don't want to go through the BLAS branch because it will dequantize (to_float) + // all the experts for each batch element and the processing would become incredibly slow // TODO: find the optimal values for these - if (ggml_is_contiguous(src0) && + if (dst->op != GGML_OP_MUL_MAT_ID && + ggml_is_contiguous(src0) && ggml_is_contiguous(src1) && - src0->type == GGML_TYPE_F32 && + //src0->type == GGML_TYPE_F32 && src1->type == GGML_TYPE_F32 && (ne0 >= 32 && ne1 >= 32 && ne10 >= 32)) { @@ -9385,11 +9526,16 @@ static bool ggml_compute_forward_mul_mat_use_blas( } #endif +// off1 = offset in i11 and i1 +// cne1 = ne11 and ne1 +// in a normal matrix multiplication, off1 = 0 and cne1 = ne1 +// during GGML_TASK_INIT, the full src1 is converted regardless of off1 and cne1 static void ggml_compute_forward_mul_mat( const struct ggml_compute_params * params, const struct ggml_tensor * src0, const struct ggml_tensor * src1, - struct ggml_tensor * dst) { + struct ggml_tensor * dst, + int64_t off1, int64_t cne1) { int64_t t0 = ggml_perf_time_us(); UNUSED(t0); @@ -9457,10 +9603,9 @@ static void ggml_compute_forward_mul_mat( const int64_t i03 = i13/r3; const int64_t i02 = i12/r2; - const void * x = (char *) src0->data + i02*nb02 + i03*nb03; - const float * y = (float *) ((char *) src1->data + i12*nb12 + i13*nb13); - - float * d = (float *) ((char *) dst->data + i12*nb2 + i13*nb3); + const void * x = (char *) src0->data + i02*nb02 + i03*nb03; + const float * y = (float *) ((char *) src1->data + off1*nb11 + i12*nb12 + i13*nb13); + float * d = (float *) ((char *) dst->data + off1*nb1 + i12*nb2 + i13*nb3); if (type != GGML_TYPE_F32) { float * const wdata = params->wdata; @@ -9477,10 +9622,10 @@ static void ggml_compute_forward_mul_mat( } cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans, - ne11, ne01, ne10, - 1.0f, y, ne10, - x, ne00, - 0.0f, d, ne01); + cne1, ne01, ne10, + 1.0f, y, ne10, + x, ne00, + 0.0f, d, ne01); } } @@ -9495,6 +9640,9 @@ static void ggml_compute_forward_mul_mat( char * wdata = params->wdata; const size_t row_size = ne10*ggml_type_size(vec_dot_type)/ggml_blck_size(vec_dot_type); + assert(params->wsize >= ne11*ne12*ne13*row_size); + assert(src1->type == GGML_TYPE_F32); + for (int64_t i13 = 0; i13 < ne13; ++i13) { for (int64_t i12 = 0; i12 < ne12; ++i12) { for (int64_t i11 = 0; i11 < ne11; ++i11) { @@ -9516,7 +9664,7 @@ static void ggml_compute_forward_mul_mat( const size_t row_size = ne10*ggml_type_size(vec_dot_type)/ggml_blck_size(vec_dot_type); const int64_t nr0 = ne01; // src0 rows - const int64_t nr1 = ne11*ne12*ne13; // src1 rows + const int64_t nr1 = cne1*ne12*ne13; // src1 rows //printf("nr0 = %lld, nr1 = %lld\n", nr0, nr1); @@ -9558,9 +9706,9 @@ static void ggml_compute_forward_mul_mat( for (int64_t iir1 = ir110; iir1 < ir111; iir1 += blck_1) { for (int64_t iir0 = ir010; iir0 < ir011; iir0 += blck_0) { for (int64_t ir1 = iir1; ir1 < iir1 + blck_1 && ir1 < ir111; ++ir1) { - const int64_t i13 = (ir1/(ne12*ne11)); - const int64_t i12 = (ir1 - i13*ne12*ne11)/ne11; - const int64_t i11 = (ir1 - i13*ne12*ne11 - i12*ne11); + const int64_t i13 = (ir1/(ne12*cne1)); + const int64_t i12 = (ir1 - i13*ne12*cne1)/cne1; + const int64_t i11 = (ir1 - i13*ne12*cne1 - i12*cne1) + off1; // broadcast src0 into src1 const int64_t i03 = i13/r3; @@ -9596,6 +9744,34 @@ static void ggml_compute_forward_mul_mat( } } +// ggml_compute_forward_mul_mat_id + +static void ggml_compute_forward_mul_mat_id( + const struct ggml_compute_params * params, + const struct ggml_tensor * src0, + const struct ggml_tensor * src1, + struct ggml_tensor * dst) { + + if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { + // during GGML_TASK_INIT the entire src1 is converted to vec_dot_type + ggml_compute_forward_mul_mat(params, dst->src[2], src1, dst, 0, dst->ne[1]); + return; + } + + const struct ggml_tensor * ids = src0; + const int id = ggml_get_op_params_i32(dst, 0); + const int n_as = ggml_get_op_params_i32(dst, 1); + + for (int64_t i01 = 0; i01 < ids->ne[1]; i01++) { + const int32_t row_id = *(const int32_t *) ((const char *) ids->data + i01*ids->nb[1] + id*ids->nb[0]); + + GGML_ASSERT(row_id >= 0 && row_id < n_as); + + const struct ggml_tensor * src0_row = dst->src[row_id + 2]; + ggml_compute_forward_mul_mat(params, src0_row, src1, dst, i01, 1); + } +} + // ggml_compute_forward_out_prod static void ggml_compute_forward_out_prod_f32( @@ -9611,10 +9787,12 @@ static void ggml_compute_forward_out_prod_f32( const int ith = params->ith; const int nth = params->nth; + GGML_ASSERT(ne0 == ne00); + GGML_ASSERT(ne1 == ne10); + GGML_ASSERT(ne2 == ne02); GGML_ASSERT(ne02 == ne12); - GGML_ASSERT(ne03 == ne13); - GGML_ASSERT(ne2 == ne12); GGML_ASSERT(ne3 == ne13); + GGML_ASSERT(ne03 == ne13); // we don't support permuted src0 or src1 GGML_ASSERT(nb00 == sizeof(float)); @@ -9625,18 +9803,25 @@ static void ggml_compute_forward_out_prod_f32( // GGML_ASSERT(nb1 <= nb2); // GGML_ASSERT(nb2 <= nb3); - GGML_ASSERT(ne0 == ne00); - GGML_ASSERT(ne1 == ne10); - GGML_ASSERT(ne2 == ne02); - GGML_ASSERT(ne3 == ne03); - // nb01 >= nb00 - src0 is not transposed // compute by src0 rows // TODO: #if defined(GGML_USE_CUBLAS) ggml_cuda_out_prod - // TODO: #if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) || defined(GGML_USE_CLBLAST) + // TODO: #if defined(GGML_USE_CLBLAST) + +#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) + bool use_blas = ggml_is_matrix(src0) && + ggml_is_matrix(src1) && + ggml_is_contiguous(src0) && + (ggml_is_contiguous(src1) || ggml_is_transposed(src1)); +#endif if (params->type == GGML_TASK_INIT) { +#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) // gemm beta will zero dst + if (use_blas) { + return; + } +#endif ggml_vec_set_f32(ne0*ne1*ne2*ne3, dst->data, 0); return; } @@ -9645,6 +9830,50 @@ static void ggml_compute_forward_out_prod_f32( return; } +#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) + if (use_blas) { + if (params->ith != 0) { // All threads other than the first do no work. + return; + } + // Arguments to ggml_compute_forward_out_prod (expressed as major,minor) + // src0: (k,n) + // src1: (k,m) + // dst: (m,n) + // + // Arguments to sgemm (see https://github.com/Reference-LAPACK/lapack/blob/master/BLAS/SRC/sgemm.f) + // Also expressed as (major,minor) + // a: (m,k): so src1 transposed + // b: (k,n): so src0 + // c: (m,n) + // + // However, if ggml_is_transposed(src1) is true, then + // src1->data already contains a transposed version, so sgemm mustn't + // transpose it further. + + int n = src0->ne[0]; + int k = src0->ne[1]; + int m = src1->ne[0]; + + int transposeA, lda; + + if (!ggml_is_transposed(src1)) { + transposeA = CblasTrans; + lda = m; + } else { + transposeA = CblasNoTrans; + lda = k; + } + + float * a = (float *) ((char *) src1->data); + float * b = (float *) ((char *) src0->data); + float * c = (float *) ((char *) dst->data); + + cblas_sgemm(CblasRowMajor, transposeA, CblasNoTrans, m, n, k, 1.0, a, lda, b, n, 0.0, c, n); + + return; + } +#endif + // dst[:,:,:,:] = 0 // for i2,i3: // for i1: @@ -9952,7 +10181,7 @@ static void ggml_compute_forward_set_f32( GGML_ASSERT(ggml_is_contiguous(dst) && ggml_is_contiguous(src0)); // view src0 and dst with these strides and data offset inbytes during set - // nb0 is implicitely element_size because src0 and dst are contiguous + // nb0 is implicitly element_size because src0 and dst are contiguous size_t nb1 = ((int32_t *) dst->op_params)[0]; size_t nb2 = ((int32_t *) dst->op_params)[1]; size_t nb3 = ((int32_t *) dst->op_params)[2]; @@ -10116,21 +10345,30 @@ static void ggml_compute_forward_get_rows_q( return; } - const int nc = src0->ne[0]; - const int nr = ggml_nelements(src1); + GGML_TENSOR_BINARY_OP_LOCALS + + const int64_t nc = ne00; + const int64_t nr = ggml_nelements(src1); GGML_UNUSED(nr); + const enum ggml_type type = src0->type; ggml_to_float_t const dequantize_row_q = type_traits[type].to_float; - assert( dst->ne[0] == nc); - assert( dst->ne[1] == nr); - assert(src0->nb[0] == ggml_type_size(type)); + assert(ne0 == nc); + assert(ne02 == ne11); + assert(nb00 == ggml_type_size(type)); + assert(ggml_nrows(dst) == nr); - for (int i = 0; i < nr; ++i) { - const int r = ((int32_t *) src1->data)[i]; + // TODO: multi-thread + for (int64_t i12 = 0; i12 < ne12; ++i12) { + for (int64_t i11 = 0; i11 < ne11; ++i11) { + for (int64_t i10 = 0; i10 < ne10; ++i10) { + const int64_t i01 = *(int32_t *) ((char *) src1->data + i10*nb10 + i11*nb11 + i12*nb12); - dequantize_row_q( - (const void *) ((char *) src0->data + r*src0->nb[1]), - (float *) ((char *) dst->data + i*dst->nb[1]), nc); + dequantize_row_q( + (const void *) ((char *) src0->data + i01*nb01 + i11*nb02 + i12*nb03), + (float *) ((char *) dst->data + i10*nb1 + i11*nb2 + i12*nb3), nc); + } + } } } @@ -10145,19 +10383,26 @@ static void ggml_compute_forward_get_rows_f16( return; } - const int nc = src0->ne[0]; - const int nr = ggml_nelements(src1); + GGML_TENSOR_BINARY_OP_LOCALS - assert( dst->ne[0] == nc); - assert( dst->ne[1] == nr); - assert(src0->nb[0] == sizeof(ggml_fp16_t)); + const int64_t nc = ne00; + const int64_t nr = ggml_nelements(src1); GGML_UNUSED(nr); - for (int i = 0; i < nr; ++i) { - const int r = ((int32_t *) src1->data)[i]; + assert(ne0 == nc); + assert(ne02 == ne11); + assert(nb00 == sizeof(ggml_fp16_t)); + assert(ggml_nrows(dst) == nr); - for (int j = 0; j < nc; ++j) { - ggml_fp16_t v = ((ggml_fp16_t *) ((char *) src0->data + r*src0->nb[1]))[j]; - ((float *) ((char *) dst->data + i*dst->nb[1]))[j] = GGML_FP16_TO_FP32(v); + // TODO: multi-thread + for (int64_t i12 = 0; i12 < ne12; ++i12) { + for (int64_t i11 = 0; i11 < ne11; ++i11) { + for (int64_t i10 = 0; i10 < ne10; ++i10) { + const int64_t i01 = *(int32_t *) ((char *) src1->data + i10*nb10 + i11*nb11 + i12*nb12); + + ggml_fp16_to_fp32_row( + (const void *) ((char *) src0->data + i01*nb01 + i11*nb02 + i12*nb03), + (float *) ((char *) dst->data + i10*nb1 + i11*nb2 + i12*nb3), nc); + } } } } @@ -10173,19 +10418,27 @@ static void ggml_compute_forward_get_rows_f32( return; } - const int nc = src0->ne[0]; - const int nr = ggml_nelements(src1); + GGML_TENSOR_BINARY_OP_LOCALS - assert( dst->ne[0] == nc); - assert( dst->ne[1] == nr); - assert(src0->nb[0] == sizeof(float)); + const int64_t nc = ne00; + const int64_t nr = ggml_nelements(src1); GGML_UNUSED(nr); - for (int i = 0; i < nr; ++i) { - const int r = ((int32_t *) src1->data)[i]; + assert(ne0 == nc); + assert(ne02 == ne11); + assert(nb00 == sizeof(float)); + assert(ggml_nrows(dst) == nr); - ggml_vec_cpy_f32(nc, - (float *) ((char *) dst->data + i*dst->nb[1]), - (float *) ((char *) src0->data + r*src0->nb[1])); + // TODO: multi-thread + for (int64_t i12 = 0; i12 < ne12; ++i12) { + for (int64_t i11 = 0; i11 < ne11; ++i11) { + for (int64_t i10 = 0; i10 < ne10; ++i10) { + const int64_t i01 = *(int32_t *) ((char *) src1->data + i10*nb10 + i11*nb11 + i12*nb12); + + ggml_vec_cpy_f32(nc, + (float *) ((char *) dst->data + i10*nb1 + i11*nb2 + i12*nb3), + (float *) ((char *) src0->data + i01*nb01 + i11*nb02 + i12*nb03)); + } + } } } @@ -10498,20 +10751,25 @@ static void ggml_compute_forward_diag_mask_zero( static void ggml_compute_forward_soft_max_f32( const struct ggml_compute_params * params, const struct ggml_tensor * src0, - struct ggml_tensor * dst) { - GGML_ASSERT(ggml_is_contiguous(src0)); - GGML_ASSERT(ggml_is_contiguous(dst)); - GGML_ASSERT(ggml_are_same_shape(src0, dst)); + const struct ggml_tensor * src1, + struct ggml_tensor * dst) { + assert(ggml_is_contiguous(dst)); + assert(ggml_are_same_shape(src0, dst)); if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { return; } + float scale = 1.0f; + memcpy(&scale, (float *) dst->op_params + 0, sizeof(float)); + // TODO: handle transposed/permuted matrices const int ith = params->ith; const int nth = params->nth; + const int64_t ne11 = src1 ? src1->ne[1] : 1; + const int nc = src0->ne[0]; const int nr = ggml_nrows(src0); @@ -10522,29 +10780,40 @@ static void ggml_compute_forward_soft_max_f32( const int ir0 = dr*ith; const int ir1 = MIN(ir0 + dr, nr); + float * wp = (float *) params->wdata + (nc + CACHE_LINE_SIZE_F32) * ith; + for (int i1 = ir0; i1 < ir1; i1++) { - float *sp = (float *)((char *) src0->data + i1*src0->nb[1]); - float *dp = (float *)((char *) dst->data + i1*dst->nb[1]); + float * sp = (float *)((char *) src0->data + i1*src0->nb[1]); + float * dp = (float *)((char *) dst->data + i1*dst->nb[1]); + + // broadcast the mask across rows + float * mp = src1 ? (float *)((char *) src1->data + (i1%ne11)*src1->nb[1]) : NULL; + + ggml_vec_cpy_f32 (nc, wp, sp); + ggml_vec_scale_f32(nc, wp, scale); + if (mp) { + ggml_vec_acc_f32(nc, wp, mp); + } #ifndef NDEBUG for (int i = 0; i < nc; ++i) { //printf("p[%d] = %f\n", i, p[i]); - assert(!isnan(sp[i])); + assert(!isnan(wp[i])); } #endif float max = -INFINITY; - ggml_vec_max_f32(nc, &max, sp); + ggml_vec_max_f32(nc, &max, wp); ggml_float sum = 0.0; uint16_t scvt; for (int i = 0; i < nc; i++) { - if (sp[i] == -INFINITY) { + if (wp[i] == -INFINITY) { dp[i] = 0.0f; } else { - // const float val = (sp[i] == -INFINITY) ? 0.0 : exp(sp[i] - max); - ggml_fp16_t s = GGML_FP32_TO_FP16(sp[i] - max); + // const float val = (wp[i] == -INFINITY) ? 0.0 : exp(wp[i] - max); + ggml_fp16_t s = GGML_FP32_TO_FP16(wp[i] - max); memcpy(&scvt, &s, sizeof(scvt)); const float val = GGML_FP16_TO_FP32(ggml_table_exp_f16[scvt]); sum += (ggml_float)val; @@ -10569,11 +10838,12 @@ static void ggml_compute_forward_soft_max_f32( static void ggml_compute_forward_soft_max( const struct ggml_compute_params * params, const struct ggml_tensor * src0, - struct ggml_tensor * dst) { + const struct ggml_tensor * src1, + struct ggml_tensor * dst) { switch (src0->type) { case GGML_TYPE_F32: { - ggml_compute_forward_soft_max_f32(params, src0, dst); + ggml_compute_forward_soft_max_f32(params, src0, src1, dst); } break; default: { @@ -11929,6 +12199,67 @@ static void ggml_compute_forward_upscale( } } +// ggml_compute_forward_argsort + +static void ggml_compute_forward_argsort_f32( + const struct ggml_compute_params * params, + const struct ggml_tensor * src0, + struct ggml_tensor * dst) { + + if (params->type == GGML_TASK_INIT || params->type == GGML_TASK_FINALIZE) { + return; + } + + GGML_TENSOR_UNARY_OP_LOCALS + + GGML_ASSERT(nb0 == sizeof(float)); + + const int ith = params->ith; + const int nth = params->nth; + + const int64_t nr = ggml_nrows(src0); + + enum ggml_sort_order order = (enum ggml_sort_order) ggml_get_op_params_i32(dst, 0); + + for (int64_t i = ith; i < nr; i += nth) { + int32_t * dst_data = (int32_t *)((char *) dst->data + i*nb1); + const float * src_data = (float *)((char *) src0->data + i*nb01); + + for (int64_t j = 0; j < ne0; j++) { + dst_data[j] = j; + } + + // C doesn't have a functional sort, so we do a bubble sort instead + for (int64_t j = 0; j < ne0; j++) { + for (int64_t k = j + 1; k < ne0; k++) { + if ((order == GGML_SORT_ASC && src_data[dst_data[j]] > src_data[dst_data[k]]) || + (order == GGML_SORT_DESC && src_data[dst_data[j]] < src_data[dst_data[k]])) { + int32_t tmp = dst_data[j]; + dst_data[j] = dst_data[k]; + dst_data[k] = tmp; + } + } + } + } +} + +static void ggml_compute_forward_argsort( + const struct ggml_compute_params * params, + const struct ggml_tensor * src0, + struct ggml_tensor * dst) { + + switch (src0->type) { + case GGML_TYPE_F32: + { + ggml_compute_forward_argsort_f32(params, src0, dst); + } break; + default: + { + GGML_ASSERT(false); + } break; + } +} + // ggml_compute_forward_flash_attn static void ggml_compute_forward_flash_attn_f32( @@ -13750,7 +14081,11 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm } break; case GGML_OP_MUL_MAT: { - ggml_compute_forward_mul_mat(params, tensor->src[0], tensor->src[1], tensor); + ggml_compute_forward_mul_mat(params, tensor->src[0], tensor->src[1], tensor, 0, tensor->ne[1]); + } break; + case GGML_OP_MUL_MAT_ID: + { + ggml_compute_forward_mul_mat_id(params, tensor->src[0], tensor->src[1], tensor); } break; case GGML_OP_OUT_PROD: { @@ -13810,7 +14145,7 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm } break; case GGML_OP_SOFT_MAX: { - ggml_compute_forward_soft_max(params, tensor->src[0], tensor); + ggml_compute_forward_soft_max(params, tensor->src[0], tensor->src[1], tensor); } break; case GGML_OP_SOFT_MAX_BACK: { @@ -13856,6 +14191,10 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm { ggml_compute_forward_upscale(params, tensor->src[0], tensor); } break; + case GGML_OP_ARGSORT: + { + ggml_compute_forward_argsort(params, tensor->src[0], tensor); + } break; case GGML_OP_FLASH_ATTN: { const int32_t t = ggml_get_op_params_i32(tensor, 0); @@ -14180,7 +14519,7 @@ void ggml_build_backward_gradient_checkpointing( // insert new tensors recomputing src, reusing already made replacements, // remember replacements: remember new tensors with mapping from corresponding gf nodes // recurse for input tensors, - // unless (i.e. terminating when) input tensors are replacments (like checkpoints) + // unless (i.e. terminating when) input tensors are replacements (like checkpoints) node->src[k] = ggml_recompute_graph_node(ctx, gf, replacements, node->src[k]); } // insert rewritten backward node with replacements made into resulting backward graph gb @@ -14506,6 +14845,10 @@ static void ggml_compute_backward(struct ggml_context * ctx, struct ggml_tensor zero_table); } } break; + case GGML_OP_MUL_MAT_ID: + { + GGML_ASSERT(false); // TODO: not implemented + } break; case GGML_OP_OUT_PROD: { GGML_ASSERT(false); // TODO: not implemented @@ -14844,6 +15187,10 @@ static void ggml_compute_backward(struct ggml_context * ctx, struct ggml_tensor { GGML_ASSERT(false); // TODO: not implemented } break; + case GGML_OP_ARGSORT: + { + GGML_ASSERT(false); // TODO: not implemented + } break; case GGML_OP_FLASH_ATTN: { struct ggml_tensor * flash_grad = NULL; @@ -15204,12 +15551,8 @@ struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx) { return ggml_new_graph_custom(ctx, GGML_DEFAULT_GRAPH_SIZE, false); } -struct ggml_cgraph * ggml_graph_view(struct ggml_context * ctx, struct ggml_cgraph * cgraph0, int i0, int i1) { - const size_t obj_size = sizeof(struct ggml_cgraph); - struct ggml_object * obj = ggml_new_object(ctx, GGML_OBJECT_GRAPH, obj_size); - struct ggml_cgraph * cgraph = (struct ggml_cgraph *) ((char *) ctx->mem_buffer + obj->offs); - - *cgraph = (struct ggml_cgraph) { +struct ggml_cgraph ggml_graph_view(struct ggml_cgraph * cgraph0, int i0, int i1) { + struct ggml_cgraph cgraph = { /*.size =*/ 0, /*.n_nodes =*/ i1 - i0, /*.n_leafs =*/ 0, @@ -15444,7 +15787,6 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { n_tasks = n_threads; } break; case GGML_OP_SUB: - case GGML_OP_DIV: case GGML_OP_SQR: case GGML_OP_SQRT: case GGML_OP_LOG: @@ -15477,10 +15819,13 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { { n_tasks = n_threads; } break; + default: + GGML_ASSERT(false); } break; case GGML_OP_SILU_BACK: case GGML_OP_MUL: + case GGML_OP_DIV: case GGML_OP_NORM: case GGML_OP_RMS_NORM: case GGML_OP_RMS_NORM_BACK: @@ -15518,6 +15863,11 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { } #endif } break; + case GGML_OP_MUL_MAT_ID: + { + // FIXME: blas + n_tasks = n_threads; + } break; case GGML_OP_OUT_PROD: { n_tasks = n_threads; @@ -15537,7 +15887,6 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { } break; case GGML_OP_DIAG_MASK_ZERO: case GGML_OP_DIAG_MASK_INF: - case GGML_OP_SOFT_MAX: case GGML_OP_SOFT_MAX_BACK: case GGML_OP_ROPE: case GGML_OP_ROPE_BACK: @@ -15553,6 +15902,10 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { { n_tasks = 1; //TODO } break; + case GGML_OP_SOFT_MAX: + { + n_tasks = MIN(MIN(4, n_threads), ggml_nrows(node->src[0])); + } break; case GGML_OP_CONV_TRANSPOSE_1D: { n_tasks = n_threads; @@ -15574,6 +15927,10 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { { n_tasks = n_threads; } break; + case GGML_OP_ARGSORT: + { + n_tasks = n_threads; + } break; case GGML_OP_FLASH_ATTN: { n_tasks = n_threads; @@ -15642,7 +15999,12 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { } break; default: { - printf("%s: op %s not implemented\n", __func__, ggml_op_name(node->op)); + fprintf(stderr, "%s: op not implemented: ", __func__); + if (node->op < GGML_OP_COUNT) { + fprintf(stderr, "%s\n", ggml_op_name(node->op)); + } else { + fprintf(stderr, "%d\n", node->op); + } GGML_ASSERT(false); } break; } @@ -15783,18 +16145,16 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { // thread scheduling for the different operations + work buffer size estimation for (int i = 0; i < cgraph->n_nodes; i++) { - int n_tasks = 1; - struct ggml_tensor * node = cgraph->nodes[i]; + const int n_tasks = ggml_get_n_tasks(node, n_threads); + size_t cur = 0; switch (node->op) { case GGML_OP_CPY: case GGML_OP_DUP: { - n_tasks = n_threads; - if (ggml_is_quantized(node->type)) { cur = ggml_type_size(GGML_TYPE_F32) * node->ne[0] * n_tasks; } @@ -15802,16 +16162,12 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { case GGML_OP_ADD: case GGML_OP_ADD1: { - n_tasks = n_threads; - if (ggml_is_quantized(node->src[0]->type)) { cur = ggml_type_size(GGML_TYPE_F32) * node->src[0]->ne[0] * n_tasks; } } break; case GGML_OP_ACC: { - n_tasks = n_threads; - if (ggml_is_quantized(node->src[0]->type)) { cur = ggml_type_size(GGML_TYPE_F32) * node->src[1]->ne[0] * n_tasks; } @@ -15837,14 +16193,33 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { cur = ggml_type_size(vec_dot_type)*ggml_nelements(node->src[1])/ggml_blck_size(vec_dot_type); } } break; + case GGML_OP_MUL_MAT_ID: + { + const struct ggml_tensor * a = node->src[2]; + const struct ggml_tensor * b = node->src[1]; + const enum ggml_type vec_dot_type = type_traits[a->type].vec_dot_type; +#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) + if (ggml_compute_forward_mul_mat_use_blas(a, b, node)) { + if (a->type != GGML_TYPE_F32) { + // here we need memory just for single 2D matrix from src0 + cur = ggml_type_size(GGML_TYPE_F32)*(a->ne[0]*a->ne[1]); + } + } else +#endif + if (b->type != vec_dot_type) { + cur = ggml_type_size(vec_dot_type)*ggml_nelements(b)/ggml_blck_size(vec_dot_type); + } + } break; case GGML_OP_OUT_PROD: { - n_tasks = n_threads; - if (ggml_is_quantized(node->src[0]->type)) { cur = ggml_type_size(GGML_TYPE_F32) * node->src[0]->ne[0] * n_tasks; } } break; + case GGML_OP_SOFT_MAX: + { + cur = ggml_type_size(GGML_TYPE_F32) * node->ne[0] * n_tasks; + } break; case GGML_OP_CONV_TRANSPOSE_1D: { GGML_ASSERT(node->src[0]->ne[3] == 1); @@ -15870,10 +16245,6 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { GGML_ASSERT(false); } } break; - case GGML_OP_IM2COL: - { - n_tasks = n_threads; - } break; case GGML_OP_CONV_TRANSPOSE_2D: { const int64_t ne00 = node->src[0]->ne[0]; // W @@ -15890,8 +16261,6 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { } break; case GGML_OP_FLASH_ATTN: { - n_tasks = n_threads; - const int64_t ne11 = ggml_up(node->src[1]->ne[1], GGML_SOFT_MAX_UNROLL); if (node->src[1]->type == GGML_TYPE_F32) { @@ -15904,8 +16273,6 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { } break; case GGML_OP_FLASH_FF: { - n_tasks = n_threads; - if (node->src[1]->type == GGML_TYPE_F32) { cur = sizeof(float)*node->src[1]->ne[1]*n_tasks; // TODO: this can become (n_tasks-1) cur += sizeof(float)*node->src[1]->ne[1]*n_tasks; // this is overestimated by x2 @@ -15916,8 +16283,6 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { } break; case GGML_OP_FLASH_ATTN_BACK: { - n_tasks = n_threads; - const int64_t D = node->src[0]->ne[0]; const int64_t ne11 = ggml_up(node->src[1]->ne[1], GGML_SOFT_MAX_UNROLL); const int64_t mxDn = MAX(D, ne11) * 2; // *2 because of S and SM in ggml_compute_forward_flash_attn_back @@ -15932,8 +16297,6 @@ struct ggml_cplan ggml_graph_plan(struct ggml_cgraph * cgraph, int n_threads) { case GGML_OP_CROSS_ENTROPY_LOSS: { - n_tasks = n_threads; - cur = ggml_type_size(node->type)*(n_tasks + node->src[0]->ne[0]*n_tasks); } break; case GGML_OP_COUNT: @@ -17720,8 +18083,8 @@ size_t ggml_quantize_q5_0(const float * src, void * dst, int n, int k, int64_t * memcpy(&qh, &y[i].qh, sizeof(qh)); for (int j = 0; j < QK5_0; j += 2) { - const uint8_t vh0 = ((qh & (1u << (j + 0 ))) >> (j + 0 )) << 4; - const uint8_t vh1 = ((qh & (1u << (j + 16))) >> (j + 12)); + const uint8_t vh0 = ((qh & (1u << (j/2 + 0 ))) >> (j/2 + 0 )) << 4; + const uint8_t vh1 = ((qh & (1u << (j/2 + 16))) >> (j/2 + 12)); // cast to 16 bins const uint8_t vi0 = ((y[i].qs[j/2] & 0x0F) | vh0) / 2; @@ -17750,8 +18113,8 @@ size_t ggml_quantize_q5_1(const float * src, void * dst, int n, int k, int64_t * memcpy(&qh, &y[i].qh, sizeof(qh)); for (int j = 0; j < QK5_1; j += 2) { - const uint8_t vh0 = ((qh & (1u << (j + 0 ))) >> (j + 0 )) << 4; - const uint8_t vh1 = ((qh & (1u << (j + 16))) >> (j + 12)); + const uint8_t vh0 = ((qh & (1u << (j/2 + 0 ))) >> (j/2 + 0 )) << 4; + const uint8_t vh1 = ((qh & (1u << (j/2 + 16))) >> (j/2 + 12)); // cast to 16 bins const uint8_t vi0 = ((y[i].qs[j/2] & 0x0F) | vh0) / 2; @@ -17941,6 +18304,7 @@ struct gguf_kv { struct gguf_header { char magic[4]; + uint32_t version; uint64_t n_tensors; // GGUFv2 uint64_t n_kv; // GGUFv2 @@ -18030,7 +18394,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p for (uint32_t i = 0; i < sizeof(magic); i++) { if (magic[i] != GGUF_MAGIC[i]) { - fprintf(stderr, "%s: invalid magic characters %s.\n", __func__, magic); + fprintf(stderr, "%s: invalid magic characters '%c%c%c%c'\n", __func__, magic[0], magic[1], magic[2], magic[3]); fclose(file); return NULL; } @@ -18045,7 +18409,6 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p { strncpy(ctx->header.magic, magic, 4); - ctx->kv = NULL; ctx->infos = NULL; ctx->data = NULL; @@ -18073,7 +18436,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p { ctx->kv = malloc(ctx->header.n_kv * sizeof(struct gguf_kv)); - for (uint32_t i = 0; i < ctx->header.n_kv; ++i) { + for (uint64_t i = 0; i < ctx->header.n_kv; ++i) { struct gguf_kv * kv = &ctx->kv[i]; //fprintf(stderr, "%s: reading kv %d\n", __func__, i); @@ -18120,7 +18483,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p case GGUF_TYPE_STRING: { kv->value.arr.data = malloc(kv->value.arr.n * sizeof(struct gguf_str)); - for (uint32_t j = 0; j < kv->value.arr.n; ++j) { + for (uint64_t j = 0; j < kv->value.arr.n; ++j) { ok = ok && gguf_fread_str(file, &((struct gguf_str *) kv->value.arr.data)[j], &offset); } } break; @@ -18148,7 +18511,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p { ctx->infos = malloc(ctx->header.n_tensors * sizeof(struct gguf_tensor_info)); - for (uint32_t i = 0; i < ctx->header.n_tensors; ++i) { + for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) { struct gguf_tensor_info * info = &ctx->infos[i]; for (int j = 0; j < GGML_MAX_DIMS; ++j) { @@ -18195,7 +18558,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p // compute the total size of the data section, taking into account the alignment { ctx->size = 0; - for (uint32_t i = 0; i < ctx->header.n_tensors; ++i) { + for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) { struct gguf_tensor_info * info = &ctx->infos[i]; const int64_t ne = @@ -18264,7 +18627,7 @@ struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_p ggml_set_no_alloc(ctx_data, true); // create the tensors - for (uint32_t i = 0; i < ctx->header.n_tensors; ++i) { + for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) { const int64_t ne[GGML_MAX_DIMS] = { ctx->infos[i].ne[0], ctx->infos[i].ne[1], @@ -18399,24 +18762,29 @@ int gguf_find_key(const struct gguf_context * ctx, const char * key) { } const char * gguf_get_key(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); return ctx->kv[key_id].key.data; } enum gguf_type gguf_get_kv_type(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); return ctx->kv[key_id].type; } enum gguf_type gguf_get_arr_type(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_ARRAY); return ctx->kv[key_id].value.arr.type; } const void * gguf_get_arr_data(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_ARRAY); return ctx->kv[key_id].value.arr.data; } const char * gguf_get_arr_str(const struct gguf_context * ctx, int key_id, int i) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_ARRAY); struct gguf_kv * kv = &ctx->kv[key_id]; struct gguf_str * str = &((struct gguf_str *) kv->value.arr.data)[i]; @@ -18424,70 +18792,90 @@ const char * gguf_get_arr_str(const struct gguf_context * ctx, int key_id, int i } int gguf_get_arr_n(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_ARRAY); return ctx->kv[key_id].value.arr.n; } uint8_t gguf_get_val_u8(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_UINT8); return ctx->kv[key_id].value.uint8; } int8_t gguf_get_val_i8(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_INT8); return ctx->kv[key_id].value.int8; } uint16_t gguf_get_val_u16(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_UINT16); return ctx->kv[key_id].value.uint16; } int16_t gguf_get_val_i16(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_INT16); return ctx->kv[key_id].value.int16; } uint32_t gguf_get_val_u32(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_UINT32); return ctx->kv[key_id].value.uint32; } int32_t gguf_get_val_i32(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_INT32); return ctx->kv[key_id].value.int32; } float gguf_get_val_f32(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_FLOAT32); return ctx->kv[key_id].value.float32; } uint64_t gguf_get_val_u64(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_UINT64); return ctx->kv[key_id].value.uint64; } int64_t gguf_get_val_i64(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_INT64); return ctx->kv[key_id].value.int64; } double gguf_get_val_f64(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_FLOAT64); return ctx->kv[key_id].value.float64; } bool gguf_get_val_bool(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_BOOL); return ctx->kv[key_id].value.bool_; } const char * gguf_get_val_str(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); GGML_ASSERT(ctx->kv[key_id].type == GGUF_TYPE_STRING); return ctx->kv[key_id].value.str.data; } +const void * gguf_get_val_data(const struct gguf_context * ctx, int key_id) { + GGML_ASSERT(key_id >= 0 && key_id < gguf_get_n_kv(ctx)); + GGML_ASSERT(ctx->kv[key_id].type != GGUF_TYPE_ARRAY); + GGML_ASSERT(ctx->kv[key_id].type != GGUF_TYPE_STRING); + return &ctx->kv[key_id].value; +} + int gguf_get_n_tensors(const struct gguf_context * ctx) { return ctx->header.n_tensors; } diff --git a/ggml.h b/ggml.h index 8e6b646066b7a..32f256481615c 100644 --- a/ggml.h +++ b/ggml.h @@ -215,9 +215,9 @@ #define GGML_QNT_VERSION_FACTOR 1000 // do not change this #define GGML_MAX_DIMS 4 -#define GGML_MAX_PARAMS 1024 +#define GGML_MAX_PARAMS 2048 #define GGML_MAX_CONTEXTS 64 -#define GGML_MAX_SRC 6 +#define GGML_MAX_SRC 10 #define GGML_MAX_NAME 64 #define GGML_MAX_OP_PARAMS 64 #define GGML_DEFAULT_N_THREADS 4 @@ -244,11 +244,10 @@ #define GGML_ASSERT(x) \ do { \ if (!(x)) { \ - fprintf(stderr, "GGML_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \ - fflush(stderr); \ fflush(stdout); \ + fprintf(stderr, "GGML_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \ ggml_print_backtrace(); \ - exit(1); \ + abort(); \ } \ } while (0) @@ -284,6 +283,20 @@ const type prefix##3 = (pointer)->array[3]; \ GGML_UNUSED(prefix##3); +#define GGML_TENSOR_UNARY_OP_LOCALS \ + GGML_TENSOR_LOCALS(int64_t, ne0, src0, ne) \ + GGML_TENSOR_LOCALS(size_t, nb0, src0, nb) \ + GGML_TENSOR_LOCALS(int64_t, ne, dst, ne) \ + GGML_TENSOR_LOCALS(size_t, nb, dst, nb) + +#define GGML_TENSOR_BINARY_OP_LOCALS \ + GGML_TENSOR_LOCALS(int64_t, ne0, src0, ne) \ + GGML_TENSOR_LOCALS(size_t, nb0, src0, nb) \ + GGML_TENSOR_LOCALS(int64_t, ne1, src1, ne) \ + GGML_TENSOR_LOCALS(size_t, nb1, src1, nb) \ + GGML_TENSOR_LOCALS(int64_t, ne, dst, ne) \ + GGML_TENSOR_LOCALS(size_t, nb, dst, nb) + #ifdef __cplusplus extern "C" { #endif @@ -382,6 +395,7 @@ extern "C" { GGML_OP_GROUP_NORM, GGML_OP_MUL_MAT, + GGML_OP_MUL_MAT_ID, GGML_OP_OUT_PROD, GGML_OP_SCALE, @@ -408,8 +422,8 @@ extern "C" { GGML_OP_CONV_TRANSPOSE_2D, GGML_OP_POOL_1D, GGML_OP_POOL_2D, - GGML_OP_UPSCALE, // nearest interpolate + GGML_OP_ARGSORT, GGML_OP_FLASH_ATTN, GGML_OP_FLASH_FF, @@ -449,7 +463,9 @@ extern "C" { GGML_UNARY_OP_GELU, GGML_UNARY_OP_GELU_QUICK, GGML_UNARY_OP_SILU, - GGML_UNARY_OP_LEAKY + GGML_UNARY_OP_LEAKY, + + GGML_UNARY_OP_COUNT, }; enum ggml_object_type { @@ -632,6 +648,9 @@ extern "C" { GGML_API const char * ggml_op_name (enum ggml_op op); GGML_API const char * ggml_op_symbol(enum ggml_op op); + GGML_API const char * ggml_unary_op_name(enum ggml_unary_op op); + GGML_API const char * ggml_op_desc(const struct ggml_tensor * t); // unary or op name + GGML_API size_t ggml_element_size(const struct ggml_tensor * tensor); GGML_API bool ggml_is_quantized(enum ggml_type type); @@ -1028,6 +1047,16 @@ extern "C" { struct ggml_tensor * a, struct ggml_tensor * b); + // indirect matrix multiplication + // ggml_mul_mat_id(ctx, as, ids, id, b) ~= ggml_mul_mat(as[ids[id]], b) + GGML_API struct ggml_tensor * ggml_mul_mat_id( + struct ggml_context * ctx, + struct ggml_tensor * const as[], + int n_as, + struct ggml_tensor * ids, + int id, + struct ggml_tensor * b); + // A: m columns, n rows, // B: p columns, n rows, // result is m columns, p rows @@ -1235,6 +1264,7 @@ extern "C" { struct ggml_context * ctx, struct ggml_tensor * a); + // supports 3D: a->ne[2] == b->ne[1] GGML_API struct ggml_tensor * ggml_get_rows( struct ggml_context * ctx, struct ggml_tensor * a, @@ -1283,6 +1313,14 @@ extern "C" { struct ggml_context * ctx, struct ggml_tensor * a); + // fused soft_max(a*scale + mask) + // mask is optional + GGML_API struct ggml_tensor * ggml_soft_max_ext( + struct ggml_context * ctx, + struct ggml_tensor * a, + struct ggml_tensor * mask, + float scale); + GGML_API struct ggml_tensor * ggml_soft_max_back( struct ggml_context * ctx, struct ggml_tensor * a, @@ -1513,6 +1551,23 @@ extern "C" { struct ggml_tensor * a, int scale_factor); + // sort rows + enum ggml_sort_order { + GGML_SORT_ASC, + GGML_SORT_DESC, + }; + + GGML_API struct ggml_tensor * ggml_argsort( + struct ggml_context * ctx, + struct ggml_tensor * a, + enum ggml_sort_order order); + + // top k elements per row + GGML_API struct ggml_tensor * ggml_top_k( + struct ggml_context * ctx, + struct ggml_tensor * a, + int k); + GGML_API struct ggml_tensor * ggml_flash_attn( struct ggml_context * ctx, struct ggml_tensor * q, @@ -1574,7 +1629,6 @@ extern "C" { int kh); // used in sam - GGML_API struct ggml_tensor * ggml_add_rel_pos( struct ggml_context * ctx, struct ggml_tensor * a, @@ -1749,7 +1803,7 @@ extern "C" { GGML_API struct ggml_cgraph * ggml_new_graph (struct ggml_context * ctx); // size = GGML_DEFAULT_GRAPH_SIZE, grads = false GGML_API struct ggml_cgraph * ggml_new_graph_custom (struct ggml_context * ctx, size_t size, bool grads); GGML_API struct ggml_cgraph * ggml_graph_dup (struct ggml_context * ctx, struct ggml_cgraph * cgraph); - GGML_API struct ggml_cgraph * ggml_graph_view (struct ggml_context * ctx, struct ggml_cgraph * cgraph, int i0, int i1); + GGML_API struct ggml_cgraph ggml_graph_view (struct ggml_cgraph * cgraph, int i0, int i1); GGML_API void ggml_graph_cpy (struct ggml_cgraph * src, struct ggml_cgraph * dst); GGML_API void ggml_graph_reset (struct ggml_cgraph * cgraph); // zero grads GGML_API void ggml_graph_clear (struct ggml_cgraph * cgraph); @@ -2045,6 +2099,7 @@ extern "C" { GGML_API double gguf_get_val_f64 (const struct gguf_context * ctx, int key_id); GGML_API bool gguf_get_val_bool(const struct gguf_context * ctx, int key_id); GGML_API const char * gguf_get_val_str (const struct gguf_context * ctx, int key_id); + GGML_API const void * gguf_get_val_data(const struct gguf_context * ctx, int key_id); GGML_API int gguf_get_arr_n (const struct gguf_context * ctx, int key_id); GGML_API const void * gguf_get_arr_data(const struct gguf_context * ctx, int key_id); GGML_API const char * gguf_get_arr_str (const struct gguf_context * ctx, int key_id, int i); diff --git a/gguf-py/README.md b/gguf-py/README.md index 502b6a510cc70..a27d2fc0e1021 100644 --- a/gguf-py/README.md +++ b/gguf-py/README.md @@ -61,7 +61,7 @@ If you want to publish the package manually for any reason, you need to have `tw pip install build twine ``` -Then, folow these steps to release a new version: +Then, follow these steps to release a new version: 1. Bump the version in `pyproject.toml`. 2. Build the package: diff --git a/gguf-py/gguf/constants.py b/gguf-py/gguf/constants.py index 7f63361bd32bc..12133882be2c4 100644 --- a/gguf-py/gguf/constants.py +++ b/gguf-py/gguf/constants.py @@ -38,6 +38,8 @@ class LLM: FEED_FORWARD_LENGTH = "{arch}.feed_forward_length" USE_PARALLEL_RESIDUAL = "{arch}.use_parallel_residual" TENSOR_DATA_LAYOUT = "{arch}.tensor_data_layout" + EXPERT_COUNT = "{arch}.expert_count" + EXPERT_USED_COUNT = "{arch}.expert_used_count" class Attention: HEAD_COUNT = "{arch}.attention.head_count" @@ -56,20 +58,21 @@ class Rope: SCALING_FINETUNED = "{arch}.rope.scaling.finetuned" class Tokenizer: - MODEL = "tokenizer.ggml.model" - LIST = "tokenizer.ggml.tokens" - TOKEN_TYPE = "tokenizer.ggml.token_type" - SCORES = "tokenizer.ggml.scores" - MERGES = "tokenizer.ggml.merges" - BOS_ID = "tokenizer.ggml.bos_token_id" - EOS_ID = "tokenizer.ggml.eos_token_id" - UNK_ID = "tokenizer.ggml.unknown_token_id" - SEP_ID = "tokenizer.ggml.seperator_token_id" - PAD_ID = "tokenizer.ggml.padding_token_id" - ADD_BOS = "tokenizer.ggml.add_bos_token" - ADD_EOS = "tokenizer.ggml.add_eos_token" - HF_JSON = "tokenizer.huggingface.json" - RWKV = "tokenizer.rwkv.world" + MODEL = "tokenizer.ggml.model" + LIST = "tokenizer.ggml.tokens" + TOKEN_TYPE = "tokenizer.ggml.token_type" + SCORES = "tokenizer.ggml.scores" + MERGES = "tokenizer.ggml.merges" + BOS_ID = "tokenizer.ggml.bos_token_id" + EOS_ID = "tokenizer.ggml.eos_token_id" + UNK_ID = "tokenizer.ggml.unknown_token_id" + SEP_ID = "tokenizer.ggml.seperator_token_id" + PAD_ID = "tokenizer.ggml.padding_token_id" + ADD_BOS = "tokenizer.ggml.add_bos_token" + ADD_EOS = "tokenizer.ggml.add_eos_token" + HF_JSON = "tokenizer.huggingface.json" + RWKV = "tokenizer.rwkv.world" + CHAT_TEMPLATE = "tokenizer.chat_template" # @@ -91,6 +94,7 @@ class MODEL_ARCH(IntEnum): BERT = auto() BLOOM = auto() STABLELM = auto() + QWEN = auto() class MODEL_TENSOR(IntEnum): @@ -109,10 +113,14 @@ class MODEL_TENSOR(IntEnum): ATTN_NORM = auto() ATTN_NORM_2 = auto() ATTN_ROT_EMBD = auto() + FFN_GATE_INP = auto() + FFN_NORM = auto() FFN_GATE = auto() FFN_DOWN = auto() FFN_UP = auto() - FFN_NORM = auto() + FFN_GATE_EXP = auto() + FFN_DOWN_EXP = auto() + FFN_UP_EXP = auto() ATTN_Q_NORM = auto() ATTN_K_NORM = auto() @@ -131,6 +139,7 @@ class MODEL_TENSOR(IntEnum): MODEL_ARCH.BERT: "bert", MODEL_ARCH.BLOOM: "bloom", MODEL_ARCH.STABLELM: "stablelm", + MODEL_ARCH.QWEN: "qwen", } TENSOR_NAMES: dict[MODEL_TENSOR, str] = { @@ -151,10 +160,14 @@ class MODEL_TENSOR(IntEnum): MODEL_TENSOR.ATTN_ROT_EMBD: "blk.{bid}.attn_rot_embd", MODEL_TENSOR.ATTN_Q_NORM: "blk.{bid}.attn_q_norm", MODEL_TENSOR.ATTN_K_NORM: "blk.{bid}.attn_k_norm", + MODEL_TENSOR.FFN_GATE_INP: "blk.{bid}.ffn_gate_inp", MODEL_TENSOR.FFN_NORM: "blk.{bid}.ffn_norm", MODEL_TENSOR.FFN_GATE: "blk.{bid}.ffn_gate", MODEL_TENSOR.FFN_DOWN: "blk.{bid}.ffn_down", MODEL_TENSOR.FFN_UP: "blk.{bid}.ffn_up", + MODEL_TENSOR.FFN_GATE_EXP: "blk.{bid}.ffn_gate.{xid}", + MODEL_TENSOR.FFN_DOWN_EXP: "blk.{bid}.ffn_down.{xid}", + MODEL_TENSOR.FFN_UP_EXP: "blk.{bid}.ffn_up.{xid}", } MODEL_TENSORS: dict[MODEL_ARCH, list[MODEL_TENSOR]] = { @@ -169,10 +182,14 @@ class MODEL_TENSOR(IntEnum): MODEL_TENSOR.ATTN_V, MODEL_TENSOR.ATTN_OUT, MODEL_TENSOR.ATTN_ROT_EMBD, + MODEL_TENSOR.FFN_GATE_INP, MODEL_TENSOR.FFN_NORM, MODEL_TENSOR.FFN_GATE, MODEL_TENSOR.FFN_DOWN, MODEL_TENSOR.FFN_UP, + MODEL_TENSOR.FFN_GATE_EXP, + MODEL_TENSOR.FFN_DOWN_EXP, + MODEL_TENSOR.FFN_UP_EXP, ], MODEL_ARCH.GPTNEOX: [ MODEL_TENSOR.TOKEN_EMBD, @@ -316,6 +333,20 @@ class MODEL_TENSOR(IntEnum): MODEL_TENSOR.FFN_DOWN, MODEL_TENSOR.FFN_UP, ], + MODEL_ARCH.QWEN: [ + MODEL_TENSOR.TOKEN_EMBD, + MODEL_TENSOR.OUTPUT_NORM, + MODEL_TENSOR.OUTPUT, + MODEL_TENSOR.ROPE_FREQS, + MODEL_TENSOR.ATTN_NORM, + MODEL_TENSOR.ATTN_QKV, + MODEL_TENSOR.ATTN_OUT, + MODEL_TENSOR.ATTN_ROT_EMBD, + MODEL_TENSOR.FFN_NORM, + MODEL_TENSOR.FFN_GATE, + MODEL_TENSOR.FFN_DOWN, + MODEL_TENSOR.FFN_UP, + ], MODEL_ARCH.GPT2: [ # TODO ], @@ -335,6 +366,10 @@ class MODEL_TENSOR(IntEnum): MODEL_ARCH.PERSIMMON: [ MODEL_TENSOR.ROPE_FREQS, ], + MODEL_ARCH.QWEN: [ + MODEL_TENSOR.ROPE_FREQS, + MODEL_TENSOR.ATTN_ROT_EMBD, + ], } # diff --git a/gguf-py/gguf/gguf_writer.py b/gguf-py/gguf/gguf_writer.py index c3b8c588f17cd..73e02160750b2 100644 --- a/gguf-py/gguf/gguf_writer.py +++ b/gguf-py/gguf/gguf_writer.py @@ -221,7 +221,7 @@ def add_tensor( if self.endianess == GGUFEndian.BIG: tensor.byteswap(inplace=True) if self.use_temp_file and self.temp_file is None: - fp = tempfile.SpooledTemporaryFile(mode="w+b", max_size=256*1024*1024) + fp = tempfile.SpooledTemporaryFile(mode="w+b", max_size=256 * 1024 * 1024) fp.seek(0) self.temp_file = fp @@ -339,6 +339,12 @@ def add_max_alibi_bias(self, bias: float) -> None: def add_clamp_kqv(self, value: float) -> None: self.add_float32(Keys.Attention.CLAMP_KQV.format(arch=self.arch), value) + def add_expert_count(self, count: int) -> None: + self.add_uint32(Keys.LLM.EXPERT_COUNT.format(arch=self.arch), count) + + def add_expert_used_count(self, count: int) -> None: + self.add_uint32(Keys.LLM.EXPERT_USED_COUNT.format(arch=self.arch), count) + def add_layer_norm_eps(self, value: float) -> None: self.add_float32(Keys.Attention.LAYERNORM_EPS.format(arch=self.arch), value) @@ -399,6 +405,9 @@ def add_add_bos_token(self, value: bool) -> None: def add_add_eos_token(self, value: bool) -> None: self.add_bool(Keys.Tokenizer.ADD_EOS, value) + def add_chat_template(self, value: str) -> None: + self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value) + def _pack(self, fmt: str, value: Any, skip_pack_prefix: bool = False) -> bytes: pack_prefix = '' if not skip_pack_prefix: diff --git a/gguf-py/gguf/tensor_mapping.py b/gguf-py/gguf/tensor_mapping.py index 22ad8b8fc558d..0115ea1c605b1 100644 --- a/gguf-py/gguf/tensor_mapping.py +++ b/gguf-py/gguf/tensor_mapping.py @@ -10,7 +10,7 @@ class TensorNameMap: # Token embeddings MODEL_TENSOR.TOKEN_EMBD: ( "gpt_neox.embed_in", # gptneox - "transformer.wte", # gpt2 gpt-j mpt refact + "transformer.wte", # gpt2 gpt-j mpt refact qwen "transformer.word_embeddings", # falcon "word_embeddings", # bloom "model.embed_tokens", # llama-hf @@ -38,7 +38,7 @@ class TensorNameMap: # Output MODEL_TENSOR.OUTPUT: ( "embed_out", # gptneox - "lm_head", # gpt2 mpt falcon llama-hf baichuan + "lm_head", # gpt2 mpt falcon llama-hf baichuan qwen "output", # llama-pth bloom "word_embeddings_for_head", # persimmon ), @@ -51,7 +51,7 @@ class TensorNameMap: "norm", # llama-pth "embeddings.LayerNorm", # bert "transformer.norm_f", # mpt - "ln_f", # refact bloom + "ln_f", # refact bloom qwen "language_model.encoder.final_layernorm", # persimmon ), @@ -65,7 +65,7 @@ class TensorNameMap: # Attention norm MODEL_TENSOR.ATTN_NORM: ( "gpt_neox.layers.{bid}.input_layernorm", # gptneox - "transformer.h.{bid}.ln_1", # gpt2 gpt-j refact + "transformer.h.{bid}.ln_1", # gpt2 gpt-j refact qwen "transformer.blocks.{bid}.norm_1", # mpt "transformer.h.{bid}.input_layernorm", # falcon7b "h.{bid}.input_layernorm", # bloom @@ -85,7 +85,7 @@ class TensorNameMap: # Attention query-key-value MODEL_TENSOR.ATTN_QKV: ( "gpt_neox.layers.{bid}.attention.query_key_value", # gptneox - "transformer.h.{bid}.attn.c_attn", # gpt2 + "transformer.h.{bid}.attn.c_attn", # gpt2 qwen "transformer.blocks.{bid}.attn.Wqkv", # mpt "transformer.h.{bid}.self_attention.query_key_value", # falcon "h.{bid}.self_attention.query_key_value", # bloom @@ -119,7 +119,7 @@ class TensorNameMap: # Attention output MODEL_TENSOR.ATTN_OUT: ( "gpt_neox.layers.{bid}.attention.dense", # gptneox - "transformer.h.{bid}.attn.c_proj", # gpt2 refact + "transformer.h.{bid}.attn.c_proj", # gpt2 refact qwen "transformer.blocks.{bid}.attn.out_proj", # mpt "transformer.h.{bid}.self_attention.dense", # falcon "h.{bid}.self_attention.dense", # bloom @@ -139,7 +139,7 @@ class TensorNameMap: # Feed-forward norm MODEL_TENSOR.FFN_NORM: ( "gpt_neox.layers.{bid}.post_attention_layernorm", # gptneox - "transformer.h.{bid}.ln_2", # gpt2 refact + "transformer.h.{bid}.ln_2", # gpt2 refact qwen "h.{bid}.post_attention_layernorm", # bloom "transformer.blocks.{bid}.norm_2", # mpt "model.layers.{bid}.post_attention_layernorm", # llama-hf @@ -149,6 +149,11 @@ class TensorNameMap: "model.layers.{bid}.ln2", # yi ), + MODEL_TENSOR.FFN_GATE_INP: ( + "layers.{bid}.feed_forward.gate", # mixtral + "model.layers.{bid}.block_sparse_moe.gate", # mixtral + ), + # Feed-forward up MODEL_TENSOR.FFN_UP: ( "gpt_neox.layers.{bid}.mlp.dense_h_to_4h", # gptneox @@ -161,18 +166,30 @@ class TensorNameMap: "encoder.layer.{bid}.intermediate.dense", # bert "transformer.h.{bid}.mlp.fc_in", # gpt-j "language_model.encoder.layers.{bid}.mlp.dense_h_to_4h", # persimmon + "transformer.h.{bid}.mlp.w1", # qwen + ), + + MODEL_TENSOR.FFN_UP_EXP: ( + "layers.{bid}.feed_forward.experts.{xid}.w3", # mixtral + "model.layers.{bid}.block_sparse_moe.experts.{xid}.w3", # mixtral ), # Feed-forward gate MODEL_TENSOR.FFN_GATE: ( - "model.layers.{bid}.mlp.gate_proj", # llama-hf refact - "layers.{bid}.feed_forward.w1", # llama-pth + "model.layers.{bid}.mlp.gate_proj", # llama-hf refact + "layers.{bid}.feed_forward.w1", # llama-pth + "transformer.h.{bid}.mlp.w2", # qwen + ), + + MODEL_TENSOR.FFN_GATE_EXP: ( + "layers.{bid}.feed_forward.experts.{xid}.w1", # mixtral + "model.layers.{bid}.block_sparse_moe.experts.{xid}.w1", # mixtral ), # Feed-forward down MODEL_TENSOR.FFN_DOWN: ( "gpt_neox.layers.{bid}.mlp.dense_4h_to_h", # gptneox - "transformer.h.{bid}.mlp.c_proj", # gpt2 refact + "transformer.h.{bid}.mlp.c_proj", # gpt2 refact qwen "transformer.blocks.{bid}.ffn.down_proj", # mpt "transformer.h.{bid}.mlp.dense_4h_to_h", # falcon "h.{bid}.mlp.dense_4h_to_h", # bloom @@ -183,6 +200,11 @@ class TensorNameMap: "language_model.encoder.layers.{bid}.mlp.dense_4h_to_h", # persimmon ), + MODEL_TENSOR.FFN_DOWN_EXP: ( + "layers.{bid}.feed_forward.experts.{xid}.w2", # mixtral + "model.layers.{bid}.block_sparse_moe.experts.{xid}.w2", # mixtral + ), + MODEL_TENSOR.ATTN_Q_NORM: ( "language_model.encoder.layers.{bid}.self_attention.q_layernorm", ), @@ -211,11 +233,14 @@ def __init__(self, arch: MODEL_ARCH, n_blocks: int): for tensor, keys in self.block_mappings_cfg.items(): if tensor not in MODEL_TENSORS[arch]: continue - tensor_name = TENSOR_NAMES[tensor].format(bid = bid) - self.mapping[tensor_name] = (tensor, tensor_name) - for key in keys: - key = key.format(bid = bid) - self.mapping[key] = (tensor, tensor_name) + # TODO: make this configurable + n_experts = 8 + for xid in range(n_experts): + tensor_name = TENSOR_NAMES[tensor].format(bid = bid, xid = xid) + self.mapping[tensor_name] = (tensor, tensor_name) + for key in keys: + key = key.format(bid = bid, xid = xid) + self.mapping[key] = (tensor, tensor_name) def get_type_and_name(self, key: str, try_suffixes: Sequence[str] = ()) -> tuple[MODEL_TENSOR, str] | None: result = self.mapping.get(key) diff --git a/gguf-py/gguf/vocab.py b/gguf-py/gguf/vocab.py index 71192a928d664..de3e5edb557d7 100644 --- a/gguf-py/gguf/vocab.py +++ b/gguf-py/gguf/vocab.py @@ -13,6 +13,7 @@ class SpecialVocab: merges: list[str] add_special_token: dict[str, bool] special_token_ids: dict[str, int] + chat_template: str | None def __init__( self, path: str | os.PathLike[str], load_merges: bool = False, @@ -24,6 +25,7 @@ def __init__( self.n_vocab = n_vocab self.load_merges = load_merges self.merges = [] + self.chat_template = None if special_token_types is not None: self.special_token_types = special_token_types else: @@ -67,6 +69,10 @@ def add_to_gguf(self, gw: GGUFWriter, quiet: bool = False) -> None: if not quiet: print(f'gguf: Setting add_{typ}_token to {value}') add_handler(value) + if self.chat_template is not None: + if not quiet: + print(f'gguf: Setting chat_template to {self.chat_template}') + gw.add_chat_template(self.chat_template) def _load(self, path: Path) -> None: self._try_load_from_tokenizer_json(path) @@ -117,24 +123,37 @@ def _set_special_token(self, typ: str, tid: Any) -> None: def _try_load_from_tokenizer_json(self, path: Path) -> bool: tokenizer_file = path / 'tokenizer.json' - if not tokenizer_file.is_file(): - return False - with open(tokenizer_file, encoding = 'utf-8') as f: - tokenizer = json.load(f) - if self.load_merges: - merges = tokenizer.get('model', {}).get('merges') - if isinstance(merges, list) and merges and isinstance(merges[0], str): - self.merges = merges + if tokenizer_file.is_file(): + with open(tokenizer_file, encoding = 'utf-8') as f: + tokenizer = json.load(f) + if self.load_merges: + merges = tokenizer.get('model', {}).get('merges') + if isinstance(merges, list) and merges and isinstance(merges[0], str): + self.merges = merges + added_tokens = tokenizer.get('added_tokens', {}) + else: + added_tokens = {} tokenizer_config_file = path / 'tokenizer_config.json' - added_tokens = tokenizer.get('added_tokens') - if added_tokens is None or not tokenizer_config_file.is_file(): + if not tokenizer_config_file.is_file(): return True with open(tokenizer_config_file, encoding = 'utf-8') as f: tokenizer_config = json.load(f) + chat_template = tokenizer_config.get('chat_template') + if chat_template is None or isinstance(chat_template, str): + self.chat_template = chat_template + else: + print( + f'gguf: WARNING: Bad type for chat_template field in {tokenizer_config_file!r} - ignoring', + file = sys.stderr + ) for typ in self.special_token_types: add_entry = tokenizer_config.get(f'add_{typ}_token') if isinstance(add_entry, bool): self.add_special_token[typ] = add_entry + if not added_tokens: + # We will need this to get the content for the token, so if it's empty + # may as well just give up. + continue entry = tokenizer_config.get(f'{typ}_token') if isinstance(entry, str): tc_content = entry diff --git a/gguf-py/pyproject.toml b/gguf-py/pyproject.toml index af777c3e0f2b6..9789c2c877165 100644 --- a/gguf-py/pyproject.toml +++ b/gguf-py/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "gguf" -version = "0.5.2" +version = "0.7.0" description = "Read and write ML models in GGUF for GGML" authors = ["GGML "] packages = [ diff --git a/gguf-py/scripts/gguf-dump.py b/gguf-py/scripts/gguf-dump.py index 5141873de7321..dbf8915089275 100755 --- a/gguf-py/scripts/gguf-dump.py +++ b/gguf-py/scripts/gguf-dump.py @@ -86,13 +86,14 @@ def dump_metadata_json(reader: GGUFReader, args: argparse.Namespace) -> None: curr["value"] = str(bytes(field.parts[-1]), encoding="utf-8") else: curr["value"] = field.parts[-1].tolist()[0] - for idx, tensor in enumerate(reader.tensors): - tensors[tensor.name] = { - "index": idx, - "shape": tensor.shape.tolist(), - "type": tensor.tensor_type.name, - "offset": tensor.field.offset, - } + if not args.no_tensors: + for idx, tensor in enumerate(reader.tensors): + tensors[tensor.name] = { + "index": idx, + "shape": tensor.shape.tolist(), + "type": tensor.tensor_type.name, + "offset": tensor.field.offset, + } json.dump(result, sys.stdout) diff --git a/llama.cpp b/llama.cpp index 01522fdb4e74f..0e5ab044cdfa4 100644 --- a/llama.cpp +++ b/llama.cpp @@ -46,7 +46,6 @@ #endif #include #include - #include // for _fseeki64 #endif #include @@ -75,6 +74,7 @@ #include #include #include +#include #include #if defined(_MSC_VER) @@ -91,7 +91,8 @@ #define LLAMA_ATTRIBUTE_FORMAT(...) #endif -#define LLAMA_MAX_NODES 4096 +#define LLAMA_MAX_NODES 8192 +#define LLAMA_MAX_EXPERTS 8 // // logging @@ -193,6 +194,7 @@ enum llm_arch { LLM_ARCH_REFACT, LLM_ARCH_BLOOM, LLM_ARCH_STABLELM, + LLM_ARCH_QWEN, LLM_ARCH_UNKNOWN, }; @@ -209,6 +211,7 @@ static std::map LLM_ARCH_NAMES = { { LLM_ARCH_REFACT, "refact" }, { LLM_ARCH_BLOOM, "bloom" }, { LLM_ARCH_STABLELM, "stablelm" }, + { LLM_ARCH_QWEN, "qwen" }, }; enum llm_kv { @@ -229,6 +232,8 @@ enum llm_kv { LLM_KV_FEED_FORWARD_LENGTH, LLM_KV_USE_PARALLEL_RESIDUAL, LLM_KV_TENSOR_DATA_LAYOUT, + LLM_KV_EXPERT_COUNT, + LLM_KV_EXPERT_USED_COUNT, LLM_KV_ATTENTION_HEAD_COUNT, LLM_KV_ATTENTION_HEAD_COUNT_KV, @@ -255,6 +260,8 @@ enum llm_kv { LLM_KV_TOKENIZER_UNK_ID, LLM_KV_TOKENIZER_SEP_ID, LLM_KV_TOKENIZER_PAD_ID, + LLM_KV_TOKENIZER_ADD_BOS, + LLM_KV_TOKENIZER_ADD_EOS, LLM_KV_TOKENIZER_HF_JSON, LLM_KV_TOKENIZER_RWKV, }; @@ -277,6 +284,8 @@ static std::map LLM_KV_NAMES = { { LLM_KV_FEED_FORWARD_LENGTH, "%s.feed_forward_length" }, { LLM_KV_USE_PARALLEL_RESIDUAL, "%s.use_parallel_residual" }, { LLM_KV_TENSOR_DATA_LAYOUT, "%s.tensor_data_layout" }, + { LLM_KV_EXPERT_COUNT, "%s.expert_count" }, + { LLM_KV_EXPERT_USED_COUNT, "%s.expert_used_count" }, { LLM_KV_ATTENTION_HEAD_COUNT, "%s.attention.head_count" }, { LLM_KV_ATTENTION_HEAD_COUNT_KV, "%s.attention.head_count_kv" }, @@ -303,6 +312,8 @@ static std::map LLM_KV_NAMES = { { LLM_KV_TOKENIZER_UNK_ID, "tokenizer.ggml.unknown_token_id" }, { LLM_KV_TOKENIZER_SEP_ID, "tokenizer.ggml.seperator_token_id" }, { LLM_KV_TOKENIZER_PAD_ID, "tokenizer.ggml.padding_token_id" }, + { LLM_KV_TOKENIZER_ADD_BOS, "tokenizer.ggml.add_bos_token" }, + { LLM_KV_TOKENIZER_ADD_EOS, "tokenizer.ggml.add_eos_token" }, { LLM_KV_TOKENIZER_HF_JSON, "tokenizer.huggingface.json" }, { LLM_KV_TOKENIZER_RWKV, "tokenizer.rwkv.world" }, }; @@ -332,10 +343,14 @@ enum llm_tensor { LLM_TENSOR_ATTN_NORM, LLM_TENSOR_ATTN_NORM_2, LLM_TENSOR_ATTN_ROT_EMBD, + LLM_TENSOR_FFN_GATE_INP, + LLM_TENSOR_FFN_NORM, LLM_TENSOR_FFN_GATE, LLM_TENSOR_FFN_DOWN, LLM_TENSOR_FFN_UP, - LLM_TENSOR_FFN_NORM, + LLM_TENSOR_FFN_DOWN_EXP, + LLM_TENSOR_FFN_GATE_EXP, + LLM_TENSOR_FFN_UP_EXP, LLM_TENSOR_ATTN_Q_NORM, LLM_TENSOR_ATTN_K_NORM, }; @@ -354,10 +369,14 @@ static std::map> LLM_TENSOR_NAMES = { LLM_TENSOR_ATTN_V, "blk.%d.attn_v" }, { LLM_TENSOR_ATTN_OUT, "blk.%d.attn_output" }, { LLM_TENSOR_ATTN_ROT_EMBD, "blk.%d.attn_rot_embd" }, + { LLM_TENSOR_FFN_GATE_INP, "blk.%d.ffn_gate_inp" }, { LLM_TENSOR_FFN_NORM, "blk.%d.ffn_norm" }, { LLM_TENSOR_FFN_GATE, "blk.%d.ffn_gate" }, { LLM_TENSOR_FFN_DOWN, "blk.%d.ffn_down" }, { LLM_TENSOR_FFN_UP, "blk.%d.ffn_up" }, + { LLM_TENSOR_FFN_GATE_EXP, "blk.%d.ffn_gate.%d" }, + { LLM_TENSOR_FFN_DOWN_EXP, "blk.%d.ffn_down.%d" }, + { LLM_TENSOR_FFN_UP_EXP, "blk.%d.ffn_up.%d" }, }, }, { @@ -515,6 +534,22 @@ static std::map> LLM_TENSOR_NAMES = { LLM_TENSOR_FFN_UP, "blk.%d.ffn_up" }, }, }, + { + LLM_ARCH_QWEN, + { + { LLM_TENSOR_TOKEN_EMBD, "token_embd" }, + { LLM_TENSOR_OUTPUT_NORM, "output_norm" }, + { LLM_TENSOR_OUTPUT, "output" }, + { LLM_TENSOR_ROPE_FREQS, "rope_freqs" }, + { LLM_TENSOR_ATTN_NORM, "blk.%d.attn_norm" }, + { LLM_TENSOR_ATTN_QKV, "blk.%d.attn_qkv" }, + { LLM_TENSOR_ATTN_OUT, "blk.%d.attn_output" }, + { LLM_TENSOR_FFN_NORM, "blk.%d.ffn_norm" }, + { LLM_TENSOR_FFN_GATE, "blk.%d.ffn_gate" }, + { LLM_TENSOR_FFN_DOWN, "blk.%d.ffn_down" }, + { LLM_TENSOR_FFN_UP, "blk.%d.ffn_up" }, + }, + }, { LLM_ARCH_UNKNOWN, @@ -563,27 +598,16 @@ struct LLM_TN { std::string operator()(llm_tensor tensor, const std::string & suffix, int bid) const { return ::format(LLM_TENSOR_NAMES[arch].at(tensor).c_str(), bid) + "." + suffix; } + + std::string operator()(llm_tensor tensor, const std::string & suffix, int bid, int xid) const { + return ::format(LLM_TENSOR_NAMES[arch].at(tensor).c_str(), bid, xid) + "." + suffix; + } }; // // gguf helpers // -#define GGUF_GET_KEY(ctx, dst, func, type, req, key) \ -do { \ - const std::string skey(key); \ - const int kid = gguf_find_key(ctx, skey.c_str()); \ - if (kid >= 0) { \ - enum gguf_type ktype = gguf_get_kv_type(ctx, kid); \ - if (ktype != (type)) { \ - throw std::runtime_error(format("key %s has wrong type: %s", skey.c_str(), gguf_type_name(ktype))); \ - } \ - (dst) = func(ctx, kid); \ - } else if (req) { \ - throw std::runtime_error(format("key not found in model: %s", skey.c_str())); \ - } \ -} while (0) - static std::map LLAMA_ROPE_SCALING_TYPES = { { LLAMA_ROPE_SCALING_NONE, "none" }, { LLAMA_ROPE_SCALING_LINEAR, "linear" }, @@ -600,6 +624,60 @@ static int8_t llama_rope_scaling_type_from_string(const std::string & name) { return LLAMA_ROPE_SCALING_UNSPECIFIED; } +static std::string gguf_data_to_str(enum gguf_type type, const void * data, int i) { + switch (type) { + case GGUF_TYPE_UINT8: return std::to_string(((const uint8_t *)data)[i]); + case GGUF_TYPE_INT8: return std::to_string(((const int8_t *)data)[i]); + case GGUF_TYPE_UINT16: return std::to_string(((const uint16_t *)data)[i]); + case GGUF_TYPE_INT16: return std::to_string(((const int16_t *)data)[i]); + case GGUF_TYPE_UINT32: return std::to_string(((const uint32_t *)data)[i]); + case GGUF_TYPE_INT32: return std::to_string(((const int32_t *)data)[i]); + case GGUF_TYPE_UINT64: return std::to_string(((const uint64_t *)data)[i]); + case GGUF_TYPE_INT64: return std::to_string(((const int64_t *)data)[i]); + case GGUF_TYPE_FLOAT32: return std::to_string(((const float *)data)[i]); + case GGUF_TYPE_FLOAT64: return std::to_string(((const double *)data)[i]); + case GGUF_TYPE_BOOL: return ((const bool *)data)[i] ? "true" : "false"; + default: return format("unknown type %d", type); + } +} + +static std::string gguf_kv_to_str(const struct gguf_context * ctx_gguf, int i) { + const enum gguf_type type = gguf_get_kv_type(ctx_gguf, i); + + switch (type) { + case GGUF_TYPE_STRING: + return gguf_get_val_str(ctx_gguf, i); + case GGUF_TYPE_ARRAY: + { + const enum gguf_type arr_type = gguf_get_arr_type(ctx_gguf, i); + int arr_n = gguf_get_arr_n(ctx_gguf, i); + const void * data = gguf_get_arr_data(ctx_gguf, i); + std::stringstream ss; + ss << "["; + for (int j = 0; j < arr_n; j++) { + if (arr_type == GGUF_TYPE_STRING) { + std::string val = gguf_get_arr_str(ctx_gguf, i, j); + // escape quotes + replace_all(val, "\\", "\\\\"); + replace_all(val, "\"", "\\\""); + ss << '"' << val << '"'; + } else if (arr_type == GGUF_TYPE_ARRAY) { + ss << "???"; + } else { + ss << gguf_data_to_str(arr_type, data, j); + } + if (j < arr_n - 1) { + ss << ", "; + } + } + ss << "]"; + return ss.str(); + } + default: + return gguf_data_to_str(type, gguf_get_val_data(ctx_gguf, i), 0); + } +} + // // ggml helpers // @@ -1060,6 +1138,12 @@ static std::string llama_token_to_piece(const struct llama_context * ctx, llama_ // struct llama_state { + llama_state() { +#ifdef GGML_USE_METAL + ggml_metal_log_set_callback(log_callback, log_callback_user_data); +#endif + } + // We save the log callback globally ggml_log_callback log_callback = llama_log_callback_default; void * log_callback_user_data = nullptr; @@ -1083,9 +1167,9 @@ enum e_model { MODEL_70B, }; -static const size_t kB = 1024; -static const size_t MB = 1024*kB; -static const size_t GB = 1024*MB; +static const size_t kiB = 1024; +static const size_t MiB = 1024*kiB; +static const size_t GiB = 1024*MiB; struct llama_hparams { bool vocab_only; @@ -1097,6 +1181,8 @@ struct llama_hparams { uint32_t n_layer; uint32_t n_rot; uint32_t n_ff; + uint32_t n_expert = 0; + uint32_t n_expert_used = 0; float f_norm_eps; float f_norm_rms_eps; @@ -1111,15 +1197,18 @@ struct llama_hparams { float f_max_alibi_bias; bool operator!=(const llama_hparams & other) const { - if (this->vocab_only != other.vocab_only) return true; - if (this->n_vocab != other.n_vocab) return true; - if (this->n_ctx_train != other.n_ctx_train) return true; - if (this->n_embd != other.n_embd) return true; - if (this->n_head != other.n_head) return true; - if (this->n_head_kv != other.n_head_kv) return true; - if (this->n_layer != other.n_layer) return true; - if (this->n_rot != other.n_rot) return true; - if (this->n_ff != other.n_ff) return true; + if (this->vocab_only != other.vocab_only) return true; + if (this->n_vocab != other.n_vocab) return true; + if (this->n_ctx_train != other.n_ctx_train) return true; + if (this->n_embd != other.n_embd) return true; + if (this->n_head != other.n_head) return true; + if (this->n_head_kv != other.n_head_kv) return true; + if (this->n_layer != other.n_layer) return true; + if (this->n_rot != other.n_rot) return true; + if (this->n_ff != other.n_ff) return true; + if (this->n_expert != other.n_expert) return true; + if (this->n_expert_used != other.n_expert_used) return true; + if (this->rope_finetuned != other.rope_finetuned) return true; if (this->n_yarn_orig_ctx != other.n_yarn_orig_ctx) return true; @@ -1164,6 +1253,7 @@ struct llama_cparams { float yarn_beta_slow; bool mul_mat_q; + bool offload_kqv; }; struct llama_layer { @@ -1185,6 +1275,9 @@ struct llama_layer { struct ggml_tensor * wqkv; // attention bias + struct ggml_tensor * bq; + struct ggml_tensor * bk; + struct ggml_tensor * bv; struct ggml_tensor * bo; struct ggml_tensor * bqkv; @@ -1197,6 +1290,12 @@ struct llama_layer { struct ggml_tensor * ffn_down; // w2 struct ggml_tensor * ffn_up; // w3 + // ff MoE + struct ggml_tensor * ffn_gate_inp; + struct ggml_tensor * ffn_gate_exp[LLAMA_MAX_EXPERTS]; + struct ggml_tensor * ffn_down_exp[LLAMA_MAX_EXPERTS]; + struct ggml_tensor * ffn_up_exp [LLAMA_MAX_EXPERTS]; + // ff bias struct ggml_tensor * ffn_down_b; // b2 struct ggml_tensor * ffn_up_b; // b3 @@ -1222,14 +1321,15 @@ struct llama_kv_cache { // cannot be freely changed after a slot has been allocated. uint32_t head = 0; uint32_t size = 0; + uint32_t used = 0; // used cells (i.e. at least one seq_id) // computed before each graph build uint32_t n = 0; std::vector cells; - struct ggml_tensor * k = NULL; - struct ggml_tensor * v = NULL; + std::vector k_l; // per layer + std::vector v_l; struct ggml_context * ctx = NULL; @@ -1242,8 +1342,10 @@ struct llama_kv_cache { #ifdef GGML_USE_CUBLAS if (ggml_cublas_loaded()) { - ggml_cuda_free_data(k); - ggml_cuda_free_data(v); + for (size_t i = 0; i < k_l.size(); ++i) { + ggml_cuda_free_data(k_l[i]); + ggml_cuda_free_data(v_l[i]); + } } #endif } @@ -1276,6 +1378,9 @@ struct llama_vocab { id special_sep_id = -1; id special_pad_id = -1; + int special_add_bos = -1; // -1 unknown, 1 add, 0 don't add. + int special_add_eos = -1; // -1 unknown, 1 add, 0 don't add. + id linefeed_id = 13; id special_prefix_id = 32007; id special_middle_id = 32009; @@ -1320,6 +1425,9 @@ struct llama_model { int n_gpu_layers; + // gguf metadata + std::unordered_map gguf_kv; + // context struct ggml_context * ctx = NULL; @@ -1427,9 +1535,11 @@ struct llama_context { static bool llama_kv_cache_init( const struct llama_hparams & hparams, struct llama_kv_cache & cache, - ggml_type wtype, + ggml_type ktype, + ggml_type vtype, uint32_t n_ctx, - int n_gpu_layers) { + int n_gpu_layers, + bool offload) { const uint32_t n_embd = hparams.n_embd_gqa(); const uint32_t n_layer = hparams.n_layer; @@ -1440,11 +1550,12 @@ static bool llama_kv_cache_init( cache.head = 0; cache.size = n_ctx; + cache.used = 0; cache.cells.clear(); cache.cells.resize(n_ctx); - cache.buf.resize(2u*n_elements*ggml_type_size(wtype) + 2u*ggml_tensor_overhead()); + cache.buf.resize(n_elements*(ggml_type_sizef(ktype) + ggml_type_sizef(vtype)) + 2u*n_layer*ggml_tensor_overhead()); memset(cache.buf.data, 0, cache.buf.size); struct ggml_init_params params; @@ -1454,37 +1565,44 @@ static bool llama_kv_cache_init( cache.ctx = ggml_init(params); + size_t vram_kv_cache = 0; + if (!cache.ctx) { LLAMA_LOG_ERROR("%s: failed to allocate memory for kv cache\n", __func__); return false; } - cache.k = ggml_new_tensor_1d(cache.ctx, wtype, n_elements); - cache.v = ggml_new_tensor_1d(cache.ctx, wtype, n_elements); - ggml_set_name(cache.k, "cache_k"); - ggml_set_name(cache.v, "cache_v"); + cache.k_l.reserve(n_layer); + cache.v_l.reserve(n_layer); - (void) n_gpu_layers; + const int i_gpu_start = (int) n_layer - n_gpu_layers; GGML_UNUSED(i_gpu_start); -#ifdef GGML_USE_CUBLAS - if (ggml_cublas_loaded()) { - size_t vram_kv_cache = 0; + GGML_UNUSED(offload); - if (n_gpu_layers > (int)n_layer + 1) { - ggml_cuda_assign_buffers_no_scratch(cache.v); - LLAMA_LOG_INFO("%s: offloading v cache to GPU\n", __func__); - vram_kv_cache += ggml_nbytes(cache.v); - } - if (n_gpu_layers > (int)n_layer + 2) { - ggml_cuda_assign_buffers_no_scratch(cache.k); - LLAMA_LOG_INFO("%s: offloading k cache to GPU\n", __func__); - vram_kv_cache += ggml_nbytes(cache.k); - } - if (vram_kv_cache > 0) { - LLAMA_LOG_INFO("%s: VRAM kv self = %.2f MB\n", __func__, vram_kv_cache / 1024.0 / 1024.0); + for (int i = 0; i < (int) n_layer; i++) { + ggml_tensor * k = ggml_new_tensor_1d(cache.ctx, ktype, n_embd*n_ctx); + ggml_tensor * v = ggml_new_tensor_1d(cache.ctx, vtype, n_embd*n_ctx); + ggml_format_name(k, "cache_k_l%d", i); + ggml_format_name(v, "cache_v_l%d", i); + cache.k_l.push_back(k); + cache.v_l.push_back(v); +#ifdef GGML_USE_CUBLAS + if (i >= i_gpu_start) { + if (offload) { + ggml_cuda_assign_buffers_no_scratch(k); + vram_kv_cache += ggml_nbytes(k); + ggml_cuda_assign_buffers_no_scratch(v); + vram_kv_cache += ggml_nbytes(v); + } } +#endif // GGML_USE_CUBLAS + } + + if (vram_kv_cache > 0) { + LLAMA_LOG_INFO("%s: VRAM kv self = %.2f MB\n", __func__, vram_kv_cache / 1024.0 / 1024.0); } -#endif + + GGML_UNUSED(n_gpu_layers); return true; } @@ -1541,6 +1659,8 @@ static bool llama_kv_cache_find_slot( } } + cache.used += n_tokens; + return true; } @@ -1561,6 +1681,7 @@ static void llama_kv_cache_clear(struct llama_kv_cache & cache) { cache.cells[i].seq_id.clear(); } cache.head = 0; + cache.used = 0; } static void llama_kv_cache_seq_rm( @@ -1583,6 +1704,9 @@ static void llama_kv_cache_seq_rm( continue; } if (cache.cells[i].seq_id.empty()) { + // keep count of the number of used cells + if (cache.cells[i].pos >= 0) cache.used--; + cache.cells[i].pos = -1; if (new_head == cache.size) new_head = i; } @@ -1590,7 +1714,7 @@ static void llama_kv_cache_seq_rm( } // If we freed up a slot, set head to it so searching can start there. - if (new_head != cache.size) cache.head = new_head; + if (new_head != cache.size && new_head < cache.head) cache.head = new_head; } static void llama_kv_cache_seq_cp( @@ -1616,6 +1740,7 @@ static void llama_kv_cache_seq_keep(struct llama_kv_cache & cache, llama_seq_id for (uint32_t i = 0; i < cache.size; ++i) { if (!cache.cells[i].has_seq_id(seq_id)) { + if (cache.cells[i].pos >= 0) cache.used--; cache.cells[i].pos = -1; cache.cells[i].seq_id.clear(); if (new_head == cache.size) new_head = i; @@ -1626,7 +1751,7 @@ static void llama_kv_cache_seq_keep(struct llama_kv_cache & cache, llama_seq_id } // If we freed up a slot, set head to it so searching can start there. - if (new_head != cache.size) cache.head = new_head; + if (new_head != cache.size && new_head < cache.head) cache.head = new_head; } static void llama_kv_cache_seq_shift( @@ -1647,6 +1772,7 @@ static void llama_kv_cache_seq_shift( cache.cells[i].delta += delta; if (cache.cells[i].pos < 0) { + if (!cache.cells[i].seq_id.empty()) cache.used--; cache.cells[i].pos = -1; cache.cells[i].seq_id.clear(); if (new_head == cache.size) new_head = i; @@ -1697,6 +1823,169 @@ static std::string llama_format_tensor_shape(const struct ggml_tensor * t) { return buf; } +namespace GGUFMeta { + template + struct GKV_Base_Type { + static constexpr gguf_type gt = gt_; + + static T getter(const gguf_context * ctx, const int kid) { + return gfun(ctx, kid); + } + }; + + template struct GKV_Base; + + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + template<> struct GKV_Base: GKV_Base_Type {}; + + template<> struct GKV_Base { + static constexpr gguf_type gt = GGUF_TYPE_STRING; + + static std::string getter(const gguf_context * ctx, const int kid) { + return gguf_get_val_str(ctx, kid); + } + }; + + struct ArrayInfo{ + const gguf_type gt; + const size_t length; + const void * data; + }; + + template<> struct GKV_Base { + public: + static constexpr gguf_type gt = GGUF_TYPE_ARRAY; + static ArrayInfo getter(const gguf_context *ctx, const int k) { + return ArrayInfo { + gguf_get_arr_type(ctx, k), + size_t(gguf_get_arr_n(ctx, k)), + gguf_get_arr_data(ctx, k), + }; + } + }; + + template + class GKV: public GKV_Base { + GKV() = delete; + + public: + static T get_kv(const gguf_context * ctx, const int k) { + const enum gguf_type kt = gguf_get_kv_type(ctx, k); + + if (kt != GKV::gt) { + throw std::runtime_error(format("key %s has wrong type %s but expected type %s", + gguf_get_key(ctx, k), gguf_type_name(kt), gguf_type_name(GKV::gt))); + } + return GKV::getter(ctx, k); + } + + static const char * override_type_to_str(const llama_model_kv_override_type ty) { + switch (ty) { + case LLAMA_KV_OVERRIDE_BOOL: return "bool"; + case LLAMA_KV_OVERRIDE_INT: return "int"; + case LLAMA_KV_OVERRIDE_FLOAT: return "float"; + } + return "unknown"; + } + + static bool validate_override(const llama_model_kv_override_type expected_type, const struct llama_model_kv_override *override) { + if (!override) { return false; } + if (override->tag == expected_type) { + LLAMA_LOG_INFO("%s: Using metadata override (%5s) '%s' = ", + __func__, override_type_to_str(override->tag), override->key); + switch (override->tag) { + case LLAMA_KV_OVERRIDE_BOOL: { + printf("%s\n", override->bool_value ? "true" : "false"); + } break; + case LLAMA_KV_OVERRIDE_INT: { + printf("%" PRId64 "\n", override->int_value); + } break; + case LLAMA_KV_OVERRIDE_FLOAT: { + printf("%.6f\n", override->float_value); + } break; + default: + // Shouldn't be possible to end up here, but just in case... + throw std::runtime_error( + format("Unsupported attempt to override %s type for metadata key %s\n", + override_type_to_str(override->tag), override->key)); + } + return true; + } + LLAMA_LOG_WARN("%s: Warning: Bad metadata override type for key '%s', expected %s but got %s\n", + __func__, override->key, override_type_to_str(expected_type), override_type_to_str(override->tag)); + return false; + } + + template + static typename std::enable_if::value, bool>::type + try_override(OT & target, const struct llama_model_kv_override *override) { + if (validate_override(LLAMA_KV_OVERRIDE_BOOL, override)) { + target = override->bool_value; + return true; + } + return true; + } + + template + static typename std::enable_if::value && std::is_integral::value, bool>::type + try_override(OT & target, const struct llama_model_kv_override *override) { + if (validate_override(LLAMA_KV_OVERRIDE_INT, override)) { + target = override->int_value; + return true; + } + return false; + } + + template + static typename std::enable_if::value, bool>::type + try_override(T & target, const struct llama_model_kv_override *override) { + if (validate_override(LLAMA_KV_OVERRIDE_FLOAT, override)) { + target = override->float_value; + return true; + } + return false; + } + + template + static typename std::enable_if::value, bool>::type + try_override(T & target, const struct llama_model_kv_override *override) { + (void)target; + (void)override; + if (!override) { return false; } + // Currently, we should never end up here so it would be a bug if we do. + throw std::runtime_error(format("Unsupported attempt to override string type for metadata key %s\n", + override ? override->key : "NULL")); + } + + static bool set(const gguf_context * ctx, const int k, T & target, const struct llama_model_kv_override *override = nullptr) { + if (try_override(target, override)) { + return true; + } + if (k < 0) { return false; } + target = get_kv(ctx, k); + return true; + } + + static bool set(const gguf_context * ctx, const char * key, T & target, const struct llama_model_kv_override *override = nullptr) { + return set(ctx, gguf_find_key(ctx, key), target, override); + } + + static bool set(const gguf_context * ctx, const std::string & key, T & target, const struct llama_model_kv_override *override = nullptr) { + return set(ctx, key.c_str(), target, override); + } + }; +} + struct llama_model_loader { int n_kv = 0; int n_tensors = 0; @@ -1712,21 +2001,34 @@ struct llama_model_loader { llama_fver fver; std::unique_ptr mapping; + std::unordered_map kv_overrides; struct gguf_context * ctx_gguf = NULL; struct ggml_context * ctx_meta = NULL; - llama_model_loader(const std::string & fname, bool use_mmap) : file(fname.c_str(), "rb") { + std::string arch_name; + LLM_KV llm_kv = LLM_KV(LLM_ARCH_UNKNOWN); + + llama_model_loader(const std::string & fname, bool use_mmap, const struct llama_model_kv_override * param_overrides_p) : file(fname.c_str(), "rb") { struct gguf_init_params params = { /*.no_alloc = */ true, /*.ctx = */ &ctx_meta, }; + if (param_overrides_p != nullptr) { + for (const struct llama_model_kv_override *p = param_overrides_p; p->key[0] != 0; p++) { + kv_overrides.insert({std::string(p->key), *p}); + } + } + ctx_gguf = gguf_init_from_file(fname.c_str(), params); if (!ctx_gguf) { throw std::runtime_error(format("%s: failed to load model from %s\n", __func__, fname.c_str())); } + get_key(llm_kv(LLM_KV_GENERAL_ARCHITECTURE), arch_name, false); + llm_kv = LLM_KV(llm_arch_from_string(arch_name)); + n_kv = gguf_get_n_kv(ctx_gguf); n_tensors = gguf_get_n_tensors(ctx_gguf); @@ -1778,10 +2080,10 @@ struct llama_model_loader { case GGML_TYPE_Q5_K: ftype = LLAMA_FTYPE_MOSTLY_Q5_K_M; break; case GGML_TYPE_Q6_K: ftype = LLAMA_FTYPE_MOSTLY_Q6_K; break; default: - { - LLAMA_LOG_WARN("%s: unknown type %s\n", __func__, ggml_type_name(type_max)); - ftype = LLAMA_FTYPE_ALL_F32; - } break; + { + LLAMA_LOG_WARN("%s: unknown type %s\n", __func__, ggml_type_name(type_max)); + ftype = LLAMA_FTYPE_ALL_F32; + } break; } // this is a way to mark that we have "guessed" the file type @@ -1794,11 +2096,23 @@ struct llama_model_loader { } } + LLAMA_LOG_INFO("%s: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", __func__); for (int i = 0; i < n_kv; i++) { - const char * name = gguf_get_key(ctx_gguf, i); - const enum gguf_type type = gguf_get_kv_type(ctx_gguf, i); + const char * name = gguf_get_key(ctx_gguf, i); + const enum gguf_type type = gguf_get_kv_type(ctx_gguf, i); + const std::string type_name = + type == GGUF_TYPE_ARRAY + ? format("%s[%s,%d]", gguf_type_name(type), gguf_type_name(gguf_get_arr_type(ctx_gguf, i)), gguf_get_arr_n(ctx_gguf, i)) + : gguf_type_name(type); + + std::string value = gguf_kv_to_str(ctx_gguf, i); + const size_t MAX_VALUE_LEN = 40; + if (value.size() > MAX_VALUE_LEN) { + value = format("%s...", value.substr(0, MAX_VALUE_LEN - 3).c_str()); + } + replace_all(value, "\n", "\\n"); - LLAMA_LOG_INFO("%s: - kv %3d: %42s %-8s\n", __func__, i, name, gguf_type_name(type)); + LLAMA_LOG_INFO("%s: - kv %3d: %42s %-16s = %s\n", __func__, i, name, type_name.c_str(), value.c_str()); } // print type counts @@ -1828,19 +2142,59 @@ struct llama_model_loader { } } - std::string get_arch_name() const { - const auto kv = LLM_KV(LLM_ARCH_UNKNOWN); + template + typename std::enable_if::value, bool>::type + get_arr_n(const std::string & key, T & result, const bool required = true) { + const int kid = gguf_find_key(ctx_gguf, key.c_str()); + + if (kid < 0) { + if (required) { + throw std::runtime_error(format("key not found in model: %s", key.c_str())); + } + return false; + } + + struct GGUFMeta::ArrayInfo arr_info = + GGUFMeta::GKV::get_kv(ctx_gguf, kid); - std::string arch_name; - GGUF_GET_KEY(ctx_gguf, arch_name, gguf_get_val_str, GGUF_TYPE_STRING, false, kv(LLM_KV_GENERAL_ARCHITECTURE)); + result = arr_info.length; + return true; + } + + template + typename std::enable_if::value, bool>::type + get_arr_n(const enum llm_kv kid, T & result, const bool required = true) { + return get_arr_n(llm_kv(kid), result, required); + } + + template + bool get_key(const std::string & key, T & result, const bool required = true) { + auto it = kv_overrides.find(key); + + const struct llama_model_kv_override * override = + it != kv_overrides.end() ? &it->second : nullptr; + + const bool found = GGUFMeta::GKV::set(ctx_gguf, key, result, override); + + if (required && !found) { + throw std::runtime_error(format("key not found in model: %s", key.c_str())); + } + + return found; + } + + template + bool get_key(const enum llm_kv kid, T & result, const bool required = true) { + return get_key(llm_kv(kid), result, required); + } + + std::string get_arch_name() const { return arch_name; } enum llm_arch get_arch() const { - const std::string arch_name = get_arch_name(); - - return llm_arch_from_string(arch_name); + return llm_kv.arch; } const char * get_tensor_name(int i) const { @@ -1880,10 +2234,13 @@ struct llama_model_loader { return tensor; } - struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name, const std::vector & ne, ggml_backend_type backend) { + struct ggml_tensor * create_tensor(struct ggml_context * ctx, const std::string & name, const std::vector & ne, ggml_backend_type backend, bool required = true) { struct ggml_tensor * cur = ggml_get_tensor(ctx_meta, name.c_str()); if (cur == NULL) { + if (!required) { + return NULL; + } throw std::runtime_error(format("%s: tensor '%s' not found", __func__, name.c_str())); } @@ -2087,49 +2444,66 @@ static void llm_load_arch(llama_model_loader & ml, llama_model & model) { static void llm_load_hparams( llama_model_loader & ml, llama_model & model) { - struct gguf_context * ctx = ml.ctx_gguf; - - const auto kv = LLM_KV(model.arch); - auto & hparams = model.hparams; + const gguf_context * ctx = ml.ctx_gguf; + + // get metadata as string + for (int i = 0; i < gguf_get_n_kv(ctx); i++) { + enum gguf_type type = gguf_get_kv_type(ctx, i); + if (type == GGUF_TYPE_ARRAY) { + continue; + } + const char * name = gguf_get_key(ctx, i); + const std::string value = gguf_kv_to_str(ctx, i); + model.gguf_kv.emplace(name, value); + } // get general kv - GGUF_GET_KEY(ctx, model.name, gguf_get_val_str, GGUF_TYPE_STRING, false, kv(LLM_KV_GENERAL_NAME)); + ml.get_key(LLM_KV_GENERAL_NAME, model.name, false); // get hparams kv - GGUF_GET_KEY(ctx, hparams.n_vocab, gguf_get_arr_n, GGUF_TYPE_ARRAY, true, kv(LLM_KV_TOKENIZER_LIST)); - GGUF_GET_KEY(ctx, hparams.n_ctx_train, gguf_get_val_u32, GGUF_TYPE_UINT32, true, kv(LLM_KV_CONTEXT_LENGTH)); - GGUF_GET_KEY(ctx, hparams.n_embd, gguf_get_val_u32, GGUF_TYPE_UINT32, true, kv(LLM_KV_EMBEDDING_LENGTH)); - GGUF_GET_KEY(ctx, hparams.n_ff, gguf_get_val_u32, GGUF_TYPE_UINT32, true, kv(LLM_KV_FEED_FORWARD_LENGTH)); - GGUF_GET_KEY(ctx, hparams.n_head, gguf_get_val_u32, GGUF_TYPE_UINT32, true, kv(LLM_KV_ATTENTION_HEAD_COUNT)); - GGUF_GET_KEY(ctx, hparams.n_layer, gguf_get_val_u32, GGUF_TYPE_UINT32, true, kv(LLM_KV_BLOCK_COUNT)); + ml.get_arr_n(LLM_KV_TOKENIZER_LIST, hparams.n_vocab); + ml.get_key (LLM_KV_CONTEXT_LENGTH, hparams.n_ctx_train); + ml.get_key (LLM_KV_EMBEDDING_LENGTH, hparams.n_embd); + ml.get_key (LLM_KV_FEED_FORWARD_LENGTH, hparams.n_ff); + ml.get_key (LLM_KV_ATTENTION_HEAD_COUNT, hparams.n_head); + ml.get_key (LLM_KV_BLOCK_COUNT, hparams.n_layer); + ml.get_key (LLM_KV_EXPERT_COUNT, hparams.n_expert, false); + ml.get_key (LLM_KV_EXPERT_USED_COUNT, hparams.n_expert_used, false); + + GGML_ASSERT(hparams.n_expert <= LLAMA_MAX_EXPERTS); + GGML_ASSERT(hparams.n_expert_used <= hparams.n_expert); + if (hparams.n_expert > 0) { + GGML_ASSERT(hparams.n_expert_used > 0); + } else { + GGML_ASSERT(hparams.n_expert_used == 0); + } // n_head_kv is optional, default to n_head hparams.n_head_kv = hparams.n_head; - GGUF_GET_KEY(ctx, hparams.n_head_kv, gguf_get_val_u32, GGUF_TYPE_UINT32, false, kv(LLM_KV_ATTENTION_HEAD_COUNT_KV)); + ml.get_key(LLM_KV_ATTENTION_HEAD_COUNT_KV, hparams.n_head_kv, false); - hparams.rope_finetuned = false; - GGUF_GET_KEY(ctx, hparams.rope_finetuned, gguf_get_val_bool, GGUF_TYPE_BOOL, false, - kv(LLM_KV_ROPE_SCALING_FINETUNED)); + bool rope_finetuned = false; + ml.get_key(LLM_KV_ROPE_SCALING_FINETUNED, rope_finetuned, false); + hparams.rope_finetuned = rope_finetuned; hparams.n_yarn_orig_ctx = hparams.n_ctx_train; - GGUF_GET_KEY(ctx, hparams.n_yarn_orig_ctx, gguf_get_val_u32, GGUF_TYPE_UINT32, false, - kv(LLM_KV_ROPE_SCALING_ORIG_CTX_LEN)); + ml.get_key(LLM_KV_ROPE_SCALING_ORIG_CTX_LEN, hparams.n_yarn_orig_ctx, false); // rope_freq_base (optional) hparams.rope_freq_base_train = 10000.0f; - GGUF_GET_KEY(ctx, hparams.rope_freq_base_train, gguf_get_val_f32, GGUF_TYPE_FLOAT32, false, kv(LLM_KV_ROPE_FREQ_BASE)); + ml.get_key(LLM_KV_ROPE_FREQ_BASE, hparams.rope_freq_base_train, false); std::string rope_scaling("linear"); - GGUF_GET_KEY(ctx, rope_scaling, gguf_get_val_str, GGUF_TYPE_STRING, false, kv(LLM_KV_ROPE_SCALING_TYPE)); + ml.get_key(LLM_KV_ROPE_SCALING_TYPE, rope_scaling, false); hparams.rope_scaling_type_train = llama_rope_scaling_type_from_string(rope_scaling); GGML_ASSERT(hparams.rope_scaling_type_train != LLAMA_ROPE_SCALING_UNSPECIFIED); // rope_freq_scale (inverse of the kv) is optional float ropescale = 0.0f; - GGUF_GET_KEY(ctx, ropescale, gguf_get_val_f32, GGUF_TYPE_FLOAT32, false, kv(LLM_KV_ROPE_SCALING_FACTOR)); - if (ropescale == 0.0f) { // try the old key name - GGUF_GET_KEY(ctx, ropescale, gguf_get_val_f32, GGUF_TYPE_FLOAT32, false, kv(LLM_KV_ROPE_SCALE_LINEAR)); + if (!ml.get_key(LLM_KV_ROPE_SCALING_FACTOR, ropescale, false)) { + // try the old key name + ml.get_key(LLM_KV_ROPE_SCALE_LINEAR, ropescale, false); } hparams.rope_freq_scale_train = ropescale == 0.0f ? 1.0f : 1.0f/ropescale; @@ -2137,7 +2511,7 @@ static void llm_load_hparams( { hparams.n_rot = hparams.n_embd / hparams.n_head; - GGUF_GET_KEY(ctx, hparams.n_rot, gguf_get_val_u32, GGUF_TYPE_UINT32, false, kv(LLM_KV_ROPE_DIMENSION_COUNT)); + ml.get_key(LLM_KV_ROPE_DIMENSION_COUNT, hparams.n_rot, false); if (model.arch == LLM_ARCH_LLAMA || model.arch == LLM_ARCH_FALCON) { if (hparams.n_rot != hparams.n_embd / hparams.n_head) { @@ -2152,7 +2526,7 @@ static void llm_load_hparams( switch (model.arch) { case LLM_ARCH_LLAMA: { - GGUF_GET_KEY(ctx, hparams.f_norm_rms_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps); switch (hparams.n_layer) { case 26: model.type = e_model::MODEL_3B; break; @@ -2166,7 +2540,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_FALCON: { - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); switch (hparams.n_layer) { case 32: model.type = e_model::MODEL_7B; break; @@ -2176,7 +2550,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_BAICHUAN: { - GGUF_GET_KEY(ctx, hparams.f_norm_rms_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps); switch (hparams.n_layer) { case 32: model.type = e_model::MODEL_7B; break; case 40: model.type = e_model::MODEL_13B; break; @@ -2185,7 +2559,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_STARCODER: { - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); switch (hparams.n_layer) { case 24: model.type = e_model::MODEL_1B; break; case 36: model.type = e_model::MODEL_3B; break; @@ -2196,7 +2570,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_PERSIMMON: { - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); switch (hparams.n_layer) { case 36: model.type = e_model::MODEL_8B; break; default: model.type = e_model::MODEL_UNKNOWN; @@ -2204,7 +2578,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_REFACT: { - GGUF_GET_KEY(ctx, hparams.f_norm_rms_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps); switch (hparams.n_layer) { case 32: model.type = e_model::MODEL_1B; break; default: model.type = e_model::MODEL_UNKNOWN; @@ -2212,7 +2586,7 @@ static void llm_load_hparams( } break; case LLM_ARCH_BLOOM: { - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); switch (hparams.n_layer) { case 24: model.type = e_model::MODEL_1B; break; @@ -2227,9 +2601,9 @@ static void llm_load_hparams( { hparams.f_clamp_kqv = 0.0f; - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); - GGUF_GET_KEY(ctx, hparams.f_clamp_kqv, gguf_get_val_f32, GGUF_TYPE_FLOAT32, false, kv(LLM_KV_ATTENTION_CLAMP_KQV)); - GGUF_GET_KEY(ctx, hparams.f_max_alibi_bias, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_MAX_ALIBI_BIAS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); + ml.get_key(LLM_KV_ATTENTION_CLAMP_KQV, hparams.f_clamp_kqv, false); + ml.get_key(LLM_KV_ATTENTION_MAX_ALIBI_BIAS, hparams.f_max_alibi_bias); switch (hparams.n_layer) { case 32: model.type = e_model::MODEL_7B; break; @@ -2239,13 +2613,23 @@ static void llm_load_hparams( } break; case LLM_ARCH_STABLELM: { - GGUF_GET_KEY(ctx, hparams.f_norm_eps, gguf_get_val_f32, GGUF_TYPE_FLOAT32, true, kv(LLM_KV_ATTENTION_LAYERNORM_EPS)); + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_EPS, hparams.f_norm_eps); switch (hparams.n_layer) { case 32: model.type = e_model::MODEL_3B; break; default: model.type = e_model::MODEL_UNKNOWN; } } break; + case LLM_ARCH_QWEN: + { + ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps); + + switch (hparams.n_layer) { + case 32: model.type = e_model::MODEL_7B; break; + case 40: model.type = e_model::MODEL_13B; break; + default: model.type = e_model::MODEL_UNKNOWN; + } + } break; default: (void)0; } @@ -2287,7 +2671,7 @@ static void llm_load_vocab( { std::string tokenizer_name; - GGUF_GET_KEY(ctx, tokenizer_name, gguf_get_val_str, GGUF_TYPE_STRING, true, kv(LLM_KV_TOKENIZER_MODEL)); + ml.get_key(LLM_KV_TOKENIZER_MODEL, tokenizer_name); if (tokenizer_name == "llama") { vocab.type = LLAMA_VOCAB_TYPE_SPM; @@ -2377,16 +2761,30 @@ static void llm_load_vocab( }; for (const auto & it : special_token_types) { const std::string & key = kv(std::get<0>(it)); - int32_t & id = std::get<1>(it), old_id = id; + int32_t & id = std::get<1>(it); - GGUF_GET_KEY(ctx, id, gguf_get_val_u32, GGUF_TYPE_UINT32, false, key); - // Must be >= -1 and < vocab size. Since the key is unsigned, -1 - // can only come from the default value, so there's no point in - // validating that. - if (size_t(id + 1) > vocab.id_to_token.size()) { - LLAMA_LOG_WARN("%s: bad special token: '%s' = %d, using default id %d\n", - __func__, key.c_str(), id, old_id); - id = old_id; + uint32_t new_id; + if (!ml.get_key(std::get<0>(it), new_id, false)) { + continue; + } + if (new_id >= vocab.id_to_token.size()) { + LLAMA_LOG_WARN("%s: bad special token: '%s' = %ud, using default id %d\n", + __func__, key.c_str(), new_id, id); + } else { + id = new_id; + } + + } + + // Handle add_bos_token and add_eos_token + { + bool temp = true; + + if (ml.get_key(LLM_KV_TOKENIZER_ADD_BOS, temp, false)) { + vocab.special_add_bos = int(temp); + } + if (ml.get_key(LLM_KV_TOKENIZER_ADD_EOS, temp, false)) { + vocab.special_add_eos = int(temp); } } } @@ -2398,7 +2796,7 @@ static void llm_load_vocab( // The assumption is, since special tokens aren't meant to be exposed to end user, they are designed // to be unmatchable by the tokenizer, therefore tokens from the vocab, which are unmatchable by the tokenizer // are special tokens. - // From testing, this appears to corelate 1:1 with special tokens. + // From testing, this appears to correlate 1:1 with special tokens. // // Counting special tokens and verifying in only one direction @@ -2511,6 +2909,8 @@ static void llm_load_print_meta(llama_model_loader & ml, llama_model & model) { LLAMA_LOG_INFO("%s: f_clamp_kqv = %.1e\n", __func__, hparams.f_clamp_kqv); LLAMA_LOG_INFO("%s: f_max_alibi_bias = %.1e\n", __func__, hparams.f_max_alibi_bias); LLAMA_LOG_INFO("%s: n_ff = %u\n", __func__, hparams.n_ff); + LLAMA_LOG_INFO("%s: n_expert = %u\n", __func__, hparams.n_expert); + LLAMA_LOG_INFO("%s: n_expert_used = %u\n", __func__, hparams.n_expert_used); LLAMA_LOG_INFO("%s: rope scaling = %s\n", __func__, rope_scaling_type.c_str()); LLAMA_LOG_INFO("%s: freq_base_train = %.1f\n", __func__, hparams.rope_freq_base_train); LLAMA_LOG_INFO("%s: freq_scale_train = %g\n", __func__, hparams.rope_freq_scale_train); @@ -2519,22 +2919,22 @@ static void llm_load_print_meta(llama_model_loader & ml, llama_model & model) { LLAMA_LOG_INFO("%s: model type = %s\n", __func__, llama_model_type_name(model.type)); LLAMA_LOG_INFO("%s: model ftype = %s\n", __func__, llama_model_ftype_name(model.ftype).c_str()); LLAMA_LOG_INFO("%s: model params = %.2f B\n", __func__, ml.n_elements*1e-9); - if (ml.n_bytes < GB) { - LLAMA_LOG_INFO("%s: model size = %.2f MiB (%.2f BPW) \n", __func__, ml.n_bytes/1024.0/1024.0, ml.n_bytes*8.0/ml.n_elements); + if (ml.n_bytes < GiB) { + LLAMA_LOG_INFO("%s: model size = %.2f MiB (%.2f BPW) \n", __func__, ml.n_bytes/1024.0/1024.0, ml.n_bytes*8.0/ml.n_elements); } else { LLAMA_LOG_INFO("%s: model size = %.2f GiB (%.2f BPW) \n", __func__, ml.n_bytes/1024.0/1024.0/1024.0, ml.n_bytes*8.0/ml.n_elements); } // general kv - LLAMA_LOG_INFO("%s: general.name = %s\n", __func__, model.name.c_str()); + LLAMA_LOG_INFO("%s: general.name = %s\n", __func__, model.name.c_str()); // special tokens - if (vocab.special_bos_id != -1) { LLAMA_LOG_INFO( "%s: BOS token = %d '%s'\n", __func__, vocab.special_bos_id, vocab.id_to_token[vocab.special_bos_id].text.c_str() ); } - if (vocab.special_eos_id != -1) { LLAMA_LOG_INFO( "%s: EOS token = %d '%s'\n", __func__, vocab.special_eos_id, vocab.id_to_token[vocab.special_eos_id].text.c_str() ); } - if (vocab.special_unk_id != -1) { LLAMA_LOG_INFO( "%s: UNK token = %d '%s'\n", __func__, vocab.special_unk_id, vocab.id_to_token[vocab.special_unk_id].text.c_str() ); } - if (vocab.special_sep_id != -1) { LLAMA_LOG_INFO( "%s: SEP token = %d '%s'\n", __func__, vocab.special_sep_id, vocab.id_to_token[vocab.special_sep_id].text.c_str() ); } - if (vocab.special_pad_id != -1) { LLAMA_LOG_INFO( "%s: PAD token = %d '%s'\n", __func__, vocab.special_pad_id, vocab.id_to_token[vocab.special_pad_id].text.c_str() ); } - if (vocab.linefeed_id != -1) { LLAMA_LOG_INFO( "%s: LF token = %d '%s'\n", __func__, vocab.linefeed_id, vocab.id_to_token[vocab.linefeed_id].text.c_str() ); } + if (vocab.special_bos_id != -1) { LLAMA_LOG_INFO( "%s: BOS token = %d '%s'\n", __func__, vocab.special_bos_id, vocab.id_to_token[vocab.special_bos_id].text.c_str() ); } + if (vocab.special_eos_id != -1) { LLAMA_LOG_INFO( "%s: EOS token = %d '%s'\n", __func__, vocab.special_eos_id, vocab.id_to_token[vocab.special_eos_id].text.c_str() ); } + if (vocab.special_unk_id != -1) { LLAMA_LOG_INFO( "%s: UNK token = %d '%s'\n", __func__, vocab.special_unk_id, vocab.id_to_token[vocab.special_unk_id].text.c_str() ); } + if (vocab.special_sep_id != -1) { LLAMA_LOG_INFO( "%s: SEP token = %d '%s'\n", __func__, vocab.special_sep_id, vocab.id_to_token[vocab.special_sep_id].text.c_str() ); } + if (vocab.special_pad_id != -1) { LLAMA_LOG_INFO( "%s: PAD token = %d '%s'\n", __func__, vocab.special_pad_id, vocab.id_to_token[vocab.special_pad_id].text.c_str() ); } + if (vocab.linefeed_id != -1) { LLAMA_LOG_INFO( "%s: LF token = %d '%s'\n", __func__, vocab.linefeed_id, vocab.id_to_token[vocab.linefeed_id].text.c_str() ); } } static void llm_load_tensors( @@ -2558,7 +2958,7 @@ static void llm_load_tensors( ml.calc_sizes(ctx_size, mmapped_size); - LLAMA_LOG_INFO("%s: ggml ctx size = %7.2f MB\n", __func__, ctx_size/1024.0/1024.0); + LLAMA_LOG_INFO("%s: ggml ctx size = %7.2f MiB\n", __func__, ctx_size/1024.0/1024.0); // create the ggml context { @@ -2620,14 +3020,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -2664,17 +3057,55 @@ static void llm_load_tensors( layer.wv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "weight", i), {n_embd, n_embd_gqa}, backend_split); layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); + // optional bias tensors + layer.bq = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_Q, "bias", i), {n_embd}, backend, false); + layer.bk = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_K, "bias", i), {n_embd_gqa}, backend, false); + layer.bv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_V, "bias", i), {n_embd_gqa}, backend, false); + layer.bo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "bias", i), {n_embd}, backend, false); + layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); - layer.ffn_gate = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); - layer.ffn_down = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); - layer.ffn_up = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); + layer.ffn_gate_inp = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE_INP, "weight", i), {n_embd}, backend, false); + + if (layer.ffn_gate_inp == nullptr) { + GGML_ASSERT(hparams.n_expert == 0); + GGML_ASSERT(hparams.n_expert_used == 0); + + layer.ffn_gate = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); + layer.ffn_down = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); + layer.ffn_up = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); + } else { + GGML_ASSERT(hparams.n_expert > 0); + GGML_ASSERT(hparams.n_expert_used > 0); + + // MoE branch + for (uint32_t x = 0; x < hparams.n_expert; ++x) { + layer.ffn_gate_exp[x] = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE_EXP, "weight", i, x), {n_embd, n_ff}, backend_split); + layer.ffn_down_exp[x] = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN_EXP, "weight", i, x), { n_ff, n_embd}, backend_split); + layer.ffn_up_exp[x] = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP_EXP, "weight", i, x), {n_embd, n_ff}, backend_split); + } + } if (backend == GGML_BACKEND_GPU) { vram_weights += - ggml_nbytes(layer.attn_norm) + ggml_nbytes(layer.wq) + ggml_nbytes(layer.wk) + - ggml_nbytes(layer.wv) + ggml_nbytes(layer.wo) + ggml_nbytes(layer.ffn_norm) + - ggml_nbytes(layer.ffn_gate) + ggml_nbytes(layer.ffn_down) + ggml_nbytes(layer.ffn_up); + ggml_nbytes(layer.attn_norm) + ggml_nbytes(layer.wq) + ggml_nbytes(layer.wk) + + ggml_nbytes(layer.wv) + ggml_nbytes(layer.wo) + + (layer.bq ? ggml_nbytes(layer.bq) : 0) + + (layer.bk ? ggml_nbytes(layer.bk) : 0) + + (layer.bv ? ggml_nbytes(layer.bv) : 0) + + (layer.bo ? ggml_nbytes(layer.bo) : 0) + + ggml_nbytes(layer.ffn_norm); + + if (layer.ffn_gate_inp == nullptr) { + vram_weights += + ggml_nbytes(layer.ffn_gate) + ggml_nbytes(layer.ffn_down) + ggml_nbytes(layer.ffn_up); + } else { + vram_weights += ggml_nbytes(layer.ffn_gate_inp); + for (uint32_t x = 0; x < hparams.n_expert; ++x) { + vram_weights += + ggml_nbytes(layer.ffn_gate_exp[x]) + ggml_nbytes(layer.ffn_down_exp[x]) + ggml_nbytes(layer.ffn_up_exp[x]); + } + } } } } break; @@ -2686,14 +3117,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -2756,14 +3180,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -2833,14 +3250,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -2910,21 +3320,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { -#ifdef GGML_USE_CUBLAS - if (n_gpu_layers > int(n_layer + 1)) { - LLAMA_LOG_ERROR("%s: CUDA backend missing Persimmon CUDA ops, can offload at most %ld layers. See: https://github.com/ggerganov/llama.cpp/issues/4038\n", - __func__, n_layer + 1); - throw std::runtime_error("Persimmon CUDA offload failed"); - } -#endif - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -2983,14 +3379,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -3061,14 +3450,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -3128,14 +3510,7 @@ static void llm_load_tensors( ggml_backend_type backend_output; if (n_gpu_layers > int(n_layer)) { - // norm is not performance relevant on its own but keeping it in VRAM reduces data copying - // on Windows however this is detrimental unless everything is on the GPU -#ifndef _WIN32 - backend_norm = llama_backend_offload; -#else - backend_norm = n_gpu_layers <= (int) n_layer + 2 ? GGML_BACKEND_CPU : llama_backend_offload; -#endif // _WIN32 - + backend_norm = llama_backend_offload; backend_output = llama_backend_offload_split; } else { backend_norm = GGML_BACKEND_CPU; @@ -3192,6 +3567,64 @@ static void llm_load_tensors( } } } break; + case LLM_ARCH_QWEN: + { + model.tok_embd = ml.create_tensor(ctx, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab}, GGML_BACKEND_CPU); + { + ggml_backend_type backend_norm; + ggml_backend_type backend_output; + + if (n_gpu_layers > int(n_layer)) { + backend_norm = llama_backend_offload; + backend_output = llama_backend_offload_split; + } else { + backend_norm = GGML_BACKEND_CPU; + backend_output = GGML_BACKEND_CPU; + } + + model.output_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT_NORM, "weight"), {n_embd}, backend_norm); + model.output = ml.create_tensor(ctx, tn(LLM_TENSOR_OUTPUT, "weight"), {n_embd, n_vocab}, backend_output); + + if (backend_norm == GGML_BACKEND_GPU) { + vram_weights += ggml_nbytes(model.output_norm); + } + if (backend_output == GGML_BACKEND_GPU_SPLIT) { + vram_weights += ggml_nbytes(model.output); + } + } + + const uint32_t n_ff = hparams.n_ff / 2; + + const int i_gpu_start = n_layer - n_gpu_layers; + + model.layers.resize(n_layer); + + for (uint32_t i = 0; i < n_layer; ++i) { + const ggml_backend_type backend = int(i) < i_gpu_start ? GGML_BACKEND_CPU : llama_backend_offload; // NOLINT + const ggml_backend_type backend_split = int(i) < i_gpu_start ? GGML_BACKEND_CPU : llama_backend_offload_split; // NOLINT + + auto & layer = model.layers[i]; + + layer.attn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_NORM, "weight", i), {n_embd}, backend); + + layer.wqkv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_QKV, "weight", i), {n_embd, n_embd * 3}, backend_split); + layer.bqkv = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_QKV, "bias", i), {n_embd * 3}, backend); + layer.wo = ml.create_tensor(ctx, tn(LLM_TENSOR_ATTN_OUT, "weight", i), {n_embd, n_embd}, backend_split); + + layer.ffn_norm = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_NORM, "weight", i), {n_embd}, backend); + + layer.ffn_gate = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_GATE, "weight", i), {n_embd, n_ff}, backend_split); + layer.ffn_down = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_DOWN, "weight", i), { n_ff, n_embd}, backend_split); + layer.ffn_up = ml.create_tensor(ctx, tn(LLM_TENSOR_FFN_UP, "weight", i), {n_embd, n_ff}, backend_split); + + if (backend == GGML_BACKEND_GPU) { + vram_weights += + ggml_nbytes(layer.attn_norm) + ggml_nbytes(layer.wqkv) + ggml_nbytes(layer.bqkv) + + ggml_nbytes(layer.wo) + ggml_nbytes(layer.ffn_norm) + ggml_nbytes(layer.ffn_gate) + + ggml_nbytes(layer.ffn_down) + ggml_nbytes(layer.ffn_up); + } + } + } break; default: throw std::runtime_error("unknown architecture"); @@ -3207,7 +3640,7 @@ static void llm_load_tensors( ctx_size + mmapped_size - vram_weights; // weights in VRAM not in memory - LLAMA_LOG_INFO("%s: mem required = %7.2f MB\n", __func__, mem_required / 1024.0 / 1024.0); + LLAMA_LOG_INFO("%s: mem required = %7.2f MiB\n", __func__, mem_required / 1024.0 / 1024.0); #if defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST) const int n_gpu = std::min(n_gpu_layers, int(hparams.n_layer)); @@ -3218,15 +3651,15 @@ static void llm_load_tensors( } #ifdef GGML_USE_CUBLAS - const int max_backend_supported_layers = hparams.n_layer + 3; - const int max_offloadable_layers = hparams.n_layer + 3; + const int max_backend_supported_layers = hparams.n_layer + 1; + const int max_offloadable_layers = hparams.n_layer + 1; #elif GGML_USE_CLBLAST const int max_backend_supported_layers = hparams.n_layer + 1; const int max_offloadable_layers = hparams.n_layer + 1; #endif // GGML_USE_CUBLAS LLAMA_LOG_INFO("%s: offloaded %d/%d layers to GPU\n", __func__, std::min(n_gpu_layers, max_offloadable_layers), max_backend_supported_layers); - LLAMA_LOG_INFO("%s: VRAM used: %.2f MB\n", __func__, vram_weights / 1024.0 / 1024.0); + LLAMA_LOG_INFO("%s: VRAM used: %.2f MiB\n", __func__, vram_weights / 1024.0 / 1024.0); #else (void) n_gpu_layers; #endif // defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST) @@ -3260,7 +3693,7 @@ static void llm_load_tensors( static bool llama_model_load(const std::string & fname, llama_model & model, const llama_model_params & params) { try { - llama_model_loader ml(fname, params.use_mmap); + llama_model_loader ml(fname, params.use_mmap, params.kv_overrides); model.hparams.vocab_only = params.vocab_only; @@ -3356,7 +3789,7 @@ static void llm_build_k_shift( struct ggml_cgraph * graph, llm_rope_type type, int64_t n_ctx, - int64_t n_rot, + int n_rot, float freq_base, float freq_scale, const llm_build_cb & cb) { @@ -3387,11 +3820,11 @@ static void llm_build_k_shift( struct ggml_tensor * tmp = // we rotate only the first n_rot dimensions ggml_rope_custom_inplace(ctx, - ggml_view_3d(ctx, kv.k, - n_rot, n_head_kv, n_ctx, - ggml_element_size(kv.k)*n_embd_head, - ggml_element_size(kv.k)*n_embd_gqa, - ggml_element_size(kv.k)*n_embd_gqa*n_ctx*il), + ggml_view_3d(ctx, kv.k_l[il], + n_embd_head, n_head_kv, n_ctx, + ggml_type_sizef(kv.k_l[il]->type)*n_embd_head, + ggml_type_sizef(kv.k_l[il]->type)*n_embd_gqa, + 0), K_shift, n_rot, rope_type, 0, n_orig_ctx, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow); cb(tmp, "K_shifted", il); @@ -3418,13 +3851,13 @@ static void llm_build_kv_store( //struct ggml_tensor * v_cur_t = ggml_transpose(ctx, v_cur); // TODO: reshape above is likely not needed cb(v_cur_t, "v_cur_t", il); - struct ggml_tensor * k_cache_view = ggml_view_1d(ctx, kv.k, n_tokens*n_embd_gqa, - (ggml_element_size(kv.k)*n_embd_gqa)*(il*n_ctx + kv_head)); + struct ggml_tensor * k_cache_view = ggml_view_1d(ctx, kv.k_l[il], n_tokens*n_embd_gqa, + (ggml_type_sizef(kv.k_l[il]->type)*n_embd_gqa)*kv_head); cb(k_cache_view, "k_cache_view", il); - struct ggml_tensor * v_cache_view = ggml_view_2d(ctx, kv.v, n_tokens, n_embd_gqa, - ( n_ctx)*ggml_element_size(kv.v), - (il*n_ctx)*ggml_element_size(kv.v)*n_embd_gqa + kv_head*ggml_element_size(kv.v)); + struct ggml_tensor * v_cache_view = ggml_view_2d(ctx, kv.v_l[il], n_tokens, n_embd_gqa, + ( n_ctx)*ggml_element_size(kv.v_l[il]), + (kv_head)*ggml_element_size(kv.v_l[il])); cb(v_cache_view, "v_cache_view", il); // important: storing RoPE-ed version of K in the KV cache! @@ -3576,40 +4009,46 @@ static struct ggml_tensor * llm_build_kqv( cb(q, "q", il); struct ggml_tensor * k = - ggml_view_3d(ctx, kv.k, + ggml_view_3d(ctx, kv.k_l[il], n_embd_head, n_kv, n_head_kv, - ggml_element_size(kv.k)*n_embd_gqa, - ggml_element_size(kv.k)*n_embd_head, - ggml_element_size(kv.k)*n_embd_gqa*n_ctx*il); + ggml_type_sizef(kv.k_l[il]->type)*n_embd_gqa, + ggml_type_sizef(kv.k_l[il]->type)*n_embd_head, + 0); cb(k, "k", il); struct ggml_tensor * kq = ggml_mul_mat(ctx, k, q); cb(kq, "kq", il); - kq = ggml_scale(ctx, kq, kq_scale); - cb(kq, "kq_scaled", il); - if (max_alibi_bias > 0.0f) { - // TODO: n_head or n_head_kv - // TODO: K-shift is likely not working - // TODO: change to ggml_add - kq = ggml_alibi(ctx, kq, /*n_past*/ 0, n_head, max_alibi_bias); - cb(kq, "kq_scaled_alibi", il); - } + // temporary branch until we figure out how to handle ggml_alibi through ggml_add + kq = ggml_scale(ctx, kq, kq_scale); + cb(kq, "kq_scaled", il); - kq = ggml_add(ctx, kq, kq_mask); - cb(kq, "kq_masked", il); + if (max_alibi_bias > 0.0f) { + // TODO: n_head or n_head_kv + // TODO: K-shift is likely not working + // TODO: change to ggml_add + kq = ggml_alibi(ctx, kq, /*n_past*/ 0, n_head, max_alibi_bias); + cb(kq, "kq_scaled_alibi", il); + } - kq = ggml_soft_max(ctx, kq); - cb(kq, "kq_soft_max", il); + kq = ggml_add(ctx, kq, kq_mask); + cb(kq, "kq_masked", il); + + kq = ggml_soft_max(ctx, kq); + cb(kq, "kq_soft_max", il); + } else { + kq = ggml_soft_max_ext(ctx, kq, kq_mask, 1.0f/sqrtf(float(n_embd_head))); + cb(kq, "kq_soft_max_ext", il); + } // split cached v into n_head heads struct ggml_tensor * v = - ggml_view_3d(ctx, kv.v, + ggml_view_3d(ctx, kv.v_l[il], n_kv, n_embd_head, n_head_kv, - ggml_element_size(kv.v)*n_ctx, - ggml_element_size(kv.v)*n_ctx*n_embd_head, - ggml_element_size(kv.v)*n_ctx*n_embd_gqa*il); + ggml_element_size(kv.v_l[il])*n_ctx, + ggml_element_size(kv.v_l[il])*n_ctx*n_embd_head, + 0); cb(v, "v", il); struct ggml_tensor * kqv = ggml_mul_mat(ctx, v, kq); @@ -3647,6 +4086,8 @@ struct llm_build_context { const int64_t n_head_kv; const int64_t n_embd_head; const int64_t n_embd_gqa; + const int64_t n_expert; + const int64_t n_expert_used; const float freq_base; const float freq_scale; @@ -3688,6 +4129,8 @@ struct llm_build_context { n_head_kv (hparams.n_head_kv), n_embd_head (hparams.n_embd_head()), n_embd_gqa (hparams.n_embd_gqa()), + n_expert (hparams.n_expert), + n_expert_used (hparams.n_expert_used), freq_base (cparams.rope_freq_base), freq_scale (cparams.rope_freq_scale), ext_factor (cparams.yarn_ext_factor), @@ -3767,12 +4210,24 @@ struct llm_build_context { // compute Q and K and RoPE them struct ggml_tensor * Qcur = ggml_mul_mat(ctx0, model.layers[il].wq, cur); cb(Qcur, "Qcur", il); + if (model.layers[il].bq) { + Qcur = ggml_add(ctx0, Qcur, model.layers[il].bq); + cb(Qcur, "Qcur", il); + } struct ggml_tensor * Kcur = ggml_mul_mat(ctx0, model.layers[il].wk, cur); cb(Kcur, "Kcur", il); + if (model.layers[il].bk) { + Kcur = ggml_add(ctx0, Kcur, model.layers[il].bk); + cb(Kcur, "Kcur", il); + } struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur); cb(Vcur, "Vcur", il); + if (model.layers[il].bv) { + Vcur = ggml_add(ctx0, Vcur, model.layers[il].bv); + cb(Vcur, "Vcur", il); + } Qcur = ggml_rope_custom( ctx0, ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens), inp_pos, @@ -3791,7 +4246,7 @@ struct llm_build_context { llm_build_kv_store(ctx0, hparams, kv_self, gf, Kcur, Vcur, n_ctx, n_tokens, kv_head, cb, il); cur = llm_build_kqv(ctx0, hparams, kv_self, - model.layers[il].wo, NULL, + model.layers[il].wo, model.layers[il].bo, Qcur, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, -1.0f, cb, il); cb(cur, "kqv_out", il); } @@ -3800,7 +4255,7 @@ struct llm_build_context { cb(ffn_inp, "ffn_inp", il); // feed-forward network - { + if (model.layers[il].ffn_gate_inp == nullptr) { cur = llm_build_norm(ctx0, ffn_inp, hparams, model.layers[il].ffn_norm, NULL, LLM_NORM_RMS, cb, il); @@ -3812,6 +4267,69 @@ struct llm_build_context { model.layers[il].ffn_down, NULL, LLM_FFN_SILU, LLM_FFN_PAR, cb, il); cb(cur, "ffn_out", il); + } else { + // MoE branch + cur = llm_build_norm(ctx0, ffn_inp, hparams, + model.layers[il].ffn_norm, NULL, + LLM_NORM_RMS, cb, il); + cb(cur, "ffn_norm", il); + + ggml_tensor * logits = ggml_mul_mat(ctx0, model.layers[il].ffn_gate_inp, cur); // [n_tokens, num_experts] + cb(logits, "ffn_moe_logits", il); + + ggml_tensor * probs = ggml_soft_max(ctx0, logits); // [n_tokens, num_experts] + cb(probs, "ffn_moe_probs", il); + + // select experts + ggml_tensor * selected_experts = ggml_top_k(ctx0, probs, n_expert_used); // [n_tokens, num_experts_per_tok] + cb(selected_experts->src[0], "ffn_moe_argsort", il); + + ggml_tensor * weights = ggml_get_rows(ctx0, + ggml_reshape_3d(ctx0, probs, 1, n_expert, n_tokens), selected_experts); + cb(weights, "ffn_moe_weights", il); + + weights = ggml_reshape_2d(ctx0, weights, n_expert_used, n_tokens); // [n_tokens, num_experts_per_tok] + + ggml_tensor * weights_sum = ggml_sum_rows(ctx0, weights); + cb(weights_sum, "ffn_moe_weights_sum", il); + + weights = ggml_div(ctx0, weights, weights_sum); // [n_tokens, num_experts_per_tok] + cb(weights, "ffn_moe_weights_norm", il); + + // compute expert outputs + ggml_tensor * moe_out = nullptr; + + for (int i = 0; i < n_expert_used; ++i) { + ggml_tensor * cur_expert; + + ggml_tensor * cur_up = ggml_mul_mat_id(ctx0, model.layers[il].ffn_up_exp, n_expert, selected_experts, i, cur); + cb(cur_up, "ffn_moe_up", il); + + ggml_tensor * cur_gate = ggml_mul_mat_id(ctx0, model.layers[il].ffn_gate_exp, n_expert, selected_experts, i, cur); + cb(cur_gate, "ffn_moe_gate", il); + + cur_gate = ggml_silu(ctx0, cur_gate); + cb(cur_gate, "ffn_moe_silu", il); + + cur_expert = ggml_mul(ctx0, cur_up, cur_gate); // [n_tokens, n_embd] + cb(cur_expert, "ffn_moe_gate_par", il); + + cur_expert = ggml_mul_mat_id(ctx0, model.layers[il].ffn_down_exp, n_expert, selected_experts, i, cur_expert); // [n_tokens, n_embd] + cb(cur_expert, "ffn_moe_down", il); + + cur_expert = ggml_mul(ctx0, cur_expert, + ggml_view_2d(ctx0, weights, 1, n_tokens, weights->nb[1], i*weights->nb[0])); + cb(cur_expert, "ffn_moe_weighted", il); + + if (i == 0) { + moe_out = cur_expert; + } else { + moe_out = ggml_add(ctx0, moe_out, cur_expert); + cb(moe_out, "ffn_moe_out", il); + } + } + + cur = moe_out; } cur = ggml_add(ctx0, cur, ffn_inp); @@ -4189,6 +4707,7 @@ struct llm_build_context { inpL = llm_build_inp_embd(ctx0, hparams, batch, model.tok_embd, cb); cb(inpL, "imp_embd", -1); + // inp_pos - contains the positions struct ggml_tensor * inp_pos = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens); cb(inp_pos, "inp_pos", -1); @@ -4196,6 +4715,7 @@ struct llm_build_context { struct ggml_tensor * KQ_scale = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); cb(KQ_scale, "KQ_scale", -1); + // KQ_mask (mask for 1 head, it will be broadcasted to all heads) struct ggml_tensor * KQ_mask = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, n_kv, n_tokens, 1); cb(KQ_mask, "KQ_mask", -1); @@ -4591,61 +5111,173 @@ struct llm_build_context { cb(KQ_mask, "KQ_mask", -1); for (int il = 0; il < n_layer; ++il) { - struct ggml_tensor * attn_norm; + struct ggml_tensor * attn_norm; + + attn_norm = llm_build_norm(ctx0, inpL, hparams, + model.layers[il].attn_norm, + NULL, + LLM_NORM, cb, il); + cb(attn_norm, "attn_norm", il); + + // self-attention + { + cur = attn_norm; + + cur = ggml_mul_mat(ctx0, model.layers[il].wqkv, cur); + cb(cur, "wqkv", il); + + if (hparams.f_clamp_kqv > 0.0f) { + cur = ggml_clamp(ctx0, cur, -hparams.f_clamp_kqv, hparams.f_clamp_kqv); + cb(cur, "wqkv_clamped", il); + } + + struct ggml_tensor * Qcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 0*sizeof(float)*(n_embd))); + struct ggml_tensor * Kcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd_gqa, n_tokens, cur->nb[1], 1*sizeof(float)*(n_embd))); + struct ggml_tensor * Vcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd_gqa, n_tokens, cur->nb[1], 1*sizeof(float)*(n_embd + n_embd_gqa))); + + cb(Qcur, "Qcur", il); + cb(Kcur, "Kcur", il); + cb(Vcur, "Vcur", il); + + Qcur = ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens); + + llm_build_kv_store(ctx0, hparams, kv_self, gf, Kcur, Vcur, n_ctx, n_tokens, kv_head, cb, il); + + cur = llm_build_kqv(ctx0, hparams, kv_self, + model.layers[il].wo, NULL, + Qcur, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, hparams.f_max_alibi_bias, cb, il); + cb(cur, "kqv_out", il); + } + + // Add the input + struct ggml_tensor * ffn_inp = ggml_add(ctx0, cur, inpL); + cb(ffn_inp, "ffn_inp", il); + + // feed forward + { + cur = llm_build_norm(ctx0, ffn_inp, hparams, + model.layers[il].ffn_norm, + NULL, + LLM_NORM, cb, il); + cb(cur, "ffn_norm", il); + + cur = llm_build_ffn(ctx0, cur, + model.layers[il].ffn_up, NULL, + NULL, NULL, + model.layers[il].ffn_down, NULL, + LLM_FFN_GELU, LLM_FFN_SEQ, cb, il); + cb(cur, "ffn_out", il); + } + + cur = ggml_add(ctx0, cur, ffn_inp); + cb(cur, "l_out", il); + + // input for next layer + inpL = cur; + } + + cur = inpL; + + cur = llm_build_norm(ctx0, cur, hparams, + model.output_norm, + NULL, + LLM_NORM, cb, -1); + cb(cur, "result_norm", -1); + + cur = ggml_mul_mat(ctx0, model.output, cur); + cb(cur, "result_output", -1); + + ggml_build_forward_expand(gf, cur); + + return gf; + } + + struct ggml_cgraph * build_stablelm() { + struct ggml_cgraph * gf = ggml_new_graph(ctx0); + + struct ggml_tensor * cur; + struct ggml_tensor * inpL; + + inpL = llm_build_inp_embd(ctx0, hparams, batch, model.tok_embd, cb); + cb(inpL, "inp_embd", -1); + + // inp_pos - contains the positions + struct ggml_tensor * inp_pos = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens); + cb(inp_pos, "inp_pos", -1); + + // KQ_scale + struct ggml_tensor * KQ_scale = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); + cb(KQ_scale, "KQ_scale", -1); + + // KQ_mask (mask for 1 head, it will be broadcasted to all heads) + struct ggml_tensor * KQ_mask = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, n_kv, n_tokens, 1); + cb(KQ_mask, "KQ_mask", -1); + + // shift the entire K-cache if needed + if (do_rope_shift) { + llm_build_k_shift(ctx0, hparams, cparams, kv_self, gf, LLM_ROPE_NEOX, n_ctx, hparams.n_rot, freq_base, freq_scale, cb); + } + + for (int il = 0; il < n_layer; ++il) { + struct ggml_tensor * inpSA = inpL; - attn_norm = llm_build_norm(ctx0, inpL, hparams, + // norm + cur = llm_build_norm(ctx0, inpL, hparams, model.layers[il].attn_norm, - NULL, + model.layers[il].attn_norm_b, LLM_NORM, cb, il); - cb(attn_norm, "attn_norm", il); + cb(cur, "attn_norm", il); // self-attention { - cur = attn_norm; - - cur = ggml_mul_mat(ctx0, model.layers[il].wqkv, cur); - cb(cur, "wqkv", il); + // compute Q and K and RoPE them + struct ggml_tensor * Qcur = ggml_mul_mat(ctx0, model.layers[il].wq, cur); + cb(Qcur, "Qcur", il); - if (hparams.f_clamp_kqv > 0.0f) { - cur = ggml_clamp(ctx0, cur, -hparams.f_clamp_kqv, hparams.f_clamp_kqv); - cb(cur, "wqkv_clamped", il); - } + struct ggml_tensor * Kcur = ggml_mul_mat(ctx0, model.layers[il].wk, cur); + cb(Kcur, "Kcur", il); - struct ggml_tensor * Qcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 0*sizeof(float)*(n_embd))); - struct ggml_tensor * Kcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd_gqa, n_tokens, cur->nb[1], 1*sizeof(float)*(n_embd))); - struct ggml_tensor * Vcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd_gqa, n_tokens, cur->nb[1], 1*sizeof(float)*(n_embd + n_embd_gqa))); + struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur); + cb(Vcur, "Vcur", il); + Qcur = ggml_rope_custom( + ctx0, ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens), inp_pos, + hparams.n_rot, 2, 0, n_orig_ctx, freq_base, freq_scale, + ext_factor, attn_factor, beta_fast, beta_slow + ); cb(Qcur, "Qcur", il); - cb(Kcur, "Kcur", il); - cb(Vcur, "Vcur", il); - Qcur = ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens); + Kcur = ggml_rope_custom( + ctx0, ggml_reshape_3d(ctx0, Kcur, n_embd_head, n_head_kv, n_tokens), inp_pos, + hparams.n_rot, 2, 0, n_orig_ctx, freq_base, freq_scale, + ext_factor, attn_factor, beta_fast, beta_slow + ); + cb(Kcur, "Kcur", il); llm_build_kv_store(ctx0, hparams, kv_self, gf, Kcur, Vcur, n_ctx, n_tokens, kv_head, cb, il); cur = llm_build_kqv(ctx0, hparams, kv_self, model.layers[il].wo, NULL, - Qcur, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, hparams.f_max_alibi_bias, cb, il); + Qcur, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, -1.0f, cb, il); cb(cur, "kqv_out", il); } - // Add the input - struct ggml_tensor * ffn_inp = ggml_add(ctx0, cur, inpL); + struct ggml_tensor * ffn_inp = ggml_add(ctx0, cur, inpSA); cb(ffn_inp, "ffn_inp", il); - // feed forward + // feed-forward network { cur = llm_build_norm(ctx0, ffn_inp, hparams, model.layers[il].ffn_norm, - NULL, + model.layers[il].ffn_norm_b, LLM_NORM, cb, il); cb(cur, "ffn_norm", il); cur = llm_build_ffn(ctx0, cur, model.layers[il].ffn_up, NULL, - NULL, NULL, + model.layers[il].ffn_gate, NULL, model.layers[il].ffn_down, NULL, - LLM_FFN_GELU, LLM_FFN_SEQ, cb, il); + LLM_FFN_SILU, LLM_FFN_PAR, cb, il); cb(cur, "ffn_out", il); } @@ -4660,10 +5292,11 @@ struct llm_build_context { cur = llm_build_norm(ctx0, cur, hparams, model.output_norm, - NULL, + model.output_norm_b, LLM_NORM, cb, -1); cb(cur, "result_norm", -1); + // lm_head cur = ggml_mul_mat(ctx0, model.output, cur); cb(cur, "result_output", -1); @@ -4672,8 +5305,8 @@ struct llm_build_context { return gf; } - struct ggml_cgraph * build_stablelm() { - struct ggml_cgraph * gf = ggml_new_graph(ctx0); + struct ggml_cgraph * build_qwen() { + struct ggml_cgraph * gf = ggml_new_graph_custom(ctx0, LLAMA_MAX_NODES, false); struct ggml_tensor * cur; struct ggml_tensor * inpL; @@ -4682,133 +5315,78 @@ struct llm_build_context { cb(inpL, "inp_embd", -1); // inp_pos - contains the positions - struct ggml_tensor * inp_pos = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens); + struct ggml_tensor * inp_pos= ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, n_tokens); cb(inp_pos, "inp_pos", -1); // KQ_scale - struct ggml_tensor * KQ_scale = ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); + struct ggml_tensor * KQ_scale= ggml_new_tensor_1d(ctx0, GGML_TYPE_F32, 1); cb(KQ_scale, "KQ_scale", -1); // KQ_mask (mask for 1 head, it will be broadcasted to all heads) - struct ggml_tensor * KQ_mask = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, n_kv, n_tokens, 1); + struct ggml_tensor * KQ_mask= ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, n_kv, n_tokens, 1); cb(KQ_mask, "KQ_mask", -1); // shift the entire K-cache if needed if (do_rope_shift) { - llm_build_k_shift(ctx0, hparams, cparams, kv_self, gf, LLM_ROPE_NEOX, n_ctx, hparams.n_rot, freq_base, freq_scale, cb); + llm_build_k_shift(ctx0, hparams, cparams, kv_self, gf, LLM_ROPE_NEOX, n_ctx, n_embd_head, freq_base, freq_scale, cb); } for (int il = 0; il < n_layer; ++il) { struct ggml_tensor * inpSA = inpL; - // norm cur = llm_build_norm(ctx0, inpL, hparams, - model.layers[il].attn_norm, - model.layers[il].attn_norm_b, - LLM_NORM, cb, il); + model.layers[il].attn_norm, NULL, + LLM_NORM_RMS, cb, il); cb(cur, "attn_norm", il); // self-attention { - // compute Q and K and RoPE them - struct ggml_tensor * tmpq = ggml_mul_mat(ctx0, model.layers[il].wq, cur); - cb(tmpq, "tmpq", il); - - struct ggml_tensor * tmpk = ggml_mul_mat(ctx0, model.layers[il].wk, cur); - cb(tmpk, "tmpk", il); - - struct ggml_tensor * Vcur = ggml_mul_mat(ctx0, model.layers[il].wv, cur); - cb(Vcur, "Vcur", il); + cur = ggml_mul_mat(ctx0, model.layers[il].wqkv, cur); + cb(cur, "wqkv", il); - // RoPE the first n_rot of q/k, pass the other half, and concat. - struct ggml_tensor * qrot = ggml_cont(ctx0, ggml_view_3d( - ctx0, tmpq, hparams.n_rot, n_head, n_tokens, - ggml_element_size(tmpq) * n_embd_head, - ggml_element_size(tmpq) * n_embd_head * n_head, - 0 - )); - cb(qrot, "qrot", il); + cur = ggml_add(ctx0, cur, model.layers[il].bqkv); + cb(cur, "bqkv", il); - struct ggml_tensor * krot = ggml_cont(ctx0, ggml_view_3d( - ctx0, tmpk, hparams.n_rot, n_head, n_tokens, - ggml_element_size(tmpk) * n_embd_head, - ggml_element_size(tmpk) * n_embd_head * n_head_kv, - 0 - )); - cb(krot, "krot", il); + struct ggml_tensor * Qcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 0*sizeof(float)*(n_embd))); + struct ggml_tensor * Kcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 1*sizeof(float)*(n_embd))); + struct ggml_tensor * Vcur = ggml_cont(ctx0, ggml_view_2d(ctx0, cur, n_embd, n_tokens, cur->nb[1], 2*sizeof(float)*(n_embd))); - // get the second half of tmpq, e.g tmpq[n_rot:, :, :] - struct ggml_tensor * qpass = ggml_view_3d( - ctx0, tmpq, (n_embd_head - hparams.n_rot), n_head, n_tokens, - ggml_element_size(tmpq) * n_embd_head, - ggml_element_size(tmpq) * n_embd_head * n_head, - ggml_element_size(tmpq) * hparams.n_rot - ); - cb(qpass, "qpass", il); + cb(Qcur, "Qcur", il); + cb(Kcur, "Kcur", il); + cb(Vcur, "Vcur", il); - struct ggml_tensor * kpass = ggml_view_3d( - ctx0, tmpk, (n_embd_head - hparams.n_rot), n_head_kv, n_tokens, - ggml_element_size(tmpk) * (n_embd_head), - ggml_element_size(tmpk) * (n_embd_head) * n_head_kv, - ggml_element_size(tmpk) * hparams.n_rot - ); - cb(kpass, "kpass", il); + Qcur = ggml_reshape_3d(ctx0, Qcur, n_embd_head, n_head, n_tokens); + Kcur = ggml_reshape_3d(ctx0, Kcur, n_embd_head, n_head_kv, n_tokens); - struct ggml_tensor * qrotated = ggml_rope_custom( - ctx0, qrot, inp_pos, hparams.n_rot, 2, 0, n_orig_ctx, + // using mode = 2 for neox mode + Qcur = ggml_rope_custom( + ctx0, Qcur, inp_pos, n_embd_head, 2, 0, n_orig_ctx, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow ); - cb(qrotated, "qrotated", il); + cb(Qcur, "Qcur", il); - struct ggml_tensor * krotated = ggml_rope_custom( - ctx0, krot, inp_pos, hparams.n_rot, 2, 0, n_orig_ctx, + Kcur = ggml_rope_custom( + ctx0, Kcur, inp_pos, n_embd_head, 2, 0, n_orig_ctx, freq_base, freq_scale, ext_factor, attn_factor, beta_fast, beta_slow ); - cb(krotated, "krotated", il); - - // ggml currently only supports concatenation on dim=2 - // so we need to permute qrot, qpass, concat, then permute back. - qrotated = ggml_cont(ctx0, ggml_permute(ctx0, qrotated, 2, 1, 0, 3)); - cb(qrotated, "qrotated", il); - - krotated = ggml_cont(ctx0, ggml_permute(ctx0, krotated, 2, 1, 0, 3)); - cb(krotated, "krotated", il); - - qpass = ggml_cont(ctx0, ggml_permute(ctx0, qpass, 2, 1, 0, 3)); - cb(qpass, "qpass", il); - - kpass = ggml_cont(ctx0, ggml_permute(ctx0, kpass, 2, 1, 0, 3)); - cb(kpass, "kpass", il); - - struct ggml_tensor * Qcur = ggml_concat(ctx0, qrotated, qpass); - cb(Qcur, "Qcur", il); - - struct ggml_tensor * Kcur = ggml_concat(ctx0, krotated, kpass); - cb(Kcur, "Kcur", il); - - struct ggml_tensor * Q = ggml_cont(ctx0, ggml_permute(ctx0, Qcur, 2, 1, 0, 3)); - cb(Q, "Q", il); - - Kcur = ggml_cont(ctx0, ggml_permute(ctx0, Kcur, 2, 1, 0, 3)); cb(Kcur, "Kcur", il); llm_build_kv_store(ctx0, hparams, kv_self, gf, Kcur, Vcur, n_ctx, n_tokens, kv_head, cb, il); cur = llm_build_kqv(ctx0, hparams, kv_self, model.layers[il].wo, NULL, - Q, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, -1.0f, cb, il); + Qcur, KQ_scale, KQ_mask, n_ctx, n_tokens, n_kv, -1.0f, cb, il); cb(cur, "kqv_out", il); } struct ggml_tensor * ffn_inp = ggml_add(ctx0, cur, inpSA); cb(ffn_inp, "ffn_inp", il); - // feed-forward network + // feed-forward forward { cur = llm_build_norm(ctx0, ffn_inp, hparams, - model.layers[il].ffn_norm, - model.layers[il].ffn_norm_b, - LLM_NORM, cb, il); + model.layers[il].ffn_norm, NULL, + LLM_NORM_RMS, cb, il); cb(cur, "ffn_norm", il); cur = llm_build_ffn(ctx0, cur, @@ -4829,9 +5407,8 @@ struct llm_build_context { cur = inpL; cur = llm_build_norm(ctx0, cur, hparams, - model.output_norm, - model.output_norm_b, - LLM_NORM, cb, -1); + model.output_norm, NULL, + LLM_NORM_RMS, cb, -1); cb(cur, "result_norm", -1); // lm_head @@ -4852,8 +5429,8 @@ struct llm_build_context { enum llm_offload_func_e { OFFLOAD_FUNC_NOP, OFFLOAD_FUNC, - OFFLOAD_FUNC_KQ, - OFFLOAD_FUNC_V, + OFFLOAD_FUNC_FRC, // force offload + OFFLOAD_FUNC_KQV, OFFLOAD_FUNC_NR, OFFLOAD_FUNC_EMB, OFFLOAD_FUNC_OUT, @@ -4939,11 +5516,12 @@ static const std::unordered_map k_offload_map //{ "inp_embd", OFFLOAD_FUNC_NR }, // TODO: missing K-quants get_rows kernel { "pos_embd", OFFLOAD_FUNC_NR }, - { "inp_pos", OFFLOAD_FUNC_KQ }, // this is often used for KQ ops (e.g. rope) - { "KQ_scale", OFFLOAD_FUNC_KQ }, - { "KQ_mask", OFFLOAD_FUNC_KQ }, - { "K_shift", OFFLOAD_FUNC_KQ }, - { "K_shifted", OFFLOAD_FUNC_KQ }, + { "inp_pos", OFFLOAD_FUNC_FRC }, // this is often used for KQ ops (e.g. rope) + { "KQ_scale", OFFLOAD_FUNC_FRC }, + { "KQ_mask", OFFLOAD_FUNC_FRC }, + { "K_shift", OFFLOAD_FUNC_FRC }, + + { "K_shifted", OFFLOAD_FUNC }, { "inp_norm", OFFLOAD_FUNC_NR }, { "inp_norm_w", OFFLOAD_FUNC_NR }, @@ -4956,37 +5534,38 @@ static const std::unordered_map k_offload_map { "attn_norm", OFFLOAD_FUNC }, { "attn_norm_2", OFFLOAD_FUNC }, - { "wqkv", OFFLOAD_FUNC_KQ }, - { "bqkv", OFFLOAD_FUNC_KQ }, - { "wqkv_clamped", OFFLOAD_FUNC_KQ }, - - { "tmpk", OFFLOAD_FUNC_KQ }, - { "tmpq", OFFLOAD_FUNC_KQ }, - { "tmpv", OFFLOAD_FUNC_V }, - { "Kcur", OFFLOAD_FUNC_KQ }, - { "Qcur", OFFLOAD_FUNC_KQ }, - { "Vcur", OFFLOAD_FUNC_V }, - - { "krot", OFFLOAD_FUNC_KQ }, - { "qrot", OFFLOAD_FUNC_KQ }, - { "kpass", OFFLOAD_FUNC_KQ }, - { "qpass", OFFLOAD_FUNC_KQ }, - { "krotated", OFFLOAD_FUNC_KQ }, - { "qrotated", OFFLOAD_FUNC_KQ }, - - { "q", OFFLOAD_FUNC_KQ }, - { "k", OFFLOAD_FUNC_KQ }, - { "kq", OFFLOAD_FUNC_KQ }, - { "kq_scaled", OFFLOAD_FUNC_KQ }, - { "kq_scaled_alibi", OFFLOAD_FUNC_KQ }, - { "kq_masked", OFFLOAD_FUNC_KQ }, - { "kq_soft_max", OFFLOAD_FUNC_V }, - { "v", OFFLOAD_FUNC_V }, - { "kqv", OFFLOAD_FUNC_V }, - { "kqv_merged", OFFLOAD_FUNC_V }, - { "kqv_merged_cont", OFFLOAD_FUNC_V }, - { "kqv_wo", OFFLOAD_FUNC_V }, - { "kqv_out", OFFLOAD_FUNC_V }, + { "wqkv", OFFLOAD_FUNC_KQV }, + { "bqkv", OFFLOAD_FUNC_KQV }, + { "wqkv_clamped", OFFLOAD_FUNC_KQV }, + + { "tmpk", OFFLOAD_FUNC_KQV }, + { "tmpq", OFFLOAD_FUNC_KQV }, + { "tmpv", OFFLOAD_FUNC_KQV }, + { "Kcur", OFFLOAD_FUNC_KQV }, + { "Qcur", OFFLOAD_FUNC_KQV }, + { "Vcur", OFFLOAD_FUNC_KQV }, + + { "krot", OFFLOAD_FUNC_KQV }, + { "qrot", OFFLOAD_FUNC_KQV }, + { "kpass", OFFLOAD_FUNC_KQV }, + { "qpass", OFFLOAD_FUNC_KQV }, + { "krotated", OFFLOAD_FUNC_KQV }, + { "qrotated", OFFLOAD_FUNC_KQV }, + + { "q", OFFLOAD_FUNC_KQV }, + { "k", OFFLOAD_FUNC_KQV }, + { "kq", OFFLOAD_FUNC_KQV }, + { "kq_scaled", OFFLOAD_FUNC_KQV }, + { "kq_scaled_alibi", OFFLOAD_FUNC_KQV }, + { "kq_masked", OFFLOAD_FUNC_KQV }, + { "kq_soft_max", OFFLOAD_FUNC_KQV }, + { "kq_soft_max_ext", OFFLOAD_FUNC_KQV }, + { "v", OFFLOAD_FUNC_KQV }, + { "kqv", OFFLOAD_FUNC_KQV }, + { "kqv_merged", OFFLOAD_FUNC_KQV }, + { "kqv_merged_cont", OFFLOAD_FUNC_KQV }, + { "kqv_wo", OFFLOAD_FUNC_KQV }, + { "kqv_out", OFFLOAD_FUNC_KQV }, { "ffn_inp", OFFLOAD_FUNC }, { "ffn_norm", OFFLOAD_FUNC }, @@ -5005,6 +5584,20 @@ static const std::unordered_map k_offload_map { "ffn_relu", OFFLOAD_FUNC }, { "ffn_sqr(relu)", OFFLOAD_FUNC }, + { "ffn_moe_logits", OFFLOAD_FUNC }, + { "ffn_moe_probs", OFFLOAD_FUNC }, + { "ffn_moe_argsort", OFFLOAD_FUNC }, + { "ffn_moe_weights", OFFLOAD_FUNC }, + { "ffn_moe_weights_sum", OFFLOAD_FUNC }, + { "ffn_moe_weights_norm", OFFLOAD_FUNC }, + { "ffn_moe_weighted", OFFLOAD_FUNC }, + { "ffn_moe_up", OFFLOAD_FUNC }, + { "ffn_moe_gate", OFFLOAD_FUNC }, + { "ffn_moe_silu", OFFLOAD_FUNC }, + { "ffn_moe_gate_par", OFFLOAD_FUNC }, + { "ffn_moe_down", OFFLOAD_FUNC }, + { "ffn_moe_out", OFFLOAD_FUNC }, + { "l_out", OFFLOAD_FUNC }, { "result_norm", OFFLOAD_FUNC_EMB }, @@ -5178,15 +5771,15 @@ static struct ggml_cgraph * llama_build_graph( { OFFLOAD_FUNC_NOP, "CPU" }, { OFFLOAD_FUNC_OUT, "CPU" }, #ifdef GGML_USE_CUBLAS - { OFFLOAD_FUNC, "GPU (CUDA)" }, - { OFFLOAD_FUNC_KQ, "GPU (CUDA) KQ" }, - { OFFLOAD_FUNC_V, "GPU (CUDA) V" }, - { OFFLOAD_FUNC_NR, "GPU (CUDA) NR" }, + { OFFLOAD_FUNC, "GPU (CUDA)" }, + { OFFLOAD_FUNC_FRC, "GPU (CUDA) FRC" }, + { OFFLOAD_FUNC_KQV, "GPU (CUDA) KQV" }, + { OFFLOAD_FUNC_NR, "GPU (CUDA) NR" }, { OFFLOAD_FUNC_EMB, "GPU (CUDA) EMB" }, #else { OFFLOAD_FUNC, "CPU" }, - { OFFLOAD_FUNC_KQ, "CPU" }, - { OFFLOAD_FUNC_V, "CPU" }, + { OFFLOAD_FUNC_FRC, "CPU" }, + { OFFLOAD_FUNC_KQV, "CPU" }, { OFFLOAD_FUNC_NR, "CPU" }, { OFFLOAD_FUNC_EMB, "CPU" }, #endif // GGML_USE_CUBLAS @@ -5219,18 +5812,23 @@ static struct ggml_cgraph * llama_build_graph( } } break; - case OFFLOAD_FUNC_NR: - if (n_gpu_layers <= n_layer + 0) { + case OFFLOAD_FUNC_FRC: + if (!lctx.cparams.offload_kqv) { func_e = OFFLOAD_FUNC_NOP; - } - break; - case OFFLOAD_FUNC_V: - if (n_gpu_layers <= n_layer + 1) { + } break; + case OFFLOAD_FUNC_KQV: + if (!lctx.cparams.offload_kqv) { func_e = OFFLOAD_FUNC_NOP; + } else { + if (n_gpu_layers < n_layer) { + if (il < i_gpu_start) { + func_e = OFFLOAD_FUNC_NOP; + } + } } break; - case OFFLOAD_FUNC_KQ: - if (n_gpu_layers <= n_layer + 2) { + case OFFLOAD_FUNC_NR: + if (n_gpu_layers <= n_layer + 0) { func_e = OFFLOAD_FUNC_NOP; } break; @@ -5255,8 +5853,8 @@ static struct ggml_cgraph * llama_build_graph( case OFFLOAD_FUNC_NOP: case OFFLOAD_FUNC_OUT: func = ggml_offload_nop; break; case OFFLOAD_FUNC: - case OFFLOAD_FUNC_KQ: - case OFFLOAD_FUNC_V: + case OFFLOAD_FUNC_KQV: + case OFFLOAD_FUNC_FRC: case OFFLOAD_FUNC_NR: case OFFLOAD_FUNC_EMB: func = ggml_offload_gpu; break; default: GGML_ASSERT(false); @@ -5315,6 +5913,10 @@ static struct ggml_cgraph * llama_build_graph( { result = llm.build_stablelm(); } break; + case LLM_ARCH_QWEN: + { + result = llm.build_qwen(); + } break; default: GGML_ASSERT(false); } @@ -5392,7 +5994,7 @@ static int llama_decode_internal( const int64_t n_embd = hparams.n_embd; const int64_t n_vocab = hparams.n_vocab; - // helpers for smoother batch API transistion + // helpers for smoother batch API transition // after deprecating the llama_eval calls, these will be removed std::vector pos; @@ -5424,6 +6026,12 @@ static int llama_decode_internal( batch.seq_id = seq_id_arr.data(); } + // if we have enough unused cells before the current head -> + // better to start searching from the beginning of the cache, hoping to fill it + if (kv_self.head > kv_self.used + 2*n_tokens) { + kv_self.head = 0; + } + if (!llama_kv_cache_find_slot(kv_self, batch)) { return 1; } @@ -5431,10 +6039,10 @@ static int llama_decode_internal( // a heuristic, to avoid attending the full cache if it is not yet utilized // after enough generations, the benefit from this heuristic disappears // if we start defragmenting the cache, the benefit from this will be more important - //kv_self.n = std::max(32, GGML_PAD(llama_kv_cache_cell_max(kv_self), 32)); // TODO: this might be better for CUDA? - kv_self.n = std::min((int32_t) cparams.n_ctx, std::max(32, llama_kv_cache_cell_max(kv_self))); + kv_self.n = std::min((int32_t) cparams.n_ctx, std::max(32, GGML_PAD(llama_kv_cache_cell_max(kv_self), 32))); + //kv_self.n = llama_kv_cache_cell_max(kv_self); - //printf("kv_self.n = %d\n", kv_self.n); + //printf("kv_self.n = %5d, kv_self.used = %5d, kv_self.head = %5d\n", kv_self.n, kv_self.used, kv_self.head); ggml_allocr_reset(lctx.alloc); @@ -5483,18 +6091,8 @@ static int llama_decode_internal( n_threads = std::min(4, n_threads); } - // If all tensors can be run on the GPU then using more than 1 thread is detrimental. - const bool full_offload_supported = - model.arch == LLM_ARCH_LLAMA || - model.arch == LLM_ARCH_BAICHUAN || - model.arch == LLM_ARCH_FALCON || - model.arch == LLM_ARCH_REFACT || - model.arch == LLM_ARCH_MPT || - model.arch == LLM_ARCH_STARCODER || - model.arch == LLM_ARCH_STABLELM; - - const bool fully_offloaded = model.n_gpu_layers >= (int) hparams.n_layer + 3; - if (ggml_cpu_has_cublas() && full_offload_supported && fully_offloaded) { + const bool fully_offloaded = model.n_gpu_layers >= (int) hparams.n_layer + 1; + if (ggml_cpu_has_cublas() && fully_offloaded) { n_threads = 1; } @@ -6175,12 +6773,12 @@ static void tokenizer_st_partition(const llama_vocab & vocab, std::forward_list< // loop over the text while (true) { - // find the first occurence of a given special token in this fragment + // find the first occurrence of a given special token in this fragment // passing offset argument only limit the "search area" but match coordinates // are still relative to the source full raw_text auto match = raw_text->find(special_token, raw_text_base_offset); - // no occurences found, stop processing this fragment for a given special token + // no occurrences found, stop processing this fragment for a given special token if (match == std::string::npos) break; // check if match is within bounds of offset <-> length @@ -6283,7 +6881,10 @@ static std::vector llama_tokenize_internal(const llama_vocab & // by modifying llm_tokenizer_x to operate with string offsets like pre-tokenizer // and passing 'add space prefix' as bool argument // - auto raw_text = (special ? "" : " ") + fragment.raw_text.substr(fragment.offset, fragment.length); + auto raw_text = fragment.raw_text.substr(fragment.offset, fragment.length); + if (&fragment == &fragment_buffer.front()) { + raw_text = " " + raw_text; // prefix with space if the first token is not special + } #ifdef PRETOKENIZERDEBUG fprintf(stderr,"TT: (%ld %ld %ld) '%s'\n", raw_text.length(), fragment.offset, fragment.length, raw_text.c_str()); @@ -6349,11 +6950,13 @@ struct llama_grammar_candidate { // Decodes a UTF-8 string which may end in an incomplete sequence. Adds a terminating 0 for use as // pointer. If an invalid sequence is encountered, returns `llama_partial_utf8.n_remain == -1`. static std::pair, llama_partial_utf8> decode_utf8( - const char * src, + const std::string & src, llama_partial_utf8 partial_start) { static const int lookup[] = { 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 3, 4 }; - const char * pos = src; + const char * pos = src.c_str(); std::vector code_points; + // common english strings have the same number of codepoints and bytes. `+ 1` for the terminating 0. + code_points.reserve(src.size() + 1); uint32_t value = partial_start.value; int n_remain = partial_start.n_remain; @@ -6957,6 +7560,7 @@ void llama_sample_typical(struct llama_context * ctx, llama_token_data_array * c // Replace the data in candidates with the new_candidates data std::copy(new_candidates.begin(), new_candidates.end(), candidates->data); candidates->size = new_candidates.size(); + candidates->sorted = false; if (ctx) { ctx->t_sample_us += ggml_time_us() - t_start_sample_us; @@ -7041,7 +7645,9 @@ void llama_sample_grammar(struct llama_context * ctx, llama_token_data_array * c const llama_token eos = llama_token_eos(&ctx->model); std::vector, llama_partial_utf8>> candidates_decoded; + candidates_decoded.reserve(candidates->size); std::vector candidates_grammar; + candidates_grammar.reserve(candidates->size); for (size_t i = 0; i < candidates->size; ++i) { const llama_token id = candidates->data[i].id; @@ -7053,7 +7659,7 @@ void llama_sample_grammar(struct llama_context * ctx, llama_token_data_array * c } else if (piece.empty() || piece[0] == 0) { candidates->data[i].logit = -INFINITY; } else { - candidates_decoded.push_back(decode_utf8(piece.c_str(), grammar->partial_utf8)); + candidates_decoded.push_back(decode_utf8(piece, grammar->partial_utf8)); candidates_grammar.push_back({ i, candidates_decoded.back().first.data(), candidates_decoded.back().second }); } } @@ -7260,7 +7866,7 @@ void llama_grammar_accept_token(struct llama_context * ctx, struct llama_grammar const std::string piece = llama_token_to_piece(ctx, token); // Note terminating 0 in decoded string - const auto decoded = decode_utf8(piece.c_str(), grammar->partial_utf8); + const auto decoded = decode_utf8(piece, grammar->partial_utf8); const auto & code_points = decoded.first; for (auto it = code_points.begin(), end = code_points.end() - 1; it != end; ++it) { grammar->stacks = llama_grammar_accept(grammar->rules, grammar->stacks, *it); @@ -7371,7 +7977,7 @@ struct llama_beam_search_data { } // Min-heaps are used to efficiently collect the top-k elements (k=n_beams). - // The repetative patterns below reflect the 2 stages of heaps: + // The repetitive patterns below reflect the 2 stages of heaps: // * Gather elements until the vector is full, then call std::make_heap() on it. // * If the heap is full and a new element is found that should be included, pop the // least element to the back(), replace it with the new, then push it into the heap. @@ -7578,18 +8184,21 @@ static void llama_convert_tensor_internal( return; } - auto block_size = tensor->type == GGML_TYPE_F16 ? 1 : (size_t)ggml_blck_size(tensor->type); - auto block_size_bytes = ggml_type_size(tensor->type); + size_t block_size = tensor->type == GGML_TYPE_F16 ? 1 : (size_t)ggml_blck_size(tensor->type); + size_t block_size_bytes = ggml_type_size(tensor->type); GGML_ASSERT(nelements % block_size == 0); - auto nblocks = nelements / block_size; - auto blocks_per_thread = nblocks / nthread; - auto spare_blocks = nblocks - (blocks_per_thread * nthread); // if blocks aren't divisible by thread count + size_t nblocks = nelements / block_size; + size_t blocks_per_thread = nblocks / nthread; + size_t spare_blocks = nblocks - (blocks_per_thread * nthread); // if blocks aren't divisible by thread count + + size_t in_buff_offs = 0; + size_t out_buff_offs = 0; - for (auto tnum = 0, in_buff_offs = 0, out_buff_offs = 0; tnum < nthread; tnum++) { - auto thr_blocks = blocks_per_thread + (tnum == nthread - 1 ? spare_blocks : 0); // num blocks for this thread - auto thr_elems = thr_blocks * block_size; // number of elements for this thread - auto thr_block_bytes = thr_blocks * block_size_bytes; // number of input bytes for this thread + for (int tnum = 0; tnum < nthread; tnum++) { + size_t thr_blocks = blocks_per_thread + (tnum == nthread - 1 ? spare_blocks : 0); // num blocks for this thread + size_t thr_elems = thr_blocks * block_size; // number of elements for this thread + size_t thr_block_bytes = thr_blocks * block_size_bytes; // number of input bytes for this thread auto compute = [qtype] (ggml_type typ, uint8_t * inbuf, float * outbuf, int nels) { if (typ == GGML_TYPE_F16) { @@ -7606,11 +8215,9 @@ static void llama_convert_tensor_internal( workers.clear(); } -static ggml_type get_k_quant_type( - quantize_state_internal & qs, - ggml_type new_type, const ggml_tensor * tensor, llama_ftype ftype -) { +static ggml_type get_k_quant_type(quantize_state_internal & qs, ggml_type new_type, const ggml_tensor * tensor, llama_ftype ftype) { const std::string name = ggml_get_name(tensor); + // TODO: avoid hardcoded tensor names - use the TN_* constants const llm_arch arch = qs.model.arch; const auto tn = LLM_TN(arch); @@ -7644,7 +8251,18 @@ static ggml_type get_k_quant_type( // nearly negligible increase in model size by quantizing this tensor with more bits: if (new_type == GGML_TYPE_Q3_K || new_type == GGML_TYPE_Q4_K) new_type = GGML_TYPE_Q5_K; } + if (qs.model.hparams.n_expert == 8) { + // for the 8-expert model, bumping this to Q8_0 trades just ~128MB + // TODO: explore better strategies + new_type = GGML_TYPE_Q8_0; + } ++qs.i_attention_wv; + } else if (name.find("attn_k.weight") != std::string::npos) { + if (qs.model.hparams.n_expert == 8) { + // for the 8-expert model, bumping this to Q8_0 trades just ~128MB + // TODO: explore better strategies + new_type = GGML_TYPE_Q8_0; + } } else if (name.find("ffn_down.weight") != std::string::npos) { if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K) new_type = GGML_TYPE_Q3_K; else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M) { @@ -7759,7 +8377,7 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s constexpr bool use_mmap = false; #endif - llama_model_loader ml(fname_inp, use_mmap); + llama_model_loader ml(fname_inp, use_mmap, NULL); if (ml.use_mmap) { ml.mapping.reset(new llama_mmap(&ml.file, /* prefetch */ 0, ggml_is_numa())); } @@ -7857,6 +8475,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s quantize &= params->quantize_output_tensor || name != "output.weight"; quantize &= !params->only_copy; + // do not quantize expert gating tensors + quantize &= name.find("ffn_gate_inp.weight") == std::string::npos; + enum ggml_type new_type; void * new_data; size_t new_size; @@ -7935,7 +8556,7 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s workers.clear(); } - LLAMA_LOG_INFO("size = %8.2f MB -> %8.2f MB | hist: ", ggml_nbytes(tensor)/1024.0/1024.0, new_size/1024.0/1024.0); + LLAMA_LOG_INFO("size = %8.2f MiB -> %8.2f MiB | hist: ", ggml_nbytes(tensor)/1024.0/1024.0, new_size/1024.0/1024.0); int64_t tot_count = 0; for (size_t i = 0; i < hist_cur.size(); i++) { hist_all[i] += hist_cur[i]; @@ -8055,7 +8676,7 @@ static int llama_apply_lora_from_file_internal( std::vector base_buf; if (path_base_model) { LLAMA_LOG_INFO("%s: loading base model from '%s'\n", __func__, path_base_model); - ml.reset(new llama_model_loader(path_base_model, /*use_mmap*/ true)); + ml.reset(new llama_model_loader(path_base_model, /*use_mmap*/ true, /*kv_overrides*/ NULL)); size_t ctx_size; size_t mmapped_size; @@ -8283,6 +8904,7 @@ struct llama_model_params llama_model_default_params() { /*.tensor_split =*/ nullptr, /*.progress_callback =*/ nullptr, /*.progress_callback_user_data =*/ nullptr, + /*.kv_overrides =*/ nullptr, /*.vocab_only =*/ false, /*.use_mmap =*/ true, /*.use_mlock =*/ false, @@ -8310,10 +8932,12 @@ struct llama_context_params llama_context_default_params() { /*.yarn_beta_fast =*/ 32.0f, /*.yarn_beta_slow =*/ 1.0f, /*.yarn_orig_ctx =*/ 0, + /*.type_k =*/ GGML_TYPE_F16, + /*.type_v =*/ GGML_TYPE_F16, /*.mul_mat_q =*/ true, - /*.f16_kv =*/ true, /*.logits_all =*/ false, /*.embedding =*/ false, + /*.offload_kqv =*/ true, }; return result; @@ -8430,6 +9054,7 @@ struct llama_context * llama_new_context_with_model( cparams.yarn_beta_fast = params.yarn_beta_fast; cparams.yarn_beta_slow = params.yarn_beta_slow; cparams.mul_mat_q = params.mul_mat_q; + cparams.offload_kqv = params.offload_kqv; cparams.n_ctx = params.n_ctx == 0 ? hparams.n_ctx_train : params.n_ctx; cparams.rope_freq_base = params.rope_freq_base == 0.0f ? hparams.rope_freq_base_train : params.rope_freq_base; @@ -8463,19 +9088,36 @@ struct llama_context * llama_new_context_with_model( ctx->rng = std::mt19937(params.seed); ctx->logits_all = params.logits_all; - ggml_type memory_type = params.f16_kv ? GGML_TYPE_F16 : GGML_TYPE_F32; + const ggml_type type_k = params.type_k; + const ggml_type type_v = params.type_v; + + GGML_ASSERT(hparams.n_embd_head() % ggml_blck_size(type_k) == 0); + GGML_ASSERT(hparams.n_embd_head() % ggml_blck_size(type_v) == 0); // reserve memory for context buffers if (!hparams.vocab_only) { - if (!llama_kv_cache_init(ctx->model.hparams, ctx->kv_self, memory_type, cparams.n_ctx, model->n_gpu_layers)) { + if (!llama_kv_cache_init(ctx->model.hparams, ctx->kv_self, type_k, type_v, cparams.n_ctx, model->n_gpu_layers, cparams.offload_kqv)) { LLAMA_LOG_ERROR("%s: llama_kv_cache_init() failed for self-attention cache\n", __func__); llama_free(ctx); return nullptr; } { - const size_t memory_size = ggml_nbytes(ctx->kv_self.k) + ggml_nbytes(ctx->kv_self.v); - LLAMA_LOG_INFO("%s: kv self size = %7.2f MB\n", __func__, memory_size / 1024.0 / 1024.0); + size_t memory_size_k = 0; + size_t memory_size_v = 0; + + for (auto & k : ctx->kv_self.k_l) { + memory_size_k += ggml_nbytes(k); + } + + for (auto & v : ctx->kv_self.v_l) { + memory_size_v += ggml_nbytes(v); + } + + LLAMA_LOG_INFO("%s: KV self size = %7.2f MiB, K (%s): %7.2f MiB, V (%s): %7.2f MiB\n", __func__, + (float)(memory_size_k + memory_size_v) / (1024.0f * 1024.0f), + ggml_type_name(type_k), (float)memory_size_k / (1024.0f * 1024.0f), + ggml_type_name(type_v), (float)memory_size_v / (1024.0f * 1024.0f)); } // resized during inference @@ -8505,8 +9147,6 @@ struct llama_context * llama_new_context_with_model( #ifdef GGML_USE_METAL if (model->n_gpu_layers > 0) { - ggml_metal_log_set_callback(llama_log_callback_default, NULL); - ctx->ctx_metal = ggml_metal_init(1); if (!ctx->ctx_metal) { LLAMA_LOG_ERROR("%s: ggml_metal_init() failed\n", __func__); @@ -8520,7 +9160,7 @@ struct llama_context * llama_new_context_with_model( // measure memory requirements for the graph size_t alloc_size = ggml_allocr_alloc_graph(ctx->alloc, gf) + tensor_alignment; - LLAMA_LOG_INFO("%s: compute buffer total size = %.2f MB\n", __func__, (ctx->buf_compute.size + alloc_size) / 1024.0 / 1024.0); + LLAMA_LOG_INFO("%s: compute buffer total size = %.2f MiB\n", __func__, (ctx->buf_compute.size + alloc_size) / 1024.0 / 1024.0); // recreate allocator with exact memory requirements ggml_allocr_free(ctx->alloc); @@ -8534,7 +9174,7 @@ struct llama_context * llama_new_context_with_model( #endif #ifdef GGML_USE_CUBLAS ggml_cuda_set_scratch_size(alloc_size); - LLAMA_LOG_INFO("%s: VRAM scratch buffer: %.2f MB\n", __func__, alloc_size / 1024.0 / 1024.0); + LLAMA_LOG_INFO("%s: VRAM scratch buffer: %.2f MiB\n", __func__, alloc_size / 1024.0 / 1024.0); // calculate total VRAM usage auto add_tensor = [](const ggml_tensor * t, size_t & size) { @@ -8548,16 +9188,20 @@ struct llama_context * llama_new_context_with_model( } size_t kv_vram_size = 0; - add_tensor(ctx->kv_self.k, kv_vram_size); - add_tensor(ctx->kv_self.v, kv_vram_size); + for (auto & k : ctx->kv_self.k_l) { + add_tensor(k, kv_vram_size); + } + for (auto & v : ctx->kv_self.v_l) { + add_tensor(v, kv_vram_size); + } size_t ctx_vram_size = alloc_size + kv_vram_size; size_t total_vram_size = model_vram_size + ctx_vram_size; - LLAMA_LOG_INFO("%s: total VRAM used: %.2f MB (model: %.2f MB, context: %.2f MB)\n", __func__, + LLAMA_LOG_INFO("%s: total VRAM used: %.2f MiB (model: %.2f MiB, context: %.2f MiB)\n", __func__, total_vram_size / 1024.0 / 1024.0, model_vram_size / 1024.0 / 1024.0, - ctx_vram_size / 1024.0 / 1024.0); + ctx_vram_size / 1024.0 / 1024.0); #endif } @@ -8578,7 +9222,7 @@ struct llama_context * llama_new_context_with_model( const size_t max_size = ggml_get_max_tensor_size(ctx->model.ctx); - LLAMA_LOG_INFO("%s: max tensor size = %8.2f MB\n", __func__, max_size/1024.0/1024.0); + LLAMA_LOG_INFO("%s: max tensor size = %8.2f MiB\n", __func__, max_size/1024.0/1024.0); #define LLAMA_METAL_CHECK_BUF(result) \ if (!(result)) { \ @@ -8644,6 +9288,45 @@ float llama_rope_freq_scale_train(const struct llama_model * model) { return model->hparams.rope_freq_scale_train; } +int llama_model_meta_val_str(const struct llama_model * model, const char * key, char * buf, size_t buf_size) { + const auto & it = model->gguf_kv.find(key); + if (it == model->gguf_kv.end()) { + if (buf_size > 0) { + buf[0] = '\0'; + } + return -1; + } + return snprintf(buf, buf_size, "%s", it->second.c_str()); +} + +int llama_model_meta_count(const struct llama_model * model) { + return (int)model->gguf_kv.size(); +} + +int llama_model_meta_key_by_index(const struct llama_model * model, int i, char * buf, size_t buf_size) { + if (i < 0 || i >= (int)model->gguf_kv.size()) { + if (buf_size > 0) { + buf[0] = '\0'; + } + return -1; + } + auto it = model->gguf_kv.begin(); + std::advance(it, i); + return snprintf(buf, buf_size, "%s", it->first.c_str()); +} + +int llama_model_meta_val_str_by_index(const struct llama_model * model, int i, char * buf, size_t buf_size) { + if (i < 0 || i >= (int)model->gguf_kv.size()) { + if (buf_size > 0) { + buf[0] = '\0'; + } + return -1; + } + auto it = model->gguf_kv.begin(); + std::advance(it, i); + return snprintf(buf, buf_size, "%s", it->second.c_str()); +} + int llama_model_desc(const struct llama_model * model, char * buf, size_t buf_size) { return snprintf(buf, buf_size, "%s %s %s", llama_model_arch_name(model->arch).c_str(), @@ -8702,8 +9385,107 @@ int llama_model_apply_lora_from_file(const struct llama_model * model, const cha } } +struct llama_kv_cache_view llama_kv_cache_view_init(const struct llama_context * ctx, int32_t n_max_seq) { + struct llama_kv_cache_view result = { + /*.n_cells = */ 0, + /*.n_max_seq = */ n_max_seq, + /*.token_count = */ 0, + /*.used_cells = */ llama_get_kv_cache_used_cells(ctx), + /*.max_contiguous = */ 0, + /*.max_contiguous_idx = */ -1, + /*.cells = */ nullptr, + /*.cells_sequences = */ nullptr, + }; + return result; +} + +void llama_kv_cache_view_free(struct llama_kv_cache_view * view) { + if (view->cells != nullptr) { + free(view->cells); + view->cells = nullptr; + } + if (view->cells_sequences != nullptr) { + free(view->cells_sequences); + view->cells_sequences = nullptr; + } +} + +void llama_kv_cache_view_update(const struct llama_context * ctx, struct llama_kv_cache_view * view) { + if (uint32_t(view->n_cells) < ctx->kv_self.size || view->cells == nullptr) { + view->n_cells = int32_t(ctx->kv_self.size); + void * p = realloc(view->cells, sizeof(struct llama_kv_cache_view_cell) * view->n_cells); + GGML_ASSERT(p != nullptr && "Failed to alloc kv_cache_view cells"); + view->cells = (struct llama_kv_cache_view_cell *)p; + p = realloc(view->cells_sequences, sizeof(llama_seq_id) * view->n_max_seq * view->n_cells); + GGML_ASSERT(p != nullptr && "Failed to alloc kv_cache_view cells sequences"); + view->cells_sequences = (llama_seq_id *)p; + } + + const std::vector & kv_cells = ctx->kv_self.cells; + llama_kv_cache_view_cell * c_curr = view->cells; + llama_seq_id * cs_curr = view->cells_sequences; + int32_t used_cells = 0; + int32_t token_count = 0; + int32_t curr_contig_idx = -1; + uint32_t max_contig = 0; + int32_t max_contig_idx = -1; + + for (int32_t i = 0; i < int32_t(ctx->kv_self.size); i++, c_curr++, cs_curr += view->n_max_seq) { + const size_t curr_size = kv_cells[i].seq_id.size(); + token_count += curr_size; + c_curr->pos = kv_cells[i].pos + kv_cells[i].delta; + + if (curr_size > 0) { + if (curr_contig_idx >= 0 && uint32_t(i - curr_contig_idx) > max_contig) { + max_contig = i - curr_contig_idx; + max_contig_idx = curr_contig_idx; + } + curr_contig_idx = -1; + } else if (curr_contig_idx < 0) { + curr_contig_idx = i; + } + + int seq_idx = 0; + for (const llama_seq_id it : kv_cells[i].seq_id) { + if (seq_idx >= view->n_max_seq) { + break; + } + cs_curr[seq_idx] = it; + seq_idx++; + } + if (seq_idx != 0) { + used_cells++; + } + for (; seq_idx < view->n_max_seq; seq_idx++) { + cs_curr[seq_idx] = -1; + } + } + if (curr_contig_idx >= 0 && kv_cells.size() - curr_contig_idx > max_contig) { + max_contig_idx = curr_contig_idx; + max_contig = kv_cells.size() - curr_contig_idx; + } + view->max_contiguous = max_contig; + view->max_contiguous_idx = max_contig_idx; + view->token_count = token_count; + view->used_cells = used_cells; + if (uint32_t(used_cells) != ctx->kv_self.used) { + LLAMA_LOG_ERROR("%s: used cells mismatch. kv_cache says %d but we calculated %d\n", + __func__, ctx->kv_self.used, used_cells); + } +} + int llama_get_kv_cache_token_count(const struct llama_context * ctx) { - return ctx->kv_self.head; + int result = 0; + + for (uint32_t i = 0; i < ctx->kv_self.size; i++) { + result += ctx->kv_self.cells[i].seq_id.size(); + } + + return result; +} + +int llama_get_kv_cache_used_cells(const struct llama_context * ctx) { + return ctx->kv_self.used; } void llama_kv_cache_clear(struct llama_context * ctx) { @@ -8873,43 +9655,53 @@ static void llama_copy_state_data_internal(struct llama_context * ctx, llama_dat const size_t kv_buf_size = kv_self.buf.size; const uint32_t kv_head = kv_self.head; const uint32_t kv_size = kv_self.size; + const uint32_t kv_used = kv_self.used; data_ctx->write(&kv_buf_size, sizeof(kv_buf_size)); data_ctx->write(&kv_head, sizeof(kv_head)); data_ctx->write(&kv_size, sizeof(kv_size)); + data_ctx->write(&kv_used, sizeof(kv_used)); if (kv_buf_size) { - const size_t elt_size = ggml_element_size(kv_self.k); + const size_t elt_size = ggml_element_size(kv_self.k_l[0]); - ggml_context * cpy_ctx = ggml_init({ 6*ggml_tensor_overhead() + ggml_graph_overhead(), NULL, /* no_alloc */ true }); + ggml_context * cpy_ctx = ggml_init({ 6*n_layer*ggml_tensor_overhead() + ggml_graph_overhead(), NULL, /* no_alloc */ true }); ggml_cgraph * gf = ggml_new_graph(cpy_ctx); - ggml_tensor * kout3d = ggml_new_tensor_3d(cpy_ctx, kv_self.k->type, n_embd, kv_head, n_layer); - std::vector kout3d_data(ggml_nbytes(kout3d), 0); - kout3d->data = kout3d_data.data(); + std::vector> kout2d_data(n_layer); + std::vector> vout2d_data(n_layer); - ggml_tensor * vout3d = ggml_new_tensor_3d(cpy_ctx, kv_self.v->type, kv_head, n_embd, n_layer); - std::vector vout3d_data(ggml_nbytes(vout3d), 0); - vout3d->data = vout3d_data.data(); + for (int il = 0; il < (int) n_layer; ++il) { + ggml_tensor * kout2d = ggml_new_tensor_2d(cpy_ctx, kv_self.k_l[il]->type, n_embd, kv_head); + kout2d_data[il].resize(ggml_nbytes(kout2d)); + kout2d->data = kout2d_data[il].data(); - ggml_tensor * k3d = ggml_view_3d(cpy_ctx, kv_self.k, - n_embd, kv_head, n_layer, - elt_size*n_embd, elt_size*n_embd*n_ctx, 0); + ggml_tensor * vout2d = ggml_new_tensor_2d(cpy_ctx, kv_self.v_l[il]->type, kv_head, n_embd); + vout2d_data[il].resize(ggml_nbytes(vout2d)); + vout2d->data = vout2d_data[il].data(); - ggml_tensor * v3d = ggml_view_3d(cpy_ctx, kv_self.v, - kv_head, n_embd, n_layer, - elt_size*n_ctx, elt_size*n_ctx*n_embd, 0); + ggml_tensor * k2d = ggml_view_2d(cpy_ctx, kv_self.k_l[il], + n_embd, kv_head, + elt_size*n_embd, 0); + + ggml_tensor * v2d = ggml_view_2d(cpy_ctx, kv_self.v_l[il], + kv_head, n_embd, + elt_size*n_ctx, 0); + + ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, k2d, kout2d)); + ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, v2d, vout2d)); + } - ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, k3d, kout3d)); - ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, v3d, vout3d)); ggml_graph_compute_helper(ctx->work_buffer, gf, /*n_threads*/ 1); ggml_free(cpy_ctx); - // our data is now in the kout3d_data and vout3d_data buffers + // our data is now in the kout2d_data and vout2d_data buffers // write them to file - data_ctx->write(kout3d_data.data(), kout3d_data.size()); - data_ctx->write(vout3d_data.data(), vout3d_data.size()); + for (uint32_t il = 0; il < n_layer; ++il) { + data_ctx->write(kout2d_data[il].data(), kout2d_data[il].size()); + data_ctx->write(vout2d_data[il].data(), vout2d_data[il].size()); + } } for (uint32_t i = 0; i < kv_size; ++i) { @@ -8999,37 +9791,42 @@ size_t llama_set_state_data(struct llama_context * ctx, uint8_t * src) { size_t kv_buf_size; uint32_t kv_head; uint32_t kv_size; + uint32_t kv_used; memcpy(&kv_buf_size, inp, sizeof(kv_buf_size)); inp += sizeof(kv_buf_size); memcpy(&kv_head, inp, sizeof(kv_head)); inp += sizeof(kv_head); memcpy(&kv_size, inp, sizeof(kv_size)); inp += sizeof(kv_size); + memcpy(&kv_used, inp, sizeof(kv_used)); inp += sizeof(kv_used); if (kv_buf_size) { GGML_ASSERT(kv_self.buf.size == kv_buf_size); - const size_t elt_size = ggml_element_size(kv_self.k); + const size_t elt_size = ggml_element_size(kv_self.k_l[0]); - ggml_context * cpy_ctx = ggml_init({ 6*ggml_tensor_overhead() + ggml_graph_overhead(), NULL, /* no_alloc */ true }); + ggml_context * cpy_ctx = ggml_init({ 6*n_layer*ggml_tensor_overhead() + ggml_graph_overhead(), NULL, /* no_alloc */ true }); ggml_cgraph * gf = ggml_new_graph(cpy_ctx); - ggml_tensor * kin3d = ggml_new_tensor_3d(cpy_ctx, kv_self.k->type, n_embd, kv_head, n_layer); - kin3d->data = (void *) inp; - inp += ggml_nbytes(kin3d); + for (int il = 0; il < n_layer; ++il) { + ggml_tensor * kin2d = ggml_new_tensor_2d(cpy_ctx, kv_self.k_l[il]->type, n_embd, kv_head); + kin2d->data = (void *) inp; + inp += ggml_nbytes(kin2d); - ggml_tensor * vin3d = ggml_new_tensor_3d(cpy_ctx, kv_self.v->type, kv_head, n_embd, n_layer); - vin3d->data = (void *) inp; - inp += ggml_nbytes(vin3d); + ggml_tensor * vin2d = ggml_new_tensor_2d(cpy_ctx, kv_self.v_l[il]->type, kv_head, n_embd); + vin2d->data = (void *) inp; + inp += ggml_nbytes(vin2d); - ggml_tensor * k3d = ggml_view_3d(cpy_ctx, kv_self.k, - n_embd, kv_head, n_layer, - elt_size*n_embd, elt_size*n_embd*n_ctx, 0); + ggml_tensor * k2d = ggml_view_2d(cpy_ctx, kv_self.k_l[il], + n_embd, kv_head, + elt_size*n_embd, 0); - ggml_tensor * v3d = ggml_view_3d(cpy_ctx, kv_self.v, - kv_head, n_embd, n_layer, - elt_size*n_ctx, elt_size*n_ctx*n_embd, 0); + ggml_tensor * v2d = ggml_view_2d(cpy_ctx, kv_self.v_l[il], + kv_head, n_embd, + elt_size*n_ctx, 0); + + ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, kin2d, k2d)); + ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, vin2d, v2d)); + } - ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, kin3d, k3d)); - ggml_build_forward_expand(gf, ggml_cpy(cpy_ctx, vin3d, v3d)); ggml_graph_compute_helper(ctx->work_buffer, gf, /*n_threads*/ 1); ggml_free(cpy_ctx); @@ -9037,6 +9834,7 @@ size_t llama_set_state_data(struct llama_context * ctx, uint8_t * src) { ctx->kv_self.head = kv_head; ctx->kv_self.size = kv_size; + ctx->kv_self.used = kv_used; ctx->kv_self.cells.resize(kv_size); @@ -9285,6 +10083,14 @@ llama_token llama_token_nl(const struct llama_model * model) { return model->vocab.linefeed_id; } +int llama_add_bos_token(const struct llama_model * model) { + return model->vocab.special_add_bos; +} + +int llama_add_eos_token(const struct llama_model * model) { + return model->vocab.special_add_eos; +} + llama_token llama_token_prefix(const struct llama_model * model) { return model->vocab.special_prefix_id; } @@ -9491,6 +10297,9 @@ const std::vector> & llama_internal void llama_log_set(ggml_log_callback log_callback, void * user_data) { g_state.log_callback = log_callback ? log_callback : llama_log_callback_default; g_state.log_callback_user_data = user_data; +#ifdef GGML_USE_METAL + ggml_metal_log_set_callback(g_state.log_callback, g_state.log_callback_user_data); +#endif } static void llama_log_internal_v(ggml_log_level level, const char * format, va_list args) { diff --git a/llama.h b/llama.h index e8dc04bb54b81..45a65cacb7bb8 100644 --- a/llama.h +++ b/llama.h @@ -42,7 +42,7 @@ #define LLAMA_FILE_MAGIC_GGSN 0x6767736eu // 'ggsn' #define LLAMA_SESSION_MAGIC LLAMA_FILE_MAGIC_GGSN -#define LLAMA_SESSION_VERSION 2 +#define LLAMA_SESSION_VERSION 3 #if defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST) || defined(GGML_USE_METAL) // Defined when llama.cpp is compiled with support for offloading model layers to GPU. @@ -158,6 +158,22 @@ extern "C" { llama_seq_id all_seq_id; // used if seq_id == NULL } llama_batch; + enum llama_model_kv_override_type { + LLAMA_KV_OVERRIDE_INT, + LLAMA_KV_OVERRIDE_FLOAT, + LLAMA_KV_OVERRIDE_BOOL, + }; + + struct llama_model_kv_override { + char key[128]; + enum llama_model_kv_override_type tag; + union { + int64_t int_value; + double float_value; + bool bool_value; + }; + }; + struct llama_model_params { int32_t n_gpu_layers; // number of layers to store in VRAM int32_t main_gpu; // the GPU that is used for scratch and small tensors @@ -165,9 +181,13 @@ extern "C" { // called with a progress value between 0 and 1, pass NULL to disable llama_progress_callback progress_callback; + // context pointer passed to the progress callback void * progress_callback_user_data; + // override key-value pairs of the model meta data + const struct llama_model_kv_override * kv_overrides; + // Keep the booleans together to avoid misalignment during copy-by-value. bool vocab_only; // only load the vocabulary, no weights bool use_mmap; // use mmap if possible @@ -185,17 +205,20 @@ extern "C" { // ref: https://github.com/ggerganov/llama.cpp/pull/2054 float rope_freq_base; // RoPE base frequency, 0 = from model float rope_freq_scale; // RoPE frequency scaling factor, 0 = from model - float yarn_ext_factor; // YaRN extrapolation mix factor, NaN = from model + float yarn_ext_factor; // YaRN extrapolation mix factor, negative = from model float yarn_attn_factor; // YaRN magnitude scaling factor float yarn_beta_fast; // YaRN low correction dim float yarn_beta_slow; // YaRN high correction dim uint32_t yarn_orig_ctx; // YaRN original context size + enum ggml_type type_k; // data type for K cache + enum ggml_type type_v; // data type for V cache + // Keep the booleans together to avoid misalignment during copy-by-value. - bool mul_mat_q; // if true, use experimental mul_mat_q kernels (DEPRECATED - always true) - bool f16_kv; // use fp16 for KV cache, fp32 otherwise - bool logits_all; // the llama_eval() call computes all logits, not just the last one - bool embedding; // embedding mode only + bool mul_mat_q; // if true, use experimental mul_mat_q kernels (DEPRECATED - always true) + bool logits_all; // the llama_eval() call computes all logits, not just the last one (DEPRECATED - set llama_batch.logits instead) + bool embedding; // embedding mode only + bool offload_kqv; // whether to offload the KQV ops (including the KV cache) to GPU }; // model quantization parameters @@ -301,6 +324,23 @@ extern "C" { // Get the model's RoPE frequency scaling factor LLAMA_API float llama_rope_freq_scale_train(const struct llama_model * model); + // Functions to access the model's GGUF metadata scalar values + // - The functions return the length of the string on success, or -1 on failure + // - The output string is always null-terminated and cleared on failure + // - GGUF array values are not supported by these functions + + // Get metadata value as a string by key name + LLAMA_API int llama_model_meta_val_str(const struct llama_model * model, const char * key, char * buf, size_t buf_size); + + // Get the number of metadata key/value pairs + LLAMA_API int llama_model_meta_count(const struct llama_model * model); + + // Get metadata key name by index + LLAMA_API int llama_model_meta_key_by_index(const struct llama_model * model, int i, char * buf, size_t buf_size); + + // Get metadata value as a string by index + LLAMA_API int llama_model_meta_val_str_by_index(const struct llama_model * model, int i, char * buf, size_t buf_size); + // Get a string describing the model type LLAMA_API int llama_model_desc(const struct llama_model * model, char * buf, size_t buf_size); @@ -344,9 +384,60 @@ extern "C" { // KV cache // - // Returns the number of tokens in the KV cache - LLAMA_API DEPRECATED(int llama_get_kv_cache_token_count(const struct llama_context * ctx), - "avoid using this, it will be removed in the future, instead - count the tokens in user code"); + // Information associated with an individual cell in the KV cache view. + struct llama_kv_cache_view_cell { + // The position for this cell. Takes KV cache shifts into account. + // May be negative if the cell is not populated. + llama_pos pos; + }; + + // An updateable view of the KV cache. + struct llama_kv_cache_view { + // Number of KV cache cells. This will be the same as the context size. + int32_t n_cells; + + // Maximum number of sequences that can exist in a cell. It's not an error + // if there are more sequences in a cell than this value, however they will + // not be visible in the view cells_sequences. + int32_t n_max_seq; + + // Number of tokens in the cache. For example, if there are two populated + // cells, the first with 1 sequence id in it and the second with 2 sequence + // ids then you'll have 3 tokens. + int32_t token_count; + + // Number of populated cache cells. + int32_t used_cells; + + // Maximum contiguous empty slots in the cache. + int32_t max_contiguous; + + // Index to the start of the max_contiguous slot range. Can be negative + // when cache is full. + int32_t max_contiguous_idx; + + // Information for an individual cell. + struct llama_kv_cache_view_cell * cells; + + // The sequences for each cell. There will be n_max_seq items per cell. + llama_seq_id * cells_sequences; + }; + + // Create an empty KV cache view. (use only for debugging purposes) + LLAMA_API struct llama_kv_cache_view llama_kv_cache_view_init(const struct llama_context * ctx, int32_t n_max_seq); + + // Free a KV cache view. (use only for debugging purposes) + LLAMA_API void llama_kv_cache_view_free(struct llama_kv_cache_view * view); + + // Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes) + LLAMA_API void llama_kv_cache_view_update(const struct llama_context * ctx, struct llama_kv_cache_view * view); + + // Returns the number of tokens in the KV cache (slow, use only for debug) + // If a KV cell has multiple sequences assigned to it, it will be counted multiple times + LLAMA_API int llama_get_kv_cache_token_count(const struct llama_context * ctx); + + // Returns the number of used KV cells (i.e. have at least one sequence assigned to them) + LLAMA_API int llama_get_kv_cache_used_cells(const struct llama_context * ctx); // Clear the KV cache LLAMA_API void llama_kv_cache_clear( @@ -517,6 +608,12 @@ extern "C" { LLAMA_API llama_token llama_token_eos(const struct llama_model * model); // end-of-sentence LLAMA_API llama_token llama_token_nl (const struct llama_model * model); // next-line + // Returns -1 if unknown, 1 for true or 0 for false. + LLAMA_API int llama_add_bos_token(const struct llama_model * model); + + // Returns -1 if unknown, 1 for true or 0 for false. + LLAMA_API int llama_add_eos_token(const struct llama_model * model); + // codellama infill tokens LLAMA_API llama_token llama_token_prefix(const struct llama_model * model); // Beginning of infill prefix LLAMA_API llama_token llama_token_middle(const struct llama_model * model); // Beginning of infill middle diff --git a/prompts/chat-with-qwen.txt b/prompts/chat-with-qwen.txt new file mode 100644 index 0000000000000..ac39ad9257b26 --- /dev/null +++ b/prompts/chat-with-qwen.txt @@ -0,0 +1 @@ +You are a helpful assistant. \ No newline at end of file diff --git a/requirements-hf-to-gguf.txt b/requirements-hf-to-gguf.txt new file mode 100644 index 0000000000000..f4600539e27ac --- /dev/null +++ b/requirements-hf-to-gguf.txt @@ -0,0 +1,3 @@ +-r requirements.txt +torch==2.1.1 +transformers==4.35.2 diff --git a/scripts/build-info.cmake b/scripts/build-info.cmake index 73853dfa47f41..ea3dc55c83439 100644 --- a/scripts/build-info.cmake +++ b/scripts/build-info.cmake @@ -1,5 +1,3 @@ -set(TEMPLATE_FILE "${CMAKE_CURRENT_SOURCE_DIR}/common/build-info.cpp.in") -set(OUTPUT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/common/build-info.cpp") set(BUILD_NUMBER 0) set(BUILD_COMMIT "unknown") set(BUILD_COMPILER "unknown") @@ -58,23 +56,3 @@ else() ) set(BUILD_TARGET ${OUT}) endif() - -# Only write the build info if it changed -if(EXISTS ${OUTPUT_FILE}) - file(READ ${OUTPUT_FILE} CONTENTS) - string(REGEX MATCH "LLAMA_COMMIT = \"([^\"]*)\";" _ ${CONTENTS}) - set(OLD_COMMIT ${CMAKE_MATCH_1}) - string(REGEX MATCH "LLAMA_COMPILER = \"([^\"]*)\";" _ ${CONTENTS}) - set(OLD_COMPILER ${CMAKE_MATCH_1}) - string(REGEX MATCH "LLAMA_BUILD_TARGET = \"([^\"]*)\";" _ ${CONTENTS}) - set(OLD_TARGET ${CMAKE_MATCH_1}) - if ( - NOT OLD_COMMIT STREQUAL BUILD_COMMIT OR - NOT OLD_COMPILER STREQUAL BUILD_COMPILER OR - NOT OLD_TARGET STREQUAL BUILD_TARGET - ) - configure_file(${TEMPLATE_FILE} ${OUTPUT_FILE}) - endif() -else() - configure_file(${TEMPLATE_FILE} ${OUTPUT_FILE}) -endif() diff --git a/scripts/gen-build-info-cpp.cmake b/scripts/gen-build-info-cpp.cmake new file mode 100644 index 0000000000000..d8933892011b3 --- /dev/null +++ b/scripts/gen-build-info-cpp.cmake @@ -0,0 +1,24 @@ +include(${CMAKE_CURRENT_SOURCE_DIR}/scripts/build-info.cmake) + +set(TEMPLATE_FILE "${CMAKE_CURRENT_SOURCE_DIR}/common/build-info.cpp.in") +set(OUTPUT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/common/build-info.cpp") + +# Only write the build info if it changed +if(EXISTS ${OUTPUT_FILE}) + file(READ ${OUTPUT_FILE} CONTENTS) + string(REGEX MATCH "LLAMA_COMMIT = \"([^\"]*)\";" _ ${CONTENTS}) + set(OLD_COMMIT ${CMAKE_MATCH_1}) + string(REGEX MATCH "LLAMA_COMPILER = \"([^\"]*)\";" _ ${CONTENTS}) + set(OLD_COMPILER ${CMAKE_MATCH_1}) + string(REGEX MATCH "LLAMA_BUILD_TARGET = \"([^\"]*)\";" _ ${CONTENTS}) + set(OLD_TARGET ${CMAKE_MATCH_1}) + if ( + NOT OLD_COMMIT STREQUAL BUILD_COMMIT OR + NOT OLD_COMPILER STREQUAL BUILD_COMPILER OR + NOT OLD_TARGET STREQUAL BUILD_TARGET + ) + configure_file(${TEMPLATE_FILE} ${OUTPUT_FILE}) + endif() +else() + configure_file(${TEMPLATE_FILE} ${OUTPUT_FILE}) +endif() diff --git a/scripts/sync-ggml.sh b/scripts/sync-ggml.sh index 4024531b10f70..0097db435a466 100755 --- a/scripts/sync-ggml.sh +++ b/scripts/sync-ggml.sh @@ -20,5 +20,6 @@ cp -rpv ../ggml/include/ggml/ggml.h ./ggml.h cp -rpv ../ggml/include/ggml/ggml-alloc.h ./ggml-alloc.h cp -rpv ../ggml/include/ggml/ggml-backend.h ./ggml-backend.h -cp -rpv ../ggml/tests/test-opt.cpp ./tests/test-opt.cpp -cp -rpv ../ggml/tests/test-grad0.cpp ./tests/test-grad0.cpp +cp -rpv ../ggml/tests/test-opt.cpp ./tests/test-opt.cpp +cp -rpv ../ggml/tests/test-grad0.cpp ./tests/test-grad0.cpp +cp -rpv ../ggml/tests/test-backend-ops.cpp ./tests/test-backend-ops.cpp diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index c8b4bc254f4c6..e42237c7a2e38 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -22,26 +22,32 @@ endfunction() llama_build_and_test_executable(test-quantize-fns.cpp) llama_build_and_test_executable(test-quantize-perf.cpp) llama_build_and_test_executable(test-sampling.cpp) + llama_build_executable(test-tokenizer-0-llama.cpp) llama_test_executable (test-tokenizer-0-llama test-tokenizer-0-llama.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-llama.gguf) + llama_build_executable(test-tokenizer-0-falcon.cpp) llama_test_executable (test-tokenizer-0-falcon test-tokenizer-0-falcon.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-falcon.gguf) + llama_build_executable(test-tokenizer-1-llama.cpp) -llama_test_executable (test-tokenizer-1-llama test-tokenizer-1-llama.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-llama.gguf) -llama_test_executable(test-tokenizer-1-baichuan test-tokenizer-1-llama.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-baichuan.gguf) +llama_test_executable (test-tokenizer-1-llama test-tokenizer-1-llama.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-llama.gguf) +llama_test_executable (test-tokenizer-1-baichuan test-tokenizer-1-llama.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-baichuan.gguf) + llama_build_executable(test-tokenizer-1-bpe.cpp) -llama_test_executable (test-tokenizer-1-falcon test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-falcon.gguf) -llama_test_executable(test-tokenizer-1-aquila test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-aquila.gguf) -llama_test_executable(test-tokenizer-1-mpt test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-mpt.gguf) -llama_test_executable(test-tokenizer-1-stablelm-3b-4e1t test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-stablelm-3b-4e1t.gguf) -llama_test_executable(test-tokenizer-1-gpt-neox test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-gpt-neox.gguf) -llama_test_executable(test-tokenizer-1-refact test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-refact.gguf) -llama_test_executable(test-tokenizer-1-starcoder test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-starcoder.gguf) -# llama_test_executable(test-tokenizer-1-bloom test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-bloom.gguf) # BIG +llama_test_executable (test-tokenizer-1-falcon test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-falcon.gguf) +llama_test_executable (test-tokenizer-1-aquila test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-aquila.gguf) +llama_test_executable (test-tokenizer-1-mpt test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-mpt.gguf) +llama_test_executable (test-tokenizer-1-stablelm-3b-4e1t test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-stablelm-3b-4e1t.gguf) +llama_test_executable (test-tokenizer-1-gpt-neox test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-gpt-neox.gguf) +llama_test_executable (test-tokenizer-1-refact test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-refact.gguf) +llama_test_executable (test-tokenizer-1-starcoder test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-starcoder.gguf) +# llama_test_executable (test-tokenizer-1-bloom test-tokenizer-1-bpe.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab-bloom.gguf) # BIG + llama_build_and_test_executable(test-grammar-parser.cpp) llama_build_and_test_executable(test-llama-grammar.cpp) -llama_build_and_test_executable(test-grad0.cpp) # SLOW +llama_build_and_test_executable(test-grad0.cpp) # llama_build_and_test_executable(test-opt.cpp) # SLOW +llama_build_and_test_executable(test-backend-ops.cpp) llama_build_and_test_executable(test-rope.cpp) diff --git a/tests/test-backend-ops.cpp b/tests/test-backend-ops.cpp new file mode 100644 index 0000000000000..44830b4d4da30 --- /dev/null +++ b/tests/test-backend-ops.cpp @@ -0,0 +1,1490 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static void init_tensor_uniform(ggml_tensor * tensor, float min = -1.0f, float max = 1.0f) { + size_t size = ggml_nelements(tensor); + std::vector data(size); + +#if 0 + std::default_random_engine generator(rd()); + std::uniform_real_distribution distribution(min, max); + + for (size_t i = 0; i < size; i++) { + data[i] = distribution(generator); + } +#endif + auto init_thread = [&](size_t start, size_t end) { + std::random_device rd; + std::default_random_engine generator(rd()); + std::uniform_real_distribution distribution(min, max); + + for (size_t i = start; i < end; i++) { + data[i] = distribution(generator); + } + }; + + size_t n_threads = std::thread::hardware_concurrency(); + std::vector threads; + threads.reserve(n_threads); + for (size_t i = 0; i < n_threads; i++) { + size_t start = i*size/n_threads; + size_t end = (i+1)*size/n_threads; + threads.emplace_back(init_thread, start, end); + } + for (auto & t : threads) { + t.join(); + } + + if (tensor->type == GGML_TYPE_F32 || tensor->type == GGML_TYPE_I32) { + ggml_backend_tensor_set(tensor, data.data(), 0, size * sizeof(float)); + } else if (ggml_is_quantized(tensor->type) || tensor->type == GGML_TYPE_F16) { + GGML_ASSERT(size % ggml_blck_size(tensor->type) == 0); + std::vector dataq(ggml_type_size(tensor->type)*size/ggml_blck_size(tensor->type)); + int64_t hist[16]; + ggml_quantize_chunk(tensor->type, data.data(), dataq.data(), 0, size, hist); + ggml_backend_tensor_set(tensor, dataq.data(), 0, dataq.size()); + } else { + GGML_ASSERT(false); + } +} + +static std::vector tensor_to_float(const ggml_tensor * t) { + std::vector tv; + tv.reserve(ggml_nelements(t)); + + std::vector buf(ggml_nbytes(t)); + ggml_backend_tensor_get(t, buf.data(), 0, ggml_nbytes(t)); + + ggml_type_traits_t tt = ggml_internal_get_type_traits(t->type); + size_t bs = ggml_blck_size(t->type); + + // access elements by index to avoid gaps in views + for (int64_t i3 = 0; i3 < t->ne[3]; i3++) { + for (int64_t i2 = 0; i2 < t->ne[2]; i2++) { + for (int64_t i1 = 0; i1 < t->ne[1]; i1++) { + for (int64_t i0 = 0; i0 < t->ne[0]; i0 += bs) { + size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0/bs*t->nb[0]; + if (t->type == GGML_TYPE_F16) { + tv.push_back(ggml_fp16_to_fp32(*(ggml_fp16_t*)&buf[i])); + } else if (t->type == GGML_TYPE_F32) { + tv.push_back(*(float *) &buf[i]); + } else if (t->type == GGML_TYPE_I32) { + tv.push_back((float)*(int32_t *) &buf[i]); + } else if (ggml_is_quantized(t->type)) { + std::vector vq(ggml_blck_size(t->type)); + tt.to_float(&buf[i], vq.data(), ggml_blck_size(t->type)); + tv.insert(tv.end(), vq.begin(), vq.end()); + } else { + GGML_ASSERT(false); + } + } + } + } + } + + return tv; +} + +/* +static double cosine_similarity(const float * v1, const float * v2, size_t n) { + double dot = 0.0; + double mag1 = 0.0; + double mag2 = 0.0; + + for (size_t i = 0; i < n; i++) { + if (std::isnan(v1[i]) || std::isnan(v2[i])) { + return -1.0f; + } + if (std::isinf(v1[i]) && std::isinf(v2[i])) { + continue; + } + dot += v1[i]*v2[i]; + mag1 += v1[i]*v1[i]; + mag2 += v2[i]*v2[i]; + } + + return dot/sqrt(mag1*mag2); +} + +static float distance(const float * v1, const float * v2, size_t n) { + double d = 0.0; + + for (size_t i = 0; i < n; i++) { + if (std::isnan(v1[i]) || std::isnan(v2[i])) { + return INFINITY; + } + if (std::isinf(v1[i]) && std::isinf(v2[i])) { + continue; + } + d += (v1[i] - v2[i])*(v1[i] - v2[i]); + } + + return sqrt(d); +} + +static float vec_len(const float * v, size_t n) { + double d = 0.0; + + for (size_t i = 0; i < n; i++) { + if (std::isnan(v[i])) { + return INFINITY; + } + if (std::isinf(v[i])) { + continue; + } + d += v[i]*v[i]; + } + + return sqrt(d); +} +*/ + +// normalized mean squared error = mse(a, b) / mse(a, 0) +static double nmse(const float * a, const float * b, size_t n) { + double mse_a_b = 0.0; + double mse_a_0 = 0.0; + + for (size_t i = 0; i < n; i++) { + float a_i = a[i]; + float b_i = b[i]; + + mse_a_b += (a_i - b_i) * (a_i - b_i); + mse_a_0 += a_i * a_i; + } + + return mse_a_b / mse_a_0; +} + +// utils for printing the variables of the test cases +#define VAR_TO_STR(x) (#x "=" + var_to_str(x)) + +template +static std::string var_to_str(const T & x) { + return std::to_string(x); +} + +template +static std::string var_to_str(const T (&x)[N]) { + std::string s = "["; + for (size_t i = 0; i < N; i++) { + if (i > 0) { + s += ","; + } + s += var_to_str(x[i]); + } + s += "]"; + return s; +} + +template +static std::string var_to_str(const std::array & x) { + std::string s = "["; + for (size_t i = 0; i < N; i++) { + if (i > 0) { + s += ","; + } + s += var_to_str(x[i]); + } + s += "]"; + return s; +} + +//static std::string var_to_str(ggml_unary_op unary_op) { +// return ggml_unary_op_name(unary_op); +//} + +static std::string var_to_str(ggml_type type) { + return ggml_type_name(type); +} + +#define VARS_TO_STR1(a) VAR_TO_STR(a) +#define VARS_TO_STR2(a, b) VAR_TO_STR(a) + "," + VAR_TO_STR(b) +#define VARS_TO_STR3(a, b, c) VAR_TO_STR(a) + "," + VARS_TO_STR2(b, c) +#define VARS_TO_STR4(a, b, c, d) VAR_TO_STR(a) + "," + VARS_TO_STR3(b, c, d) +#define VARS_TO_STR5(a, b, c, d, e) VAR_TO_STR(a) + "," + VARS_TO_STR4(b, c, d, e) +#define VARS_TO_STR6(a, b, c, d, e, f) VAR_TO_STR(a) + "," + VARS_TO_STR5(b, c, d, e, f) +#define VARS_TO_STR7(a, b, c, d, e, f, g) VAR_TO_STR(a) + "," + VARS_TO_STR6(b, c, d, e, f, g) +#define VARS_TO_STR8(a, b, c, d, e, f, g, h) VAR_TO_STR(a) + "," + VARS_TO_STR7(b, c, d, e, f, g, h) +#define VARS_TO_STR9(a, b, c, d, e, f, g, h, i) VAR_TO_STR(a) + "," + VARS_TO_STR8(b, c, d, e, f, g, h, i) +#define VARS_TO_STR10(a, b, c, d, e, f, g, h, i, j) VAR_TO_STR(a) + "," + VARS_TO_STR9(b, c, d, e, f, g, h, i, j) +#define VARS_TO_STR11(a, b, c, d, e, f, g, h, i, j, k) VAR_TO_STR(a) + "," + VARS_TO_STR10(b, c, d, e, f, g, h, i, j, k) + + +// accept FLT_MAX as infinity +static bool isinf_or_max(float f) { + return std::isinf(f) || f == FLT_MAX || f == -FLT_MAX; +} + +static bool ggml_is_view_op(enum ggml_op op) { + return op == GGML_OP_VIEW || op == GGML_OP_RESHAPE || op == GGML_OP_PERMUTE || op == GGML_OP_TRANSPOSE; +} + +struct test_case { + virtual ~test_case() {} + + virtual std::string op_desc(ggml_tensor * t) { + return ggml_op_desc(t); + } + + virtual std::string vars() { + return ""; + } + + virtual ggml_tensor * build_graph(ggml_context * ctx) = 0; + + virtual double max_nmse_err() { + return 1e-7; + } + + virtual void initialize_tensors(ggml_context * ctx) { + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != nullptr; t = ggml_get_next_tensor(ctx, t)) { + init_tensor_uniform(t); + } + } + + virtual size_t op_size(ggml_tensor * t) { + size_t size = ggml_nbytes(t); + // add source tensors + for (int i = 0; i < GGML_MAX_SRC; i++) { + if (t->src[i] != NULL) { + size += ggml_nbytes(t->src[i]); + } + } + return size; + } + + bool eval(ggml_backend_t backend1, ggml_backend_t backend2, const char * op_name) { + ggml_init_params params = { + /* .mem_size = */ ggml_tensor_overhead()*128 + ggml_graph_overhead(), + /* .mem_base = */ NULL, + /* .no_alloc = */ true, + }; + ggml_context * ctx = ggml_init(params); + + ggml_tensor * out = build_graph(ctx); + + if (op_name != nullptr && op_desc(out) != op_name) { + //printf(" %s: skipping\n", op_desc(out).c_str()); + ggml_free(ctx); + return true; + } + + printf(" %s(%s): ", op_desc(out).c_str(), vars().c_str()); + fflush(stdout); + + // check if backends support op + for (ggml_backend_t backend : {backend1, backend2}) { + if (!ggml_backend_supports_op(backend, out)) { + printf("not supported\n"); + ggml_free(ctx); + return true; + } + } + + // allocate + ggml_backend_buffer_t buf = ggml_backend_alloc_ctx_tensors(ctx, backend1); + + // build graph + ggml_cgraph * gf = ggml_new_graph(ctx); + ggml_build_forward_expand(gf, out); + + // randomize tensors + initialize_tensors(ctx); + + // compare + struct callback_userdata { + bool ok; + double max_err; + }; + + callback_userdata ud { + true, + max_nmse_err(), + }; + + auto callback = [](int index, ggml_tensor * t1, ggml_tensor * t2, void * user_data) -> bool { + std::vector f1 = tensor_to_float(t1); + std::vector f2 = tensor_to_float(t2); + callback_userdata * ud = (callback_userdata *) user_data; + + for (size_t i = 0; i < f1.size(); i++) { + // check for nans + if (std::isnan(f1[i]) || std::isnan(f2[i])) { + printf("[%s] NaN at index %zu (%f %f) ", ggml_op_desc(t1), i, f1[i], f2[i]); + ud->ok = false; + return true; + } + // check for infs: both must be inf of the same sign, or both must be finite + if (isinf_or_max(f1[i]) || isinf_or_max(f2[i])) { + if (isinf_or_max(f1[i]) && isinf_or_max(f2[i])) { + if (std::signbit(f1[i]) != std::signbit(f2[i])) { + printf("[%s] inf sign mismatch: %f %f ", ggml_op_desc(t1), f1[i], f2[i]); + ud->ok = false; + return true; + } + } else { + printf("[%s] inf mismatch: %f %f ", ggml_op_desc(t1), f1[i], f2[i]); + ud->ok = false; + return true; + } + } + } + + double err = nmse(f1.data(), f2.data(), f1.size()); + if (err > ud->max_err) { + printf("[%s] NMSE = %f ", ggml_op_desc(t1), err); + //for (int i = 0; i < f1.size(); i++) { + // printf("(%f, %f) ", f1[i], f2[i]); + //} + //printf("\n"); + ud->ok = false; + } + return true; + + GGML_UNUSED(index); + }; + + ggml_backend_compare_graph_backend(backend1, backend2, gf, callback, &ud); + + if (ud.ok) { + printf("\033[1;32mOK\033[0m\n"); + } else { + printf("\033[1;31mFAIL\033[0m\n"); + } + + ggml_backend_buffer_free(buf); + + ggml_free(ctx); + + return ud.ok; + } + + bool eval_perf(ggml_backend_t backend, const char * op_name) { + static const size_t graph_nodes = 8192; + + ggml_init_params params = { + /* .mem_size = */ ggml_tensor_overhead()*128 + ggml_graph_overhead_custom(graph_nodes, false), + /* .mem_base = */ NULL, + /* .no_alloc = */ true, + }; + ggml_context * ctx = ggml_init(params); + + ggml_tensor * out = build_graph(ctx); + + if (op_name != nullptr && op_desc(out) != op_name) { + //printf(" %s: skipping\n", op_desc(out).c_str()); + ggml_free(ctx); + return true; + } + + int len = printf(" %s(%s): ", op_desc(out).c_str(), vars().c_str()); + fflush(stdout); + + // check if backends support op + if (!ggml_backend_supports_op(backend, out)) { + printf("not supported\n"); + ggml_free(ctx); + return true; + } + + // align while also leaving some margin for variations in parameters + int align = 20; + int last = (len + align - 1) / align * align; + if (last - len < 5) { + last += align; + } + last = std::max(last, 60); + printf("%*s", last - len, ""); + + // allocate + ggml_backend_buffer_t buf = ggml_backend_alloc_ctx_tensors(ctx, backend); + + // randomize tensors + initialize_tensors(ctx); + + // build graph + ggml_cgraph * gf = ggml_new_graph_custom(ctx, graph_nodes, false); + ggml_build_forward_expand(gf, out); + + // warmup run + ggml_backend_graph_compute(backend, gf); + + // duplicate the op + size_t target_size = ggml_backend_is_cpu(backend) ? 1ULL << 33 : 1ULL << 35; // 8 GB CPU, 32 GB GPU + int n_runs = std::min((size_t)gf->size - gf->n_nodes, target_size / op_size(out)) + 1; + for (int i = 1; i < n_runs; i++) { + gf->nodes[gf->n_nodes++] = out; + } + + // calculate memory + size_t mem = n_runs * op_size(out); + auto tensor_op_size = [](ggml_tensor * t) { + size_t size = ggml_nbytes(t); + // add source tensors + for (int i = 0; i < GGML_MAX_SRC; i++) { + if (t->src[i] != NULL) { + size += ggml_nbytes(t->src[i]); + } + } + return size; + }; + for (int i = 0; i < gf->n_nodes; i++) { + if (ggml_is_view_op(gf->nodes[i]->op) || gf->nodes[i] == out) { + continue; + } + mem += tensor_op_size(gf->nodes[i]); + } + + // run + ggml_backend_synchronize(backend); + + int64_t start_time = ggml_time_us(); + ggml_backend_graph_compute(backend, gf); + ggml_backend_synchronize(backend); + int64_t end_time = ggml_time_us(); + double time_us = end_time - start_time; + + printf(" %5d runs - %8.2f us/run - %8zu kB/run - \033[1;34m%7.2f GB/s\033[0m\n", + n_runs, + time_us / n_runs, + op_size(out) / 1024, + mem / (time_us/1e6) / 1024.0 / 1024.0 / 1024.0); + + ggml_backend_buffer_free(buf); + + ggml_free(ctx); + + return true; + } +}; + +// GGML_OP_UNARY +struct test_unary : public test_case { + const ggml_unary_op op; + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_unary(ggml_unary_op op, + ggml_type type = GGML_TYPE_F32, + std::array ne = {128, 10, 10, 10}) + : op(op), type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * in = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_unary(ctx, in, op); + return out; + } +}; + +// GGML_OP_GET_ROWS +struct test_get_rows : public test_case { + const ggml_type type; + const int n; // cols + const int m; // rows + const int r; // rows to get + const int b; // batch size + const bool v; // view (non-contiguous src1) + + std::string vars() override { + return VARS_TO_STR6(type, n, m, r, b, v); + } + + test_get_rows(ggml_type type = GGML_TYPE_F32, int n = 10, int m = 5, int r = 3, int b = 1, bool v = false) + : type(type), n(n), m(m), r(r), b(b), v(v) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * in = ggml_new_tensor_3d(ctx, type, n, m, b); + ggml_tensor * rows = ggml_new_tensor_2d(ctx, GGML_TYPE_I32, r, b); + if (v) { + rows = ggml_view_2d(ctx, rows, r/2, b, rows->nb[1], 0); + } + ggml_tensor * out = ggml_get_rows(ctx, in, rows); + return out; + } + + void initialize_tensors(ggml_context * ctx) override { + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->type == GGML_TYPE_I32) { + if (ggml_is_view_op(t->op)) { continue; } + // rows + std::vector data(r*b); + for (int i = 0; i < r*b; i++) { + data[i] = rand() % m; + } + ggml_backend_tensor_set(t, data.data(), 0, r * b * sizeof(int)); + } else { + init_tensor_uniform(t); + } + } + } +}; + +// GGML_OP_REPEAT +struct test_repeat : public test_case { + const ggml_type type; + const std::array ne; + const std::array nr; + + std::string vars() override { + return VARS_TO_STR3(type, ne, nr); + } + + size_t op_size(ggml_tensor * t) override { + return ggml_nbytes(t) * 2; + } + + test_repeat(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}, + std::array nr = {2, 2, 2, 2}) + : type(type), ne(ne), nr(nr) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * target = ggml_new_tensor_4d(ctx, type, ne[0]*nr[0], ne[1]*nr[1], ne[2]*nr[2], ne[3]*nr[3]); + ggml_tensor * src = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_repeat(ctx, src, target); + return out; + } +}; + +// GGML_OP_DUP +struct test_dup : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_dup(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 1}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * src = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_dup(ctx, src); + return out; + } +}; + +// GGML_OP_CPY +struct test_cpy : public test_case { + const ggml_type type_src; + const ggml_type type_dst; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR3(type_src, type_dst, ne); + } + + size_t op_size(ggml_tensor * t) override { + return ggml_nbytes(t) + ggml_nbytes(t->src[0]); + } + + test_cpy(ggml_type type_src = GGML_TYPE_F32, ggml_type type_dst = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 1}) + : type_src(type_src), type_dst(type_dst), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * src = ggml_new_tensor(ctx, type_src, 4, ne.data()); + ggml_tensor * dst = ggml_new_tensor(ctx, type_dst, 4, ne.data()); + ggml_tensor * out = ggml_cpy(ctx, src, dst); + return out; + } +}; + +// GGML_OP_CONT +struct test_cont : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_cont(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 1}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * src = ggml_new_tensor(ctx, type, 4, ne.data()); + src = ggml_transpose(ctx, src); + ggml_tensor * out = ggml_cont(ctx, src); + + return out; + } +}; + +// GGML_OP_ADD +// GGML_OP_MUL +// GGML_OP_DIV +struct test_bin_bcast : public test_case { + using op_t = ggml_tensor * (*) (ggml_context *, ggml_tensor *, ggml_tensor *); + op_t op; + const ggml_type type; + const std::array ne; + const std::array nr; + + std::string vars() override { + return VARS_TO_STR3(type, ne, nr); + } + + size_t op_size(ggml_tensor * t) override { + return ggml_nbytes(t) * 3; + } + + test_bin_bcast(op_t op, ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 1, 1}, + std::array nr = {1, 2, 1, 1}) + : op(op), type(type), ne(ne), nr(nr) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor_4d(ctx, type, ne[0]*nr[0], ne[1]*nr[1], ne[2]*nr[2], ne[3]*nr[3]); + ggml_tensor * b = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = op(ctx, a, b); + return out; + } + + void initialize_tensors(ggml_context * ctx) override { + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (op == ggml_div) { + // avoid division by zero + init_tensor_uniform(t, 1.0f, 2.0f); + } else { + init_tensor_uniform(t); + } + } + } +}; + +// GGML_OP_SCALE +struct test_scale : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_scale(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * scale = ggml_new_tensor_1d(ctx, type, 1); + ggml_tensor * out = ggml_scale(ctx, a, scale); + return out; + } +}; + +// GGML_OP_NORM +struct test_norm : public test_case { + const ggml_type type; + const std::array ne; + float eps; + + std::string vars() override { + return VARS_TO_STR3(type, ne, eps); + } + + test_norm(ggml_type type = GGML_TYPE_F32, + std::array ne = {64, 10, 10, 10}, + float eps = 1e-6f) + : type(type), ne(ne), eps(eps) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_norm(ctx, a, eps); + return out; + } +}; + +// GGML_OP_RMS_NORM +struct test_rms_norm : public test_case { + const ggml_type type; + const std::array ne; + float eps; + + std::string vars() override { + return VARS_TO_STR3(type, ne, eps); + } + + test_rms_norm(ggml_type type = GGML_TYPE_F32, + std::array ne = {64, 10, 10, 10}, + float eps = 1e-6f) + : type(type), ne(ne), eps(eps) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_rms_norm(ctx, a, eps); + return out; + } +}; + +// GGML_OP_MUL_MAT +struct test_mul_mat : public test_case { + const ggml_type type_a; + const ggml_type type_b; + const int64_t m; + const int64_t n; + const int64_t k; + const std::array bs; // dims 3 and 4 + const std::array nr; // repeat in dims 3 and 4 + + std::string vars() override { + return VARS_TO_STR7(type_a, type_b, m, n, k, bs, nr); + } + + double max_nmse_err() override { + return 5e-4; + } + + size_t op_size(ggml_tensor * t) override { + size_t a = ggml_nbytes(t->src[0]) * n * nr[0] * nr[1]; + size_t b = ggml_nbytes(t->src[1]) * m; + size_t c = ggml_nbytes(t); + return a + b + c; + + GGML_UNUSED(t); + } + + test_mul_mat(ggml_type type_a = GGML_TYPE_F32, ggml_type type_b = GGML_TYPE_F32, + int64_t m = 32, int64_t n = 32, int64_t k = 32, + std::array bs = {10, 10}, + std::array nr = {2, 2}) + : type_a(type_a), type_b(type_b), m(m), n(n), k(k), bs(bs), nr(nr) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + // C^T = A * B^T: (k, m) * (k, n) => (m, n) + ggml_tensor * a = ggml_new_tensor_4d(ctx, type_a, k, m, bs[0] , bs[1]); + ggml_tensor * b = ggml_new_tensor_4d(ctx, type_b, k, n, bs[0]*nr[0], bs[1]*nr[1]); + ggml_tensor * out = ggml_mul_mat(ctx, a, b); + return out; + } +}; + +// GGML_OP_MUL_MAT_ID +struct test_mul_mat_id : public test_case { + const ggml_type type_a; + const ggml_type type_b; + const int n_mats; + const int id; + const int64_t m; + const int64_t n; + const int64_t k; + const bool v; // view (non-contiguous ids) + + std::string vars() override { + return VARS_TO_STR8(type_a, type_b, n_mats, id, m, n, k, v); + } + + double max_nmse_err() override { + return 5e-4; + } + + size_t op_size(ggml_tensor * t) override { + size_t a = ggml_nbytes(t->src[2]) * n; + size_t b = ggml_nbytes(t->src[1]) * m; + size_t c = ggml_nbytes(t); + return a + b + c; + + GGML_UNUSED(t); + } + + test_mul_mat_id(ggml_type type_a = GGML_TYPE_F32, ggml_type type_b = GGML_TYPE_F32, + int n_mats = 2, int id = 0, + int64_t m = 32, int64_t n = 32, int64_t k = 32, bool v = false) + : type_a(type_a), type_b(type_b), n_mats(n_mats), id(id), + m(m), n(n), k(k), v(v) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + // C^T = A * B^T: (k, m) * (k, n) => (m, n) + std::vector mats; + for (int i = 0; i < n_mats; i++) { + ggml_tensor * a = ggml_new_tensor_2d(ctx, type_a, k, m); + mats.push_back(a); + } + ggml_tensor * ids = ggml_new_tensor_2d(ctx, GGML_TYPE_I32, n_mats, n); + if (v) { + ids = ggml_view_2d(ctx, ids, n_mats/2, ids->ne[1], ids->nb[1], 0); + } + ggml_tensor * b = ggml_new_tensor_2d(ctx, type_b, k, n); + ggml_tensor * out = ggml_mul_mat_id(ctx, mats.data(), n_mats, ids, v ? id/2 : id, b); + return out; + } + + void initialize_tensors(ggml_context * ctx) override { + std::random_device rd; + std::default_random_engine rng(rd()); + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->type == GGML_TYPE_I32) { + if (ggml_is_view_op(t->op)) { continue; } + // ids + for (int64_t r = 0; r < ggml_nrows(t); r++) { + std::vector data(t->ne[0]); + for (int i = 0; i < t->ne[0]; i++) { + data[i] = i % n_mats; + } + std::shuffle(data.begin(), data.end(), rng); + ggml_backend_tensor_set(t, data.data(), r * t->nb[1], t->ne[0] * sizeof(int32_t)); + } + } else { + init_tensor_uniform(t); + } + } + } +}; + +// GGML_OP_SQR +struct test_sqr : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_sqr(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_sqr(ctx, a); + return out; + } +}; + +// GGML_OP_CLAMP +struct test_clamp : public test_case { + const ggml_type type; + const std::array ne; + float min; + float max; + + std::string vars() override { + return VARS_TO_STR4(type, ne, min, max); + } + + test_clamp(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}, + float min = -0.5f, float max = 0.5f) + : type(type), ne(ne), min(min), max(max) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_clamp(ctx, a, min, max); + return out; + } +}; + +// GGML_OP_DIAG_MASK_INF +struct test_diag_mask_inf : public test_case { + const ggml_type type; + const std::array ne; + const int n_past; + + std::string vars() override { + return VARS_TO_STR3(type, ne, n_past); + } + + test_diag_mask_inf(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}, + int n_past = 5) + : type(type), ne(ne), n_past(n_past) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_diag_mask_inf(ctx, a, n_past); + return out; + } +}; + +// GGML_OP_SOFT_MAX +struct test_soft_max : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_soft_max(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_soft_max(ctx, a); + return out; + } +}; + +// GGML_OP_ROPE +struct test_rope : public test_case { + const ggml_type type; + const std::array ne; + int n_dims; + int mode; + int n_ctx; + + std::string vars() override { + return VARS_TO_STR5(type, ne, n_dims, mode, n_ctx); + } + + test_rope(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 1}, + int n_dims = 10, int mode = 0, int n_ctx = 512) + : type(type), ne(ne), n_dims(n_dims), mode(mode), n_ctx(n_ctx) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * pos = ggml_new_tensor_1d(ctx, GGML_TYPE_I32, ne[2]); + ggml_tensor * out = ggml_rope(ctx, a, pos, n_dims, mode, n_ctx); + return out; + } + + void initialize_tensors(ggml_context * ctx) override { + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->type == GGML_TYPE_I32) { + // pos + std::vector data(ne[2]); + for (int i = 0; i < ne[2]; i++) { + data[i] = rand() % n_ctx; + } + ggml_backend_tensor_set(t, data.data(), 0, ne[2] * sizeof(int)); + } else { + init_tensor_uniform(t); + } + } + } +}; + +// GGML_OP_ALIBI +struct test_alibi : public test_case { + const ggml_type type; + const std::array ne; + int n_past; + int n_head; + float bias_max; + + std::string vars() override { + return VARS_TO_STR5(type, ne, n_past, n_head, bias_max); + } + + test_alibi(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}, + int n_past = 512, int n_head = 10, float bias_max = 0.5f) + : type(type), ne(ne), n_past(n_past), n_head(n_head), bias_max(bias_max) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_alibi(ctx, a, n_past, n_head, bias_max); + return out; + } +}; + +// GGML_OP_IM2COL +struct test_im2col : public test_case { + const ggml_type type_input; + const ggml_type type_kernel; + const std::array ne_input; + const std::array ne_kernel; + // stride + const int s0; + const int s1; + // padding + const int p0; + const int p1; + // dilatation + const int d0; + const int d1; + // mode + const bool is_2D; + + std::string vars() override { + return VARS_TO_STR11(type_input, type_kernel, ne_input, ne_kernel, s0, s1, p0, p1, d0, d1, is_2D); + } + + test_im2col(ggml_type type_input = GGML_TYPE_F32, ggml_type type_kernel = GGML_TYPE_F16, + std::array ne_input = {10, 10, 3, 1}, // [input_width, input_height, input_channels, 1] + std::array ne_kernel = {3, 3, 3, 1}, // [kernel_width, kernel_height, input_channels, 1] + int s0 = 1, int s1 = 1, + int p0 = 1, int p1 = 1, + int d0 = 1, int d1 = 1, + bool is_2D = true) + : type_input(type_input), type_kernel(type_kernel), ne_input(ne_input), ne_kernel(ne_kernel), s0(s0), s1(s1), p0(p0), p1(p1), d0(d0), d1(d1), is_2D(is_2D) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * input = ggml_new_tensor(ctx, type_input, 4, ne_input.data()); + ggml_tensor * kernel = ggml_new_tensor(ctx, type_kernel, 4, ne_kernel.data()); + ggml_tensor * out = ggml_im2col(ctx, kernel, input, s0, s1, p0, p1, d0, d1, is_2D); + return out; + } +}; + +// GGML_OP_CONCAT +struct test_concat : public test_case { + const ggml_type type; + const std::array ne; + const int64_t b_ne2; + + std::string vars() override { + return VARS_TO_STR3(type, ne, b_ne2); + } + + test_concat(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}, + int64_t b_ne2 = 10) + : type(type), ne(ne), b_ne2(b_ne2) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * b = ggml_new_tensor_4d(ctx, type, ne[0], ne[1], b_ne2, ne[3]); + ggml_tensor * out = ggml_concat(ctx, a, b); + return out; + } +}; + +// GGML_OP_ARGSORT +struct test_argsort : public test_case { + const ggml_type type; + const std::array ne; + ggml_sort_order order; + + std::string vars() override { + return VARS_TO_STR3(type, ne, order); + } + + test_argsort(ggml_type type = GGML_TYPE_F32, + std::array ne = {16, 10, 10, 10}, + ggml_sort_order order = GGML_SORT_ASC) + : type(type), ne(ne), order(order) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_argsort(ctx, a, order); + return out; + } + + void initialize_tensors(ggml_context * ctx) override { + std::random_device rd; + std::default_random_engine rng(rd()); + for (ggml_tensor * t = ggml_get_first_tensor(ctx); t != NULL; t = ggml_get_next_tensor(ctx, t)) { + if (t->type == GGML_TYPE_I32) { + // indices + std::vector data(ggml_nelements(t)); + for (int i = 0; i < ggml_nelements(t); i++) { + data[i] = rand(); + } + std::shuffle(data.begin(), data.end(), rng); + ggml_backend_tensor_set(t, data.data(), 0, ne[0]*ne[1]*ne[2]*ne[3] * sizeof(int)); + } else if (t->type == GGML_TYPE_F32) { + // initialize with unique values to avoid ties + for (int64_t r = 0; r < ggml_nrows(t); r++) { + std::vector data(t->ne[0]); + for (int i = 0; i < t->ne[0]; i++) { + data[i] = i; + } + std::shuffle(data.begin(), data.end(), rng); + ggml_backend_tensor_set(t, data.data(), r * t->nb[1], t->ne[0] * sizeof(float)); + } + } else { + GGML_ASSERT(false); + } + } + } +}; + +// GGML_OP_SUM_ROWS +struct test_sum_rows : public test_case { + const ggml_type type; + const std::array ne; + + std::string vars() override { + return VARS_TO_STR2(type, ne); + } + + test_sum_rows(ggml_type type = GGML_TYPE_F32, + std::array ne = {10, 10, 10, 10}) + : type(type), ne(ne) {} + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * a = ggml_new_tensor(ctx, type, 4, ne.data()); + ggml_tensor * out = ggml_sum_rows(ctx, a); + return out; + } +}; + +// Mixtral MOE +struct test_moe : public test_case { + const int n_experts; + const int n_experts_per_tok; + const int n_tokens; + const int n_embd; + const int n_ff; + + std::string op_desc(ggml_tensor * t) override { + return "MOE"; + + GGML_UNUSED(t); + } + + std::string vars() override { + return VARS_TO_STR5(n_experts, n_experts_per_tok, n_tokens, n_embd, n_ff); + } + + test_moe(int n_experts = 8, int n_experts_per_tok = 2, int n_tokens = 1, int n_embd = 4096, int n_ff = 14336) + : n_experts(n_experts), n_experts_per_tok(n_experts_per_tok), n_tokens(n_tokens), n_embd(n_embd), n_ff(n_ff) { + } + + ggml_tensor * build_graph(ggml_context * ctx) override { + ggml_tensor * ffn_gate_inp = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, n_embd, n_experts); + + std::vector ffn_up_exp(n_experts); + std::vector ffn_gate_exp(n_experts); + std::vector ffn_down_exp(n_experts); + + for (int i = 0; i < n_experts; ++i) { + ffn_up_exp[i] = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, n_embd, n_ff); + ffn_gate_exp[i] = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, n_embd, n_ff); + ffn_down_exp[i] = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, n_ff, n_embd); + } + + ggml_tensor * cur = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, n_embd, n_tokens); + + ggml_tensor * logits = ggml_mul_mat(ctx, ffn_gate_inp, cur); + ggml_tensor * probs = ggml_soft_max_ext(ctx, logits, nullptr, 1.0f/sqrtf(n_embd)); + + // select experts + ggml_tensor * selected_experts = ggml_top_k(ctx, probs, n_experts_per_tok); + + ggml_tensor * weights = ggml_get_rows(ctx, + ggml_reshape_3d(ctx, probs, 1, n_experts, n_tokens), selected_experts); + + weights = ggml_reshape_2d(ctx, weights, n_experts_per_tok, n_tokens); + + ggml_tensor * weights_sum = ggml_sum_rows(ctx, weights); + + weights = ggml_div(ctx, weights, weights_sum); + + // compute expert outputs + ggml_tensor * moe_out = nullptr; + + for (int i = 0; i < n_experts_per_tok; ++i) { + ggml_tensor * cur_expert; + + ggml_tensor * cur_up = ggml_mul_mat_id(ctx, ffn_up_exp.data(), n_experts, selected_experts, i, cur); + + ggml_tensor * cur_gate = ggml_mul_mat_id(ctx, ffn_gate_exp.data(), n_experts, selected_experts, i, cur); + + cur_gate = ggml_silu(ctx, cur_gate); + + cur_expert = ggml_mul(ctx, cur_up, cur_gate); + + cur_expert = ggml_mul_mat_id(ctx, ffn_down_exp.data(), n_experts, selected_experts, i, cur_expert); + + cur_expert = ggml_mul(ctx, cur_expert, + ggml_view_2d(ctx, weights, 1, n_tokens, weights->nb[1], i*weights->nb[0])); + + if (i == 0) { + moe_out = cur_expert; + } else { + moe_out = ggml_add(ctx, moe_out, cur_expert); + } + } + + cur = moe_out; + + return cur; + } +}; + +enum test_mode { + MODE_TEST, + MODE_PERF, +}; + +static bool test_backend(ggml_backend_t backend, test_mode mode, const char * op_name) { + std::vector> test_cases; + + const ggml_type all_types[] = { + GGML_TYPE_F32, GGML_TYPE_F16, + GGML_TYPE_Q4_0, GGML_TYPE_Q4_1, + GGML_TYPE_Q5_0, GGML_TYPE_Q5_1, + GGML_TYPE_Q8_0, + GGML_TYPE_Q2_K, GGML_TYPE_Q3_K, + GGML_TYPE_Q4_K, GGML_TYPE_Q5_K, + GGML_TYPE_Q6_K + }; + + // unary ops + for (int op = 0; op < GGML_UNARY_OP_COUNT; op++) { + test_cases.emplace_back(new test_unary((ggml_unary_op) op)); + } + + test_cases.emplace_back(new test_get_rows(GGML_TYPE_F32, 1, 8, 2, 1, false)); + for (ggml_type type : all_types) { + for (int b : {1, 7}) { + for (bool v : {false, true}) { + test_cases.emplace_back(new test_get_rows(type, 256, 5, 4, b, v)); + } + } + } + + test_cases.emplace_back(new test_repeat(GGML_TYPE_F32, {10, 10, 10, 10}, {1, 1, 1, 1})); + test_cases.emplace_back(new test_repeat(GGML_TYPE_F32, {10, 10, 10, 10}, {2, 1, 1, 1})); + test_cases.emplace_back(new test_repeat(GGML_TYPE_F32, {10, 10, 10, 10}, {1, 2, 1, 1})); + test_cases.emplace_back(new test_repeat(GGML_TYPE_F32, {10, 10, 10, 10}, {1, 1, 2, 1})); + test_cases.emplace_back(new test_repeat(GGML_TYPE_F32, {10, 10, 10, 10}, {1, 1, 1, 2})); + + test_cases.emplace_back(new test_dup()); + + for (ggml_type type : all_types) { + test_cases.emplace_back(new test_cpy(GGML_TYPE_F32, type, {256, 10, 10, 1})); + } + + test_cases.emplace_back(new test_cont()); + + auto add_test_bin_bcast = [&](ggml_type type, std::array ne, std::array nr) { + for (auto op : {ggml_add, ggml_mul, ggml_div}) { + test_cases.emplace_back(new test_bin_bcast(op, type, ne, nr)); + } + }; + + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 8, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 1, 1}, {32, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 320, 320}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 1, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {2, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 2, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 1, 2, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 1, 1, 2}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 1, 2, 2}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {1, 2, 2, 2}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 10, 10, 10}, {2, 2, 2, 2}); + + // stable diffusion + add_test_bin_bcast(GGML_TYPE_F32, {1280, 1, 1, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1280, 1, 1, 1}, {1, 16, 16, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1280, 16, 16, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1280, 1, 1, 1}, {1, 256, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 1280, 1}, {16, 16, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {16, 16, 1280, 1}, {1, 1, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 1920, 1}, {16, 16, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 2560, 1}, {16, 16, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 1280, 1}, {32, 32, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 1920, 1}, {32, 32, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {1, 1, 640, 1}, {32, 32, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {5120, 1, 1, 1}, {1, 256, 1, 1}); + add_test_bin_bcast(GGML_TYPE_F32, {640, 1, 1, 1}, {1, 1, 1, 1}); + //add_test_bin_bcast(GGML_TYPE_F32, {3, 3, 2560, 1280}, {1, 1, 1, 1}); + //add_test_bin_bcast(GGML_TYPE_F32, {3, 3, 2560, 1280}, {2, 1, 1, 1}); + + test_cases.emplace_back(new test_scale()); + + for (float eps : {1e-6f, 1e-5f, 1e-3f, 1e-1f}) { + test_cases.emplace_back(new test_norm(GGML_TYPE_F32, {64, 10, 10, 10}, eps)); + test_cases.emplace_back(new test_rms_norm(GGML_TYPE_F32, {64, 10, 10, 10}, eps)); + } + + for (ggml_type type_a : all_types) { + for (ggml_type type_b : {GGML_TYPE_F32 /*, GGML_TYPE_F16 */}) { + // FIXME: CPU crashes on f16xf16 + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, { 1, 1}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 1}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 1}, {2, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 10}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 10}, {2, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 10}, {1, 2})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 1, 256, {10, 10}, {2, 2})); + + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, { 1, 1}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 1}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 1}, {2, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 10}, {1, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 10}, {2, 1})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 10}, {1, 2})); + test_cases.emplace_back(new test_mul_mat(type_a, type_b, 16, 16, 256, {10, 10}, {2, 2})); + } + } + + for (ggml_type type_a : all_types) { + for (ggml_type type_b : {GGML_TYPE_F32 /*, GGML_TYPE_F16 */}) { + for (int n_mats : {2, 4, 8}) { + for (int id = 0; id < n_mats; id++) { + for (bool v : {false, true}) { + test_cases.emplace_back(new test_mul_mat_id(type_a, type_b, n_mats, id, 16, 16, 256, v)); + } + } + } + } + } + + test_cases.emplace_back(new test_sqr()); + test_cases.emplace_back(new test_clamp()); + + test_cases.emplace_back(new test_diag_mask_inf(GGML_TYPE_F32, {10, 10, 1, 1}, 5)); + test_cases.emplace_back(new test_diag_mask_inf(GGML_TYPE_F32, {10, 10, 10, 1}, 5)); + test_cases.emplace_back(new test_diag_mask_inf(GGML_TYPE_F32, {10, 10, 10, 10}, 5)); + + test_cases.emplace_back(new test_soft_max()); + + for (ggml_type type : {GGML_TYPE_F32, GGML_TYPE_F16}) { + test_cases.emplace_back(new test_rope(type, {128, 32, 10, 1}, 128, 0, 512)); // llama 7B + test_cases.emplace_back(new test_rope(type, {128, 40, 10, 1}, 128, 0, 512)); // llama 13B + test_cases.emplace_back(new test_rope(type, {128, 52, 10, 1}, 128, 0, 512)); // llama 30B + test_cases.emplace_back(new test_rope(type, {128, 64, 10, 1}, 128, 0, 512)); // llama 65B + test_cases.emplace_back(new test_rope(type, { 64, 1, 10, 1}, 64, 2, 512)); // neox (falcon 7B) + test_cases.emplace_back(new test_rope(type, { 64, 71, 10, 1}, 64, 2, 512)); // neox (falcon 7B) + test_cases.emplace_back(new test_rope(type, { 64, 8, 10, 1}, 64, 2, 512)); // neox (falcon 40B) + test_cases.emplace_back(new test_rope(type, { 64, 128, 10, 1}, 64, 2, 512)); // neox (falcon 40B) + test_cases.emplace_back(new test_rope(type, { 80, 32, 10, 1}, 20, 2, 512)); // neox (stablelm) + } + + test_cases.emplace_back(new test_alibi()); + test_cases.emplace_back(new test_im2col()); + test_cases.emplace_back(new test_concat()); + + for (ggml_sort_order order : {GGML_SORT_ASC, GGML_SORT_DESC}) { + test_cases.emplace_back(new test_argsort(GGML_TYPE_F32, {8, 1, 1, 1}, order)); + test_cases.emplace_back(new test_argsort(GGML_TYPE_F32, {16, 10, 10, 10}, order)); + } + + test_cases.emplace_back(new test_sum_rows(GGML_TYPE_F32, {10, 10, 10, 10})); + test_cases.emplace_back(new test_sum_rows(GGML_TYPE_F32, {2, 1, 1, 1})); + +#if !defined(__SANITIZE_THREAD__) + // FIXME: these tests use too much memory with thread sanitizer + test_cases.emplace_back(new test_moe(8, 2, 1, 4096, 14336)); + //test_cases.emplace_back(new test_moe(8, 2, 8, 4096, 14336)); +#endif + + // run tests + if (mode == MODE_TEST) { + ggml_backend_t backend_cpu = ggml_backend_cpu_init(); + + size_t n_ok = 0; + for (auto & test : test_cases) { + if (test->eval(backend, backend_cpu, op_name)) { + n_ok++; + } + } + printf(" %zu/%zu tests passed\n", n_ok, test_cases.size()); + + ggml_backend_free(backend_cpu); + + return n_ok == test_cases.size(); + } + + if (mode == MODE_PERF) { + for (auto & test : test_cases) { + test->eval_perf(backend, op_name); + } + return true; + } + + GGML_ASSERT(false); + return false; +} + +static void usage(char ** argv) { + printf("Usage: %s [mode] [-o op] [-b backend]\n", argv[0]); + printf(" valid modes are: test (compare with CPU backend for correctness) or perf (performance evaluation)\n"); + printf(" op names are as given by ggml_op_desc()\n"); +} + +int main(int argc, char ** argv) { + test_mode mode = MODE_TEST; + const char * op_name = NULL; + const char * backend = NULL; + + for (int i = 1; i < argc; i++) { + if (strcmp(argv[i], "test") == 0) { + mode = MODE_TEST; + } else if (strcmp(argv[i], "perf") == 0) { + mode = MODE_PERF; + } else if (strcmp(argv[i], "-o") == 0) { + if (i + 1 < argc) { + op_name = argv[++i]; + } else { + usage(argv); + return 1; + } + } else if (strcmp(argv[i], "-b") == 0) { + if (i + 1 < argc) { + backend = argv[++i]; + } else { + usage(argv); + return 1; + } + } else { + usage(argv); + return 1; + } + } + + // enumerate backends + printf("Testing %zu backends\n\n", ggml_backend_reg_get_count()); + + size_t n_ok = 0; + + for (size_t i = 0; i < ggml_backend_reg_get_count(); i++) { + printf("Backend %zu/%zu (%s)\n", i + 1, ggml_backend_reg_get_count(), ggml_backend_reg_get_name(i)); + + if (backend != NULL && strcmp(backend, ggml_backend_reg_get_name(i)) != 0) { + printf(" Skipping\n"); + n_ok++; + continue; + } + + ggml_backend_t backend = ggml_backend_reg_init_backend(i, NULL); + GGML_ASSERT(backend != NULL); + printf(" Backend name: %s\n", ggml_backend_name(backend)); + + bool ok = test_backend(backend, mode, op_name); + + printf(" Backend %s: ", ggml_backend_name(backend)); + if (ok) { + printf("\033[1;32mOK\033[0m\n"); + n_ok++; + } else { + printf("\033[1;31mFAIL\033[0m\n"); + } + + printf("\n"); + + ggml_backend_free(backend); + } + + printf("%zu/%zu backends passed\n", n_ok, ggml_backend_reg_get_count()); + + if (n_ok != ggml_backend_reg_get_count()) { + printf("\033[1;31mFAIL\033[0m\n"); + return 1; + } + + printf("\033[1;32mOK\033[0m\n"); + return 0; +} diff --git a/tests/test-grad0.cpp b/tests/test-grad0.cpp index 7fe9154ddbb16..81c20a89cb586 100644 --- a/tests/test-grad0.cpp +++ b/tests/test-grad0.cpp @@ -1,4 +1,4 @@ -#define _CRT_SECURE_NO_DEPRECATE // Disables ridiculous "unsafe" warnigns on Windows +#define _CRT_SECURE_NO_DEPRECATE // Disables ridiculous "unsafe" warnings on Windows #include "ggml.h" #include diff --git a/tests/test-quantize-perf.cpp b/tests/test-quantize-perf.cpp index 88fac0e23106b..62d0190f9066c 100644 --- a/tests/test-quantize-perf.cpp +++ b/tests/test-quantize-perf.cpp @@ -117,7 +117,7 @@ static void usage(char * argv[]) { printf(" --size SIZE set test size, divisible by 32 (L1_SIZE:%d)\n", L1_SIZE); printf(" -3 use size as L1, L2, L3 sizes (L1:%d L2:%d L3:%d)\n", L1_SIZE, L2_SIZE, L3_SIZE); printf(" -4 use size as L1, L2, L3, MEM sizes (L1:%d L2:%d L3:%d MEM:%d)\n", L1_SIZE, L2_SIZE, L3_SIZE, MEM_SIZE); - printf(" --op OP set test opration as quantize_row_q_reference, quantize_row_q, dequantize_row_q,\n"); + printf(" --op OP set test operation as quantize_row_q_reference, quantize_row_q, dequantize_row_q,\n"); printf(" quantize_row_q_dot, vec_dot_q (all)\n"); printf(" --type TYPE set test type as"); for (int i = 0; i < GGML_TYPE_COUNT; i++) { @@ -202,7 +202,7 @@ int main(int argc, char * argv[]) { } int alignment = std::stoi(argv[i]); if (alignment < 0 || alignment > MAX_ALIGNMENT) { - fprintf(stderr, "error: aligment-offset must be less than %d\n", MAX_ALIGNMENT); + fprintf(stderr, "error: alignment-offset must be less than %d\n", MAX_ALIGNMENT); invalid_param = true; break; } diff --git a/tests/test-tokenizer-0-falcon.py b/tests/test-tokenizer-0-falcon.py index cf65a3f65d72c..4f06ec9bbba5b 100644 --- a/tests/test-tokenizer-0-falcon.py +++ b/tests/test-tokenizer-0-falcon.py @@ -1,7 +1,5 @@ # tests with BPE tokenizer -import os -import sys import argparse from transformers import AutoTokenizer @@ -16,34 +14,34 @@ tokenizer = AutoTokenizer.from_pretrained(dir_tokenizer) tests = [ - "", - " ", - " ", - " ", - "\t", - "\n", - "\t\n", - "Hello world", - " Hello world", - "Hello World", - " Hello World", - " Hello World!", - "Hello, world!", - " Hello, world!", - " this is πŸ¦™.cpp", - "w048 7tuijk dsdfhu", - "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ", - "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰", - "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)", - "Hello", - " Hello", - " Hello", - " Hello", - " Hello", - " Hello\n Hello", - "\n =", - "' era", - ] + "", + " ", + " ", + " ", + "\t", + "\n", + "\t\n", + "Hello world", + " Hello world", + "Hello World", + " Hello World", + " Hello World!", + "Hello, world!", + " Hello, world!", + " this is πŸ¦™.cpp", + "w048 7tuijk dsdfhu", + "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ", + "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰", + "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)", + "Hello", + " Hello", + " Hello", + " Hello", + " Hello", + " Hello\n Hello", + "\n =", + "' era", +] for text in tests: print('text: ', text) diff --git a/tests/test-tokenizer-0-llama.py b/tests/test-tokenizer-0-llama.py index 078f680b165ca..f3d4d7e3da76e 100644 --- a/tests/test-tokenizer-0-llama.py +++ b/tests/test-tokenizer-0-llama.py @@ -1,7 +1,5 @@ # tests with SPM tokenizer -import os -import sys import argparse from sentencepiece import SentencePieceProcessor @@ -16,32 +14,32 @@ tokenizer = SentencePieceProcessor(dir_tokenizer + '/tokenizer.model') tests = [ - "", - " ", - " ", - " ", - "\t", - "\n", - "\t\n", - "Hello world", - " Hello world", - "Hello World", - " Hello World", - " Hello World!", - "Hello, world!", - " Hello, world!", - " this is πŸ¦™.cpp", - "w048 7tuijk dsdfhu", - "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ", - "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰", - "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)", - "Hello", - " Hello", - " Hello", - " Hello", - " Hello", - " Hello\n Hello", - ] + "", + " ", + " ", + " ", + "\t", + "\n", + "\t\n", + "Hello world", + " Hello world", + "Hello World", + " Hello World", + " Hello World!", + "Hello, world!", + " Hello, world!", + " this is πŸ¦™.cpp", + "w048 7tuijk dsdfhu", + "Π½Π΅Ρ‰ΠΎ Π½Π° Π‘ΡŠΠ»Π³Π°Ρ€ΡΠΊΠΈ", + "αž€αžΆαž“αŸ‹αžαŸ‚αž–αž·αžŸαŸαžŸαž’αžΆαž…αžαž›αž…αŸαž‰", + "πŸš€ (normal) πŸ˜Άβ€πŸŒ«οΈ (multiple emojis concatenated) βœ… (only emoji that has its own token)", + "Hello", + " Hello", + " Hello", + " Hello", + " Hello", + " Hello\n Hello", +] for text in tests:

dP5b2Scj^{|f?5Y^AfmyS&Fpl;mOHZxgeX=_I#zR!XXrhW>&Z908T7!&GdJnMr zegGI(cT3+Ee{1*i-nzYHd=ta`H&v;;palpAvG=wkcy`4iXvl)c+ah3=jQ*0+6RPA5 z@(OuX_Uo4*oo!5Ap#X!&_I;d49Ixh93Cjo89W`yA^qc25r;bHF?<%zeTB8GO22?uF zESC6V%Z%DHD!?ywzf8QY`R$<24fR*qjWX?Y7aDcWVmsq_%G{K7#Q!-oxRPwq2=AYeMdkmRl@Lg`P0Av`qwcb;(d@p3s6+O=N9?O#O`4DJ>(ro9p}$FCea61 z=hX;d#GY5>e!U~#5t}rvTu#)8DcrBOw2^1}ItG_%c-wGDlrcy<{G{TPQ||P}R%d`3 z@qv};xMC>drfg0>N#T>Dj4Oaj>FY*Zj}5}tOdb|6%a>3>A)X!IncHo5^KBml`TK6s z%n*w^TEYc6HjJOO@{Q3-F%|7Is)96&UR@six| z@qlcC`Ft6c6+(Ia ze4?+U%vV46nZ{x`!y@!Xw+5J7e6+Ph6fJcZ2Eu$sWt(chV*WRWu>9N>H=Kowa!KQX zoJ2Q0@-&{eNwCE=Cu}rRb*Z9My@_MWrsIaCdW~12Xx#3!Uu>+2lcD8dSq@x^YR|`p zCAq^12q~a1grjfv7n^&@GC%+Pl!FQAqz_&r_E0R2wogIMI3PJ07Vs*Qo@U@1dAhe= z^C7*3DuPFQm{T1u>g8vOXD+`iS-E9&hW6_C?A6Zj>zWhxnLqHw*KXBddZxA7isIN1 zD?e;#dMG&lPFi6_!i+A#v5v33u5%g%^nV%R$g}U_V08^3kQDR2O2nE|wsNe}jKP)_ z9>DGA`OgwiWsma<|1;xc1uYgC@ge<=upy!h4Haz{mq-4VrUmB>0@{#L%n=v8Z2T`j zrB6E-Qcs#%(?HQUr-Zw1utLBic%N;wvo3N-@hLuOS_?aAEVQR}*x~Y0_(_5Cj_&is@z&Q1D{Uu4*qq~zkpO10{^TVgE zmus!~^AGpbSL+0JEwgE6r5_wr*7O6+j_I?CO@iZfF=?TgoL z#FdB>0j&`qkYC3Q^jP5QszRCVv-7BD=1nz~X2zX5Qy$feNj4yQMU9BMUJ?T;p5B2S zSTSN`NCbC5@U`@IcaFkbvbv6NPP#oUn2NJc=rOA{E8jsQan3t$Us#asDn;uo-)xi+ z2*Ttr_P2`|{klqLy1+p|MbYp(MKXI@_b#O+nhNr0QxH*q@q=>@s+F6x5t8K$%!Pf~ zY$&hGwQ*#e_2?<}8n!@NA{)F{``+iPR4;=c`dLZfbNY@Nd(dt$j0)F|--J@%cA}>^ zU=rknpCfVjKI834I3^AC(;rRu3Wrs0D=@Et9zfp)YMqS9Ky)c7s{u!j25CKuD@&>t z57bZkg zer|(T74~WwiPxy&bN|7&I3a@&Wx4tJ}&IRNeI8481$s;Iy?JfI*ulRX8Y}lUyl7fSRc$ z^ENhN3-7=#a3P*cr9>c8nuFPHcUGKDS z%vAr>TZppEp8qo}(|%?+&FuCg=Q7u*L*UOJ&1F)Ii5o-y0q zuN>8tUkR?G?P2(;S&QL3<9td;5(FnhRF<_Bi>=M2PGJ zN@zUS0~p&%cl${6GnadyU$geP8(2PmBwKbSrrdJ%B7@rBL^@p_@h{HFe5v)W;ocpI zX~rqa-3<{AHMRp$WYXPl^LFU{#tul}#GH#}#DC|Vhn?W6pU!n4&H#6ci5YszA{eFW zvvO9Gb@bgbor`GHTcw{qC6SI^~ND3j~{<_TU^ZObkp z9ksQc);-X{vZCPjmQ$N>ugpJCDv;I-G}*zg^W=V}1}!es%14)%OJLvFTH0|?GNG5i z;FLN^OTARz3fzesx!Hm`4tU2AbFCFTSItz~_3l6GJgvXm2VM2k<5Rz*Kdh3C(Nt43G)utcK(Ml~GR*%E zm0K1=HHkLiyPLRdXxS~3p^94m_!i*DWd8mUr+fS zQyIa=k=Uk){Ym$T6oi>`d8jgt&Wc%>Zsn)A*0>cz!W zWFXe=wM3l0_snrqSC1qx-$kKHNl1w*oQO!(w5b zYuLis8hS(Ng;l=u>QUeo(M1PJ0~yW@1}%n*s95O(US!KH%pswAM*L{=Ijp5m-c%R| z!sCMnZFiHMI&w$huOA#>mQ9{^7Nz!YZ!V#O^Eu({U*8+vhN6W_r7*dW3GRyTCTc0P z`kqWgZWIXVGaTXd#?8<7IthxvMrkm&*?peCC0T0rG4zsgdF(t~fJ>w?ac`($V^fSx zBvs7NZd0j#Llsp)>o9xTvU91ARV+n2KqAw;RB-V8_zPm7++k^T@gNTC$iOwLnUv-p ze;&?O(M`!bJ={|~_{dfsE1kAVk-Ec03P;gI#q}0xrOX0j_NU9_GWh~v#;_V_p^wX#WOnO5;+mb(65Wjz-pnd7x=8P@vi8wr z4u!xBe|(G%YaJageVKZd+JZX^;AphP@*o98dE+G4>#K=Ehl4D<5RKXt-6fdulQt|mD!SqD!^Yvtp$ssH3!ag!dh6_6f*>V4Skly@Ey`Nyd8(d6t zx1JWX{wF&PBFC>9<~M&r;xn1sM+fV)I*3ql4_{*w7@etEc0(_$6dJICOoezC*1}7+XzI(lwT!{Kad$?Kr+?-~8a8nDu-jU;@!9y4g{N^KBgoble8p3X`MpiOCZT*dskxvEp!iDmDA zk{(XHTX4^`QnY_Gx1?>`X>syEM)X}`q|ZM;WwCpX!`-e|5p6mr+ZEHd7^U@RAx| zrC*}A)Yo-1wly~~Im2sda44R35;$Zi7JS{eC-9ea*=}xDok{`846Z1v_TY?&+jE|W z+~eI=b~VqA$lxx0(qf zW!Om>-==z9#SJ{0@0y%Y;36-RHDEwaxFnK74< zFL&anoX3^C_xrGLtu2cK9Y%t(x+0D~W#jr+O0~4K@R`zR>bBwT16BE}ls}0;n3)cI zGftHZdxbx^`<_S`NFONj78ujIjQv$n6%FnXWc)4j1Obx!rzqgTTXG2DfdssHv4g^v>M+LtFo_8w^N1guI3za zH+ASc{h5_RW8mZ;z;AP}l|BhQ#M5ANaI4|str4aL zJH)tzKynist`W)TduL`*hF}X>a&5#&pXozQ)vZW<=_Io#_d?#|i)~>R3E$_;vG^-1 zN)I4HYr_yXVyaKq_6N`Z7ST96l3~`(yUz&`HYN`c9iyTfy6APryVdK3vr$ zAk%IBZ}6LSab!9e9MMf(sI*UdPZ!`cFO{wC#XXSnhqg)Idl>2$fdkEM7j!Uk+%?$3 zp-D1{^VROn+dCcJCU2ABR3O8xY$qHdxn>A~Gw2>Fep+*j^v&_82>+f49GvGawpWdT zfoI<6Gtmr)1DQl(3(c(;ZqLfbDAdC*w>8}{_;<`e5Kl$gbgmc_+8EzpmYB}-BsqcE zPhkC3duHc59jZ~5{yvNL>HPw@lVqdoq)P<*^c9Al%i!>%zQ7DpO$T#z_Kr|uxNDHY4xdsp@-xY$94@|BXa z$X4QTbBl!~gY=2*1gt0Ry zco66_B)L-m{29k|=sbW>agBK@KW1Q0iTjilB({F-iwPj|@kcYU_VI3Q$vQf|O*o{E zYvLG~+RWk;MC|(&h`mCVy8K5qjlN;q6sh}$a#W<0Jy0tKo{$@ir_Q=uhMx`RC-191 zrXik@?(+%`Dz`4Hs$p(}tk6wz5t7j<&UI-%0zqHwQ~2ng;~}MM;Si|$M!koXnpA{@ zp_CNe3Ahr8p%hW=2+4$fJie!VjpnA^bz_o|j?jE;mlxko_Z8DH0-nch*PcNx;l1-; z+R)UH6Z$p*Xuk3Zq5S=Xlw+l&*Oaz{?f$U0brDL$($4#>*@DIab5#;$0qw5AzW&zh zG#87lT-NEt6og^T6jqJI&jHNa*FQd4k4BbJ$i=3PI>raN_yfjeB$UAK=TaMsL5{ci zsB+HSgC)v*HKyodFi-KpGEw=;TF(sl_J;`y0wv;!{jdcsE8fN5)ke3R%_tXXMWN+yqAjs zP6Iz&L`Xy;(38%b>H#=i!}C6tgVrSBo6O}4OQFMiY`Gl)Mh-O@)N5>!a>O-x$Ez{p z`_U<(K*j}DZd9)NI8IrzvL5lW!Rn+3I)6CQA(MMY(k3oY$N_2 zi~Nm5YwD&Q1}IWXocY6h!JYVrh_8y*_^+O&7KHit)}@~easWRHgi8)Cn{4hJ@t$SQ zp+97U$%7nS*UA29J>yH>YY!>nycPK)QM@>BP@6l8lYQzy_L!ck#HBK+SP(gg3hdJo znGY%g6Vo}n^V^whtS3d?>sDAihxiI!>ukVFVP7P8<`lP zY{d2=b=`w!N4oNfD^9f{2Esh$JzIIRxMloHjP$aPH|3q@F%wK*gont0tGF8zBz?Mm zmgfynVyH6`)EyTjVe2i*a4!%zw!$v$jSy*I!@S;Uw*}Jg+2LbaS8Sb{C%AzAcs;)c zu$ZnW)YKWoNx#PHjok&i`l+6O<=1aC`%CLsRcN;EXM8@3J%CyHfdxLO_L#o@TEi_$ z0$Ra7ODcvYF7SZUhK7huyCXK%}^d4r>TzqJAsWDua zJ-qKzxGa(tAf*g=H>d6|HhmxJQg?L6T1-2liuq_+YV+>4+$lv+NLecV^LOcXlsNn` zgv00JOP$SvkPP7X`EBfh>?U!MQj8)(;_(OZgu3%D(ejJv=3BHld~sCtd3?9gq)JI+ zF-VTTRh{^lsOwA;AshJ-PewFt;(mVSxwiZxf0Wyu776YgNj?tlhUe}$>{x#f6pn&i zYZW6szojh3TMWwHG7$g%DIw)AMXVnZxM4P?^_smIT^0<-76U4yeZoEX>{8pT&dYxG zI}2~_?iR9H_}yinx(jms&|M)FN2gu|;@uMsp$>?4C`*b!6%ic`syk@br~&?))k*3v zlRb9gjBSfKfz#BTolH5bpb~bN+r{3>La^vRzg$)gn}c@}U#6L(PV1ER2McKNWEt*V zXRa!`&4Zz_u9;6IPTN~Vp$Ih4D5?c79NT=Zb=THaTa4AzX&-CcPCt-1E;Ks>~ z>d6YA*gsH0;V}qu0?=YqDT9W}MvEC}Pmu-gKrc3FYr+XChFLFCynibVv@@=e$u$5tYL10M3)6`jeIp_zU;smiC#c`tzY`g;+3$ zj`_u$4Nj6Hv(lWOfC~on6pBbsiEmm%$Z11= zm|gU{FH7avo6d5KAJ76*TxmbYv^dCWqm1l}1{&7p7WeGra(= zTW`S>(*SO5I}-Dz={t`rkTeLUt5>?>N`00+-B`aER6ld)j&3v?q^CGcp5Cu{x@owi z7K>0dh1zl9M*Ra!&_82o@u=kzfmbr@+2X`cqUM8H1f@EI%}}VebXBe^bZd|jbS=Gp z2lBaQwVzb_T=F4WnZX-gbichZ0ag@G+6pZ87IAJSdz`o6_=lPb5w%6z#xdZ9{%Fgi z8DgPOc=ns4*@~}do$yXqo!&I;w@8KYb_+xk#3yz;!W&dOtV4f?-k8MR6Ge1Xtg~fh zET~hqefEAZot%-z;QUFo(*3b$mirV+pWJQQipoRqc0)5(wWZ;SKa!Gp$ zvv!yUD~>Sp<Ll?R6-PJHLmr|kh)*N@kQJk8q*rFKMmwd@l8VTdci zR=>9&N9inQ3_=tK{ZS=9M{{Kv!Pm1#jWj_z9>BVgsT*!@3=7Qt@%)NFk0WsoKWpaA zZW0V3!Q%(uk5{v)LS_-OOiV;HSYBJ2MC4~p3?XUpYYapktCKQBv;odKsz;I_=T;Fw zEm;-+>ua@KAv=gVm6?mEOYI|lU66N}gYodmR-IKR?6Du3Eyy#S&$>Q^xrW&0Qpfyy z9)$d&;^nNdAQu?%9N>KZ8}piA_$?_V)DQvhG1l)wwYxV);upT|F6s-zuk(&)7R=0v z!3$`6VY+IkI~#l-aW29Odq|(-J!h?KR&B#E6L!-p{6xSGG)<_*%XF^y@iEQSDbl?R znwcF~;s?i{AkZ>Z`aakJMY6Jp6ikXJ5Gi5^SEngt>rJ)7as3%JCJhQ2Zm7>r(cczB z)pvV51`f`=&v;YvyVz{evfB_aDdzDkiR|?i}~W9V`sS;Gnfy8fnQ))H%D}Sp>HlPX7>2a5Yf3?cd?vJr$RyS zL)-+cyY1?a3|Nq5;Xpvm(9!G(0!&4cCt)Sy?eu0|=2g9exv}>@KuQEUh?!@|tYhlb z*l~-X8slI-hYP`li9uiDbH{jsl1)r%XlE%F$r5l5Ns&8W?od!%#VW!gh-Om){>{%2 z{DRf~bZ$UG!6?=2<{8^C5Hu1-MT|m@qOJ!-lc(RBcst^85t0Q_-jf751DYcapW|D2 zjo}!e37)xEH}t=K{K&6gmQc{d7_y+WS0&pqQf)5So2Tlvt6j1# z-E(N1zpGR29|R}79wgaccJob_7N2EAIS6^&oUiubGH;MNFEM7LwFpB(;V0dzyY@Aq zKUrzU1s9{D43a@XM$?U=6RnGb90CIStGEGz6Vq#2mnZTZEHnf%rW$zGxS7Y>zjB=@ zlg#LRC=e%oy1ytYqUW|JMRBLjd3~ZcBVF3~_1*#yMHu#_dqDOi9pe0JjFE-1#duY;Scn?#Y4vizC@$)? zhK>4BLuJzT1Or9YqTDIwUnGbXsqjWP0&AX-8G9yEAE4`s#c4IHf zuWSgp(KX^y)H&<-&zMfG%kJJfeY;&28Psgr3N%Cppkv)s%njSWsjpf)fP_9|Nn>O= zhryHS+so0$j?JG3mV%#7%n^5GCGDC-_o$;_=QfN#dj}7*HPvs6tYwBxsk0S1hJcx( z1sxUM^Z9O7q8j@%B&%{{wW&(M;tl!HTtZ>SeIeH$8tm&cnLhrRY^?c-axON2PX=|k z`CZDGs@nAa$NNQIA(Iv0sHUrt#B2M~b_25zS74HI6via}kv?l%VZx_FKs@I{7&Y9q zI*YZ(rSIeyaA>{(5m9!~Cb#U7eMb&}lkL@g;{Fh_YfWBp5Aaw-0)>fHN?0emk7$dx z=<0WZlTogb+uEw><4ztlwe7q_Wa#1zotdn8%DvU(Mg}9=FPeZ`+JxK%F?+I==;tfJ zQKw`B-s8S*du5IJxSsEKNFQps?_vc`X4Tih+akwC@0HI|pEfoHD&@VYq|Dy!{_J{{ zB}NgyA<}P4x*B#I@PE2uUuGRX)gT|0l#; zg8$tdJG~S_k?>jcU`W&G7Yx-n9`R-ZN1Dg85d1}gId!Yz;W-KVGY|I@MshL6y($b_ zZ-$ycZRV#c&_*Hntbvz^FvXWjJdtwPE_Yes;au&aWu)2bh1k;SmDe^LPEOECtIJBW zfp;0H3qUIwDxC+3nEgQiA0+c$zrQpzgg=SSmVFM{7Y%7-897;Nz?l){NbAA3Iqor8 z-5s4!)+FKo>s|cA?oW4>fxaWT=4Bd)cK|6>0HcP)Oty_cpSla>#4_U?oG<4Awc za;scuEY)(G9F*7F;uNU?VzuaKpF6vsb!4Rur1?7T#K{O8DoaXS zMMfUY^hxM^Rx*eg(P|aHb*3wTog&3_o=)E&>0s;--;ewQEX|o==YRS==bELb7;wDA zlM4)g7SA$efj`#-gXC?!wDnxuJmznRD}5f_3P;<5ULYsMSeHg3oMZ*0*z$E6&*Qw*4otXZ?ZY@dyt!_l| zqojbTuILcVi-FYWBy*rM;WT!IQv|z5{O(&GsvF~nxSc@mj*08n$8QD6R6pZ!blvc) zZiI4-5xu96@!Hze1pOolKFpQAk>GqbZh4U+)GIfB5Ho8n1r7%-t|^pLH-E%pm^YSt z7JI$D@{^Mks+`aiP-rTHL$pG>r4zHCGa9g~W{mT}nbvQTs5RNu5ds;rW+$wPSJ$H1JK79%*a;P*n2xPNF&!$M3o*7$Eh*bp zgwX}3*wSJL3$s_6>~{Hd;VX9wyBLmJGwIF4eGRVvCa;B*$i2Mcp8FE?0f_$bhqi4> zq25+oDT+t}RnJZ_k#9xi_e}%utio8|@o!{lAdPTd1NV{tIJCwZUKDPoZ0SfSn?-wR zQvMW~D6RU{y9Ms_)?A`$5VZNYrfp^5TG80X`*(w5pW2gQ_O5cFSBuJh);d<+F{GW? zHcvLSwZ6N$fK|lPK>%9B#Zq1Sgf%aD@UwwC3}-lO;`|Bwrg|=yVF)dct6xGvYdip! zl8S(%4rG@}7xUIV*vFm@cW&qXF;d9lUtjzOKwO?f{WoAQ9EVvLvyfy-!QOpp;pZuY zZeaQkn*EUxxw&^k%8~3#vH+X z%W8!xcgxz|Cv0#K?Os4%#&fIx`mSue9Acd2lh8F)6RSoP58e6IoVFE_|uA78vi zFZcGC{|7kAKsngBq?kSMe|PcP{HY1yR}OZYQ!vqdRx(^XZ4|+Jo zBDH(O$uwSRi)8XYfLG$gt{l&sz427tab}Fj&&yNlMVMitw-41SqzCny4NCeB-6Wk# zExI~E(5LWE7oV`)dD>l03ESRN8xs{fYC_SQHm4m87uPU*rNx~V2LQKr84F6ItR^y4 z%?Tz_Ik-M=L=h)`te#zs?dVp@~jj%W}dNz)2{6~HmDes>_4#C%UVW={Rj=3xNCln7Vf9r zY3Vwp-wD_dv%1RLGD8`;PJtExxr%M{+EqR(RULP`wBQ4<=;LxzTcdbd^Be~)l&--C_K1DN-(Ujg0_aa|eK zS=d#7E&-5|A~#0!&*;&{6zVD+jwDDVu5`3r*1rF8O-pjF*(W^!-|m4SL3X{yRqjY* z4~+a{IE_gClxhq>15F1hnrZXKNTj`iDyWZpP5Ef=p8Ey{d=RjcDY17!0{!Us>!r)UN(70 ze9lFHr4!PvgUf6-bqben2{Wj}HY(+HdwX4ZgX87%{e}(Y(zubE)u7O4K zG?-#>GF%gnu7T6-b1~%~J%C=BbXi5iVbxK;QXNqO`GK*nM>>y&ooIWx^hw9>G}Jwv zuZ2Cz;EWfq_R@UOq9Luabn2HvaR9Ob3{Vx%mf4r~>6@Vj*~>|s@MUPAsU>K-GKyl~ zV=$@Wg!rIEw43}Pm~IHpiVJs-^esgLQTIe#7T$*zuQsKGNs9(4*~|F)`vq?cS>qb5 z!-0Lb&$f-VrNeWPTD+AFU;E(tlx3TQ*+=?%qWKJ~TALoi5j*>nXYLA2$)w`W)em$T zTToh2XrZ;2qg={@LfE7A_nFqVd3V1)y@8esP>A2B0vo|78mXgqKrWT1T`uctAsigkU=BPq z+Y*3ZUOR@y{ekiK{;d~Lxma=f?!``nd^&_<`CWGGv*$Ll+S#iLhI_W~6cF(UaJ+bF z=n(qXU%UMNx5Q-JABgX3RQy(b_HPkfD!WD>p2aWlvbQ@4Me~bke`tOB!Ci@SGQlq* zcvnGMuU}foHNs>mqFdef;yZs#MHQb!G_pX6Xv`Dq^wEe{Qm+M#=^~UH5y?21;fzMTYZ2gBRtq z==&M&mqNiM2`8Li&x_|ilp~0dlabg#dPcP(k?WNeL&nYHykSf3Ks0M(p<&NvR8Ox< zH^e zgRTN$X!@n72r2mQuYwWxK3fTxg;^)$wgcb9iYIrhE=d|nY-_y|&h-q6hVm(XknUo% z1%z7by*7XNU3c9=nA~J#1o^79oLOI8(#7D#GRwd3bT?+{Wu)zE;%e)$?zn`vz=(qq zJ0m(53JTVfg{92j9+x+gyZu@gk@k6EVOBv_dEaBhr?XE!hnT+PYEC9I8y-SB()Y>4aV1d7*|0Zx!s+KzO{(=d>_Z5 zwU2?p5WlKfD*E<5kjY%92&HmC3 z1OZCsu#qTykXjR}2OZz#nw}>W^1>EsM_kt_X+X5u(3Za;FYbH{Mm62H;5ORVk3Yv! zQ*QYetmoc?&e1_bQgUx{9deLL22|=DK*?sJ$u8O7h%j$ROO(@u+$6ILMa5m~3ZHzC zol8p@YNDSUu{hIm?~9Nj(&9X#^IHl(GxIqJ;;B4T~a$0OO>mp#>%egH-*jRj&? z_T9Yt!UBni{ImaW<`Ny&xe0j{P>G}_iRG+*G(m)k#O(a`c27}qH%T(q6KhZ??y^$A5VZX2!xj%6D!FE zEhRWFWNcUUfBn%;ge5TU6Z?(Pj@^~>E3X>Es=BWmu(aQ8`Azg!NA*2fnN&O6Q7lx- zOpNVC5_h{~S+izbgu{@gi7k&nhZ}v>fewY?sU{kFLsL^OBvrE*CheH6i>LdO?lEzw zQhgWV;Aqy+o_&1LG9t>|b#jt^z`kNbXy{+L{8BbAJ%mcx71Q|<2s2djcIg=@{q3)! z`7R|gPtQMFfjQ^K$wgme#5GaZ!;rz>R{b4Y2#FBcvT0$m#3NEv;!1dpp)}V7Bw=U6aOV~mMEK9qbYISyJ>;3%ou`|rNr>`` z`~tJJ{qGhF*UWgnYv6og0$j-B3P>)t^X^JSMPtD}s;K)r%Lk;1(M1ZF2)dHfV@!){ zgMEXPGi^v1D!fd#^3jV)sVHsi#I7R#lz7AXZuHy82Qu_abT2X#kYf7k^cmZk>H&fn z!xbMtIJ5&S5nc$px6+w$Ya60s!tQ0YtBi6^kO4w`n!Zmgrx;i$n?#Z3X#kl#o}dhs z7{F*NBmCo=zoA&X;|xZ@`-&aE)i_ey8pU6Krfk!)?3FF9>FN@x19QShn;f~k9JUe# z*!D-VyTXo)q>nsmjFxKRU&kRJ=L{uPe4Xdl=~-D*x-_Y5VF{9r{1gWBqntHSo6m;9G1g`c3%fgnA#TZbc(y#5A?b$Ea$?ETe(N)s4 z$$kC-vJ1U=u9mImIlKat*ELbN6{R;gv(NsBT!daz-7I?Xsa4OgEn~%x3_h$j4*+1D z^BI|mjT4tl^y9}q5zBplwNV^%PtG*S$N?GVcv&~O##%dA+rxb_`-RvsT{(vc6f`DZF%;G|m(jf-D zJ;XK;%{I-8a*+ZaR!?j+dleSPe-}afyuhPMi-Ob) z03KMCV5fKMOr0m`_%EmsUHI%q;IbWEd>5_^d@N!wFQyAUE-ND*ah)!=nmPzKW~81uSdRY~&HgZa$K?7& z$>R?}eXtPv?C-Nbrz)x~UC<)ikpYo-Y`)2X^V4L!pBCf>!4A4&)G(H#lWe{6V*R6Z zc)n-3%BK`PH#2i=(qK*gwYs_1s9VU-BqN(R-F&|pfgKMi)SBn!XWMpPTQ9aZbFb+X zf?`5YvJplnuX`Ki0t-ETau+t5Rw|1M8^rtJO}uX31pVl0j_aM*X3fy!b_-6X<%n40 z5@eNbX3cLK%N*$!@et}I0&)g3;L>F$H=GKV9c}g>Mb1?VUgRI_qGai<6s9a$12?>^ zTc&(z9)>2UEL@))NK=LKT`0h@cnlgve6HkDPQ~eK9qW_TwjDS~0yneO9yb?|wjHrI z3=WaEY2VB#HPgkoURP{V9Yr4SJ+X{FG3lfcVm-YSg z)zFfEn=)rX4xK0QSkUrY=g9H1sVT~UZ8!ZsxIV8SFvJ!0*PPgA(e?a?xsBt$1et-j zw-Am+hp$C*5x#2~V8M4_nO!iW#T{Asvdnjg*(?sC0*XQ>e9E{HO`bGVaV%Wnncc^1 zeoXKJ_@;aOq$_8gHju9+8#HH`WCVI*#K-!as1}k95Ye|~QcdQd!6*`SYSR1g;vAGWNh0jUA4&8mVPf=v9CqpV=demjozPSLai~@(7egfWk@=U4y&wh zz1i6YXVvFEv@&xzzZW8@8IXv2ax3+F`>JOZwZ-7X!2qdiWj9L_ow9C##qhJ-jl$B_ zmxdHZIokXIOq1(P;6QV-wRC=hPvXy_Tz1*4%px#kEy=a`eV7c)(CQ*=%<6oV56zR* z;q9fdl*cy$_D|5pZ;E(d&Nr+6+P>WmCf+_ z>Z57YD@RA2pK;>97<8ecD|oGqRI7E!T%be|j_KBltrSGiQ<+R41s=pqNIk35IoI^W z)4)=XGOp<+v+`^u|IHmJ?T*X8rM~Vy1j*YNjCaeLO&~NLkLC03j(7(?!zs-w%q|=s zVv)iZYMyPO?k^ac=UWJy_~`st&uAc;Xi#Tsfc!Ga>w03XkH1T21HrIEd{DBi1AVv9 zZ~miLc!P`68alI%!-Ico;zC3n{mf|>9RN?P3p%M8^k*nUfzKL0u>#Kh!^FnsPV4W|1|R1u!>d?(drs*JzO90>8R<$#5Tq1jx3-vKv{8Kf zRh0#Ons!#Sv0?eraqjf?v0=SHmS*YAnzU=UW7Jv`32ir)mu`FvFPt{* zGUcP?A|X2pY2%sl3K>V7+>y544u12hTJKxJPPt>akdEN5PjJNRd0$SOqHm}v^W`wNPk!$bcB>#+GNn_*1cqnLd;|O57)vHj)>qkM@)8}%nNZW zE>yLr0aQ2saZsjDSZI;1e!bSbFD!;Sg1|3AjKbWL|7^Y$>r*LZR5VA-`=PF(e%Ii` zLpEb@Ry|z+*)-_hR(i))Ty#C2?lh_;u3&O7o49qSomWkI&yfRZsFs;1+58hYC;^oZZ;8( zcfaLd^})U@<6{`JmzPs98~Lp;TkoxZ96qKSs9x9jQii9zniy=`7QLO`^PYjVAdDX- z5&*FrjP%GUM-~hgnn{qPtkZ|>69wik(A1Ue%q`}-ow!y^(+0i%m1KpAUy2xC3}<1z zx!l7H#GTl)YSh91%Pe5;&_Ky~e9XB|lz9=Pn~8!NXV;vDY# zEQ=wAw0~2n4dXdmYG^}qI88Kau)QFEoa8Hc=4(ID)A{n6!RTz_;ttLs-dK0 zsMn9_usa$iRw0jx`|U=PbfVwEM4XAUxNz^hZdb@m_VGC@Ye_!-^vtsa@t*n*ls^QJj2d;7^W@JB|rbjeaB2-25kLK*wAA-q$ z7%uOk*2wbKB4EpN3a=#A;qTen(6z`&AspN~ZAH3Q7_raXu3HM--sZX$?6NwGSG!rt zy)XDaJNE80$I8-TXVV=C24*K&^qu$4+l?D|1z`$gq%M${FQR=!9Z#MlS4BC~_r4k} zQ;xE7kkx^;Iz(6;VaWZ08+k)o@`QM}>KBHgK(SL5Pz=kgn{AMLzCNLV=lIY}5_ z3C9%#9$Z&CD$Hzm&MwcN5|Y0+?)nEHpn~tUb>v37JhqNl_>!D-zFJs>m54$kE-t(* z_@87?e>Apc3qE0bE^tduQC61>RSuVU0Jy0Xr{oa*nn+&xR&k-(IJP1U|6q{%&fM7;9-;i`Qo>+ zmEXBgul;_+#BikWfSslVI>vxua!mz)DetWi)-MTP@{oHqaLMKA8^7={?@(3dednU^ zDMyahHDnU30&d%Ro|e6<0e^$hmGsveFt}++!}k!9I;PlH3srtqMD>HL9;@3UR--DnZ&PB55^f^tJv<$iveN)=hKd@#jnf^9#}ne`lT z@sSj=A9h1*@y2B4)Sw_J$qel$82%KUV+k*OO z)|WSr@^rsXF-Fr$FGuxdAP$3tOOkadBE6Lp3Mx6u@{rja!GR8oV%!Itb-R$7etYHL ziP1c>?wMyFJeSd`maZ2IBl31eL}+E;kM1nYPRBGF;A%k`-^LZ7pA5*{&lWA7td>W) zkNap;!~g`?5+@X072P7O1SV=k0`&fpaZ87CQW$X>sp&ONI)wg`q1p8bnYpS^yg_jq zj&cw<6_Ijt0$zJyD=?8#z{nmggVJ05&^8s*Py@}vkOC^b%CjV8weTdvUnPiP0IpG8 zbsKFadtNuQ=AcAnA4IIipgZp=E}128Ih7A^$7KFJWDozha1Fv=HC*gQZ3P>;R}_A; zoLCis7B~20M(=o5S4>@l_DL>dN-MxR8+!LP%eS%>W^q!{eSm`fGxwuOx5_YBKcC0C z?6gG!W*)h!RSIn>qB_K{^jV;@`p{B1dsAjyc|ytR5%Bg|$lck*aT`KM=wBeqpq{_S zTtNIB^vp8y=cB&<25&r!MZA$sZ;fx$YHyh=Ktg8ANa^0^p#g%oetLj@@MT|}e?rpG z;3874w(guxZK;nYY}o6$-Y^1-CFtZz+g{)>Z{ma6q2F0yy)IHd{=+}O1<&7Bk?n&! zr;U|9l%DSJkr<)awpC|C&jJHzmnlX|H@sm*f3?4fdgpO7;6ihHL&DD2tq_H^nxdaM zJnFIB1;n2GMYZ9{Jy+xDX!u3u{u+^$a9CQR&l~@KbP|P506LnfE0`stcE6y6WAEUj zjn7tvPBgW;b6k+JHm_rl?>v?x?+i$$eYZK%{?o> zEs@J5oa$IOo4c6CNDecRuFN+ywDsHHd4$+}#67v=&Jv(hY4FC6Xjr0e($|hFlb7aw z^5j3dE=FT6RoC}qhJ4f51lQd=Zb3{i>%C9diL*J$N455a^IabZ5#WR^oB%u}-h3a3 zJ7@HkdPa1D8G}jpmurS2-GEo-U)ht{ft*~(QOZtpXPu_FEfo9l6A*g=twgtn${$7B zCrngUCgq1IwvAX)vtoXKzB29vv~yuS25C7oIO5%>qfhggKe6t=BF#WXjl9!UVRpVb z=y&)So*zp7&09cihV;?NcH+jLfh%F8Z4Vx>1y0#bZ72EE$=)y2@%BsIF9lu-X^t%~ zG+zQ1h=Blv_1lu?r3nh^*#yEyzNN2FAEggYWW@=K^<>SsY=NIO*)+fuk|I^IN9j<4 zy1RFR`mL3j^E~H!qSV2IVe`Hp9GPUjGwfHE#E+d3JHSo&jO!H(xnC9}!|tQExwkfO!C=X;KG@Lrfcmw60HiTLyC%LT zj=z7>Oapp4zHEbj>i6lbNIIbC=~?m*33CS!2gqM-{`O*mdawejzNObv=Qy%&06rH} z_999)W;5`_U3(1_gYD6cByZp?|AS16K+ej5m>=@%Tk6E;a)p?Fp0*YBXUqGlM z{oIP=SUKmhq;_F@FeI^)J>0ke{DHOrTxkB^g(LPVyaNXlBc?*3IV6fI z5j>%}Gh#1%x!Uu>W$T2I*Nv(j!(nuNBSRLzD#y#=;Oe%^6(HYUKu>GmT$&j+ z7U=u~;#z-o^V1XHgkKjBZ2)F`c+3Cz?)Yocgv^Y)Zcba+-ipfzIr^&Cc>lT*bk!OdRa*xEty@=?R7?KGHY>E}JKORw5&&VFksFIo%al@~jhFpE7h#G^%rX%5RD{Qi<4lk9c)qJHDn ztkI8FISBwWnGM};Quw1XD^J!u>GywIwLHXHC#yZme(XF+<)fVi_WCw{ zyW1-higNGcd=%@m*RyRxUYAvG#4`p8^W3+?D_+ zj344o!@d6EU!YBphwp+58&cXVwjkOnH3Z#z67ehE-WaUV(X|9+`HPIZDk23+<$0KU z&oBBY5}eu1JE*W$QfXN-y{7zH^cHEEeUqg>C$LXnqvM@&rz21D3I*WpsIm$wR}^d# zMpEkhAzEZaPsI~6iSYxUaV3h|aF+L&--T7hebzc<-1;!VtJ1(?$Jzueqpk&|&1hk{ z5*1g^ry&E^KN50mS#pksKC=tGMO6$POzXc9% zo|{X0IxUk7_ZrA|C$AtZ3avy?$`*H1Y2E}5+rLKmfA2U4I&FRpitTKo+{e$C-8=Tu ztn)NC8&s@bEs(4;>Lk(NA{kygO@6zcTT0Swc+EVJOJIzL=FQM|MoP&(_}9mk*xjW$ zJ{7K;dvu>QH+erXpamzd|C-A}e;wi1JhTL|YYz1v4Y*HV z-!jXNuf!T*Av|P^QRsX?oU%V?S{dCgM|F2J{u_I56%$l(M_fX_F zZpI`vl)LOVgyKWn{sFv&vjkN3-jMZuCEAok-P3tyQ+{)zX27o&cq+|H^4lGFhw5@7-6FW)BVCh44wnNWjo0$67=?lGwv~?op%ewv+68&;WP^W(_oErzf-Rj?{$UD}ora7XAS$RKjl=MgrEl zg=f-AFq4%pi8tVTI`AGr{$Z zB7F%Q&=R49&!mQ}gsh{^bH4@W{B*k|oE4`91s0l60fV3;1~W*n^`Eh73u>Bu=|;iq z@H=MPA71L@$!O=BL#98qO?90VX7Dr;pD7sl`i5|~%)xkgAQ-aW{tvKjES$)7da9Q6 zgIY?#!|heBp*E4lR>H;moASwn{P!tQA$QETvyU&nHsj0!mP6q;;lC>#VN`&a(re)` zu9btw=&X{W)WTJ$Lte3X5~bRqFynxOjSRI;jgHQrSZ#q=%JiB6&lj%+LW}${dk<}M z6y9(^^b+i0p*W1GTR>HE9-Wv6s=L)jCaJ!)TKw}7`%s>j?AVBEtUO>#yjD%e%-U_#i z+;7OW%=*NZK7q&VvG|}nUx=MahEHKux7Q_bWBHeg`w4hM zEZE#IHDnf(8d=Oqo6%P;6%?T-8)Tx`H;ukAqqQh?%a+KTlwI#}Nn}@7&e~!IHj8sk~=$9v2m! zuf*pcK)>#V6K@Pu_*AApx&?5(A5Y6fR@j7jbP?dpdZ&uWsC54fLU=;p@N-^Fp}g%% zs-Y70vp#Bz!I~Xm-_X*kYdapL8b7qGB(xcmOWaXv=rM3taqn0m#cV7w? z8JhKtKkwi4?Pr>U7i!j#-`2ory$5-!*@VtSBlP?5fhXu4*!7 zunh=4jB(cCg?YNuUB8NHdIfY#WqASX0I3EMG(a?`CdXPxgCXA#T`?4vA37-(1Vj=Z$4#P$j= zN98s5xiRnSUm;4d>?ZLGOJO%vEPT$;w>phumw^GiRM=kI?X(RNPMTRMu=9ffR$=WP zmy(Xql97^=hJOJ3XMPqSU7sl^Bb(=OvlRBsOo@ptF0FY@VeqlTPt~mNhopiceUM^@ za3W)QD@I=kN4LP+`#4Eq#*xDd@(J(sE$-;Z@2rX}d>~}#o(+XT1jm}BLcBdO&&FUM z_(oqi)8nBb^(rHd&(3G(m5|o)P!uBqPlKn6mls+=F+kJYT<%fT&lfB}L?of|RP7Ut zy3l!~D-}m}nuu_A<)pa9Y z-HOJH7wKRLl<1Wu37)fE1Ll)&l|e|XP5Z1%&(23b>LbItq6LB^ze0rhybzSEZU?hy z8JDOW@~RZiFn-2rLNiyU0tbXgj2leqTG)!8_%6Lp#bdf6+cAlB9P6+uKt>#sUO?Fm zqZRGE3@O@-=v8-OC|4opq`T12T8wDm^P~E)gLPn|WOil~@XaeCxYCuUi+un`G>&cd z1&gVgQDxuW_?aLHs8sX_jQ@7R^6+qT z!;`N+Ts&~>m52Mku~&AE|2x?0*#BU!00Ilm|G{1X06w&T|Nf1@cLmjd&IthE z|2FnI_CMGwI>!HCum3l&*E>MCQlY}s-3 zOwOuE8)IELtl|YY9zD$STUYB0zZwn-;ym;N3?N6?Nz)hLrdVm%;w;wX;1v^lvH;$` z;1OI|GNQrxB3MLw)&}d?Pohi<R1g@ z<5c~KQLt{WjJ4fyYiThI)F9?}TH@V?@F}Oa3rz-v+nI`pwK~Jo@XvF5nyFqk))mwg z00WXzK7O2kX+QWt?dcCR9oNTPeLuBz_1Z-c6;AWeUksK_2*fdH;Q$M;H1R&Lhp>ex z_18|G`+mCJu%qt63sS(LqAXCSYsGBf2wbJ~*VfE{vd}(w8-jMcUaT$Zxp+z!lJCYvkxQD%MiWQZoCuJwLGUk%IH(I*jw+vYH^Y!Dzz6CDQ3#~32e1zX ziMX~@z^w5Av^f)H0_3&(nJz;)hR11SsvvBRQ=_iB17I<@08u>>0 zh#o1!)NQN0H6;N(fWHNFDg3yqh6X~c=%S$y@WrXn6f4Chu(0ERFhnFSfaqtHVd$a! zp{#)JY(z<=5yp5ZofpVu`=TTzJdb%1MhtMCu(o6=707Pdck=ROOdo4d)--E%@a1ycFF+x^oq0|J zzj51SMgRbT->5;Ba4s(+ITSe0`KcCx!H^p_lel24~ z)#--BVrdbADkgwUw-WAAn2#5$tz+j_#YOn$c-Bp`(5^S7&O~ZiH&UXlYBXBWz3xT9 zZ&9EZ(Og+cY~pAR1RpZe)GLi@ItbOnlGi``zEy!td-T*Hj6gJt@{~ex&>U1bg?otdLH@H9~|qQ z>qX&aSE0^;%ET`zp9a_&>6#tCzPAJuB+)hcKeo6PhfGBqpCpO!jypmY%MI!)x`)gr zP*K~};-1SIU}sVJ=faFZQFpC2Va&yd3^bMUnhitW3pt)U6316`lYTl*j6mtHDn?w4 zs4(1$4|zA-_5$c@F0eBDpRCUD&qBMQgIZpoI~qvhG?40P5tZBqE?G63zA=lNLE#Vm ziU%uBmDG|dJZ-w;I(YAU(otkYzQfEdn-l5?hUA?KC@bd3Te)bytoZU0R90WJ^7Nz(Y_^dg4&Y4qwM|@mn9A;#Mn|f`9=_q<@Be{7Ib>ylx*~0AmZxh zl6~hPBuBOF@#l;aa>hNC#^!8PWHD^7_zC^G0s7LwSgQ&eCrJrjVvD^U*KDLNUSkA7 z#%alLj@emJcBtVtp~Lf35FaD~;osDa<&G+~^uVcFYg%d~$Lsa!fFk@rz-hu>=IL9za|pcFo!w#u6S<4KPz+iMD3Zv zOF(`u2-+I9`FyeMfnBiYCw0>+Qs*`hEFR*Y#H4I0qEahpjUW@zb?L-#2I6?8Ud}K#R84wILVDQq3sS8EaH%cBp?0 zI6r^G6@f=t>xugMI+%QArdfvf$cC+oV#PUw{0*%4YcaSH$N+gfg9A8rd}_RQt1|V` z*l4fqj_NY{4L-sg7f^Zc5 zBs@-YDo2X--ex7d4OHGWDsY~Wtc-S#KLgLg*1@hhAQa+*3BtXa^>HH9>;fXEBx)4J zYTj~rMNwZ{2Miw6nWo~-hBuoL`$F}-P>LL3v|lgq`VVXEYzxnuER}zN!~C^?>)+(o zoDYe$!Hc-Ui7vOxJJxceqoQPtm(SP^Y!ifA!)YN|6MLo6O#}WF@;KovaeHt1bCjN& zl!+}moUfmYn`rjQWl7~VWcssTHv1-p{6S9nPZVKCTCcdXc=_IF(X+9h!_BBB4&?qq z{(PHKkqW=FQygF+lLzA}n?o5DCf z6C1*&LQ*9cfbreH0=7n`=RV-y40Vb2zsnzFYI{o?_%8*SMD=v4g_W)1^sH2f;g=#l z*&y%671Y4%O-Rh{n+YWEITvJUsNYt=XXY?>{dV8}@5D(19(Tnp_?4_uFy#{}Q;;KQ z66seG*)8#2)?4j4B_ZE3*^j~)&*pk@FHFBTNAZ~(7z!X ziyuc-gkL;&%X*Qy8R6YZs}6!ZWw;RHp{TNFaT?(e6(XZ*yunk^u6{1G5P?6(yAezf zsY=0O#F?<3Q`$-||9IW`(p{1)b1@LG$y7aW2v4bB8cmu$s}hxK+*;%Jdp3gw_~Vqv zR(2yo$kiY{lmu`!_{=!g5Nz*gnwF6KZUh%ok7O7R6gbZ3%ci@Y=S80+fzuP6I^fF| zc&6d%Vky9^YT}G+&>t;ap{h1y2Q|olu(faX;xtl3&tfk0pb>}_eL<${Kb=S&gpw+D z+A(%y{ls796wP8r2F@aY{G+}qPp4geFpM9LuSDY)d>ODYH6LZFkdb152EI!CXr5CN zBXz^Tq%5)%N*q4C(zR+-v~XeV&A2r>WP-HKF7^gF2KLLrm6!(eg)On!LUYOai;6D5 z4;f@v+I;4qyK=iTl*Oyd;Mo_OvK}4iAL>9&1vjq6k$0>sl^=~2vr9r=+JHYDixF#C zPD!!=+w>%3(ws}Gxq4;HRiDJN`Ut%;ydmr!0E+^qJEXH*ZmJ8Ur(fI`p|l$qlPxoC ziqu#BwR=$CvQ{#SSawmU2|O2DkMT~TIdZ#~@B3ZjH)EZ+U%kXenbq%h8q4-7hYId1 zS9B>pcbIF{8 zX$r9i6MWIf8Fk!8sFZZC5m6Huy5CH{v&quSqrc+177njRsIpSFymM$z*Z-5xzk}w4 zebO&DNbXn*OPfQzTjSrW#Nej;2QHM3^qj@3VN^bXQ?HyBBqgU<V zYra3-U#2A(G6ERpiw}}|c#&{CX_?=#5N@?|fYXw#(>|0XIloe(+=Q1UAe3{K06Ao_DHB|^r%LE9keq8Q!m3vvY^F_JgT!J6%8PP)p0^Ez4 zx|xd#k8hukrUe8qPORx*(rFJ1f2B%%T!* zcm<)_-z>yDCpJYW|IM6Oe4G-9^VoqnKR(Xe9DX9{iRx6GE|~qkqS(Csx0A5{5Wj`B zYXC*iS(PT|bvax}FPSPkxbh{K;;8EJn=L(3KRSsBTfN_Z|bigtA*n`XI zGqst2MiS420^=+x-QaBeK=PL?7H;1)5hT#|dd;b;pu&mIU`03vt5*2+UJDO%#XKf5 z$@Vo|MrQCv@bfxHkSGx3R=Pu<3h^hLOgfrU8@eV#a(GRZ5NgD3K?>u688>7^0 zf1vNWiKc(0D*u;Nx0&LhN`LnAw}o5oZPa0WX>T_S;nX5RBW^2g@fIO3lj^N>*AKZQ zLxcMqfBymI6d^9rY#14aPEwS*-Wqfzf9pCKp4#~?pgwB4lBnT!6Fe$uFs zv-{V2SW+4XX*Y~DiK|=JMcAOIz(wPi^iC%USJGq9N@p@|7h%GZBNrE8+B&Tz^n&5d zQGCpimUhC7cGgGDW%gc&^#U1{NzkoO{d-+q{k6VgxqORJAB$ZzUS?l-l;`-zp@tVXuyIs2&d>bPB4~O>&bqR3}B5rCt{7z^?ClIE? zPSpE1l6`(ObCcuKAKKfT1H&f+gN~+BV)Ri9WXe2O@2U~xb)7C|zuw7AP zg}8q0a4{LqT>2r&28!l0)QAU-QOjyGttirt9}A(+)~`ib-89Ypa@(*a!|9z=^R(Y> z(lrIHOg9}?%f!rUlw-p~w3-wjeY5QE7Sawf^4NU=cQzhXGnkK$gU_8o2* z;KDtVNn{fG^D5ezn|2|&R9A1!rIA=odI5Y-;qR~YH!iHxIZ{gl+I~_Kv!%PviD|rc z6)c=5+zjjMA|ozG0oZ(>3_Zx4A*&i+{4F^Lq-IXF6*=B;CT&WZ@h(-rOlCU&}>d`JnFuG(jhO@kx{hPuAl`9Y9;yA~wX~yF`PQ+(uMp`-LVP)tR zs>LbOA8@5hyG;M8DC858JKpvg`^!A#Sv2G(s?2VVCPk-bYDrgm0&XJP0SprH>;%uu zx2j2v4bT575wdvaP&^hjcc71Y0ZRM6)pHWjkZ{SLN4aHi$p2!URIoWF)GOK2{7YJ! z^3&7^B3i<8p1m&N*^w^F5>0PL$st35n!G+WddW}UgyN6O!=f7SFebFu?72VP*wCZt zwxZ3VNcoedsnfr(-zis*F^5bGi!_*7(8$GX>A6?KMI#M0Q&kdJyPDLo2kLxPaL7uN zp-BONhlEB~SeaiH6P>pHd$%sU^Fu5Ezhq_|K5dviIW=|~oyN`45P%55r@4po1$>(| zL1+iT`hkv3&}tn0j+^nrA3)qI;<&s3-hQR%FW-mPSa*nsR~ErD!U_^DZhtW2dTFs& zZ!4XoQ%O?dDzf4!@*p<0Da6yKRBkOhG2%Mi`(~V0P^vhVW`OBJ^8NCYQ@WvZaO8oq zmYnV9&>@N!kM1hg3)PdC{3*7lHrRYd9f2#qe^`Uj$ZDfIyx}<@@>+08sFo3!#fbDH z<~B9rRV{cU9cWwF4-SUULVwHjEsqz_6H9&D`3G=B@mCQe{cNy`J9-SY-_4SR*_pGO zH4z^QGQMm_=$_HYNh=&5tZ-q&(kBKf-rx;k3#DE*UWbB1$} z%)O#b7~&r1iHw5wBFuMHBQRO?qj#&O@IBjYiFCYDuViIkvQM1zBsQ`f!Q^30xV-^p zFuB_)F%_WAJs%6m=Hf@wHoxU&5FH))xE+7=83nwHip~gp?WA zH-;6&)(x(8Y{UWd!k$t?7WT5-hstR z5sIxSdu|6#+Y6y}n|YFYCdShs{6RUsigV!CXp{qn6$qK8+jKxv1?0WJwO$;G{TgXW zRb(wJX%P9=fp4`kmuEu*{ii}r*V}}9?+M%6-d|tTA8UhHziocN);ClN4&niXDiPRr zo#>4h6_0%^A&**Oh4Qh`nDmn26KWyw*GA`K^OQF;yByho-E(!MKCen^TcI(cOWf?( z367&z$hCx1uKft3abWrpK=cj592igst+@wB9xeIw<__8rgU-xZWUzAdDF`&E zMuuLnNbU+x*U}_Q#dptGDf3Bq5|rN7&@-Elt5^!eZd|QoS!c=COl>BsUU1$)WLJk+ z(|`focDxUKJJG&2VUy}mr8#EvRuQ)lJZa4m`ID!H!F_jJY>zQDzKnx{Tep7z8g$NE z`eu)MN@P9hKg6)1nHh--%(q8)=s8wC zL*RHotiK1_y@o$khc@ofh_9LpPqBDr(<3z+@%N7Nb`X{=5p1Gj zQx@I%TA>uf(o*LdTeCKRFk#SB$C(=Ek~DPBjBao!%s10fZ@8BFHA%{rlE5O0|-ncz<6X}VCx=Wn|$$jF- zO{Ia#OkUjK__6;835kL?Q_Q}E-Q4I*Ek71dL1Bui*)ol898Q5FgtX=AGs~C1{FNWU z(5*$v?B_C*zLu~>A3-X`wvon4zIlURCp`y0m>dJJ*&xzmEIE_yE&}NmO^FVR^g>xL z)>m0|G7VNk&_q(aqvQvv!=);j3ns#v#id>#l#J5f>vCws-$J+fl|D*pCLW%gHv7Fe z^;40)t|DPU%bwLm9Yq(szgIui_q!Crkhdzc;rqP?sgXME5}Z zmsej2EZ0brB1Z~%S!H|^&VP_56)Jc6k3`IqjAx9NMgEeYjIdW8xd=lMvR}SHo zoL+@Jqnd1$jnRq}`2SJ3rNMYpi+Q4MlmVQ*t^B)r7eV@c&R&3~Jg$aKEB6aD=@{T* zZ`&I@Bdop6nMG(YRwQ4veI}wXjA41*>PNJH9&5}%HF3)<*^j^CR@O*=n%bgz zwQK_!9P6rxZ}Yu@MOwOg)*8yXQpC*S;93UZ768dDjHfd6RExSHNzSRzd{BN7C>Q|) z|6L#)IP~hzc*L6hn$i>@F2(ONLa~Y$>q$%bb1O8t3%^yvj7iQO)5CmkYf#L*ZEme7 z%wc!3bjRMLCrH6bNbgSWe z!25);Vm9>f+kt?!OITDFxgjoD+drPqd~eQYs_Fa!oP@e?pabt8$v5bx6_fsKwh^<1 zn}j?GqR`t?Y6{)J2A)$UJBW@#oB$l*byi~}#@3bdvSKu(+tN?eb{x=q z^#nU&qcGqy{2n@gLx)d-zS8S{Tc{ZNkrQc~2R5CrZm!oG_%Qt^mqn!4*Vp=&b0Zrv zYD{l&C8=`cPx9<@FKpq#D%m7j`T!h1o76dK6FdN6*C6ptHnz81Q##xV%1dHOsgUvEjUkG5)71t@lJloDHA+q9+H)!5o$-IE&H!v`|#j;%4og z85U*!I&u**!peC0E&dak&&m57`b*&`XFuXO)5h7qN^|0#Zu=_;By9Peju!{rNH=Gy z<$FSObIO8ZyHUsClZ_c~(1t$^D7s$1*c8qAeA&X{>8;SI`?{z3O}USj^tO0e!^Us} zdEB%A8x4h`+2`Z$M*|$IXFhBg*6*&4{wRK6kPl1`Ukp;%U7Lw!<-@+|>5Q(lXoJk^ zO5S#7l7wR*2KL&Kx!&d~60lTlemQ0e<+J*+%QPtLHrQrZO-6>J;Fz}O^HnGjGhKg; zwq9ouE$K(>#XANiQ-mPwQz;YKP((tq4Di&+N^NveotE_{Guy%uhDe5rlT^JO!(zf& z#%RxRk#S^~uhSK;2~$?dGFEn~lpM_5jirr61y1&bhmYTh?jC(uKSk|ms)?p9vo5LB zU96fAcaa|@Px~-!eIX*TtLHVdOw#RJ0OY7E3mscK9-`k2eByhX(7v>UCo%o&%V({9 zm9-@?@C4Up@^Q@xqo0IvaNMW}Jw+rc*YNuKfd;GfTSafXnmEIr*y56y%^x6GmCu}9 zY+6+QZKE7v{<4lRgjTmIVM(N0k$5{e&Uqnir^;pj=E$m!39h7=`K)GujB*s)>upN~ zbGlXFh~8`YwDIK_3O#)nm4A|r@ zY|EPukZZ4tlI>ItzqNK)dIc{z2MbE_dq|BiEYnD|LX5Sph8q$6~-~5sqq9PC>$>=kou3R@&Re6%% z(ikhas*@<@Q0aWd#IRk2;mc~1ZsBa@vahv{`vrF+2~jw4uEh|U*4w4u3mP29k417r zl52Ul`CR=yitO8tMYC|9KLyixec5m7t)u*USG4=4@J|DqRDESfSI*Fb$eVq2n0uB= zfxOJ$=H2%vAh!i`8$Y8APcV1}wfOmy&)0!Qi7&U}>vTK@(gW+zQ>qmqFgaF>nqZ{a3ZppA{(*RuowqR5BvaM5J11tviq zER98xoD>U`KnHo<{0h=1f&Iq%)Ru1C4vH!R{Aq`@gT^`#2cGhgV)kEz<5C3xPp+F9 zBJ3Rt_-(D2bL@+f=9iGDg=g>eAQb;lGs-7c%1rl#G^Zh)wCqqa26DGYjjSU+UdXzp z@o){ow&SdH5u=TC$1fRIfx7`O|HWqsE9YnS4a!o=^AGLbVkv^>?NJHUpK%n;*!PpoG*u~3`(yCU4si(5>Z=YwlI z$9tD&CeEFUBh9+B+OMFBnxo~cZ~~U-_XMN{Q1LXCiySJR^0c9cPR4lL9IJ^fZ`u!Vg__iGV zI5s8y02>`2W`|6u3%q>x^x3m%kRg98M}H{aGorSmZsDy`H2srU?FciqjO!7&u@d7B zUh*XU2Oxqv=JIKcnpmlZi1iyBfUf4-Ap-+Y3BI}00!KUKJ8gjzna%I_T@l%}4PRvhp!EUEj+_)hyM3KrT`(jMBy?vVVv;Mf8^zxR@AJN)Ibu~1$Jzir! z^KQ6M4`;p|Mo(y&S*=?p&e(IMJ!k7TlwS|c!bno{Smx>?M~CeVk}C*u%(o}^-g)Ir zeu!5JG3sFa9#LN%Mt?7*0&?I>yCgTz`Q?$ce*5Nctv22ckHi|sNiXug=(xxPnfr#S z(3jSE>{`>G0d0d$T16p|r;JoZciSE)yp@2L(^O zACHCsyJEp1RHoG=^!AlJTW#7NS_*jyTPH0%M?XO3aWSL8eJ#<=n-e&rQ_EBkDb3I7 zSTmn6Sv8|dMS%j86&_bhHjRGiyi$GrHQ2)|Qr}O=<^fPrWv1Z-}81B-jlb&`o$a zo&l+D$O^3B75lUXfBPZ2mA_xG&6GOxDWM^I?xjBW<(b&lQw*#VxbM)(Ck`~a`VqE< zq9}g%jbQ><|wTmAyInr!;tZGJf8rCsfC&fbxB>sW?rhhezpXJ zAXXjv=vf3<$!2R5lM3W=5Vf(jw!rVW>MjWb;1DDT+0v+ueD7O2a$zh81Bj!TJG~J73Fxy1uO1c{u@!66g({=YX8OFnOZ~s?qeNc-+T-nc)rZK*Mfb!TH!6!op{zv0F zflu2p_mcGt5J%4HAriUJb*&K&Q(WS0G#;o*oNrll`zO`8>yzeOhmi8hiQGy%Qq^`H zX%O0Xopc|g5uX(YMN|NcryNWDZ~vW_Z^FECx9LBCrANN>I3nVcPZz9d^wb4$h@LI=htB(@6AtJ)0yE3st zU!B^O{?v(dDN2zD^VKe3D4l*t(jW0Bh3Oz9;-)@IVG4QS(BZTq`JtU~^rh;EBiMP&XvBO){YCDv+#}l9O z%TL1-Kn|eh=n7&}Hg(KwcVhS1BASOtA$pyIzi^jvsyrh&!vx~x?WkAhs9w1KoToBGwHQ^R=1{Zj}MB0!Hr0d>=TeCY` ze=aL&q@(?P#F;)tO2nLX4sMV)lR;7hF7%>vw)_d?%7tQzM8KlUNaE8r zc( zk!Y|k*f)f|HMLTrM$b!^Qd@J!EN^wV{&BWmoHG3nYwA-oT?Fq=gMuKk^LdZvFMS!# zl9`>3qIB8#U?+QL|K2o#$N4hCd|FUO5_!Ox{up`UFx^CGrNV*I!)?V2JZ99F_WLVu zSqH-T!mVJ?U-|yNZ5}9N){EE5gbazluS<3l_5U&2^zpLV&tX#%2qHEBbZ^kJOXk-%xsg&9En>cgYn)F>Qxss?BL9&HdQ*brnnFw&`;Fe5-^q)V?mr} z>QQaJAtF*{#LrNyCjvPj1SZX#2s~;=8Fw41U#ku+8u~QA=GjjKl4Z?kwi!y|8Fx>DCx0Z483q@&|4@HJPjgdExZ5eGAlS(Fm_{DU1|AKgs293)_^r z8HCX^Huzez$*rjD_zG9ZW~4_PSSVd!txd}pTjPh@R!n4ER`&duSW{Bdv@`2Eb?nzU zhZ|v-KR^F)&FiL(U;Zl=LBM^i>G?Y=!EPTT!*u|ahfuNSeene+-@``|L>H+xW2{y8h3I{JVwW@3`gGDkGk(zv2`}#b6XLu6{BBz1afuq(0^7 zvUQ<~sgP0OgJi|`E39t_*xuQv4wBUjH~Rmlf0A&K%VSsXDO% zpoFe{4Kj#xT3BDpO7E5)hiG8x=bxKGrolW`x`cL8*&7zkmG%bhQ&m-_vu9k=LX#`% zmM2q7AJJ#rX)|0OuMe~g4;0W<Gzh(KC zBY?du00We@1#48uJc`R2c+B_8@Jh>G>MJI<8vvJO zH_xFnwY_X{#1Fv;4mR-Zg|K?fBNbymmvT$2$R@-ZOJJCwXZIx3v+6P1d0LmmP0DG}YB+k-!)j^Wa1ei@| zHHSt;#rw2$%IB;!kYA}0IP-TtGl>N0(s@3B2aB86hhvME)BWLRRi`IvZVe9DOqfp&oH=$COt0))5FIKXzJw%y>8$Z}b9%56yxyn!M^<^7TYW?zUf z#FqicXc{erfP$qmd(Hn@9naPie>i1>qFH#2CZ}Sjj-ihg349s9UJk{8gZ{FB2blJ~ znJ>Y3iOZZ3Ybmjc*P=jnL030hyOci}YH(ya~VBl}LI{f_siSN0EDkU7T(J4p;igMmfKfcFl#1K`o}JS}btJSPYe_ndcr>W^_piFr>XI~h4X0@KCxyBH7= zKCZH_(mqzh{B`Mi*SFp5x^tZTo{0=z_y$i-lYa_hn+4I2ZDB7 zveCt3JHL>Wm|zH*WJEh>sA5*)j5ZZTZ8$$DArG!ZK|A%I)j*BwKR`Is#o4QU4I;;k z(DB6~etCfWUAn_|v^53)^-akSQep4ey^KSCki|6o)nMj^M3eI7H^?}Fh^YyeAHKN| zrqdsF8372-2k+#ev~FWcA>F{}ju#05S7we0BrmU7C%d-*Y<2`LQrTVylAa6)ue6FTu2KsLu< zPe1chc}Wl|&p~v;S%;=Ws}wQ8V&IqE;HZr_DVPY_N+4+{zpo_~${?!M&di4qnB$mC z(0DePlO^Ie5D}obTR4hOIuG0@E4#f=^bb%ON7ol{1LKq)HvQaKhzwA7rJaZGxO{^@&oEP6EI<-5BU^1veno0Uq>GziTuFF)i<1-*lN2*=D0ep~aKmkS^iq!>E zC=@8jp+Ju{!D(xda0h#sM9z;zMg{Ipz@+nNYXrQ=T(0l(fM;Hg3aEQz*mZyqNf{cF z3C^1dekowinL43d)VPyOzXlL=OB-__f+k0V6yN{7ys6i@$+h_P5WFCGKwvKvpoH89 zT|*6=iL6FJ6(GAQ1@9n=_G@DHaeEsFkYS6G*2auONWrby+k_;V*#{YqvMKUDwdUcInE@zd7vywEyB1Etjgz0sC1kwmH=VNP> z2UAbAR5=Jv^I?Fg-nK|Btt}rO1R?7ho{TtW9mv{qK8ayaH-Qll(lacow3&%$DS6fvhl>Cv zG(w#fUx(1L*j9!f)d}3=b^prROOxujShsV7yM?1A3_N#TF<^nyq2*{`Vq~^oN^%xO z;s=V2#KvS;0T3Lj_vLaXfk^|7H+Kb*;$%(9yC*>$;3H}Cat-cYr|J!-?7RPhBQ0V= zc%e1InP$2&b-z@aC>#&cG(Xavw0pWF?#%Dw(sfyNtC}AQ<)-@#ML}i=CdF2zi1gqCE1RY1K;Dz!0O@cKRZtH0IrGAejc)|)GIRElVyk^!GD-Y2H<}o zk)Ed|0~)kmCR{w=LDFQu<2V8n;E@`^J9jNq+MDbD0wS54{u_uiL{$~KMAaa>#xzNUb-(=L<`m)Sfu;klKT zbYY+XwwiUWn>*^#2s3lQf3gF(^Kb^iM;b0_k;BV^?0f29wC=NmsU58cCHH^X^aw91v9ZE_{cS}pBGy>Ayxd}l)xiu$RR?M1NzqQtEhYwrz>ED%%SaYIS zA*G0Q=Yv%{MO@%<0`D5*gL*BaJ0NZ~(Qk5SpuumAN$ zoM2yV!bcVPT=a2mV~4#c;>#l^=Qk{EvQ3neV?V`3?T{jWx9O3%wY{{c?Mrp)^*WhQ z$YaSH_cZWLKlK+q$5Q?x)URuoEae&%bw(Ammf(+WtN3x`V19dEWQS8rSysrVAx#`D zhw5ZTGrY5Eh*P#yZlozpNik5=M1V3ZKFn*x#_<4tATMysh-gZK%JLM{j>K!Fz1TO| z&4OCri);WV&8`u3|LrIhwWts!YqA8+r~Tam4zhZ{3#1fk{IPOcs}TNy;$ZW*kUD)a zZj+dz0+9axi#lpvhz%FMiULQz5~>3G=S>yY<3Bk$!S({NBMlQqhfyEZSn+d8MF(Ve z8URPHzBDuU?e{!ngey&s{A!BT3;X+V|5Zz|u4`INvWmt}a?q<=>>p3;?_ooD-V66G zyrmJ?*vdXCm!IS;Q&GzZ>JvSC=b=V$GaoU$aD2;%#CD-(oL187y>U@NrZa8M*jcZC z<*OZ2LKDVJMk)Kkjbz&Zt4HEO_(Qlsl>4qYnGs>c<^&rLUMHIBZxz?8)6+_cIC;8O z6@dX<)K@pI17vLg@%l*_JI1^$b~c(5VL^NW$mYo{X8>EhHL<#mw2ic&;jw2$CXl@ef18H%t(wY7TPR+B5U`*f;ml@#yB}!!N6q zo$r9@ig(ZQOw596k)onZZvPNR(J6l6iyNOwJc&o~h)`WI#E3j>{NFu7#yXvNA1z*jr5IEr!IjceKh#p7$Vh)cHk#v0{8PIWS zllF4QxEC>T#}!WS;WK5dS@{#cE)X}GK$*7p>-r~BlhZbwKDjJlI|&^dv6Mh9PcA>@ zRdA%IQ#~{&_)hOBnS1G!|7@f^Ajtst6tw`Wur&V;_!zrnqJ*>syPpOz_i7RcZ3k_v zJ)2}M0Y^Y5q^c3SBm1m^z^kBE--$MS0Ha<)rt6Pv9G{8ov-jz>_Yl!n!aTvN+{})T ztKbUhfBd${@5fK=IVzb)7Gwt4D#+MLy$h8m4{-vVUp~5kT1C)9-fG!-{;VaOZ0!8M za*bu6YwRdjY)t7A_MI>k)ggXTFeg4hyj~8-$(xSwuVrHpt_x`Mf7MZU!sgPTd1^n? zpM#~wZFneS)Y%^ zKXpPpj@rN<&n-6kO_0H2%qu?(QOG4~|3Sm~@%KY}tY0EbJ+fgZqTutHha+O4>O`jD zXdA8x*eHKQ7emRWFa=wf}qcXV|OpdI`c7QiRlxsj~ta z$*1{BjF@N$Z@q7RA-$cbAeKM*wAyGIha((W=a@6U@RGiO^=tLlV9+zhM_u&M{I0N7 zYsDXY@^)4B#ky!(=S6Ar7j9AF`5$RaaMnJM^f$52?4Bl)={qb%6Kj0gGBxV4Ow52M zr4qFWxNzKv85ih`@7L?geB%0yf>Z^V5iBYy*!I7ZU`S*z!2RueeWFjpg?^}RRP2MD zJyyK1^XV~G3yQ3mN`Xsr>olqT`0PT|vn`P{pUACBciv|!S;?%?Pue@mmE50-`` ze@_2}D(>VuWVM~nWz;ZtuNXT;RHmBC=vsZB5EH)nX~6gm7pb(b^i!lZWr!{9=$f;&hYS-Hj}0s8g{A1q zC$k`jTKLcmgn=bapRJG>u&TQUj8<0LMh5XdS>BMhH`2wkip8yBmNjqiX;1vZbA65i z%uW%T9xQzYXIlIYZC!qHD;bsjeR_21$pmUkfqdTLm-E-h$*&exu_YE8iH#Pi{)%?57%NJfI#6QSli%?)fk&b@&h^?KtIAl=^jUmRaY|=Id5BAV3zp!Lw1q zE^vHbl(8SL=7n`b4TvxuD7B=A=uMoep%xUpcoJ+HON+O|d*q3@c!Q%7qTqw}z|%RZUpKH1SXIht+dHOrYYq~3Rp zae^8WGlToEt&>WmquA&J?mk5<>8ScQfn0;K^Ra6CmpZNbLyq2f3wn(Sd?eZs>X3ij zqLBN~f|DjnlO4gVuwoKqOB29(ThD3mVXSSju+Jz$GCy)CBr;yqIGJmN5cl^?=wED+ z#__+g#i>#z&aBp~)9~RpC+WcdAQlx;)wt{#CXNpcjB=T2FfmQcrgqo<-->f`>0StC6?wFqYxe*G zEV4yBKe{X&KASK?V#lglCHz=U8?1&g5Fy0-TK#OVop22z9L7-875j-Qo&0Z0M|n0l z=X9lN1Aec1onf2=80chri4wI=5_x_ncqFE7JME64CYoyjbp1||$Jro&aw#)?u^X>_ z#}WLuReVA+u%0vG<9Cwpe97b-1OEQQ6q}HQppW}}O2fBqq};8GdrX>*ljRw^NsPhC zti1ONqk50LztcsTR0vX4j*m4ncmEV%b54BeSj^gT>5Pf+QYjT*L-nQonoJb*A%d)5 zhPHA6_f+3N3UAcs1-{_?l3y~uq&5Zy%Z9xmQ8aG-n<$3*bi8eSV7tD^^ni`+Cx*k8 z+eXT3OqYH>OTgo_%zVOrq5Q|1{j2>AL!-=;>I7vp>~BR7E0NnjJ`qP2gDU*Yjf?AJ zE$(+z8c9?P?9kW_#O!;WSf%%Go%o&rJDX!0rlaAz^<21Q;S=Nbg!4FJYsPD9bd4WI zbKdlxrbo&J6??Wyuu8ORb{BIWyp&-PX372C z;0aw5GTovtfn2r3C)}IFs`v`-zfB^dfoyE+;DsDGHM-kHnY}J=5&4&;M={^lB#*?9 z55zI{hlMDU=RS!#XxGxopYYd&%9x($6o6elaq@>*9MV2mtI#xaz#rfG+r;GEgGAZp zrgz_8zy0Rs(iu!A#@Lr@bXyuYPpj*iEFw+tsid^$F%N5Ob@C(ovPLEtQ+F=aG>_9~ zjjF5DN=BIVT6X<>^pkzK*JZ32ID#em**V%mc5gl7^JwFrSiww3F9}h-54CWwDG|=f zfdIpTFjPTRq#&S4J7AV%PiMa)R31TEsi ziN0OuN5fw1X@hJteF=~v+Q_WQHE&?I*KIEGwl8(OhU9*AE8%XbPw5)Cy4FdB6?zDM z=;_P7*(u1$D_QUpsljRAY-xwwptfNOtJ`u>Q(vlnJcj>x1uZc3V(Of?t}-_{6FB3Z z7wDPmEn*;+ClnA{P|M2t`3AEA>LkULA|%GN#rFCH!4DzI+($d~AB70kSgC1oKiHQ& z0L$g3{4R2MiCY!$FNBzE+Z_NJMB(AfYmqW&fQPyaw7I?LQilWT#KyLyd7ZN=wrB*| z)?uw%beNmnE3ep;Gw;=;1fkjZQHAvh&w+zAUGA^2!|rtKBJ2x7haW~>GIvdTPBmQcsq6C$m4B&qZOJuF=^uvoSY{iY zooOYV7+aCr57keFni}?GNk{|>Tm`uQT=ULV&MP|B9mgWR*36fOhCTNX@Eid9i%6(< zvj~>GBJ1mBpm1(9SqWK4jEWq}l2nFh=nGQ3X?INTM>~**YyE~j&`egbB+OgB87cZI zlVR;Dzf;ew@8zSz${~Jl56{Z4S((UimJL2%svw_M=)Y{6z}J6PR*I>Brdqb{4J171 zS9Njie^RV(<0vS;&6kaAQe7L}+CLBej z;0-YhwlI+CG@cEMq5r-`om;~4u$ueLD#`f7J|4o(X#V)K41mlTh`NA1xv-O-G-eY& zvse|%pjOZ6^-7{sTU>eXDvp1xdZTLG{s3hu!Y*O4=R;v!a88o3k<9wSO8V)dwNRl8 zh){3}P#p1F=r(O2cz2ajlu9*`F#z4*fpqa#~l z)xBP+08{V~cW39qLFY#kP1Z>Kn*2c2zFIN~1y| zR^%4>`>EFcD`nlm>{P2AsgMx||4;sGgR)_U8a*{j-psDFA9Yoz=-0DTE9ADO8lzF{ z+_BWZ=@(Ejsu6@nCM`idW+_BpA&d}e?zk4F&d%RN(T>klxg76^Vl>Aom}M(wlK0+( z1xwW;z}G3W&bD;^@rK%MlGF!~I|;#gCQ85Xjt#(H#k@_lY8bcHIW~H(kYu7PT6eaT?Ea|8g7B9B9^fDJ zH#XdO-5v-tF8h#4um1pB50B#%d~OJ3+L6wx$}B@%x%yerD?yzFe6pcQ_Ur1p`Sjcc zLh&@nE3>f|+h;6}tE>OhHz{&#B?A&nR3)(eeoqa9#Moqnj&=D)@zB|ism2M%_=feb z7ZL)`aus>r z;R(Qu87ves*W|CFEYDVU(05bip%MSX04mAsQSmx6rd!#(b{sRR<+*KwZS>LR@U{G5 z_Mw3iO{`xP(q({F`I_w=@EPVK{(fPwp|bq7mpag_sjwKx-|HpyD=})Q+i^O zais%xg5ez!BKLN#BDrUV@KB?~HEm|Is+uq=&+*+uBYh6B*=aVP+W6wWB8DsUh&|00 zZ^kRx*q+ohkfit#LOzQ;s8no+nU-#olI>Uv8m1++R(*`?L`aP3qY8DpQZ8^Uu)*EM zE6RunHSq~Vg=`viuf~$rlpVh(mU1fpeh0XFV)G`>c%rs0yx5gdQ1zjh$W>OWZ4fv_ z)P;RIk$cFtyqDKj6^b(3G*`B;LRnPjX^i71H|J2|xk_P@9G;v)A0S*$?I?4;z=`wX z);4lXSgTmEaM09ZQ@^}oLxbT$;tw!vBlRP#FE5>+Ghk}1rg_mI@%ekx2H=8bd~p0y zvLIt)s=(g}s|bHbdVGJTHU=q0;2&T3;@%fZ{qcpm*KtBP({+CHCAI;~Rg>)88y)Y> zY8{z)LL(m#M{NX+2F*2QYWYct?0hRebS4kRf9x{EeAYRl5zM78qo%kh&*{xGxkx4P zYs9y5LCMnzGlk{gVRO<6NlT0``G(tw&1fVA@AykY^hIr2jw~Ss%)@=Mn)tYGhO8XzfKvK~2Q3pDkt$Cvy z$#|nuez-$A67+4eYuw3XC=8v0euJLShO~2gN{2`=c_Q@zG3{F#$8YzV@G!>J5Nn|l ztF~G{nX}22E#)4imy4bdxP&V?BThUIpR%+@M#7Ome|^7N>BS5*w992llkE2A#vZ0f+8 zowZR*9oIDVXI7>Dc7zaZ+NgvD6M!qmmP?Pb2h&Gsc9ERUqEQQ7kvHHM4K;RF`&q)f zXd{c$?%i*%SLn&nr8&Q2*>vA z8pW@MM?I43amouhC0rvtsX9iV=XIi`@PE*M%e;K5qg4~_nw@%ZE~bD#8C{O8((~Ip zUO0BtgWPhd^{0z+G7a`g?j6vlPx#A-LDtQ41PT0~$nJ1dm?7^zM+iyu9RPe)frn~v z^yHmig!L*qN6B|xxmOy(EWq(yw&Qm6)=Fv{;%-w!S~-UK8ZJJ284tHoo{c1R`sn;n zwXIw|g#4NobZ=e|OUT6+hA>YAO`|rJw&BTYQOxtRhkLH=YNcbynFs63U}Ttm*A|Wu z!I`S`qLgP7mM{2F8u7+Sdh=sw@ATu(*f>(R>!-zbbq{z>Skgb%dh&wgphfO-ksdln zdpuHLIcqRv!d4h&Urq6&Czp?lIZHxBh8c3mpkn5>H*o>>5hoqGO9J;eoo-6%@fU&85-m2-1s)mX09UqNxQuYtH~pFS(@~a9OR(3p9St80 z61;IG?J*Z<9?CE9xvfWeoK=wAxVnlgoCkX?&2Z&$Ac(8-+jqw(Thks3?o#5Dk{`H@ z<;YMP(YP5zJ=z1S-Jh*Iyvg9Ri$&Uc{TUA0yvJ*=`c>VKO%Arkk};8O1YDUZC>LA! zcYH-sKBj2T*(HR3j9gkM6>9#3;*Oq|yIaG3&o|I+4cU^HBepK_dP;>{5j`Qyn;dh*Pj|;d5=Yux_?q4H_RHqz zYL~ZDK2?TL4LK1z+UJuW7!58{e*AAx@O6h3yE|IhoMY`9D?=g^EhA+Ox za2lo8-zc;}DEKvX^Ff`LoeK{CK-Y+bVQkRNE83&UjH?>>S?X)VxINANRTg&_?_SWnR(-E1&`@XBQWrTpJ>chWn>^swQgxUc*MqJLNbv;}aUExpCB>9Fb@AsyA&h;?-5CUfF==!(IyQ5*Wwk0pQQWFCfqtt8v} zr}iGNL((kCMq@J0s6yGp8mC$AfZb{+haSfh8$tWttT!#Q38NV%)lWs*lW{_FxL}b( zz{p5;zxEf=CI&l`3`7|;iW2f<8U|WV0((($)0#S|H=bPC{Xt>GdxklXmMjy>i*GzrxeV1#~4yoH3KVI91GJb z^ip^W=i4tER4ve$Gp(>u3+{MN`+4$>C25Qgpo z14E=Tc*>MP8;FBD#1}5iV|#A{=auR=?`kA1=j>Pz6Pyi(#CXElXv=#O?dH5oI5zjR^6^yp+_J{Yb>N{C zB9l3PhE`)eer#RvkB6=??TdXU6{!lXKc>vXu52r-AESSEL_$}V*Xc!oWRN56>Ix=F zlJ%j9%3hBDT3A2Yepy7jK2>GS^V*R_-)#C!$@dE{zN}?lkKU5Spb(o815(ABJneCI zp$$BC&5(nkA?BwTw{hlnogyyvY9EC_*NC$1upjfv;G)mk>xLI72F!dI9ThkMS(AH4 zm_J^R3oD;DXQ-mqg`me%^tsK5+=l)K2n?!itDYKHlbavW_NEJe%F;NjS!pB-RkqEW z5H_ahjoh{5h|X6hRJ4Z{ijkh{vx*MNz~|a0ieke!aql@`zfk}1Rnd`F z=JzQh=`U4@LrP_PF`G$Il*(vi{$3ScN=g;(pT?yU%Ehe4ZNA40Ptn&X;hQZj8};66 zK$d?r-~fefr6bz*4=!{usm!M zewe^dBjSy;Ng?pbv*AklZRwo6J(+>Lha7LpPM2hqql^6JXIbSO(u<4?t&bjEF6Ee| zWVdFT_1E+XC+}C)zm=0ciGwH?feg^=bVTpm5odA7QasX7k}&MFgzbYqy7zvJvJiG* z>pShKHDf&e#rY)+eXi?N^rqzVmC(NspwA?f$jD|(uZ9M)PWq22*y1Uup@n;O&+K1vJsIBh%JT0?EaQsq+41#>e7l26=l_~V{FX9!!2AwCY|^qG#fNog3HT{c`a z9CRjULuO~Lzvu=!bA`?R!haJ*3}vjj_x7*8qFeLLHqmZ{esk>7rHkzJPZFGv?Y><@ z*?aoeteEr{{ge3V6Y#N-^}$-Je64;8dSZ#ypxaBLINNc*2fPJ&AB^{D=P_45Y7fhd z&XAlkIm1Y|WmwJ&P@LX^y+3yrq?|XdqI@Ua)f%gAnfjh1K`3sm!)xE5&iCT@_}R=? z3u!Vw^WRx8quu8p`ZFZ0mn~+L=zJ?c;R?1Q24hP z)`GQO^4H1qxAh?Ti;?V*>T~L@?3gM{e4ZtaouyQWEjOEV)7ZCaI3K-(&`IIH!zcfU zX0r;MNb|;KVwQ~|lY-aQn^b>08WexR_R{y_kE%>h0Th3ivCg+5rPCGr-BI`A?*l0Q zC?T5@<`eZZV^wSEZ6&_a$+ANTVi#S)aM|phX(j=IN|W}n$SRFMvY<}5*5<_&yQ#-0 ze3(f$=%x6874>LkA5tMSb>BML zEf@V}OY*pe-Xrq4)EC5=6QrzlNObEK;Z_#yadhn!Jk@?xzTQy2nKxaU5r4>}%V`Fg z+;2H|$g!*3MzC((OXgq_e?+S>vvq`**YCipOCy;*>R$Q%y#24ruj^HuzOr=t2e#`7 zwW{Xl+m!2*?_>N!ol{eGi-~zKE)feilgb=6o_+wLyQ@_5VpK#*W%b6-W;hhnv(3c>*N1V`O~XPAhi! zOEnz*+gGpU7+rdP@wgvNIJu>tA3kR7oo_6g5u)}~ z?{+Pwz3IUW_S%Y4`?g!6jJfXiNz2DBTCvohU4ANetZr~R6^Z3|%UnWk-QfZChB-I} zN82O|cblTB8rcZz_bLqIr=Io?dvXM54uXb#C+8h8ts6(;UoUw#Gf^dbU53hdg> zKzpp)98gz4w>VT@Dm>0ryZh7tKYNvS1!R5mkqqI{g7y-IcR)8qZIX50f?h^G?TDU~GuC^q;SAD;)e?5#UM>*VdW3v_L$8#HN(lGAq%s2)=4DB?Ml2e0lepuOkzX%c+2rD`U$*j1}HEO$ui4HGs|LzcC+FA%%}| zmd0s6-C5FgT=Z=L0&&zjWI_u~(zAp4kL08GWK(o<&1(NxO3t?5gmhMTz?|6Nik*Z& zZ*1L%82ITE68^A{M^3U;pt#gY#%W)j`1jkt9iQU|5f~veH9FYge)N>kP=(kinR=nm z!E1*RvF1yKyyO1}q4ghwGbz9wkHW6r&TX4?j`;jM!1 zjI(*_HT1k!RCd#!75^~4^l2AMq>goQs;7elQ)6|lJ8GjVAzj5YYzvfihoW>x$rZK2 z{c@2jILRYW@yRv8m=C@M3uLx4-=6*Q#r1&P$kdM4apUTwJc zIrdHE5wz8b%8p-6lngXxrQQ~-Vaz9`U#lkeH#CU4B~VM6nO$fSYLi8-Ppn5a<#<*1 zF84I(3A+r4l>f5=k+WX{pqvtShwx&tyCZ3r3_-0HKI2~J;C=9WN7eJulS1mF`9MFPA zVOzUS7S&!w@+zdCX43$6pc#E@jaEb2@~)m+!lf$9`A;1SwPfKp$S!FVXa~MO->(_E z$O$VJ{!-Z_W4SMHl)(O>v|Ap(zKluL9@p|nelLe zvTZl$581;}?iE}9%CHh!Ifj9ggvb&Fw|F%y`>OP^%0;D!vwjWm(c3=`&6}Qf_$j=D`BoH%O6k;qwRmv^-FWbaqzcVA>^|0fH5= zadl%XuNWnY-g)N3ETmcaeQ>@RJYs5?H|Y%Mu=fWurqCj3R!GpkMVBmQ#Hz~T-ELo{epavgdu^+DQlpUH0TH|6n4EUN`X*(jt5zEI z1vuU!y7S2c25WG#8?PK}eFgyYR;3P*$P9c&P8!FB-@h*GJ3ft>tGg8fn^2jEoeHFk z3fc6#mBu7S=m6Y$p_$L0=@ZlXf;aVsAHBdd@%20$SPOjO)x}Tlv8yyo#%P%9JWj|T z!%K5YT*y=1{(zQ!H(LQ1Nas!vf9S%LRg$Z_^ZA`^f6u!Zt}f&|!0nu;JtdSl49IZu zxB2c=f#atN8NQnC7P@vB+h(B$8@ntsf#6igRAa4Uzk7f8YG7@sV8pIEN!;4*GlY>4 zZS+BewY7eJuz=#m91tYq95rwKzGr$OLh7w!4);v2zlr=enn?JeO7j#C#_r)hbA(EE zoD}{r7KMM%*ZYEit=IS$vi+R(<1 zsj~8JyfK5m0g*w>ESrk*2X&Kd`yr^f-*@AeF>NRCS4Aty%04fBDa+@K1AQL#>Ao@* zdSyYxz594 zD*Ju5e@mbkGo?j`;$bN=Y1B^N8VE(x^Y_sv)M@khH&GUN^5SAr4Zd zie55q#P!6MT1b4ymL@#D1q&)mI zK1`3CKr(8IqQn>P*pyGw126=m*69Q3Qe?Q(5ECojEJDd3y|S$4t$^phr3h$E|WD-MM5VWF^F3GgnwdB0rs(<5EBq4cmL$; z(b%W46iAS(E>53~_u0;E>F*5Jk&_$@d1`cQ$YAzxyo|irs0aTYKp&kK zcFGTqRY;BwNaoST$r{;6-z&T}4-Fh3zvb_get94D}WEKmP7!d$ozI8Jj~bKCrzBCfg7D`4|A;e&qo<&T~^YQ+9hdRw*tCX-+;+-SJ5C@Ue1p@(Qr>%kc1k|K*jI z;^O4t;N{|D6?n#XKZl{^-OSwz@K8ZUUIsuw0024gA8@w{h)a7}zXkv$C4dQx84W;q zYYiaZlOJ#;1y=w-%t8F;6$JAfq<_Ld|NWbH`vAYHm7|-ZtCgb@H5c16fL~Hx3Hd$- zaQPFa`xAUPK@l|@3sfU-2R=yKgGmP5tsyeXOGz24s;kJzzmWd3QHaE*PEL0BU$%E} zb5WO-q`sdP5AqY-n)~em7gIA=Cka*67x&?R{@#cGy_`?|_1SUO`?&UQ^J4fzWxMTb z+f;n!h}r-Bx&LG6KPld`zbPII_(|2jDV`q~{)ghl-G~3Cc=`9?SAP+`#y_9YRF?t| z;&(7iZDwU^4uuq01FuA1fGE_ z=lyX8zY8wslYbvSmVdt%vJ3zs9pE&DE&qN^p#uP_Kn{xD^zYXg!D%&dF#w>y$;s5k z^iOn1;BUm&U<&(P1OQks0RVp#{EoisYc9?|?;~JJ0szX=-Q5X206-D};5P2=?)u~1 z-EAHKAT0nur^8(*fQtt7BbXy0JOL1K5s+{Z?wJ~Rq)`51YJVaEfBhGx_7FfsKte)9 zMnXZkPml=y;QIhFE(+db4hd9zHB-nFX9CWk_*^s^$*L|w^&dyHT(4Y$AD}-XA|@fF zqi0}z%EZmX%f~MuDD_-gMpjNH2K&eg$j zfb6FN1a`Afj>BRrVT!$1j)AR8EBlgjOY^#8?yMRkziHo_r#&CVw`WQqmLox9mrt28I@hTwg2>Z@SGJEy`n-2J9muEMsKA$F6x|u{-v zf!vB)e7+^B1#J>WSJ!s-O80qiisdm?Up=gGF3%6ecpgvf_=Rt%3WlWN5w{+bykbpHHEr+Wm` z`n=h=*PiISV06D#(dKAD+V@ce`SjC5S&lfgo7yDhPoF%4u~$K(GO4%zRg}1P`ti}% zr99h51ug|^wchcT@v`bwiEV*>s?0te-(1(GLe`D(Zwz65&esZF>)ioZw*{uK5@u^p zzGmn<_JGH|$BVMJ8O?~>dvm_l!*9x%p76)UIhj_T@xTJw8C4K|MXX1t`&;SiV@b-5 z_QtwzvS)mIn~bW|gWVko>=yqY3Jf?cbfp4%L5&zvHl^GCvZ7 zqq9LT1v}E8c?}dS$DjaEMd&eM!wn%><;)Fk>@~8%9q`+o>X@+dhL9XOSQK<}2bkPP zQu#C1(^-K`#0u$3{|*Q*?^r3`D##ATx&3kF;h)%f2f!Gc?tqOQxK-CW zeQE$d0m3bqUwgEdk2D=R%#LKl{@RNPXpeTsktT@8*s>rax50mHmdN5)MmwHq z0?y(doR^gnfF^c<_9sTMHD+TN6R-2R3b@ZJAHiX~5=$@hAW`XtuoYFvX#+T%yFRo_ zt@$>#ix})gCp`ipC9t)Ms|Ox*wZ5BXFDkL~zhZ)0U3e1Y)=UM86?DKN=i%rvJ}+U^ zcfi7v`$)mA)8XaNmEzgHSz#P-BmUa|C`wLia2R%Rs@7c;uW`{`KnwGi{bSmw@KL$|G9rv$LT9S7_H`=q3>q z=-89}(Q%s0eEK>No$({+w|@Lp*}!nl!73&=FrZVcda1$esOY@&*yDYn;m%a{9F1~o zRpW%V&|~(*^TFDNhDJqwWA43c>5KC0zXTO9|6?UX>A>KUB$sLD#^etP%<4m>aVr zKIsX2_gGv%u|yTH$B@w@vsYJ{9dmmP?;pza>WaG030#%S4uCzyR4*C zNH*+wD3}eVQj$BsqB5!jx9$+hk86TE)3Ha7H734%$J)`cDP{a+r2Oe%9!C^y@EW&g zbn;A~K(7om$zB$YW_Yc&>?LamBg(lE`@wCDO`*R|yX{I@8#b;H`&F9;j{7vfUY%CT zQt<1eS%S3IDhBG4Vc0kvZ4lf!{HYt38W?leQ~=2+YUa_}xbBfwR)g8&sV7HEA3n|Q z@WSnyYMdz+C-@>Lh99Ro4=fYs;Or}JeK8_zoi;s|{`uaTt;nw6af_Fk^}5CcZGmU8 zXyCMjG1%dwizfi(M(;Mg!*t{=6ba71L!=iQ(JEHNO{C--(428psze(H!s@T+Bj;8x zA7j%Xp@9uNvkM-8Ku!9;t_M>vOU;Hn3u*ijNw*wLNWt zKR5=nN6KwAn8@>+=b>LBMyCe5d8;pV+UX|W>2Jq9yY8!R*%ndRniLSQnA0+I-oAE;#c!>=viq87A)nPzmVGJ zxdl{hjV??C!XzB^VRgdWrSpx1HzPR9X1^+5ynUIx1Mlx?`vscBlc_Qw_sfY;noqQ>v+p_k}B|4 zxglWO0EI3MS{lyfzZM8*sQsYXv~Rqt1fF-T+nCzynLPM|@uTb2ky&jlU;V1s9>LSJ z?tx+(s~W+k-RrdPPB;=2KtSvbC(P{VdSxWgtunL&z1G(dse6v1SxtL8)zNQ-dTR1<#9;2_K-8Tc0WEG z4?|9e)sc}H%PsOjL*gN?aA7}_&duCLj4quOgLcewx|hTczkPF(fW%3x#7x4M>fm}* zN6tihLHY3hjw7&vg$_HIg?u|rY3 z#a|7cq|@P>6T)pW<65#`>uPz0puPkVgYsFNE;?2?my=*_wmG(pxDhD#V&4nkY&+^r z;U03)sU^*-Tpd)r{;Ka-LtHmR!C>Xi?EI4_vtY7y>f-UntEkZ+<2r6t$O--m`@Hda z%~3;4UCXVJl5O{uOsQ}FSNC$SZ^P%8eq3h0*5+eF?xyy1ZMp6l-k4^(2~Q7RE`-PR z=~2~C-4GC+udCqR(8DZ>btTp)#ClSn*k-C|)V2isJ5VV0%+-}I)Jv{8*3Qco&U8C+ zt8r`CMUN_P?%lti7mgzB z!^)|QKJ~TtU^e7@Nr_)Fv#ZD(;ur9PI;z{*-YQt<_fP!q(X?{yU>BJV8}@m~dHZc< z0RN5lxF7SU`5hk(*t_mW^%&5?wfXL5>GN1u;ptwTrr@XRjUDy+N+YyZi2?%)YB-1G zV0!-Q_Yy1~-olsR+~DO)CC;3L{R+dot~lHJIe7L*8zO1Om4E$0v5I~n@G|nf{in+XRAq!-B|9~7 z6oUW0Qf+#JxtbhoHkt>p`*Vxka_Ua65e6)R#Zn(@EQ}@f>)~|}^u8#0$TjRs3dB*a zMZWWNR%}Fw1OjBZ6x9ybBmcZ{x*S4DQFr{d7o4|M;o?@Nw8DDI1==Js;oR_R5mljak0e@bs-itiQcq$DzJLb?l_=ZCN{& z*SIV!euWihjp1!R-uBg5XnmgDjtePSaf)ugNS8JN!t{^73Issss14kWXqb-jyRTx1 ztxl0B3sZB8Aq}qNjjLqpa@h)UVLB6`Sfq|Me+Oys?vm)C`Z}d_46S~ z_A;xl!l|U{Cu2uPA;)_-&eGB?jRgzf=rv+ABn8D6$6radM4|yXaIjk+Qm}O-VM6dH zYgfW(iEciwui*SEYfo=c-QYpsYL5SrwPhLq%GwkEuVp$j6vn>?mgx+~Ue{K}SLRif z-B-8mVD%h}Ew)k(C2CJ6g1=|6J-QIys>*r5wG>c!-sYpWyxdM$2bh zPhV++r?hJMNL7UFJo`4n9{Sqxkx|KU!xzSJWvhJ0C4*nzRc0IrDCNSTdF`Ux&lwKE zOGEDc9#>;^crUb`s24h;a~?X{0oKbwI?J!8+^l&Ak>6j7oaMjqK2JsGXnv|`a-k7s zz!@S=wKpOiU=Ho#6`7V3Y}iK;E`pw3c9h^?%JbL>yK7*=1L-q`==$y4}0_IvSc5NU!?<< zv8|q~RG~{~lkGIHD>y6r^U-qp_V$*{;6AKYG$Ywpz9^90QJqWhhEA6P)Gk`*{NlFm z#{F8^h$U{uRwqj)(ed$R&e%SIlqJcj8R=W++jE;+FcAe`Lc4sxCak^H0o2@hmh!+# zlxS`kyQ=i&hiDs%QgRFrzeTkSLZBUqE~2<90$j`WK3gP;{3^b{9b$7`3ZwupfX2Ck znif~n>8nn0ca1Vs6OF%FCta`SQd->D@=XN%|t4+*Yje}wd}ia^%tuhcU6 z|D0N=VEor`4d!F$cJ7}{i&JCVuzci#09OAxYZv3-^~e3)1ImN`e(+M|>|@@rK*6#& z&WMk}#Pg%4p62GNNWXo4sBMvt9a7dJ%y<#?!3AK(Yb|1we!NrcZdkS|&UAI>*g1PiH`G%NZTeQHs%ba%?DnHuV|HkoTFy7%Cs za;u@c{n4>Xlp~SKQP_LWQUi|8m~Ri*$P-8Hqjjez+aH04-y>jIlt0kW>wmHLCh$;p z{r~tNdk7)x5JIwVSu&Q8BvO%e5?MoqjEosklrsq zgN0RJQZ>hw4(OX0i}*NaP07)Me2H_f7T>Uk4S_ht(&d!hsTQ?bAy|5fpNQKNokQKU zBfg(jL{WY4D&{s0EdIG32aW?GCtj0VIOSL_it?5`(KQ^Q7j*T zNd6Oe-q80JFjorSg4}UH0_ms_Fx`UmpDKU9`NIsoHT-9re?Nw#%|JV;^q1yWgV(|E zm$^an!@+XzxAqtH-){e?$rHyW4f^Iym&d?V^`*X4ioGq%sfJfAr%&)!0#}b_%K;e<0 zaV%1^F=zc`EJw8pt0shuk}S*bOE(fP(*#c%iSSL@!(WDai1Vyll96}l21LU#l00v5gL6k*`R zVMi-9qtvgQa(dPqwa>Q#;zI7=34gSAuHN44`IuRbd*HI~#zHJ)4HkoCY~n)h8%RAu za-4ORY)h+%DCWO^p79Yx_uTFiF89GGa$3~PVn8F#wqmFhSAVUAL*!LizF1B%J%r`e_{>YjH*&i%oY)yyP zI$_Ub5~lW9_shiSS+^uNT#vF&Fg!WRnkPBbfxdXknwj*B_8tS&;}RWNl5*q)$&zeJ z0`tz+t4kFJBZ;a{R2h;~SxwzC)W+sK#JaXziz*e;Mp2(JJ&N1 z{4HstCh_yf>cP{dqaV9i#ak^VIPOBAG51$2GU7shkzue9xV^4E)xMK*>fc#p^LAhdR8V2=aK zlOytLVKf%dS7j@xz-+Z>@*cfybBz9Vx%w->_0IxmTM#)QINH0nAf2C@l*84Pwr8HR zomN+=ppK)GGog5UOgz+*eG78ziU!Fq1A@@MyKV5sd)Kxt34^65-r^S0+f#^I*`cpE zROasvMZ}%p$oC1qV{`<#=NS3^Xt}u;?Eu632T%kUWC-+OYYnks+Wsx2&d%m}(YpTxA?=>AQ?v&kYF%D3h=ioOTMD(Tb@D41q z5C6%>?kNiXUAf;*DxcSHrSD&KBfL5yA>gLTzD`s}cIwA2k8X@UJ%tNx$7l zBKZc4Sk^hYOH9h1n>A(N_|1R4MY<1t_qgZE}4B{AFw~Rvt z1pUDWvdMHNFKb`)`JIVu&-5a>b^vGU{3{bO%~wNOvEwVaq_%Jo#>#gT`x1MD;&&z- zAD%{pF*cX415Fos0|;~r_Ln6agDm`dv~nzY8ix3q6x00*Y_|=D7kC7<^lS?aiT&mEDfz@nA@uDl; z!Fl;hKKYs@S)nlW7EHn>fIc2P!b0oy+k)tgtU_na#{AHqGib(?{Qzq6nwn6jgN~JA zK#Eb)^Ks96sjGbZ@Unft^pvzXj9u$ROVD#p?*H+V#dKu%PfLe`5svwRNCoXA2|(e1NT-yQ)tz1c4s=5P_Y zK_lt{BbNLYB5wN!C&xH4wpErh@m^A>imQL%^RjW6@Y(GgWB3VKzz5Zs{Q~ zX2_7@$VVu2__FjT2#a>(qhyYb&I(Uod1|^vb#;1>YF%)9#X&)lYsoF;J!sMmuvXf0 zl>|Ku2GS_d>*ns#^4G1Izh6MND}mt2&^p4R%q(}_0dc16&| z&-DG_h42M1JfPFMC`Q0E${4=|$^E^UxG%um=>nUnh$+AJdja7KR)2UN%o+dk{M?;v z0z2lvmN3z7^Ueta=*J51U5)7f@>%sToB&wNyMCSd{d>UEsqg>rH2M#fAVV=)`Rfvb ziY(aumx}zucPS?QnOZ-Bo*6-t4*0`9{HXM2KbmMe=MF6XctDgsjQS9LcI7&p9<|@m z%oAtDimyq2XZ-742;%p6X8~ z-cKgpPbS_^CZ055n`HQdWBAWe3?u&~7ymz*cz@LRKbd$xnRq{$c>h0O;{CDH|1;OW zEdhVBrhc-fezK;1vZnqKYwGVYss6h(^wys=f&4Sd>W|O-B#itdjQp<=M*bd&n1i9dD8n)?^F93XG0%9|FZmD z+RQy6zQHzM*BE8IcYMM`Dv_>YDZ5`i$WF@A?V?66ZyWd^=yE?knh_lTt(+v6i2m@B z1WaAbA!y3wRGJvzJu<3@qVy5Jr2*u&9`GO}fZdZ3z(JAswjl431T2m7M_y3;H?Y$o zEpQ9MzS0GZgU_b0TM!Yz^;>>~d1XNptKQD8kCKiN1xDdzkS8z;whe>pLdo*3zTXqA znKwZYm=AjaT>_0i3Efo60w&g$A9>~dQYO{6=-F`63X~`e43=|u=x8qo?Y^^uyf8C` z+FOv6jX4ZObZQGicwG%J1xCQph6X z{;R$1Px2D}x|&6zyk@Z-gqRK`k$ETw3VO!k>DT&ilbZbe z7diT^F6mxR*d12!^&O`rIxmF63!`yVQBzG=;Zdn{g%D{Q^RuH8jT%bjZ6fD8PSy#_yjcyf3|1`$H&gQ_u{yAzFxXG zyi4WN@@vyK2|L_rM41DxFsc>S1`wJ-?TI=!4WqYP5G$^Rz37XJo!Zs-r zp$7{qnUIga79wxv%{w==<83oh+-}pw?mHl7w>*bcUAcxApt0S^L%4OQmA*`Gx7@$d zc1!a1Zs!*kan_(~9g>Ra5nGn0>OF>(g>_8xy=x4db+&48k*&3OsbyE4@p_q+$@Yk+ zvB=HlZ#zp2%c2`saGyK4Sdcs+@M_d3_t9~8|0d@fD0KwAa+hz8T+wlrtJCUcEnGlf2~mtI zU06(SAAh<0w1+wnqT8F)c16Ay1l-6-*I=>~P5}LinZi9lJK;+RbaA zE}&Zg^7N4pQ@I5JOtBx$nEH;+1Q@=z{v=5LH!Wt3nOR8%4KuRcFyp^;r?ROm4)miC ziYVz4K(W8$nW=Y#t*9Ccr)j}qYaI~F`t)$)`c#xGgYV&y;3N)T+bl`^x4{}MBClDX z-zA3KX@ZZ=&hJ(JXb({7FD7WzHr`rJ(lMSfOQbLtB4vPvxj9vPT@UWc1N36+@!NX! zF8RC9T3nQ{4YThfP@?x~pY48_M^Xgb-eMfO6SK#W0t2DBdo4N~md4sOg)B>I(>Yc? zww8>3lY;1GP>VZ<}=imEm=+jl;?>06A#mmZ6?CU4XOnzrH@TO?&wh)B2 ziS!0q4gjx_rx0ANrE?r^b454&6(R-BlqfNN>ml4O_PZdJd1;Z2jkX^)QwzgiJPtq+ z#vSjasrNA+y1IRwl>PQ;mPQMBmX&W^w`xtL63kZ>PDi+MOHV~8#WM=lCsBk#z%Pzg zJ||n=9f|b+*tL0QsK``*EX)8?iex1lqgwznJ@YiXQuCpH!(88WhqNaJ6LO(S$U{;( zZ;nP4tJ>?O?oa%f=M<8|)m^%tS=sUkC%3qK{&_<}ZU(=clz)Nqo{OjNL>P*;e9FaU z$)jJOR(Z;>{ScNz(8|3VOlxJz#R* zmtQaZ*US$eIHoiqs3zJ2mwkSb{5F(9r<5GD99eL z*|IP8q7TFHCP^gmuJ$C7oaH)NxJlK?8DTNAl+Y4CeK#t#f&n{ z*0yEuK#Jp;67aRNWtmJ#ISABSc@F*8Q${9?jcnZ92I~wdp)yh_mJkGXv6d`Mdy`!e z(Dnr-NP9EQInXFqX!=yz_sq2yFU~{1D*8LB9sm5qTQsSVF^mSEgM=ADs?YFT~_r9YBb9OYw9kRJlqd%AX3De6?15r>RJ4n<&1a}0bXnvpb7mnwpbq0TD1h0AlUWFw2 z(%87jY^3xS!nh=&ul35BZ<9bh!^W*JKS7ULW+DRJA1fXao7@QvA)k+NT9 z0mR3`X>f?i{X( zK9bgFVlPRM%!zI>Sx<_^R9Rw)cG1Cpd&$PBr`$agTOBKfP4sdNUB%9=PDSV%FY7VD z=fm<*SopK1mxXP`^Dsdqw?=hqOhPB7Xy=m;B7!Z@o8^QX2f_yTO;6q0ry+v@O#E>R zns5>kFiIY^%_ISQ7fU;m_ctl!gbQSi#~|*!_jfk^9$x^p5753tXAPiBK%H@PkpUOw zy(`{~wp%D*3nF#m@1)t0sxy6E2ujpdIg`?IFYt2!|MsKtljsU7DnE1wde*QIY^gLs zaBXMF=n0xSgw~5;`(|}>`XH%dHo-)}?}Jy+lFZ4PL8Sm`wks`+FRM~?)w1vK(8qtg zDJV&~+SKa<`VXka2!Dyx{Kl0?;--K%<;Q^<@+He!*@l z6vrQYaOl6gIHdqiY2+rl|U1DYuf=7O(iau^(cAL#&~QQWrxLPdgAm4D+0BHsggRZD3${*3V!L^VAI z3_fUYnmUHS9zfw2D5C6j0pi|5EkHl62KDa@`z4)rLx~GWD6RiCx}_9@KLg*~-3SKZ zDDu%Q$cd|fR#neM5dylE@j7fx4uJ-$4|{XA*$7Jh{2GZ-*%@&#TiMx81YRySA@XyHxY zC7GjlU6IvkE#5O@6BYX_o&0|h_}t%SYaZ&Iz9K0E-(@iJO@7Kjp8acPObWZDN$Fxq={fIY ziTJ9Lw{)!45ZQJ`m_Cx5Y>G6bHqj)JM)lMu8rX3qr%F=jZZQG8MpEk`br*YYc~{|y z7R{2vAg~ww621A}$2@O7*{T7TX^@rNWcioOotkT;*BQ)qU$|k2zkbf-T5M1Zwm zV$lL9QRjIv-l4K@Nm1E+Iq4^p+8q)SABF^pR}Q9lrH$Jm=Ddy#pLp%;%y%=jCx>et z@d(r1&;l33cv(&K%xWdVERxW}`z%XnMhgZ^S=+>FS|3m1d>( zqlQnd+~R^mg{Do?KUtQLYi7hCu>N|C4#pi#+Apx!LH0$x18i-Whf*YV9L%ayk%AaM zt#)`oB|i}t{32_Rnq{RDANR?zyZiJUvj~BLHYckf9l+bc>Y)YM1Vy*#A=wFST{Ia| zDA^F9PaJVhwHwN^G%+pqd@^pXcR6COV{zzG+NbfE!%_)NHfKCT`rvp|_$an@8Z56O z<7i`9YGD7&Ej@8wU#GGWd)F-r-7w9 ze+m^u+b-=XrLhs<*XA1M>T=;lk12J+s#aRzz>yWSGbwJ|qgJ$sa=fB;FerfGjKRpI!)|F7w>aAU zCy1SNNgJg?PxtQBssdkpRG%9WMN9xj_sB5XAXvq(@fV>NrMlE_3Yf&~CG5i|)l17si6 zx;F-DPjRiOC2-wRKeJeEThqVfivMa^KYqlm?XZXDTghE|Ct6FP%cJhQxYo}-!n}zi zc7rcYrA=UJ>=t>+Oi0UtCUIYVZqki1L~VQ4?!j*l_boU|Ja+cp$S`?XKy9~O61iM6 zb~@kIIoe7M!nC((0r&<}N`&D}p(G`{#V-8FtX&tE2x0>)@`MU&YY7MR_YVp!PQ4rg zQPjRa)@=V`A@wfJSn4-mXYOIy^t1?vktzhP4ldG>YPW*$xVGYY z-wTFPv>;4F>IYrTb&+IvA1xVl@?@kg_`(%E-W=4MtV2!1K{K5Tt<#9}^%0Vc0h+t7 z*Sp?izPJat&Y|Yn;jy4AT?m2VgEp43=niV4#F;12e%^z0b6xK(pRzZ@`4*cDthd|U z%JQ6EcC)vBJ#E9!@>X~WPLkf}0DG^9wm`g!DSCL%#RRbC9_Kwlq9f3OuI%he2s3#Q zp16uu;op>-;c;wF){Z@?xYw`AJ>bQ~W(MlcKQkC_{fbjjfMp^w1AMGN7iG5{v5Uk_ zwg)zAFoa$Q#-765fN*L$HJ!M!q$GKw7Blvl`jksDB)L-G>E2X>$XVLwk&iJ;zy`tS zN;*jvL0+mOv6fQw30(=_{M1@j=6Y=d*v>e(7#4XcPj>ri40UCP1xRq~9e&;={4oD4 z@9jeqtuVZe(x^-)io1qT_>h2ZwH>bf@*WX+q&zrI*D=#i$IGwCe!2A1-Sf59>1iJu zhnx#fF{cSaNCQM6P;#TB(JmUBG+~2C1SKZZc$7#k0&0`!b7&80I$5jlrB)_th zXIG!CzIva~fKzOdW-}^Z3Eo(GK^B_#o;BA{iWB3yh75X26Y*#-u#5yV!mJE8%pzx z3_R}0+89WAM8H@nVcYvXL8pc;%FkMj#k9fpARUVUvKEceid{m9_Z>*rO7$6$6FTZq z?LElu_ssZXXDU-d9@i%|^-KV;ftG~mYak|wqxkeCu!EM-fV2fs>PiS`=7+n!s~TOG z_Y)iGIU$oytYm+7|8zkp8%s-wcBr7~z~&1Al*ZNr`u;k}CyP644Obo$9ETP{Jkb+A ze<^RlHXYybRNceW@rlH}67F`*TkdYxRZT(k+l6h!A+iwFoXw!W2eF1Ly4fU43bU;y z`R+mQviBOlTR_Y8=~b>hlQAq9pXGZDS#3n8wE7Jf0b?4VaXh9z)@Lh^Yhp#UVyZvi zLh(~L2^}9c9Ad6_W)mCYg6pkQI|2~3*2-KwH+vr68lx|XKITrfCjiWH4iCG9lAyiT zmm!4_)U5m0`^F-_xUd8}8b6)>x~7@ORG?Qmt?@?RqW6I11GBw37x|2G9H__fU~_W} z(P*Sm9+!P{dM}|cVK#|{ITE1M*5*UClOuSAi-j=}IxtEvXjv%<9AHQH3gN+x;&)t_}7H4z`m zwV^|qM1_*DG>&^i8t@4L+~+Q(7<3g=^=wiMT=erGymOkXW?o&?F(D>bzO%3n4E zOG30R)P~qb+6X?o?Rei4DOxcTxc)f`=8G9?=n~@}pqwlswHt}K)0ePhvVHEFV_i2O46S1e-qgHj(K&c2;K4wRwIgCi~&--W0L?|ibLwNJ4|+m zqL z?kqMsBPRB2ReGn;R9Z}qOXf>`UrQs$B^8Z>MYfU4<`PkE`Hagvd;JAZoQ^8q#dVzq zLAOAV#Z%i&Evfpoj?pMk?wp@AKbp?6xXdRb@Oc3N3nxb z(@jcXz7*xW%_?9Z!P>FUvlnN{&WN~Hcx7$8QhxoQo~Zsk#j+hI`PR+~$OrR0OFnV> zRiegix=*H*W!g=&C!8b~vKS7g{u-nZ0)r*tO7^U8U+g7IU=Zcvb{QM>)rw?);to3?2&k#4_q(5Fnq70=3{0i(NSpLCV>AynNSR*VK^Pam*_jBsBz?6_HpN< zbM_g_pWZIxEN%2(V&9!)mAxn?9+nxuaaXuy`OG z?dTX671GEf${uP#wI%y(Qs+dt%OS_slrP0Pf*LGk{6j@|1KjLs9nvj9dk>Da-jMRt zpa#A-fWAow#hsbpiV;%>pC3pOrY7Z+GSLM{QxaDw`Z37@CwBjwLYa776DEavcaiq4 z_an2{&9Td6&ksMBkADt5L5+va#KEev+g8R$jH#8tX)C6x_9jXPk={Zq_nTc(cC1M) zaHuR6&q7E(><_VCe61<7qP*<%bZ%4R(GPHH+il@LbHaXeia@zUeTXrerLFXD8l80Jga^^ek=zics7aC_X>Sr@xqni@i!?GnGeIrK3+?C^@ED8 zgQaIu^!uKezE)JbTv9SYKZS0o#d1qGl8Oq@t?m6~7xXL(5Ys(v83}K$R%JX;B@MeB zXr-wkFX+{2X|6Fo3yWr1sz}fNstWugUDlTXqOl7lj&A)tNK;306J%}#Y3C6}z|30E zTj>07vgDbrEeH$hhJHe5fZml=6Cf(lXl)_AvpJl5R`u-xltB?LCY$2tWU&5+$awZwelw4%5_YK=H6BW zmU-7+CV@5MNxbbtric5fPyFNMMaPXlMF*vrT#!4*po_K+vSwcO#;?&h#z`LqTA-v8 z7+#t-{vO`uRwK<35>9+2Y-%EzS3+nk2)O(aTJ^cx%?GY>IekxQl%KPq+ zIb)h~Ph+Bh2L)XHsmaF9695 zD+2R{jbDL{M>TIj1U}&PpX7L1&h~LkOq>ageA{PJ^Rj06eAtaN?T(Jtsl4`gdah!_ zQ}?tD%zCo`h#R?=auK*Ox3VNP5XiWM!pf)Sms5Q_dV}LU64}E(ysW7iD4_MLI(F6t zyFF}v_{#18*ZMUR2#OwIMq?EwpW+hq(gGF>j2DNS)B8n#;2-qfO#QHee{kIa+gBF8 zm*<~^UTlvZKEdwTnY4Gxf|x3QdHNRF5Z$7J_D)LNg4FCQCA_4m!@Z{IhVln+*=;(= z&4+pBuD+KFEEOs({KEDm$>D0X8XKUo9wHkc6-Yyc$djZN zLI91otpH(cgP#o1ntyC@)zM}ksDXz$J51rZ;m(Yp)W~GkkG|<^`R8(zdas&Mc8w4V z@FT4+VQpe`q!th?#xW8obCWmIy#3`tVZL$bo!%RoY3Tzj0rx)-dZf+0sY|BWo|!Wl zKc#`TT(@SPjaj@8^g=f2Gg%GQj}j*3WbaY(*W;e;u6`nkf0^gf;VV1160DoEFVXk@ zquZDXocFy5WY3W?mAWNr-{0{;{V(^B|3y4`JEVzcK`CxX7y{Wo|A;3W8XwfwR8==o z&^mY+;K`cWDhh@ghS~~->gvj>TH2=ST1H0O8>uyZ;>kbp^uF(0E1J=B8k=qgjC`woOK`G33LrJlMy^ScQpsmXD@XCBsk-V{deL%g%g(0u z3Z;2mdW(4}6{hW+#~D%P2sIk#9q>9?L|@0P*@LwGeT9PX9*dV|CPNBKtcO_dm)-=+ zMeo*YlOXWHyN7&TBnF&71Wlf#cAS)e@3MbR;JjAcs(2t+R0px_=fBrKY4>}6GncU) zt19DluyJN$GKrlmN6~B`wUEt7wa4(D(Iq$SmA*ONh*VvASkbv>ImNbRLd5l9gIo^R zg-4jIJ+PStC~2<~O#v84>}dQC$0bsG$bnuC&#K;DYCw;L_-%~E+bwf7W0Nq9Fz@VW zB1~Z0^U&`3CS`nQKG)bc(fzZE;vP=^Mz4-{3yA9|a~4HI-Z7WGcd}|vx&`NI;v{jM zCglOPNErhtpPhZZlIGmh)2FE+*ttv;izI>A1nhHacKc*sjvU^I3fOg!_#z z?+PHx*LEcCaeIYT?^o^OareK%AXy3tXEZ?qGUJXVP^37@_PX@}a(4qY2jl6d27iw4 zs3}lP=X?9oy~cFGi(AOHmi<7JyO)!szQMHM)hAup|8ZSF_Zu1E9|>{Kh{l;91^zY1 zIvc`Ri_~KA;LX7b20iy{@Mg0mAgIhGnhp55llYKDOZ?L{BZBneWyg*(QpN(SZTmC% z&n9mEqUY~kExNC4(9CZpc#^sWSak*iT{%SZz)Qf@g9y4=COlpJz=#-2I5(f^%O`5iN?2}R6 z{*q^Pb>WKAfeqe;6Ws@%t9-ma6{m8@yqx%S_*8d43$+`QJ3^evclz#)}wa9!#(FwdAh1YRzBev?6 zkbHwuOp{cKAaAretkRRMdGm=l4RBAbpfv$Y5@d6bSrCXJGC?_Xp#H{2Vxu862FX`& zY_VqCKU0c!IzKFwdw&<1T*Q#BdL*2A1-=B_=rJLEvClUa%}P=4p$ttjPNa!W7q$VY z-H5tx!Xdw<<)zX*RTt4ib0Ntt(mTJTFs+?F^11>-vjw|sTmd4`s+`) zzdoP^pOLLRwou&@hxAlbt z?UA3mbo1~-f$SK}Ovcy^uo2n|zqZt?=3c1hX za77e;h8YJ^JGGL05@(wf5_JB1;~1=p=ZeDD7hPb(kp?M>wzGsl%p?n&&n3v?k{d`P z0R_~Tjrs=$$YK@+Rw`9rmZQE;5XjYa@)h|lvX3vrfCXj-v@b~(&?rWMo!4Zx?;1*a zP^K2o)!}zkY3J8*r^QxT4dLk314=tQZ+?#bkZ+?28*GUFQU@3>0FUFfnmEdW0(O`d zI7(LOdY~Wl22H`I-~PMPo{q9w$&Ol4wxvCr*33lkjm$N)_b45FQ42Z>TFD%<{ITY} zo-2^s;c}4+r~Qwo)KrX{?9*SBpVG^;5*`v3v^I-D^=IRapre_yeH7u7g|vYtC?3~A z2swsFCCn9P^RgCb38?vqnRN8#$HHHeRa-8;H@(KR)9brJUF|$D?TMy?x*H_TlY>#+ zG-lEi`Jh)SDYG5T7NAtr@P6@jcW7Y2!lhP4shV4t`@46fy;r$K=VA2GP(F&%f&@A) za+x-Yg~Dd;Y-$0F_XtVrarPdw!g<>6m&1O;3t>H_N|uQBU1o&}wfE8$Z+Gl0b3bcv zd6z(I1lk;q_fYC87epQbxm+Rj&1}S$yEj&o!f|5~{b@#LCdPa;-%ozuZ*^EeEbQ9o z3u5_A%*+S$xGye%vb%vuA#LLu^3-JirhyL&!y{t*vv?f1pS->?BqJLdW$|1o@F0ym z0~`x)T7d0=vwb|E0RX)LIEoXd^~0+fQ@nw&BZD6|6%-+^%wD^zA>R;O-LL-rE;>;o zVR`wENEh^8(I;aUN6_*5QE&A*s2Te7NHb8@RgjYj#W)~uL0>;*sYIhUn0T*hzRbb} zn_~JY)25L+c{J9!Us-bJK1c-G4f{N1u@B!v)S^B>ijtKnqSa)j7^FatXV-N4-jmFmYKqUQTA z0?zzN*i0&HHC-tJO|+?kQ4e;(7tP52NZ@JsD_M|YPf{qLA3-VLue`{2jL?GW5L><6 znwLZBI?#7P05D-1D==&@?!j|W__$`W+qpz2%mi9Ot~YF#df3F2=|)?9Ja7#E9^ScX zd7$5*BHt^gQg;{Rvj(j(h>oTO#|JQXE=dwEf=240Y&(wrY^l>pwta!84opH7+`?V@ z!&suy$e4LvQ_f3?-)NwRR>M2XIZ$t*m9V6yLx`qjT~XauA*S4?J?cel`n~?Ed&!k` zY)`d4#CN91Q&uoD7%1LvW62m_OuS1_S_&sKl8$e5HbDr%>-D|U$+>w}Prih#Riw?@ z4D0o36u)*jlELV3Wn! zqQ!Unmvp<9x|a6M3j?%fDBiY@9Zn);nvg6BrsIb!3vC)C`m0?|K8)K1J%Y@qqjJG*|WHv;^QWTXIs; zgWI$rgg=D+I~nLH1<(f?Fk<@Zl%PSNAagL!LSjA;PIukROSBR|oa+HF%hK(fB(l-# zy=Gba5>8p}M8904OFq3((+sAh8|_V^14u2c_<)t1IP=J{a;Zk|-3B!aS#HPN-(nAl zPcf_>n2v zyKFrCVc3igC>U(HufsQqDeh7}U~Onn6Of*oc^~*KWvj`KjuD!n91=;M-L(OHRM44Q zD$)1=rOv*wN*uLlI~K*#d#)Ib-hI zgmmo_DXg`J#Z9{Lc^B>OV`R>ZeXE6J(Au~-VqXO8egH`hn?wO(!psiJbxBVWsJC|{ ziN{A#+(@a$I?KQl)qd|%eR`-b9y3AZjWJ|(bd!r2G||1dF?oNB(}fDD$Z-BwDF`xl?WAobq!t)AVAReOmjq8>aD&d1%gLQ zF6k~IS7OLt|6S1%zk_VgsH5iW3zxH<3$88u4%-SQR)Wd55=mMMxQCX_UhJ4ziG@`v z-lK_wm;l8MW0(rpu5qRHDU-bd@)uZzKi^uy1;rJY;a`M5ZH8uIYWgVLaN-`)wmN<1 zoh_s%F*CsqyX7K^M_ro})U!>#_QV9F1L& zd;)|OplE}t^o$~g5%w^!fOwPGRE@kba9FvW^&1j2M^QSUUaoaHO+ z5bafF3sESmC_M0qcgEejcWUa@zxqBlKOb`FsbM(Ld+OXJpVf*hbjG|cn%Q_q%osyE zOoYZu;Ub;q#NdR>V})qBv)$)~6rY?|ycOo=p*W<$ZqaaXSyD~D(_nvt1iDIlm;6Bp zHU1GswMhlM*#@k>&||f8*mxQb>fP15HBm(*7p0wD0l^pQqw~Up!*Om%1wV(@;f~ko zn#Uwmden}z7fZP-f=ze8Rf1C2Vq74hZFUP1K@$U?nd-Mv&OFGTZB{ajXc$L2{#*!2~DZF78cy;|lLzLhU&zCd2m%&{y#;0VL_^7<5=69I;86 zJOdWG%!x{zzWwRpVpd8vi98u$tXW+vlO^ZlRcHiakvbij)w(AjKwod&uFJ9+`WgQ4 zYB_~3mB`VLlJJ6a2zLfL!x3qRw5xjJEM-o;7+97XT6~*Z!uG-jGNz0F}66Ruew;5I0(qxpjKX7a>%X-s;k z=$mxz3tqs9#j?V6rTyK{7&gjcDUU}0i5h$d&_X*&@>YbV%DK!}12!**)JNC9y{k?i z%6s|#Z0rJUeC`-D0_>+-YZ95jF@v8bab zth%r@!}Yw*Bgo)FFCDMSK?Q}vnB%k--tQ+Jj=VZ=F0Jc^)NG3Qdd{my3GadX7PolFe<29+yn$m;yGr}qVj?5EEC-8ICY zY}-X+4V}vb@Hg@xX^bF0hl$g(o5RMSRO$(*bgI9VJ*Zi8#Cp$J)c3nRi)E~>u1b5S zx_q7WaEe}TE#JiE8~~!as%uw+`&bCh%_!D#TgL-;;O$+!P@J!KrEq~w;fvv9DD1swqd1^Nu>E#R;NirZhC@ComN-fq`q#Pe<3?V>|63(Z` z1_Z}VSg;>%qV>G@SGQnS^Yz|&$#}Q2Q@31@n0qQuM@59d_6%MDDcuh^s>pJoua?wQ znf5VW6-VS}+$AP*NH>VTiP(aebxnf~dB1#|4dD5o@F5goEWQiqM?rlNB%o&Ac!Co> zt`r?Y;~Wg?(Ip?!$C;$I5O)Ee-5pm5f=eZ3c`>~i1aLEe1rZPep<~A3WafgVOJhk_VMMP(?~Jg!y*{b zJ+1;bdH;7~ApN!b8tDHT68#^xZCA`i*Pj+4Q+(VL2XrqKtI!_)@DF{CffgAzEtk0qUErkw} z1LDSmwC%h1w{LDOK@M7b1q1~IdIek*S5s7m95gbw+zv9oef+qV{&Bf$P&9rl3G#f$ zd<5&gMUoL}3rDYRZfp#VVS{i& zI6&zRy1E5kG_p1|NoKy@Ba{of0Pa~ps=0SNkh&jo4sRXyEj462`g=|-yi$? z7i{k51Kq$a3OGaDE?y1-p9)ThWJu7(?dyBsdiP~eC~#e}eeL`)_ z#uz;F1zhiP^Kx|u*CcR#;M^b5J^yh1>wCc?5YdyC)~8IwZLG!B)l?Kz6;zd#wZsn_ zn~Td@tI2IY@Si+_&Lg(J4`Li}G4!&xr&o};A*i+U;^zKtiU-71l~n*Fz1;@Cyz}FZ zU+Q-K^k34D!4QZJ%huLb&tKAPz;{=VLLltie@T-G1yup^d1Zak^|I?P`GJ4w?`H&h zsHy(=n~vub1j4$qwM9&XKsa?E5aO$?Ey}5_EgHy=&JhBc#BMc1xLF|g=^zYrdqL28 zItFgKt#$|$+-9Wvar~As9X-PiMkZz!R^Xzd9a6muLQluQK)-{5k#So%>3+h6|F>`< z(~pCViHVJqjg5_yhZ7t;0zBJ?fQZ14L*&vIv`98+z=RKQ{`TA1Wd>2L7@2qHg9H`EO}Q>nG9D!T)NmD zYx*Py;pAB66Gxk4ft#NBBjH|Rk#}MtwodyWe8|e{zkT8z7+m#4ggpDxGa%2dWCivb zdFA_Gx1mcUbVl2=oIa7}aS*~;G-{WQ1uN=Z{%I_gVln>YxGavI@?+>{x>vJ&-vx*7 z@3F7p56OI^8@1bZKNQ+Jy3q8f)NH4<>9f{rCrs%RA@r3GwAaIgI*;!Eihh;rU0{Ev zvtq*ZDKql2PhXlqwY{VWn?KVTj+H>x_$^4--5{U7zx*~x1^x3L0{aC-`;kzi)JLhk@je5o>IoVaeQof$p zrm-4?nb*mD6{;k!<~gEW&Lw=kAjaGH1UnijBh=M zwCznzIpvez%^IkhDF%`lIzn(u=g!yGB$UaxuC_l!|@hPYW@K2f5`E`b+sc~MusJD zbLO;exM@(GDDE*(BRiPBKY^Wt>`FJap#Cz*W8__bT^H7FeAjzbWh6 zGf5xRmdM4$9dILi@8Rb7Y}P|z4(xZ z=ic|-N}*eyYU%vqq2JOLB;?I=WJMAhp7Ygu>e%gxxSAPB`nG`zT5xn~=J(#Yn%(Z= z%#XGpx5|Yc1g!LIy2T%OfnD$0dbcSf1J&4PJ`)=%6YKBm%4Cr9z}hQ^{sENaSiO&N zpSH|!gh)@~T`7b=EB*Z}W$)`}`{_-aFL0Vj?|F2OQRjx>g{%F~?|wb9qhPL7v+y)G z<&E~@Az6Cfk=^~QxO|V#A*EWu5aj_bnJ;qfYe#g}zZ$TFhkL_rp1fJ;5h4Ft_4?I# zRPbtJTA)3=Z)bCH=i(md+H}+7+cO?@XddsYhi{udneHIJ8hQFaVEyFR-ivPd>EZq6 zg~g{dCtlFp>a1N>0u%vecTj@r(-0gdwD zCsRCb&kb+Pz45pZ`%IIC!n;T?%m2`XrX>wWNKyH}D zg>&Wwdk!ADc*yGLH-g+-&Iec1+4RP1-2=5YjEI}o`I#oJub}eEdtf&l6BO=o+z3Z> zT&1```fa8ZhR!hVnac8Yv&4%l5%?nFcxurD1i{1pF=VUn0o3dA?F~G5S?ziU7;$-mzDO3num9JMh%zt1*I)!d9t;~lZTzZQRH|< z;z!~P6^8+fdo%Ok* z793EgAj)5mdvb#u1#85IBE(}QhtE}X&8Fg{61r|D?qNwjE$8&z8_!FQ>^%{kce%i1 zwWhjvV5o4gT6AU#(_iSj*zDQu-|ByU;7)txZ2yrnl}PS3p|tw1vE6*hM#zjNrLe~cYMRkzFytUrkJtXoKCSe*>qcA|H@LOrZ_bZe~CnyUh$|M-X* zPd_#(V%P(>-`k!Q zYw|u|T7BX=46(jHEgnQ$iD4X%p26G_9bqT9)9O`ljMSk|xb=lSh4HBJcS88F_1!2lj2!4Z+P?)A8lbUC0(VNZ~PJ4!}VF{ zEZJOc+u8i~vP4LK_P5|S&TlW=*}10`4@?&;NJ~muPD+IUDYdfPUo`TXCLXk&<~UpK zz`xoe`>H-89sVXsy;d-`iKu}R;jo3O*!Es5G?PeMvu*V=Uv^+iKdS(m5wVcUt^BS` zk6yPox*%P=kILv4e|EppEWoi+?$Xv)*!PNrYi)g1&aBSym5ff-b9X|x*S9ZcCB2SO zHc>65`!Rl~y0`x<^2VcLK|16E>Wn*mYf7);p6_8tCy19S8SO?inM;g=bOf}%K7K4(ndRLs+@aU{QQ0np3#DHSjlY~>NqxSq?3lOJlr9$H+9~m!j+C@I ztv~(pgVe(T6R6(-4Kn^&+kIpGu%iE|eWcKBcKFh-&bnBAU;1}zt_h}f!{7eerS^VvbJ6Nl% zZihmb*F0`*(5m%Ieyg^8bc<0L!kXlx>dYnQT$B}~Kh9VsXUIkc-5(VCeo2)ac=^WU z)@C1-z#6{eBV_f7>B4o zU=z<}i-%i!yFHiqt>>tTWhc6$dE22j`iSc@ug#t@!*&~zIH={xpkIl(m1)JxqWQHlkJbBqM`<=6Zf8Ky}`H7sc;0fF7J z)vi|lgffpK4`XyS^32PQ3_boG!Ew++v4SQQ4RU*H)^9SPV?x(FI#s_&X9a&`uf-A7-5oQG@- zD*-&)kB}iPUd21iB@jtye)=spR@(;;hxNz&Y)5?2-ZYB6mYBh|jl9)RD34bd{Oz6S z`2smkx{H#jZy9J=$%&MSS=bCTf>SX6Z1zyK{a7up7R?>iu?A-K#2Gug;kC zX})d@eg=dO^Ke=H=84F^JfX9Ao{_stzgfD41r?4P>&7~cdLHlz5MBClOne>~=c{{Kpvz%-e zbt^_aMvp)JfT-`uzsC3kpUp@Am^@UhG%gOitP443bY6W4MC()qWAknQ*U{ zK8koeN^Mg5Xl#rZa*?U-+uj|`<9taFn_)$3fA*N&^YKI*i^s(uhlRl$(I34QeA1{y zo{Cc|&fnwT-T404ZKUy5fM|u#TGj>Q$9l$M)M*OC!s7M9noA*yW9H*&t#y#p^^5&F zBZ{1K8J2IuPRDS3I)3w3-Y%5Yd;d+#hNrrXU!rQE4#|kv940`v;P|kivgrD%a0|H? zSb(uRq0B_-Ve`P{ z3I4-r>X)AKGx2EiQ+h#t>d=n28(tonvg6WP-RJO$?9VGC>;j`k!)}Cn&RvRH2z{HX zO&0&8>GudaffQ-6vuLhwxgE-2Xq|?@KJ;HA!w$dq0*gBKm@(Ij!To{{yFF*)otHNmk?D4(_?BK)PH-Q-TRakHZQgzGYw~9gd!^ zali4^Y6m6)s|e9^@G?3)amwVA;%{x*Z;jwx)9g3xlb^lLUSo;6EZTc*_KJRV={5xW zqW4iP^z-GtDp9jqC#RbT?h#Cdy2j}U!Ic~Grow}7gG@k&+!rL<@|2Yp5+|c&dcORF zO4N_3L-Dc;&L0o0@800)R^X|)E1}h`zy0cnj+#*5Hd&%=^+CpY`dg#7tg+&SXVoFM zzL#Pgy%d`s(~Hr>47MLd7w%PDdI z3rY2ABA*Va-Zv&`YM+uLw;t(Cp%d^9*m5$7NV|MB@=o4?_pts+zu>HiJL|UjHj&}o zZ#mjV1n@47LnV}R2UKFQBLkfzEos?p0tK3fNNY>c;r zo2R%t(kk&f%;jlA-GOM((TFM$Rtv*hNYAf+LWb!ywk}VYGfuxB-F0es8Dnes^6S_c z!H4e?EUj$k2kn9r@Emy`*LRL6bR$05CJivDRYf#YRXhT5`!AwYpWr5Z2h$~41LF@g ziqdY$opZG>?&m+gT~XFp+ekE+qtTiF0l9z`Z%XQzsVx2^uJbkW`0}=rM*e|*4wJ>z zqNhYEm+mN|8e6B_+Ue8y((Iy>MSTh>kIs-IDi~qioOX9NY316FHj9`{2|ei-e`W=S z@b7tg_UuIKo~d}}Vg#WYFI>E^5-0s4Gw6d$*^lW%=B3n&F2$^lSYqF_5OAslrine z-=Gyl8@%s$Z_up1B;!F$5WUXGD*^4|>D;6tS-d&x^k~V^kz=h%s{^*LCPZgM!dwsU z%-fpQ;a~40MxB5}^nKGRUkEkR9>0{nxr6yF5WVexo>e6`SeX8?Xco|E2DaQ7oBnku`(cDC`?IAxsh}ChUmj8?S`7k zW^K-|?^C<u1*sPgl33oLqK#tr_gXBuRTVsBTYv{aQqbM}VIK9r;rVOgb-2Be_m+yODM;hgNZXnLXPZ@T^a7h>A)_)uspgMN)}o&#^YZ^K&)a z`Zpv0wNc)kImKO;Vj(1cro3n>bWo#DpEs&p;exhvnWo@s(_r$Pz}Zu#iRXo`GQC8` z1o_MlNN9UqLdJ__1xfg1Tb#~SsOx1(UApR)-qHO2#{JhxA1|gUpmo0aG`v;sO`x0c zec~xKm+hr(y%_K6s4U3PoFwznms)Fg_?BwI1&chn*Lha+pSS!VJVM(6&BGHEXGsiB z8J&9d$^~uv2OuB-hN5E1!;nmv`{~;HsTy1T0aGmu7ss~tK3%%&emJG-D`lj8E2*C+ zpd1FG09=ovJ)t2<5@jvELLbu@^}GCsVfE*4jZ+05TspxwWapx@K~db@zo(?V1yTF$ zMlSk-dx6n}P45MWs}`N+D3e?seEC^n4Bs2WtoQ6lF*L61^@-zIDP!pyX5Sy{>G6Ns zd*1wE!e7@!a8e)jlJwPf{X9)(tSOZ}Z#O>fL)lq(8n*c}#1H#gbW`7p4}On768<>CHv$r9%=5>DCV0WmV$>kT#@lyP6*mhaZqA za6VGA_!Uig!@LF;@M)MC>zi8A+_Wb(^=Jj!hWZ#QZ3ugvIr14c(LaxR6`j6r{~lEV z#i_ps55I!3GB0qjiFz+k`7$CZAkarHwT8!pMzG5+vUp%;OP;Ywz${38gmmxQ56DI> zICA(fhTNH+d7qFuMZ&QK$m=LbDanF2VQI#Rmw3VKpXDhB$pvT`a0s`7fE zbWB{l%4-x~}1kHFm#DllYsFUH%da0#5=SOauNMJ4$=>ub=(t7ymvhQlJPlS|Ns!ghAShKV`s0gj}Jm ztQQ(l^hL_=xc!HS8{c+TzPZPp5n^rjypnw`T$YL>-Fu9D2jhpii^5q3%)mMp>$lq{ z3&+pn6%}AJRVmk2-}!hI9$R|7^Kn!&fY0NE4)@iYJ_lgW8G8uqV*kU)iw~dz89L+S zY)z;aUYgiuUDX}Hf$%>(P$V*R)jTk|?h_e&+E#TS;_XDN{S`wea+^M&)FgIe0FOdhe^f)lTG&=UUCRZO5+ zzmy`H2Y|tMP_OEJG#2Xu1t>B*b`(mU;>a0)%T?WgJ|Dg;riMnwxR zQ?KU5kDU5yR4tat-!n}WmbsMGj%rub)Oh+a3IQ;zFoV*EUgXK~RuO|@A1E%*3&}Gc zWhaxM#GqUoT&}lru$^b7ttKjBmU+gQG{$q31_261%m7WlrNMHd^Vw;;n#`{Uyr!PE zD+*$LxhLK~^L>4V{~^r@yCF=&;#LU`L*OMp()1x3A`aD*5~7$(Q>{0Q$4l-Ib?r~Q z@B9s$t{aDOw_}u)kj2{MJL0)J^vWDqFdM<%3jj{kk5U01;=|Ft#Zrn1#kL?q@0NNFcG{=CuN+VFSLoK3Jvc#5%rmCqMrl zEx`qoJ#_sm`-MaH*IrT+DGknsmslnD{=yR*h@u+}NskXkjv`s`7M%j{mGo3`ocX;H zpX^uBC6!M--%GFUl!Yd}vk{t;5HE5+VYgaEVzmdg0(3_}DyU8b$RVTffH@Ob?REl} zE$)Jxv#rKWJE9_T3{}^2J0N)ZH<+hiur5BXDww0|eLB;z$4!}2*uGQEnT`>%cOniN z)=D6#BG@KKm!stXz4hEd^MyrRZF`>WlcHgqOxMVAf#a^FCvD4o|LW?>Oi{*2R)WdU zZ6pwE{Lwor(Q2V6M!3^g&NO3(trCYKG5(VR$87FQ=Yf83ps(O@Z1)%-N_YZJ zhZ!})Fh~7iUQ*nM7uQV*<7^$nwCgqMnf4ql>UM{<>RtNr`&V5F(|gPyIe7vInl>aq zr)Vui(cv58>PPOWDCfDE6aj4J(_18Q*NWxy(aP8>(k>3dtb7lW3=fh#KH1}_$O->| zDfwrdI;eiPC^;y?u5GChWheuA=zHHS(kY_jQ4$f@mTQZhj_0P07p@bY1XD_9ToiBc z%C>~y!;Y>G-z>nD5xM}WDiRBTCTuX6A?cU>ObwfS7_DcXWZXzH^?6MRanGP? zACMh*Qwkh-9U%l9}v&nACP5%sSwyP1YO$#-J4om*I`8?;n8%f=wvDD z{CqpVo$Yqnuy=NLb!XU$Zb`8u-w_{t6(I=-AbJA5gfF=5d)l&qCxE^fVVkn5gdOQ#Psw5a>91Ga77=qX9djG;-8- zvED9STNJT~g?2HD>Vl7noBm{Tz%#wDS@Q?xjts_IOO5Bp+_xWgzV-JvTpd{_f6D<~ zjoPwjG3nCheD{`wGS{q|(`Xz7&pdcr_cFPqqB@`nF30dM z_fsnnaZu(DU-u z7rBZgxzZBQ>B5F%#H1sc9T1(pHUPXW6R2@k7SLus>2(<7c{p(4+XqfN*#+j>S{~Qk zlxhjL^Ux`k4htRnT%?hz5umeUR_gc3l^NejUB)|o3p1DP&E#@ zNDJl9MR#9`hSNcZen2`G0Wl8}2VDoKmjsF1WdGfcThTeBv-NAG&a7l5%D*bUm|kcRymcW*PtpTWl4r1kBiV zEuM()7#8gqTc^D*ng5KNL<-yVJg*TYD!`uchU!@|4 z&rzmOH4o#{WugQZ#9ob)3cRQMfj|q9pF~-)piPfve^k0P-<3r{>E4cV{oTqbXw-= z0!hmbFUq<=VMkd~^T`{)`11B~&s*Ca26*lI<>ZKpdIyUHf3)7HOz4b^Yc1z0bwE%< zzOaolo;_n;WnkH8i(J*YNHya#JDudG=yT>fjHQ=Gy<)7LJCC5@?zvcaGAcSAT(!WEW|kEqsYS8P^wbhvn8kB&bhTl zs`@!k@tH_Fvfm(<*eac8PNwjq+U+sE7D*J%_u8~Tz_NLADFV|)zdtpdH#zmZY~AN- zNK?I-hWIPfnNzkg9S*K12Lw)~;sm$~@aZHD4MHmL8CP`?NXNkq-)r(Lmz$xM(+aAt zz`b9FmgICdsw;gT3R7PF&@T0?9z&E~7ZCd`Vuah`j6eA~;6W5Y4nk=^r(u_&omET- zZ)1H-37}Sv)=|^~jv@S~d)4bsdfF0h0_vSzcENByMwKlUE<=YBC-?LK#%v7{zr_YP z1I!&ZWezLv<1af{;V5SHup}tYHP{V)4yaPJ_O&+t6PJ9WcY+?t{Z|> z1brMBPT})=XPN5p!G<`x_zDu&k`MgE$cp{-sgh_H+v%#(@iN|vUOu0$7}1O>%$bPx z$XLF3Y@`{2K7v{*feL6kJ-wi*R)j!e4RA0*Bv4B1rihv;Q?SHfdrVD(nYN9Wz>^)f z2Itb{r)+&43kof!jbB{i!4B{%`Tkx&S~#kXu{Sm(nH)UuerYUw&7u|&V!x;A$Y(Il zuFL!J(#f`?!p@hoyg1f#(AAjDoM(vq^Y{X=N3u9!R*tX>Cuq&Bhn4z(hNo+N!Gdn& zy_yTMh>V#YoC*P*G*OE(0-J^7o<^}fMZk!z@3&Fan^$Xh zha#I_Q7%YdR%z%amNzxr&T^UiQ<{Lk*9iRm|Ns3ow|~Hw|06N)f45=$-B|O#d!GO5 z_&*~Q`mc`tGw=V>Y$W|d2b|wKyFXKw|I9vr5775Qsza^VA&{Tz-vab%N{Vu7^12FA zaw@v2Qi@6-&tFYVK~2g)K~Gj*Pu@Tm6m|Rr=zrx3>-+`K{{rZL0rbBB`d{8||FCEK z&w8u>IR#{EH2E3KtLh%AhO5262wT>pwia(VzE)Rv%mLw`*AB=EewoU;+q;iu5}Vwr5Q&uZb22y1J9Nk` zGtut%9XklRn9x<{ZzTW;BL??k8n)X<;xYAAozM@{Hc6C?F3_7lWZ22}z4Vr7l9{H{ zm(iEIxvRT+&qG{4`=)k=pH+Lz{OU{TF;Qu^6AUt99`_leJ`**OQ!pBsw@*61BZ&)i z-i9wvCvGcM_&}~}Wkt4QYPYtx zB=AX$y2MU`$|15q#EZo!FzCiaCARkN3zKukmrTy)XzXx_9YJVsHZ|2E<3|nw^&VC#_!6MRREIq2}BQM&Q1Zdt1WQUr7+IUsg5|9LmvX+I+ssVYlWK0J)+&N1>}@}U z<^)LfqRYL7hwG$1OGbEiC=aFI1?fbd)a%D3;*2LB^2FTuqR`)4f;5S`3PKb^$YCfA z2-C{2BHsy0;iglrPc_ScJ`f1Qoz=OEiJPg@sn%iR(r&qxC-V(_!jmGBQfnUS+ks{z zSN^kS)@MZI5u6}<$|1ht+G+qrs|=b4w3^K z&lbMEvC9e#>ah#TRDZE0Q73ukELGKzMf{tQtumq}s@=qDMi69cfaCBX7 zVezfP`K-{BU#1WF#{sW_N8j>}HY(SoUE=O=uc?MD`Zq2u(j)RM$Lj-Fy=-sYTS%9C z{VH#j{&VKkHRW8PaBH`yz$SWCtPJ#4RNGYAJ z?T03w3*Ay%4d%FOm1DW1d)rb{GHPhD-)m~|6$(u9%SUJjoH^>J&?~8WgqZxc#FF(w ziTv}g8*qF%I73s0*X{EJah6cZq^ya;tB*@A%gojp@# zcKA0g$L?0N9|ws@+0PBhNJNK64+SrjZ=fKpG(h&Y1*B>FmHh zUf9W-eo11G^21u%Q>ARv*REp5x#<^^>R6pZvsTVnDs<4 zl5|iNU##FxKsEtQ?F?z2g(!*C!Zu`925FI5w&bT)rp7%X(SwDU4=l}sAJAh17!dhG^7RH@nXY&<{8mvrlt&`*ta}T+ztryeX#Lv zX5p7DEi3(s^UAS&cbIywE5lr>%HQ&iVqo8svcaV|j!$Tuu}lgQgv?_Rl(7Zcag2Yu zzz2k+1xRGOPG;RO&9ZMbKDTybs8bf9j#qHf3f=z>m~e0nrWhl3Jn5a~;3u#<>itlsa~ z1|+7F+{Zz{bJ(pzTXW)NT5b6iuQ#WkW9geOM3ewtRzU-rQfp%yx{oQ|R%AnCC9Zh-Ru#_AHR$7b*74JYS#--t_jy z!VCU$Nh-&GK=hs6mblugP6Dj3dvCyks^i-}p(l~2C4x1p&u$mH;UV2H!jmB9apN49 z+O=C*jps$s9){x2f;h!cQonSP{Klj$xQ52=Cb`PTc3kyCE&o3oBB(63a7k zh?LAL8Y;g*u5OWgDI?ID_>Od5k{Af+Z>I}#k1H)E9KICf!@FS0`~4KnB385%Oig_v z*W#AlG8kV+FgDBgJf*v~qL_2KwQL#*0fU-`m|{d#*TS(;B>|hxb^cq4Gsh{2BMBoX>ZB{CA@#egBNyToAz9@3;Ae+IS3f6lL98#>@q!MK?AeVq)GBgXKXM6eEJ!~}7 zw(~2M*wqe6fOmzhY>=PpnsMhhxQK(BiTu&2o6M(Uv|5w!usC#Gc-u&d%B==8Zmx@< zUh?jqx)Z!D&DtsAy74Ci#3%aBTV{L`J@ldNpDrh0JmI(Z1vOs-u3jqz>e0x)jL_>R z2F0>JG!kw7mEF&updPZ3upCewhD4s?Nc32NGkL27e(*ZHSGPNOvOqNLl80vkjWIjx zg_8r}OlthQP9c93@%gi#|20nj^9BJJ{t_XLAjyNzY68MZ2Xr|yLtPo$V?1t3toC?1 z&!DHZ_3=@*<-Exm+o;)I%^pP>2`~Ajts||HSrmFbB5VW7Dy1z7V7tP3No9`eQ_pEv zGm0Nf3YwB-E*0{XI$ho3pU!g)1E~&nf#={MDabFNH_^x>n#8w_b2G+E^(HlI;hC>g zH#-5L$;XObjgg4XhEt{{Qn_VBm3*yYGY`J#Btuy)@+vjNFb)lZ=u`k#S@GMb@CwaF zPOEOs(!q<2!4dp^&{^p~UmCp0gIz!OqsQ3gIp#!_8EqfH(z5 z6cMSP zpDG8z&{be2@-+}V8ze{psY@2T03je1eY_b8rp=zBumJgy#SOU;6@4Iw5$YLdltI(z za^c>oQjZMowFn0=xBcD~fR=cvv8N(@#XNTfPU};6qz(wJ8hILx>5`yrX(;EXNpJk{ z=E6aJ)OAO*cpCTPJB#t4U3oa!0&^ZFto7l?bNEwASTtwsa#V26PM?w0f!_$O(J(XfcNg~{ocX^z#_t0X- zXqzUV>Bs?5`vT+##4?(pTbl>3YbG97&*yC5_o!Q#r5B3Mw`us@gHYnMM-V zT07{%o1S!QEmm75iWbN%M}!;Zz55siFY_W`b>8F=XmUjwn+p+>k2vKXE^Qk^0^#?d zxbrARgz^aGc=8h;bd5?{GgsiU2aYBK=+ps%OA#fCx%VZA0`JO$D-~b$zF@c`sQc>L zlQv6--NWY~S{U{S)ed_1enX7H9_>*PdN=35Ie79kh=;+s5H_8+(3{hEyS8lP=}CXZ z86_B-*M0k0FT9R#4$Qa71AVhGn#eJ#GE~fOr&ft-Z-ZCg1gMr}W;K=x=qf~(9zLKe zL7T_mgr!bEXu$K~w`39f(nqMLsmA$0x z5ov4PyvPz)xq!2)vskk_lWX@SXv!somiDVYLznzBh?0N?K_6$=IdZ~_@CM%F$3-X^ z-D)=V#?2c4@#t=B}F-#&s_g`EZs5m8{f7 zWO!s4SyyMZ#PnK9-ogw7uT0&+!r4g9KyM(JTpHVnWF(ik0?#GA!d2d&l4X#>J@}7C z*?w7r~^rg<~8wFg)&N4iHea@{!br55hm2P`4rrldo2LT?M*JCI|)fHf@0}5tCp(j zzUCm02QaNKoW*S?Gsk_`FvZlF%FIORT)$Pje%KRtF&~xoe|OI`H3Cgi)in1K`#RMQkiLPJ zkqRUyGT4Z^%iME}@2N_;6_>xX$l#e_K(jx>G-;4+ki;#T#_lENbt=?&IgzKJ?|5+@ zMXOm4oXA`yx7O#w`F&8wC|olnSzOxN3Iy-BjpgrEiCdxKQ@)B?Dpqw9!an)2ghwt{ z48rbTQOS^2kz+kVw@H`q6QcP6srxuhVzKDJRAZ79`Cy6K%I^g`RyU0B&;)7GCZbYh zae!z;EuJ=MPLcQC13^cJkLoo z(7^k+wl>S+b>qaf%U;4Qb%os0!jB}&Ir5aQz2tU3X;B2AR=`rWNCFACBsZLj!GvFE zI!hER%R*3S+=bKa+F=1^cC#_}q+yS}cIb!U+tYMOluJMPb1>M_D>P%2$8uF#@?+DL z!Dz`V_~@4SI8Bq44IYx7Pbnh3lW^$H<=Bd$eWUkWsb&7usB9kMNk@4#u~h(vrqF{7 zT3QBNZxWc^5M=&|CzZx8Ylb0VaP}8w%SFo?3{I$%S=ZS(n~9D_QZ?A{yfVfFoG0~R z*A&Vx;KSI|>~uG8SxU%r->t>2<8LyHhP^x~+}5+Bd0V2Mer9>kSr4iNFFbVaI;oBZ z?^1Bmdc7(I!&1LZ9i=d#SP;8Q^+Mg1uLSGYL3h#nyub>*B_3(z|U%DJ&Aj4c20M9_rnkGwi=L zSBnenz5~W7^n>8Vu}zWA)dG_3Q%IiCqXoE?aeLh5!bUgYDG}U4{)?Hf$`s$G+CYJi z6Do-U?}T##`xm(%GjEx%b+~XePrKp`R&%|i%X~|O-y+<+ur?E7CE?GsW2SZBC%_t0 zLei2P$$1#pk$4m}MfEF}V|rpUtW1&1EE4(V5w6@!?EakA*Y4vZ1WyNHCQ)j7$T^7H zz7mAp%HY4fqW-7mzXw%ag)-!4QZR26BOuc$$UL^K&HC|6V>s`0!siLi7Q^w&M&0#w z${MIh_QXMxLir#&Pv(P3u;{(<9*HEi3WUriQoaS(ldU)$WLL*qShYN8O6+zDsO7F7 z77+Igl?+@}Dkzz3p4u1wbS-7{HoRWZl5Ur6NoEdCsKqs=)xRaXnE7-b8kR>EcZXIW z2Hp`4q7>XBz1+u!LtXt&wLA%Z6mi1MK2zeG!+gU1UPovx^rzcC)`Ep}8o2(ZtiTi&52j9+=w}Nl`-1 z!0PSV(Q^zd+p$P4??6P}^=ZXtiWW2ri@C42Y=GoW&ISfl>B?N^ z8v{amu;z4+T~n+C_BE!#$!tW-GfS<1-impTiXzx@S254(_$`Si zx8L_^n5CefdUF|sPSn17J5fhZv&Fr8PB`1A_U6FFYGWb$uBk>a7--rm(9D}2zZpsu zlG721C?;Mstv)>;bC8tl_u^pws5w=SneV>_0s6&o{DYy%FNWhkM(zHmTF>8yf&c35 z-=l|%-CBhetPsf0_5Xt&D$409DXS_=DX7Y+NhvDIt4it08t6*ts_H81>*?sr>+9+N zL=P2zp@+ZF!(Zs(FZA#idiZZd5C7G@#P8?(9fbJ(|9|!Ne|>K{L81JmOsa-t00}!1 zM-IUa90PPKlhDeKtBRMpw_5SM@80DUD^3IZ$pZFTWmDNpC+6+? z;5A$%eqo{p@;$8XT@FzoDi{dHrpGL@MCntF?rjUKFS?qU{3d6-e7|8?bW23`OSx!N z!Vq9hMf`5SZb@apaWJ?P0Gzehti|7167SxgWmwXmZA=Z>j%`#gL|1)SpAA`$+CRLc z9YSchKYz|8D&;lHLk;`;DwcF#COLLTHlGmnzhLUiHn*4IvtYJS!HVT_8ID=L*CE4U zR6ft<&SYR`0*}|;jVis#jnf%X+WWr~QM;WSJQsy&nw)`TtILei1EBaw!+O)wTGRZG ztYfo0r)gh3+l33AYd2w-h(WBhVA?RapsF^E0KnFR)T#xJ;p4`jwCjY~l&O6|=sl0O zrsW%v9Hy7K^fijE9OGw&q&nC3oS{B6c-qQ2idlNPSSwgXG)8a{bXq24LKYL@Cz~aq zDIvqGUwf9FL#ESAn9=G_VkIo~MCxvswdZiF@wAb%FiS;?bTD7$Svzd_W(#iDcydz- zbDOuKHKStwD@$8;qr#Q?j&g@O8{^HGIqi#yJf9JtmZRsuW-LjdCv98h&5n7$?pn^9 zrxXV1h$ZaCr7*6rQqd8ND91tLESexZYU??^w++?++Ek?DMJ(>Z1d+V@ej&plbL1Ce z--c>mHNHt!qtaZD@vD(6uHw&llhOFeOo2iVa!hIY$u%g?k$5PGzQXYurggT zvyy~3pcfZljEhg$j)H;Azs?^J2E|nC)s9)o(`R$u6gRMNyw20B>u0Dj9dsR&Ptp?a zz~Jne0g3%BC!!Q(1kCL^Q%|;8#&n~TP$vVT*vd)gUyqlJe`iRgvq&2--mZB1P{h3{ zS0=1qERw&D=BVw%B+9rC3}*zQ?%>sB6}g+6H?aKxNNxKWO=fW;_Aene4q0A!ZnTh3S+ z_9$+AYhq+E(2VGH%*1xRxjgY+zWX^MY;AK#ESzP))_|sm{iv ztO^jD-WtZD=BnBk8yiU+dRXZUJhLO<=BiY7w~m>?YrAYcY@h3F|EbUKJ|_=lF*$jr z=g^1WOSJ?7I#7VH%vLf^88#cf6d$84|HUSXpZr7^uQltXa!Wtns%EnOgCL(*@oU@u zw7HuOM^tw&Q_U9VK}fZL8u1Ec5p=dGm-#RPrmtGsPH7dw3FE;kAH?tj#9q!Ed00X{ z{k7%^(X?5<8Roo~tRw!PkryAgeol6auE!C(q4l6#T!C`Da~TGRqY@|zKt5KyBOp!t z|DIP^CWhHL7Gm<+uI@YHD|P(}LS8=O~jEsP>jcz9=@F9^ob)4H8^$ zO}E49j5gPchM&v;Ihz*B8RhmC!GRi%ADy)K?c17ib!L~d9XRfPJ|0=NSp_~cq2w2# zGIMNJ;4KUmd;7)KbL)$&NQoIl*mkE2d(G%e$3cV9t;*$Wo>QqjZsGS|j-~j*zgG^u z>%IU%X~U|a9iW02wKW5t3XWeRu`8_DZ3W^LOr~48=YZ_}j@huB7yTz#3s^Z@a=$yY zb8-1;9E}rlKSDnl_Kl_j<$}T)$jsWTd-!eAuqkig zYa~o-5@NaYQh3o5(j*2^t;+oYxeC^}N*C(XmdH9*QWqMIqbK%rJLm&W*4Ik%=fSpU?p1 zA}_Yi6RRtP$j8=d46?Jev}&67%^*lNuxzjwJ1NJ;aUXS5VeBDy(o5os;9zFu9uz-P z9ytQ?XJPzN4gkwaflWI_y!!Ph>*D1xY4^J3M2`u-q6cE{IYOCI20m!U8r*6maqd%& z-NvGUvum4!%K3h)OjDyttzTDgPNhyNIXh!LF zq+TFFZYC>I8oUIgHZ|aO8TgLp9B}}WaoJEFOoHEcCEn89HJu2>Jm-|n`0IDLKX89g zU=jWFX|>6M2e}9b1CWoP?4S+D4+uZ zE|!a8Ee1DB`!LKE$6OXjr#n7>Rav|1&BB7l4kMjB*#(Ph5bLpnEV4Q?w=WL{~EFX0e&}QX-Z84%6Y-% zrv)d2Cs%$23y6P#>)+yEYI<_I$_BDJQc5~%22zS@ApWJJsH`KUs;8@lgm@3;+6sfBhTruV3uBU+lSG?73g;xxY&6e;)?_V$c0z&;4T0{bJAk zuVc^sGjsnxbIk8v2!CeZe`cRw4sU;eKaBq0h~fVa*w4@U`n}`(XX@_H9P__EV*5Av zkFafs9b!9X+6%;fey)Fu|H$gdDJ%SBJt^qPtAO~Af})g;lA?l?tdg#xp1z)}nvVWY z{72;%|LK>*o6ax%=NJC-3;+2y;y-_;=lW;%`PG~L!D!}p{?k7i&4B!uxe*FO6;YD{ z23_mKVAcD#qu4;@QYwry^#$A+;nqfg2E49r%%Dk_HhX(^=pOu<=qp3p=qEYhKO^Aa zp0Sh2fgcc@)#6E@Y}w=9-f-md4@eh^rej)t@eaws^IW~L)MTt?U(!1<{Tr9>N7V?2 z&d11SQW0qlq8<{tkf7SJHiyK{u!JSuB4>Zk4P88Ox+t)OFSo*@HlW^dQ@-k9P)6&~ z@11__BOKo=AnYJ3iwWhAo~c@C-15R!eTZ-8ooIx!wDLPRdGitqr*c?&188S;aW(Ur z*LWDt;WJr2r0kAjH)j8eK%9Bn>OrnT`HaN;fV^sKbA}VL$t6k-ig0?*ASZ+HaE`iZ z|MYNmCtVulFFVTX?p7W6M}=po*OMgkaRJGSFepOr0)9H~l*0O($wyp-O!)^gv_n4U zd@sjbKgG>|l78Fy)E{brP=97LPSL`6K|fRx7Jz7wtKtTLiAHEa!cVL?=<6Sit>WHL z4m-3p&eTJCbl{fCpJb)U0+<@GO@gL!*3}6~P z+RfkuYw|0S%`zvExr>?ZYfyCMzFD>XYPO`^jZAjdIY-TFmsO7A#WD^OzEY7{$Yq$N zBy|1SZIu5aI~bMMM=fQdcs7w7h{hFmrznTV8-$}o;|ULLDOF+dS@%$;fve|gxI!}Y zYV5*^45)TqlqcpmaxMphvj#<`>D_~7}X^I2FipibsQA?hCsI-lvby^p&BEE+A_~7G(Tf4iyF5 zjD{zZho3vlv*Ou6x>0&%Boujo6PABCgm=)P;Xf)95lxu#VBtZ(~JX5h?rJ0@A3wlE7XYCgKRn6LoF zjPV9X2b6VsG49#2X1ktj;(S}=R{vC5Xa2F>Fa=Bp5_=Xr)*9tz+#HOfrW{Kp8hN%J z15RO64R@{Xm`&G}y`3bf&&JHDujkOeJa!pk_!%qe>3d8;6oNF`fH#hi*!KxrKq`@* zqy%GWw!&$EEfm{Rym)-Gt5k}$eRAn|C5^>%M(y(%iaQRUpPb8gdwr^CmfdZJA_)2z zHhcpGr?i>)5fV$h0lA`80@d-n8CTz!8W0*;VCElnfyG*wVCsCq-OXp!;^v8}<3gEA z$NS7p$oH$kpUw+lJE3(|n;Cet8BiX+5(%H_n7izJZM&;yb>DvaTfJ?>so$<>R*G2e zu<-j1iKd)A5}5T}9MnTDQsUMJ;G76`8HJ;A}aKN(};_PmdELLpQ*w}U2lPew!3ZlrJCT`m9`FGbYC6Ccehxd;s9ohl*9B%5 z*yx-c_fk(oE1P}n$&-FO=ZVj#Idx*^$Ur-xya-7~U?`05$0$l0&;~|KFEd};tjCt{ zik3{*0yO_(eMW$$R&(LDO8f+8tE;g%~p?H-zqtK6^KrLe%&$4naf1T(WnPk(9J1PcztX`H~?@Be|AOv+uaA4$BcULdBL zV!icI>zbVC`ILj*TcbN>;9XK|A6%vV%&r;n?^(L8m2q|}CfCH?cb$A+|9)_X==${m zj!BlM_dZB^eE#zJJqLCzf6^b6)wX*&sXBGS<5LU`%74zj0H!z1KD4LMM6HllUc-%>J$h72{Ay)3h0ddJ)~xTTYR_L+&PYyJo>2RNNHUWK zX=Qg(_vOg_Ku&+$GB(z}x>3QaNSSeKys)_I)TYF=E9S?dI%@hREaJja3sYd7IAvE! zj>-eUqVIHcC)H0h?sLBRc4lb}H0l0u(XD%;L+tjv8h%sNQxdr6`-^}1g(k`0b9eql zZ>r`0cxVX{*RCUr?DH>+?{P4apkK>2*TpeV>cQG#>#3cuwJgFrUtybrlxK} zhbcQMbbVz*Lx%b)hK7Nf2J79mmA~ii{GPk>d+g@-*v;>;o8MzMf4Xu1=T`dHIMV;C zCn3MTqw)JY8o$4z@vGJSZ~4FPMgF;#`#tjgFXZ!g_afQzOTFg_2*Tno_afdtbTw6D zeFI|?Wx9reiV1v2L!Uv{TxD!bS2ogAHeRV|sA*`d2KORrn!oQwe&37yz8Cp@FY^0d zV?cx3hWti*rX=Kf!`SQG^JoxlGp?S%Z(V zz_WuZ$zOMx74WuRYk=H&CiV^jn$BBI_nMm{^019Z&00_dC#`c#OG!66D zZPw1tvC2B|xJ-@oZSdl|9UL5(JP3+K5ee^0daMMd#ui3*3>?D#`nK&xe;?cNEfJnr z1q5O7H@#%Ws+EQ-VTB49tB1F38EPu-{BdI2>-Trh$kZOdAPRJ4fs0x{*?AaW7i|$Iz~@*)~*&7MZR=a z4W;Lvu8>-ImRfW;cBt@0L26`jg-}7%#`~3*UhS}y%|AW5Un(VoPzPOBj471PlQDTc(~tQk!=Y&Jh^wiPK6m>os$yeUDIx%*g2r+eriK z*sB8$dRN~>xgF8?@OZ*-P_kOVp7PrNeSNXSil{9wC8nx&mglQo!#WsO*n0UtJX$S# zX#)Mi4=XmYR&B!ZPutdiijWiyT0bOePyVJdDSqeT-qVhw+jm8U<;>VN`g8@I=W`mQ zRV(VRKGb%|Uh4V0@It57!aeV7n6%N=Nx0ke=2&MZXAHgjRcHOa8$J4b zp2MNTPMR$??s9RfmMzWSQI|Z}x#Q%Eg%udSV-8zP`6}^#Sxt?XnVcQ+wS|6?ovHcn zdv|zO{Fh?zd5h|18MfGOJy=9 zm;FE_(nyV)Xq}0!#mSWvdU>OvpRDWQhdNe`ye>T6i{iH zWK3?~cgoTusX)kBY2-W}G103jS|v-eeICrcAgzAZ~CqtN`=0zYMx|fm_nuE z8nda@Qex`k3?aMM^f%i?^m;eNoheb+wJSAdsq4O7W;On2oz7G>GqdX%pUxPY(%OUc4=zW*|CC{`v z3T&U)-`kQtwBqwsvF++9&I9kRm0lbdD7!s!)vLo@YGpvvQIfYn?^a4(LTm`H{g!6b zeue+2<;AVc>x0`o96Qu(S__T}j~9$DvnjhPcgMpsdEe&9k=~_8Z(NVMbXE_2t1#C2 z_39v3N0qy6gWy5`T;m5X)v|>g!$#cb3LEdg7UrUD)NYS2qhO;4)>H0Jhn9tCKN;)X zdgb+jZ&p1{0aVAf`;lczZ~HUOTUVr;nRGwGJ2okJlhwgg16UU8fa(x z+##`~hQXtH7hUqJnpEA7<%Wj%E#F7|{7AE9Uho4{C}YszzU3(TkLT|L|SZcm4ATC=Oho8^sQD{EK*ZKH*(E74_g$wP4MGq#v zpPYWRzj)U6xu*a^=x11tv2xvaT~t8*0il})3oYkwtCtHY?j__SN&cVp(-`Q*AEpF%IpNJpKc54UUvhZ7%qMkhl9^tx-4J5~E0W_mDmF z%<=MT_vcICYgMXBPd4lAp7q{7wa{!&anDNZwHIYBdm3@bXL>2^K7r?~AU5V2x%b>i z$ZB2gZ$=%TvfOu=P3E5V<>U`@6EF>u3_KKoIpv(W!De4;D-QBV|O1X+^f+70uhr}qiD`q#-g^wl-pQc`zTRIIN7^{1rbI9%JkSES? z1<^n|i9>2;TPq~RWxIou-Fk`t)+%-FQbW01%+I9)1@Z~u7NS~Fm zFkKe?#{wRrEgQXqcJy2%@=oDRCBH`0XHR}+YfXbi$N0liwdk3^%owhM zVXXS3Ovo~?m1XK%xFma4amez1HcyQy*SN90;mE$#kNK`HuUQcw8QJmJFSVd2E2fYu z*nXliM%9wX-)-GXTZ71>`_`A)CV!QoYk6K7dwwztWMok?Db)Y&d?e#K)4Ja!i#e00t7O*tA-?mqB2EeYeJ85iB&ckK;p zuJ%gdQ9doUZj`q)()hJ#cun-BslXkTu`OF)zZ%mB7StAs-*0uh{#t~^(R&6TCv#2( z7arR~7Y(V6&i1ZucqSnoH}L%Q0@ZCnjTtkkR9jY`St^jUx9CE6lJC-sq|QU7E>~U# z7wE+v7Hnvenrgh;&%KVLWNgKx?#2?+7q-k$kJFU3!?ljrPkP_gXN`ur&CfSRJ-rqiA-CC$1L~Y5*3HolPnQ`Rw)nq@v&9Q4gR2AzdS2YF17EbzE zV$3~Z`69k@$;?MJTl>Sz6>VgMjz$0y#mv1VvlHEC#!@z&0PVTaA z&<=Js5zXYVoO>_sWj-5&1l`YEcwYALta#2wduHJH2b+3#-y;K`bjF>gMU1wjj=wcE z=q3wt#Iw6u1+h=A7RzRC@@C%|$LV;=ue(;aQ~2w}_t$wH z;Oz2Al<$yuQ=8S}!U*|#@es#H&rI;SVw1KZzD*kST(13bY;Whn&(oy0l$TW|60|3- z1-J`5=47uyzV`2lZ-_+R9OHdwC(SG#z-#YL+kb_(^tgGO?=HpH<3XCcs0&SPiU%&s zc#cc%bL!n5HKS>1+=g{_;%(pP{p-b7Z+Wo(eN-HutF3h-X=x|@#s2o^ zh0+W&SVNlhsBTE7_ERe62lA+p1J(-pEMqScv>j|;@CJCzVkY7TlC^sOyi(N%Fv#oR4j&5D>)Xmno2i0#gYAbs z%WEy~%1*{wp1pGR_CRig?J>TJHf_TjV^d?B%$<@GZiaLE4@L?k>K95Gy(x&0og?kO zG)Wj(ZO`p*U%Q}?vbMFrJ$T-rewn+ty1|x_0jWggd6~l|v4-*QXYU@0S9l96?C4Mb zKswYHOhPn0!F)}b2Em$a4pEZd$!Fgdj@onlK;9%WcPvy3=*~$gw7C3d#`{l`N}(a{ zsLUUF-Qn{72eQk$RAByLSzm1M6frKQZ~4$dXT#3S)+u+pYyDOniQnAjU6@BzFLx#lilT@W^&xO$TCG{^<@2VB>uBkZAYkPS@k#~HU z$aVK??2Kr2m8~{U+L9s5&@bQDG!>=X9^0ZJI@Bi-al;(@gkaKH(|t9a8XcX@eV+d@ zeYQGQaDm+9z~jlYKe+M3XEgV^|LowGh_Lajk_qnIBRBN6^sdV)%QGGBT?J2elzmMRxaj@v{_fo!t_@~i*Y}U44_xgp<*9?E#OI!+ z-U{A7R`0a%xXn|yMbxUqO?`UHYZH-Ibq2dtqNRq-k7Py+J&&jmHkFz96t32&qQY*u ztadIjMJ98QK6U8I$`{M{&==qSS$pZIv13e{oE_r9xt2YQzjtkSU7(R-it>7QhN(cd z$DLfhC$ZOfdra2KM9vrgZ7UOFdORJ~z5kxsKK$ff*Q2Pmo0zE_T>{D9$$9CY`=Q`X(0E;eV>l?2aSyuMW9RK*ewZA5{}+%fp+v{RdVVI$WL^KBnC zO7r?(VZ)xz3rmm0DM-mBT&ITd`5owJmE_en9LOd6U)a&KX(0LfAx*1X*|KAoW$yVt z-did=8Ts{o?c9dsN)hc%gIu-)w9^CcdsePHZjrp2zUGeI;OkG@-tM@WYW!L;E8Gv{|E$sd>AgN^bWj6?mq{)k zWwo#ZXX}A&W$cpAijQfGZTIfFlPX?ken@Cni|4NQ=gHA`ycny*Y}JJ&YTQ|dm-xPLNb<%XwbY!cGy(L;fEpS(0x<&k`p z_|Vt>RGpbAE@>^7gjs6*)l7U>Xo+&Q2g6I}&ZNh^!)b40j$}Pd?0!rWXr`SwG3_C@ z&Z=TnXHI|7Dk@na@#T_+1&J4)+WNPd7cC>7J<}+Gu5ErQZ#~7s>yZEY-PnnhvC4-M z(f$hImuj~yT(#E@ybcOh%l(v7e!jyZSuMG>zg_{Q%{Kuz>pu{@wr%J2=sE8`|C*Y2)Y7SMFJpH( zb*CQ^eUz5)3?rd6ZJ&Mx6mP^RN?3;y}N{9<=}@HwZy zTkVXMf0T9M-6L_UG6X5tjO*M}6iT8`Vg+O}OVIO54U|c_ZK|vqbmlV z$9?EmRHl=ymU5~oCw@$1&G_|#iLYB)>gKlxT-|f<(K;!!_`omxA9Rq(wc%ebi*(kS zwcS=yr0^#U6`z-Nh-tZ-WAP&4_=^kOlYVJso)I2*PB%QV39c$h<9X_CTP>md@=Ne| zRHMDv>IlOEx6Z(;*tkYC!NcHkIZTnU!47g>129aklH?1T25zA zM9RSwX&0w}Gd=J&6=sxpRnv(x=;ddQaEHl;$j{=s_1^wr>*de``tC}%8R z`ctw-Lgb3-)gex%RsB5D6PXv&8Yi=MzASLN?9os+S6{wGqwLJwhK<#}?1?V|i1$>N z(oFWnr`M!#ebXZ$_N~Vx{flPnH|}5A#r!BEf3E!g(O}D1#og=LJ_u^c2JC+=JKU)9 zY>=nRPNaBpZ`RNn+xqPvqYho|;*#inBIT=Ao^Lk!SUbA!?bH(l?MS93`t$c|r06J3 zZ2ovLyL7J^-ro$o~Of~nHU$`8D|e;^r@2Lk`$2X|lo_0c1igk*5)~~)$N#7OP7gmu~OCD=Ne>&Ri&5m&C5bcJ+;qhk|`&(|ju2EbX2Guk{ zS}~8qx%|XQhQ^bSD;g3Uo9YAXpl`*DGEa~_Ub`jERfgtvNc};U!JQH1omAR-?Ac_7 z|1znt{*Ys*CfA-AnDjfYbNq&E6yN+-ts8RNgtU}NIa#E%fl!WgtjHAa{F+ziTKhf( zipuYiE(#@JXgLS9KaldX#nbkL93diug?*ip_N|$vJGdlGOX{wENH1*MdS&qSMNS_E zo;TsXSaSEZ-C`YgyWB+V9t?e#+|vBfsr+7P%cnPy{^EOLI>rV3<)5sR=@yy%bYelY zfO)n5b-8)Q^MUy&&%*uNWklC3UE!$w?!t+g3RCA28DjHCmpO$u4o(mHYASi|Xs`Aw z?qHnm6%EecVT{Y~?CPp{Cf=?YU#$4#`sv-XdWd&C$ui1*O6gtdj0C&~_<7Xb-sWuu zc_J&aTw`zV(ie2!CCKI7^BYzuzC3d#o~yrK>s!w1SRVeVhWfTW%|0&=>M!&*Gj(T{ z6y@dzfl`oeleC3#z!>BZi~%eN1g)HqIPH44KaTrdnQA7_@&}i5tV-i#-q0cm_GkTN zV>!~Y?fE{bzuWj~#qeyY#1m!R%YG8PES>Xwq-Q?wdG6Bk;6aH8nLNA?uJ=4uIk(aF zKEd@$OcAr8we%7n$FM=844sNO#(rh`uO zUJ>Xso+NJXst@He7r3kvv$kiW>|NCp%LfjtP?44O0hQrj-qS6I1Wzvr?C`%~(#d}D zdT(Oc(Gw1r&2xDhF1Z_72NgXbrqfEL=Ax0n4GYSx+QAdE29qWU-kyiWI*XF?A7rPT z*lB!}Vkxo`Co6mFaUwc&{b=RD#pi`g&EvBfrZT!An&0uT?A|A?x9!u=(b~_98ZT;R zoE|8wKU*%n`LV5nC!cdsQVH9k`y9e-6=#@7Zf~>OwttI2QiHk-BPdo>jO-nD^advD z|E@u||FuDbZ8y}~xf||mC1+kLq?`!8mZsY)`K&BB)3l}N z!h49h1SXq3$J_0fuHahAO+4K_vML`W54NvP_gR|c#IRT?6+iK8r~5}P^Y6z8rp2O< zak;24#$R!#XH|SF3)9&wy3YK0QUmIplrV$s<6IEh`E*u8N4Qls(M0j%jKI^UPrbG# zo|nC1s-3h`=) zj?$^jI7KUXpS>aJdzMGja4o0rSi}CgB>RlsKbiLs8gq;J6`{gpI~57bqLuc^@J6<; zeMIi0=%8GC&H}}L$$G^rGnjfVj~iC6>u1ZK_XTY0MlCS=a-}Ox+rJ!G5WjXa-cCH~ zN~KD@|E2aVhq)>S`4Y;U&n%g1=BV!Im7CbT-%Ry^?f~Pp$0?OmtHSA`bCrSB)@+x| z>vkV*lB8{}*iP3r?mw<`!DEQIp`~D5=ZW>+F8M}p98Z{s+hW7?9ZJL;Pd%Rdf$W^L zt<@#%pV`iFT8&E``bQh3+}-cUMy}dNtL8|KHdNnvvfr2LtM>JHe;;$X#)n$Z`R(6~ zTFPw)cj;Qba8GMdJvDgVy}a?r4ZF`@6Y^`uYQZT#2D0T1G!+{>7;_{m&Q~Tu;vv$5^W z74zAsC%HFB@&}T*e!z1c3##hviCp7cb>>Q`lzvz(T$UQsx#n~=mUkic68WJ1T*tL$ zbaA~N)|x&4b2$FIBIDoR052VmYro$YRkY$}JuWIwUs~fS`$0nmZk>$s%e2Djr50(I zejqPOejtNyQOi^1+Cu4V^yA!DpKX|In9Y8wR3RROZ5w>q^7*?6LEDh8Xm(a$!Dz{| zdyhvdzFzH;T58|B&+l+dkBM7Cgf;W#zJPra5!uWjSCt0xWhMF@osJP&%qULfkqp`v za-PHSu~cZizWUY4p3yU~?1X&6+4yU2&VJ&iyN8`Sw^*|c1#~j1`Vkk*qhKdk&K%JZ z3Q{X8pYS{X{ru|pT5Y&OpLHMGm#eLRG_?E<-!+C*Z*&@g|M;Fe-AA*(XZ1hRZ9Ne1 zZn}O!ZD3A4d(vslU4zvR%&kqC3BA64lK)JtBed${v21$g25DlcVcqSPGQxrm1I zcP^M~?3yT0e{}QAk=4aX2fIds796ZZx$X_i9VUJ3dc@J9?s{#fbgye8HY%xTU06c= z!Gn!&{d2Es|H&{0)eodhA8z6!&Uvo#S7d$Ps*5K);j|;nv_b35<1XQ@XS8 zLcCyoO*OG0&VXS#<~FQEhGk(U_CO)UC!lN!HFR5}P+RT6~8#L!TTb~6Q*s*!-d8znhW3TG1$HOrE4m=x<+cWm3@>eLD9}#suwsgIl!zo#w0lJ^5aGT?6 zQ@vYv0*HqE?@BsuF*)ltG4D6La3d8beor;XD=H}|$-AEy^YO%Vd6?8;KI4ehcS}3* z&z568R$}cgKIf?u8qQd{k|`F~+_A&Xe4=xD$<9|Q^Hu9~xp}7f->V`Ujrsmv&_6%X zeg2@;r)OQZeNEz-XBwm-wY3Yp!92mwui4Kwd4ked;lE=SGir#3YrB6%qv^!(<;*nu z5AVEv56X55hlNWcobEg0*G$H*+~Eh`0?GSCesDeaq?}cTf%Vit_=}d3i1g~}J5JLx zfs;j9{p&r$w=bj*H0?IdJM-9Gd?ChLHqvm_TH1v~9u>oc2W+!c(qR6FVCVZyE6tBe zS{6*~RQFo@rhqUoAIcl|blQ(^;YC;R)31}!9@S$wBtYpz<=$Z7wHaDL=1H}7Qf&@x zYNC~eUG`(!514X|qB*izv-gJ z%08~@=N>8LiRcw-*H*|dcN^q<Q^jqIs z?doye)cI(J}-e-cMltM|eaE&i87V;VXV6WuA4WTn-vV|(}d^*(j(DZi(_HE$LDnRk=_ z<(ZRNZza?_bjbckYC6@6l_EZmt7N?>xVTPYP{DoEcv`^3`a>c)^cjAeWpd{}SIHta z6P2~kQRdFvNS{8zg2xQSWu-6g+>zSDDa}^lKD7K1h3m;-%I# ztixT|RPEhVCWa$t*mvH192EGe`qPQIl-GOO3o5w`2bPz=OUP-;f$rk)Zi#@c&S(AN zuaO2~8Hc(t9p9t^W|YV~N;SEIV2L!gp|67#$8&k(s_bMUjn8W5+{6ZbZy2z9bup4{ zh%5G30W$UfZTi8XWF$1{lVnkV3+rEYB66PHdnaUms?k~G<*0i!)Ox`rl@9xNuP>_~Rr3p^W&|l|edu6(6P`GF^GdtZ ztkdi|#VI|Z&Z>E*-p#MidB~nBUUEE%^-YtS8_aEBYTD2KKtc`%Px1x-eFQogGn^9g z!amh9k7+!xf+ll@cGxrc`prZ}fTyg{Qz6jRj~C=;bHh6YYb|+#_O#O$gtosc^2>K= zx6}@Fs!KSp)8d8Estvs_#uiW?nDtFQ?qup`x~n~t13I|8yVxTJEIIvmH-VXVGcSBm z{vjZB?@6h?V$GgBlS4_-pPLZRmVaMSvlKh}|LWDv(TCMaNw~-tv z1YrT|I3uPWdz}51j68Syx;uOLD7qeYL~ckS&4(v8sNqqH3V=ZooX`VKX-*=Jh{2$| zWQ-6ZFVr)cj`jk5z~JX}v>*HdfP%xbJdt~x#5Y7~oU@0$!CEaG9#ulj2;8r%PzS^x z4MEXJw1!B;GgOq-6_u6cDTIAq&ig#QTpc|`um>D{Jfu0y96fv;y#l21CeBV?IO0y6 zxue%^7ipaSK`++>(in5c0BNk5uZJ_r?s6bNnqA*_pRc!%G+~qTK_6##Cuc8d4$Iv> zo+yEYxA62MEFs~IoOk1hh&-)L!c5_c0p3Gt6>+B7(CfGXf@gQ)L@is2LUO0XdyNQ=8DlUuP z-b@bWJO0pf5 zLH+?7oY?fu_p@1-vOEm}22KXa(Ln6wl;_X8{1Q#FX ztwV_6u}7R6yV6UM8^NpkxbTW07Z{qzqJzDFDrzD>U2+D>wowJeAq-$8NLlq3%Jg`j8a_W0{cv6i#-CaFgYfz3oist5u0Rc~7 zi)wCeP7P>9@q19BGHi9C+{W-nqweA2l^sN}>$Ke6$50NGgHOxbdkQ6R5y{lRIh4dt zns5&c3=CsK6(iYDS|a!tz|UYq6>hPi^n5ndxQq>zt6{?ohTMI^hE*6mb)}UJW8rAg z$wrW&$;c=x%E-tJvSDP5mB-lF7h~k6*zn5GUTV^6uFbJw4AxqNVF*Nw{$?Wzo4zcL zbjrYb1BK06HVi`~VqoEQ3OEqpgn-313f9n~=6V<^DrUUdnt~Y&%fYD%tu?Tsu+?0J zt##`m`MUK4eZ1+0P3ThO7ET0fYJppcl;SLe%&g$&W>#Dl`1RJd=yK#HZn=>8RtpNo zd@I2~$YMME)M7h<3vac~3SEZe;g(@-Y*7)!h9Dxec@ym1ya{%0u`q(2**I6~R(%r+ z*2K_6AE#!s)e!a@8$vd@iI}aHCKL{n!H`^>1PyY%)e6!Xnpm0O+9x(|-MSUBdQAjW z3Vp4CG!DyV8I4#o57ZNuS(5Jb-(2}N-vA!@WKmj7$Pbvgb)bYBA|A$GjL)C+n@>S1e{bVqKs%FdWba|G@+Ak2q#)+cn}FgG5A3f zI8GXh+(6N0^auR$yh7RfQDh9|$CngTzNu`C%Dj(~qe{Cjq|`*cT}8vAb%DnaZ22^L zjGSt=c=){#PBjD9rsHe*s|WW}ukI!1R!5X{zACs`hqhtlHjyb$F?iyNAks-8X#)CV zqMSfRg;4n%%$c|8=sd{g8}J=i0Db@=5FZMJ0mp$8z)2t+hyYFjr-4Wy3Wx?`fLI_7 zhzAmYL?8)hhhugCuYlJ;C-4QB0cHUv@D-Q?x`1w=2j~TcfKgx=7yefhwRHr~w%4Jgh%5v6)|l>t~yBzlBJ<3T?0!!lUA$bHTcD2oj=8 z_hkm_s-Z;0fgmC)&b~!gP7EA6ic=Ipl<$R6bmj8Fq34BB7$S)5hO`Gda%SMrQKF(M z;^>CRfjWlNULvgbgSf(IBd>CF(YPxP``wT9SvKiCD*)C+5`?alvu1P=hyH- zhhBI?i*X52rm3INlEbgL0Ea^mIKHx(>xhn=T-;Gz`r*_2SrNuj1#9G9UfZWj4}NV4 z5$Hpz+;E$MF5Nzz8xARERQ*ba-cZoUz@p0Ud+@q)=f4E%(%GFZvwqxITdgCvY1uYi z`nqM?AVM`bW^|2Q|9KT%dNTijmZB!A{AKp~YB}BfK^^*<^dU$hRZo@sy6%uJeZc-U z!vevHhzZiS>(Z^l*TGKeyP{*da)-4Qbq&%`>Se6AuG}}{H#!C)2+l(6Y_WhYJu#XK zaw2t?D%Z*W2r_o)62k>iZAkUdm8&BYb?9Mtb=D|0!z@kUL>yl>-5H-$4 zY^5W=E}d(_41NKpjnF!&L*G^@#VW^c8yj6Y&)m8-^t#96j4(twE?!VqE{uEU8hXj$ zOkNS?4E8}1(T=hck5FWZjzZQ!YzV*9jMO}$%eB0|4n1E9>W2# z-p*#=fY7Kh;i__Dle0SXOBwtR6CvK36>n|C2HVQ|2ao8=J$UD&OFySzs$v2=q=+QB zm^VupTUh(84Hna?E~agH168V_1q!ij)B25Ctb@-QLq0Q7?AFk2d^XeU91tOjnaDaa zn~1)S7*h=@%H&sllnGkf4v*+M2wcP?$=s57YO*1N4bP*8nxQo1!h!-FxrBJKF8yw< z5=tY5=sKX{8w}*x_HRIx-@SFwk-M594cSW(G~0j}2(W>Rkh#v0M@OJ0p-CkAgYt5f zLGg12-6-theS@y0%1uvA=|H^p;xDlK;XqlLOLgT!Lqc@uFWYP&S~*pGgRa~YwWT`r zLB&~=%1V=bmS0!y^r=%i^sdfMlqMB~6;V3L=J>%NpPdX!y1)Px#p_XV>1495GOCFL zsvw9eszOTBqhhb9AOt78K9vX?I45W*I2A;Xkc6lot3!l#9hBFl^3bIoE{C77S>w0@ zguguq*F;bioF>i{)g&=rF4qZ!-Vt(Los3Y$=m7IoY9f3q?+NHrC7{m) zmM269Y$*3BBA(?%{q$mgdI>+hGe5l)A{mTV=v&HT5)ww>K#l_^fRh9cZ1`KsFiDK1 zxuK7Lion60Pb9%sB!L@=g3x{9atN^$xB(6y3lVih5^Tl8?gThOB7sAQrNzx42dg<{ zr`TIjA`Z{e=^RK0?0*HkhO=~z;!Y!9ATR@mm?n}KI6M;$@fCK>jpA~UE{JC1c;&$UCfuXI1LqHB;9cB*f`Ei^Kau))%z#v3~e#D6( zsc?eRi)kAmZ4d1F1iMat!pU==Ta120;J(YU3L>)V>gomzWri``WR-ydUDZ%um2RME zz@Wn_6b8y_`bI{Y#)k6HcQ}PG2plv`=*;6;ro-~*Nb6qIw~sQiH}u-7+%TR=8&1DF5-Oah;P`w-3nZUc9K zT;MK{2jm0yfC8WpC<2Os5}*_)1ImF1Km|Zi{#iZ*5wet*&3YFeh_I>xdU~;Ycm@#x zSxFN(@T}``8_xi@id9F^TE}MO-N{JwH6DbD*2!+i;aS;oK=u=~5HI*qAYA=0lz``! zpYz3c{`A+az z1sDXMRoF4`Sp}W~pH=8N@L2^9V}nWoekAy;!X|<*#UNc^dFl&TR11q@1@WQr>arMQ z*j4tAi@hL=`O7~8_EbrY(BQ|lKpjx6C5y3YgN=INA#6!(T?E_@d2CM^`A&Wt$9A(iNU=CPt zl$Bxw|BpD_93D#lD^^Mo_A&r<5=$sjS|pZCq;yCukw__$eoBP3|ED+*!QtC1X^i}L z(olX4S?&V5fu7KX(6OF%dgHL&3-kf~z(Bgl0q}-^H^4A30*nH0fp_T)ZSdX$AApa* zCr5weoz9{pwo%bDSP>i!>f7U8|5_q=2e$nL{a;nBdL3#9;7P2C_DdxHNKAi@`>T*Z z5L$o^@E5`TrJis`rnC0Zwd8aifJ7h(I0Kvo zl7ST99B>}E09*tv0hfU*KpKz^WB?RZ{HXL960-ctF=X3v^wMI@S$d9y7A`8OM~h0T zaj_XaTWm(V188U?-U0jPN@E3;#tOv)GmbQn3k{kN4H^|qirtZhy)Y(lj)c9)a`BK2 zXBzhMV$2nm3yq5=r4DF0(vWkZc6`vTAm7e3Bu#2Rp4G@;CpKe@a+@rbfKUi`K4MWh zEXb0?t4_lp?_6oWw2l?~!modZ6}!69xkEt)mOqiCy=1t%hkPYMjw}CrA zE^rsf1M-1;Kmkw)6amFRDNqL7AJeg72fc$31?&lSp6!DWC<|AE>;?zzh% z1wbKC1QfgfsqHw*B6$#(50pYYC?f3rF-2KSXq7<~xdOEBEtC?<$4ALP@t2kf3{q^) zv&o-9A3**pfJ&eWs7AYXf>#UF0rkK`paFP{emVeNBk&Yx0-gcSfo7mZii5TL1<(e( z1loZP;1%#%N|Lp^3+M)VfL@>v=m!R*Gzjx(%Ay6lfM@jvE-$)itZpcJb+Hjw|&ulz36r?x_yi8 z;G#Rc=)PTa-*bzSu#jsGbaFBLdC{F+bmzEN%MxUy5C<)?-?9a!AJgzqs10qd`^YJd zut_Wj7qcOF zjFK#bd|V-p{)214y`4yqLH|bl+0u0zsHAPcww+yrg` z*+33(8@NN@kV1^K$g483rmMsZ{X-+P2h!hk@X_n>cj5f?-1mhSWt$J>xd#*gg+LKd z43q$+KpAi!Cpbn@99s&)(Bj7Rc1ZV`F0!_d(;5pC?v;eKZ3!n{n z3A6(pE9(D!(UPEI-(OLWl&=s+DxmO{z#q3VNDXYWZpZ3?df?$NvBF&!@|Y#o^h>by z6m~TM&w%GZGtdIG0xy6z;N>4O^~-P&Ig!2IJ$u#ci}%F&^YnTIvdtYJx+u}tP>N2V z3+M)VfL@>v=m!RXL0||N21bBU;4SbD7z5q|AAoV-Bk&2B049McU>f)gd;w;FS%3+A z1?GTx;2ZEASolvQ%AF+-@*Dzr{<;fi-I2cq-T`AEw#9n-0VFs6OQ4@1F!2|GPD8rS zz#ljM$SiC#0l4i)=74$N+aCfg$$MmBuzmwIG~BjnWGXZV1PZz4)(}T1I3$*E7Wb{aX58tspm@8~Jd37y93tKvs)SBFLl7tc|lE-(-;Q zYgXVK1VDA5gRH!nxfU5;z<AyE0vl+ECsl>J4y*3Y>I$oUf0!nax|sS0q#pj2dYF~E zFmHCUuj*;q4sz7(;r6EC!gJY>7eOi<8+&1#+lG#e{7OFp=`A+Xu{1m-jrTBx9jTQkI zY}iMsZ^-1YpG<;0Z20h{4@#?EH4R*sCYOBq{$t{P#%Dd6fQK~yG3!|jl7>5myhV?J z?2aMGTX44W<6n)}MwH)gOLrq*Yq@diYI*9l5je|;G^+&pkf(RQPVkPEr{=e>nkV9J zr-!5D?5>k-siifIJ81%TGOW~(AobX<)MJp^sJ8p~P)pd8jE&^8l^=&6J<6Swny0ic zChmpAQ@;{Vu?}6-lX&hd^KM2TNzX3@i7gw+%0$Vz_dlO~n9scM3M5n{ z#~)&S6h$86LXb}|J%Mv?2!eb92QI?<#?qnOck`Jm$su2>;+kU;KAF%c^rgbbtQVlp z;*L=aRexTCLC6-I;@tDFFIDfS(sEJqg@h+JpT@swd5j8D4lb32v9MB@PQl5B<^CXt zGz&1)V4RZ|$4-LBjwM`@Rw6`hBj>%2z6X4$hK}yeUXCJcWQ5YHibcs3K{XPEfae#m zSwiE4&jz#pCUnWjUq6{92Mg4e@n81YheHq+l91x(+zuOO_k%E}!g@H zsHiVa^w^CdOV;29vl{2?f8Q~;&8 zbU^S0P!GI^fj!n!HP*;2^m|c-J<=;FDh5hKqK%iMhqo*1VJrpf;pyQljKH(c3EdSq zydLCF8QD79SNV!91ciqUFdDPit8-esy$nXQ2>dKmt#LN4J}{paT^!g#L*b!27ZGdY z?C(P%D3e&jRA>T;HA;nEUmTA?FCW7ri{nvfHESyaZWz%o%k$azt-PGQVUCL<+DE2t z@^$h-FOnazA+2nLC=`#j0IhsuY_>#EBB2>ep)hn1Tsxj9L?o=*K;hB%^>Oj^qB?r) zp~46DoQV>|6)=Bhi!Xv0$|;a3+W0P>6;C(xpg$kjvV-6Vtm!f+N)qOL#Rbnq$vq&b zUZ4}`0*FsgqBT*H2+xR>9~khj@pCnz5V=sjS4R9tQ`Z>?VdSK8u@s!Kx{C+}k0rAS zq3{q+PoEBQ(^W>6!*gG2D3}c%Mh8K14u}h?f;v$}T<{1Pk$0kqZ=#CW5V>Lg4jgnF zJl4UHAjHNitwTmgtP^liQBeXJ&xXh87{M3zP>IERvbMz$NoAB57q1M}^1KiQY63hi z;Kkikgy}+}xRoRdAHIcdOfr?n_nCLlC^$Up0k}LTk$^{cp+Y1Ep2&{AglF4Yyn_1r zcq`P4Ul?11@^gy8YzZP89x*nikSIEMjukf{MjB6qNV!{lL?Sy8i>}D#CCTL|KLPnP z0u8_;sT^z%x+>SE{|AU^bG4Zvl!%QbCIl%q6uOcNb4M11;6%?wo!8{@Z*s~-xug+9 zrIL$`EjCny1ZO?K>A*_E2pCxzCA0D35*1V7`9CiEXQ33=jYnrI3lpB9;b{5sMn)7KD^#7HFuvsfT@u3)e!zk-s! z2HJpj9R}NS1gS89;@E=d>3#SRX`_r;%rB{%Gf-y(1{(=tMYJ-ZVXq+G%tR1#20Vdc zCFP~Ul1m6bj%<*L!|JWr$oNyiv8;kq+5U7Pj?S_w)=MdA>}dK%6+}KS0TpT;!XRrX zyYN_wdfpABE%pY8um!T#Ds@xnIZ7-?iAKYgV#oVEjrjXKPdInU@$tbKh2VKJN1!DK z{UU0+<|4J8`c|L6dG@`&&PB8^3?EX(tuMy9Fl2^42YkwsY;X}pkT3e^V=REKR zkf#lTJN;8nOHe;T-~%v#;#L1nO)Kq@W>Az36#to;*8ZucC&NFI|IpG07qztGqL%hy zY3RkRK-dzqvF2LT)zqa_D;G~6&tEz_Zuua}O~jcy?{Rgc5cK}6!4067u|fzkwy44D zeroXYzi9COUm9GB?{75t!XFyE;13Oc5w!Ot&;!gZYH z?9I~ZZi@=dPYAR+!WttBz)BN^?H9ZMr%q{N%-}Q!GmQC>cmw0@82tPyCk+1EDwyzd z0sX1+(KI|XIsF}rDj&UuQ-WBQC{VVoi3cpxYvM zRm>T-pty}xbop31;Hd$6QS8^Grd7K5=@034y3ll_#KkCo*d9e4_%5u|JVI~~o=3?f zulsREit`ALY&MU8^1rgE{7vcDLDZbSoMkD0s(utV$V)O_y>l3(I|8tD{}4dz`DrS- zz6r6EkRHhht-$KNLcf7-7yhg;|FnDFFx!A-_oTl4w0oh6|G@4Up4>rVbriS)g#PF!$kmeEA0n zG0N{C4`Tq!=#5I{7CN9TdU&l zduNgm5K#L(@B7dD>!)9DG8yj7+?hGwbIv{I++)&qe(6_CM7LN9qI(z(qKmc`i0+PN zg^Ung!zk8*<`l)Ondc})%1DWQ6ibQ4mB`M2F8%qEK<~ja@LgCNLU;WS+WIo`u;972 z_~T9ASI2kVyNgY1Hx1zhU^}OU5+mDD?4>Y?1R^IwB56A|DMHTPh48MC?e6})p!g=N z5rSIT?h}8W?XJQivER~xeT{pdXl-J<%@VPRxMpEIW&?z{W}wwF4)Tz4ogrr1utgfFv)r;m@>U!~ZQe0AcLQ*=#7TE!07Tse6Mtmed$ZQze3NoV#=_&BCymZcN8KJs9;|Aff=vug}w;fHpXde2} zGz7ezJX8igwWG~;+q< z7(^y9*}RZvhQg}Mz|$0Ooe9j~Ur3l?m%V1R-}d)qhIXRjq16z>7mAvRp+GAten?EA zNx06*03n45QCbm|oORJq7Il{yR949X^3jUyzzExD2T#@dO|O?rSHCpr8$BXaY6kfx z@@`rf8gD0sA!Tx60^PNL$Xe3EZ^ZuTe}y?lkIOL{K;w+OP&z}Pg>i#m{p2w4h497? zyay>O=&JUQ&sr0LYDvHEyDl2`iEinJ6R@7Pl7(!A91y4dtob%s7Q z%@C6wpPr!B(+*|-R`)XWNi?LU8WPm8DY{g3dP<_s3sO2wuZ~Sh%+STh8&cKCC`v}t zD;@vD>toZ4Nqo@A=Us^P-mpaC#R>ubEd{h6p1=@njtaP&?_AoXVM6AJP=44 z*)L5WACKe4C&l0@$$5xRGP(r%kZ($Cx{tbbqB_Hnrqk0yC@*pUWG}U$mpaA}qfhLG z3ysHzlq5AAsIj>8RPRsR^?h)XI`rBaHyY<$mkW4pxLZTSP(nlKF%Yf^S zH6->?XC%d^rzLCjvh*~y&e1{DS<|JL+JIB*qfhZtC#R?2+|Yj-T!Arq9lcsy>=lo% z>8U=vC$V)F&4vyGW;@&#eGbkRijMEs9MmYTo z!s*VDzr#j)%+rctWG9n@jJB|oXjuc9(JEbl=XfDNB=JvZP4r?Uciw4}7e=g7;xD_N ze~GvijaI4dX|0Y$qdjGGEb5(2jzyo2R421zQE{(@hw7IVl!w{12;)PISWB2!CKJ;? zOZBFGRmJEuy5Y=*Rx3hPMu)PQuD+*@)|QE`=&J%<(Sg;_Re0+h1CKQaVt-P_dwX(6bq>{1qGcmUQ2>97#Y*CT=Ukl@`5_VRc z6+1nAng3du!$^49EDbFbM4~$euXV^=t{Sys`U=0vnzhC=Afru*(&NNR8UM@M&8#>X z$dtX*{Zas>NH92?EUn^9S#7>Wv!O-DRYRx!Yq7J2oTFD}oU@`L75>E3^3B5)uLKCI zE(fd1lOt4H+O^X%T_fCn%7t-deENuQQG3bC7R;KT?fmb+OyX3{>_+wrlT(o{D+-7q zPl-AcV=LmFj4f@pxWd$e%NeHD=PekbEd`c}kJlwYz^l_FL7(c7 zL#!7#4ZM|_M*F*ZsT1^s2MxXGZtGI@$VyF9C#A$1)XDlp*Ti zvRh5uak{bvR7Au9wb7 z+A*mYZ7YYlnQEYS$wAcyU4u(3=suu@NY)D@PS5o7RT@e7J`83p7}dajs<06ZYhD`z z^GaL@^V(A}dV2?+cq3VzuRylNZzJ1Z@XY@TYyq|cZ;~w}H<_H`$(G3P3bM`e`=?~9 z6v&oVcu_{lPq-FFAa6yDv{P5++HRG(mU!8!<637ixK_nGHHlp{vPV|nrzWB`v4L(v zt!$t+&NN}#Z(Y{Twr|cg!5tdt>_7?gO_*r1m~L_)(@j*6jqPXjN*QJdS-a+yrwErA z2u*CpDE|x)^=4x<=9nldFU)Tnn#*ApEJ5{b4W2Oiw2&@-S_nJ17^k)aAS|{$s*n~~ zETN3*T2Nsedcz3T7Sgsbg3SaruuLE@{Y8)SyeFC%!4ZCzMle4!R$&BxAod9^6wE?& z=ARG$9HuIR6`T1EaY78$1I9Jj(v{tb zk*}B#n=h_p{p}@B4R8YR^xDmnm+UOQ&<*2`peX@%0cBM<3{zt78o5xxVJ!U>Phx#} z0}zo5DvlluLTfxA+MX9YUTPz+Rh<}{<7F{1CKmt8(m3oUa9@GIeTM|@+im8)+a~VQ zG%Sl=FPbH4Cs0o^c)d%nm;44lu3K48(?kASZ^OAt70x(#rrsFxpTE zlMn}V@{xW4#sXzxGZFG?P+NMcT}R(ux@1CqUc8lOr&7WSBqpFWEuLaFI|-)>wp^mX zdYIzkIVVn+Dsc$xt&26_&<2oPBJXx0k%}1h2|tsxyJ@dTX$hz)TZtwl%&nJ_2z1fc`s;Uh)`=pghYoHF{DTFIc6 zkPZtnz{`vmdWAp+EDiyMm^WRjFv9!Bb#x^*N@dfl`q&QoqiU?K$ViT7)0$&DXv`S%HF^zI9WkP}Hue?nvw4NVlGUU@R^I z#5sQft(JN0{lr_kmQFYx0h6dGk|I;ep4YoXsGK7v&uk)!pu@i69q&oK)ycZ#WL-+H z99dase1@Y#@Wx|XYP^1+Iwn001ysD$DM|PZZk#yXAbosddKz9s1Tm$?=`yepJ7daO z2j_j99zLXK)X~6cAf$+#S?F*+HRCtRrNMIESAT@X+;dh7>O@w9mUW4z{;@Ya^-sKc zPkl%`hM}ZW+lhT)mNIoC?bKQAi1pKLzO546W5;T`_@t9r)ygx>In=uT$Fw=TPks#2 zl2yqwtit`qu$jRs1EYLLOtXDcz)$I6993+gv`~O}()w)?L+|owwZY zZdRU!U%H!&_@$@jS@@-SWx;Y@*yoS0WJFU4_^J-&&+s&bFF{lAiK}DRvGZZ)CQpavsj`Oj zMwevP8YkPMG|sFlGxC$l*P1U@NpI8%yw=H-pn`Yc_))4n23SKm5o9wQ3W_3C9E^ zFgq1>Wv}3FRcI4yDr*x@@#78P4&do^4gb;#8k3H`f8*mr-~n)zmXQD(OPBIL)Fy;D zNI{!O=d}svq98F#4i>F>L!oGkfG>3ygmxyWJHld$H+h;D%P-UQq*6z?<0KK5iiJT? zEPhAGF6)3kNfIl4!mL>QZB{IN{=Z^Tp;)lE`AF${PRlhUz=$|^C9aiyyLHLbZ0j)M z#&wKz<8_U2ldjQB$(x(uk+FhiVPI?3ES}-M{0R`v;tAS!&@65c%_8Ohux8^w0u05RXoqDnBeTkK2ijV$3A zn(DDsWDfb!*zsN!y^lytCV5Nq&HTJcWaj655#i@<8Nwh2InwBVO z70=cR7Ay@{^%pAw=nufzU};+6K&zg1-wE5lzN}`pe$wmt!U!-mL2Yc^9AzR&Wy+Sw za3pL!3>C3zWa}UC<2Yax;OR92|Khd}n1gl-FbNnAJo@i&b${OS|DLz}ja0`=iAgA;et`?jC=LILg|L5wrH~~WIzzOgisD-I93+v6#!csAj4b79Vt-lc_ zVME~qcsHh=ftxoIAWS_S?f+-$xBCEm$77%-0EDT>{$FBhnES1@)X4uus_t4X zRgbNns+F>Aoa0Yi4(ni~>P-u&Ko8{>`JZQ)sX3(vw^YG>gsMi!n+Sa_uf?sPp*!gE2wIb0-L zIyrPTNO-Y8!V@h>IIJ2H{&Y~7k%3*!xh75_&7gJ}!*3&(Aq1b2Pwhw#W>oyNu&uIY zulmQ#3>Th-<#sIiN^_sOG0~-3V;P}VAY4@5R1u?@5HN2vlh7-WwOsM6hFRm>zT3OR z(I52D<0J@bCE%t40qYoQrX6OD7ve@$%oRo1yTjEKmk}GH+vAHDugp)^Gchtwyp$;a)_uVIrje)kKVbA#ydoH_`WvmX# zI+WT=C2owu)^-)6?nFC4d87{u4$f|Gn=V{}i(Lwk=Xnv@O4EgWE*q7(p7@6=a@owW zIhuX7+HtkS=2Rs8X%?B2{v44FhkyhmOd#p+BW#`_!*pGxyuV8Eqt(qAen4gn@g|To zh^~koX_saWBn=p9ip)u*KNRx+t_zVlp%M!pvD%dX$A;c6ETVl^U{LY^LITS|ifA>T zW!Bl(F+%f|M-gKmPEs9AzOf=C)*;mayD>1`!>S1B8!)uPcB$vR=JHK zq(x+^Q@;@*q{Z(aLYn1VP0UM8&R>#SfDqEG$;3EYi)&n$@Q7NC8A|G+{t-zlNFTyy z(N!2|?X=O4tb`o8M^b^X%T}#BM)KzAz0%_hPnWpkge5Jp2#tAz3w z^5wo1Q$LcN=Jgo$ir~<_m2;X4@Z(NkGvMjIk(W}mQ(O_g76Kc9Cspi&_sAizoEa3t zTVg80Lt8~lc^4kCIO7MJMAi?M{<#kY|J-Cj=U*h~{BzCzx$P$Z9R1iRzjBZ$kjwj| z5a94@Cxx95^MLSAM4s|)CnDSyRZMZA8V?CLY|VSB#J8wOH&}TjBloA8g<1li2KO^Y zW(z>x*-dCG2{mJus71ixbIX83az|=NCmjtTi6 zi8>q^PofTg!5RFDsKc%A5_PB)Vo$VjR=m&1e=Xz?CSVYBNJ83KgdL{GRh7SQBzFl> zhdD!Yx|68GcNW3sl;%~{PQblW+~KZnki@>lxa1vx5U}x2!w$dJl822l1|9alWO3bZ z<}(W!UX_9lJxR<)rJzHLn2l@iA9P4bZ~6b@utN`1I&+4rf_LyxV{F4E5*a_yvbvT$ z=9qqN-&!Hokf2t%{E47Bz7|t=?Tb3GdolhMa(O0GDQx2aemnz|0G{r@<6l?^aVO9o z1NH$=A(pQir>FmesQp;pZNux2yt`%;1hgtJx=ttw$J3d_@-R!W>?w%lX@Xc@Dv0Iz zX0g1(ES8^@tQQs2QTp^g21GI-7SR|tC@{t%qEIi^Qdg-O%=_$Gx1kzzYbEJI+W`h)l1iKzh0NnSwVJ0a zBvO0yV-$65>+^vUSRle)>VZ&pTU#fGPD#mz*wje3(>#&1?pHIB2>H|paW4-6gh=+I zt({2Vy0(a14?X)&wd0D&^@(lgvxw|VOa~_nHEYIN6oC<jkiq#eM~qW7d}vwb`5?yhVsv-(nH7{!5|3 zN?wXHIg((Y4ud@mHOH(Q-V(EJc=wof1D^oa2C2b)%>nB#jP zE!6n7#(4Fk6>9voh>FSkD)H(Owa2Ufyy<%~f3JOyaP_Aq-uTWGu6~aMsuNd)MABj9 zenkDTL{`_B8GQ8HXwMjU9SCaWkLH2A>;D^unoB$n=W#bGJP*?=dmb+1$LoNn>qWuy za2H>00~dg2I32RIhp9@@yLcXIjaQF|mUJX3O(fFXL>b=+Eb*Pd65|Dy7;a_>KDmpp zNOYE2FLM0BsKL*Gy!9k1`PwQG%L+s+QwqmGDpG9pJM?^;WW9k)y$KK{=Nj5KCF_k} zip^oleq&~bSA;2RO8%*2wTMiEs@QX;)EJpoeVDT5ugaqJ9mAB}o@W25Hh=D^6Mf_nIv3UOu~|E7We>2)H5u6U@ z*ks~$Fnl9~CsR&a$5;3r{__CkYcWQ*!vW%UI99XUVNm}a*IUBnKv0W#()t=pseLM@ zCaIv_mSM~*)Y~1E-41`@$0xurEY;xY{s^ChIHI_6e0&K!1pb7WA}bG>#MD1@JQzcg z5k6#1r?re`!wh}YQMy>Opc;Z=1*SS`VzHYbwk8TUKqn{&2H3plxUF( z&HevOq7C}*O0@9E!Ga2U-K2s>Y6TUvwV$m`r0=K0@$8NOh!Z0VZIHJ}=DXM>!f%V9 zgnEliGl0n3$I<7k_&86$wIsr&N^Kfb^srgo_nWLcXDQwI z!ZiKQS=jKrtgY}}kj#sGA?$?|hx17lBL~Fo401p^lN^xOIE_pU17X`&2=jE{GZ~kK zyA`^kw1zPWDD*0&7?y1gC$UHgf#4N#8hODh%wZY5S??uy?IYNbBrD7p@mV2F-taKI zsF1wZ!p${gme^-u@i}DTOnS3S0Ay`6NocGH9wFhK=E^Zo{j2rx`dd|mc#_H_e5Ev# zYawm%7Lv=j^1w%!m0D4YZ(+lJ7w;qpUV>VMZ&N`x>?fuc6X81wH=;uLo~kT-XXD3l zz$C!aeGL9J3f~{`aT+ih_zS}K+J8&<_U2s+q;w32DwbcQC#9JpKIlP~o&^s<_>L2V z?|eb{&NK_(0<-YFAGlsr?{en6_DN~Fn0USV3woyW<;K(zDp<{sTGa1r9w7q zMauEG_6ZLVi)sCkG89%TWFen*Wem}a z#lj@2>6G42Ja=ak9+(wlMGGRI`zKkJa+zu&pj%bg@m13H2DPfykk_3tCZ1Oo)j>6g zYGc|S3G)jv$!bTV8@O}#u88xK&pBTQvU*1?vikbR%YUv5<|U|ARtNa&>bPB9>(2nAn|iyo6?h{qRw zpFTpmQEXgNGExI$`|;(#rt{Ig^UdYJa*$y(iq9}wHzklJ)h%1z#EBJcjX;jki4f!% zoeZ(428M#hYl*p7r5YG^7d)Hd4(9L%)VFK~C8VN^F-uoYKl*`G1p9$xA9?idfb?Qj z`RfIt7>-O0hJ&JxhdcQ94!z#JcQsYPzIj2hIs2E>Jylq`jv5_cstOhoIp^yx)P(L` zMDkxTVcWM}HT z(6+u+imEZXUXiWbjV0C8Bt&{RKZuhw9^Il;-|>UDuw4|jfu2p3ghn|SYp4y164AcR zKgTMPa%N#kz~`tQ*5-5K;GOBy%rX~iew4_WrnKRz?ic^D#HJ8_-^?QO= zUa3Jj)2)X=M|Ntn9aEaed)8q2Tn-avzlDr7lrxR2>|5jQ8OyXaa~w~#bKa}iDphuh zHwZHp)mGVwR6OOA-In0WmUJbzP%ITZ!5{b*!m#80@>C#Qh z6-a40O2%R89LnumW&%w0;Iw5{=6D`Oo^4g<+A>p-(2h10`UBTX>PHE|JQ-VM_O_C$ z1wv!y(!7P|qnD8>fc){sFrL11>1wqG%Q|qwkn2l|gRYjjl0tB_%H57n4u~AOot8AC zRO*O%;ryI(gLG>o?{&D}61s$@Zp}`&(G%A{JPMnSpf+8?&sV5nb68B3t57X;AMQg% zwNUV&Wwp>!{CE)Xbl;DE&tZ)jR0}-EfxoNh5dYBOW?3y1=3b=&*r}B&gEsae zs}_*M@vX&C-6^n z;H&GS9%zC|A-IeSz5oz~;5=F_Gt4i`&#VwoM|P;F%+tSt0V%Z@6*;Nn#FwUN4~4+3 zv^IqxhLi(^Z2xa71Y-n+VBKdpnW9D{={U;|3c)Zx3xxm+juHRy#;OWIBP1bLsdoKN z+5jt!kb3u~!Zf|9aC)wTvD$TT@6fr`Dgr2V%@jco;LC%Kjxd!6MX;z2t9Br+s&!|u zLzViVI@AY^?QsaV;4nbg;Q(3_I~a>yquhy5?0O!H3GhiVsDNU@4pnPio5`V~*0qTo zDr#NR8A=?-1sqd-v1=bwlxMZp z1vw9^YLG(ri`=n{FE`p?Q$k=4&DH+C~&4k33r%nbk)TZ%826>izf^$t6+*Z zOT-*Amkq5Tikvyu_3U#oNbD|d8c}jk&zkvoGHfSK4jG$$SN7B=jGSB>a3|r zMkRSXS*H5!m;^d3%0V`@oK<+|qQ<-yt6QlX{l6=vvPKap^%&Rs1RzrCAz#~AcNOKW zL$#(1*vg8mPni|Te(hi8%2%i89l3giG@;C#|M~?V7Fms_?#-ac>NMJ2pH)?8A5^%+ zEN6?|RuHn!y2ER=*0mL~nrkX@Y!p=H=?jRH8cI}HsJGX;SWXr1tzK&-Qfm1CZWe}& zBB&2_URZU@yxNx5&tLF-`7qQLd6^-EeTFed&35B*TdT=ZQAm~Mm)r7G@5ap-UIw=@ z@FS25{E0sxYjow|GlCw;E;W~_!WodY!LU3PZS(2H7RZ~qHf-KDYh$_MFL_s2cizcm zAVH4=vT@Ky>3-%6nxV2YWJ9G%isr&XTGTLIXnszvoPwRRW=F`5jar{{TA&Z|v@Oyn- zPZ?IeiU@t-kxvqIu=kfrxL&ZMUp_K)^0D+)zIo}Zsivi`^7*B&^0D+4l6U&|w3uw= zmA6Wht>iTR;7@~}%UsfqxXNP?@fKqxkp&&sK-+F z>?LbfQJ1xCCbRZrR%+%lME%BMYIVMj^o{UjkKsWo1z09zg_w3~vxlO+yBAn29MyS1 z+LsNxHe5iv;*^29mBkG2Sbx=+n(>N$(n_>f#Qypi5~c_#UJ5Ik51j= z?5A9M|NLA!YXPKJ_@mYH*heQ&N-AzA3cpe zPUX_Sqs<470ULpxz!_jEFayX0#sDXQhuFUu?N;CgK3_mP0Vo6}0uQh)2kky!EN~Uy ze?mJP*bLmcl1rCA&ZVb5&7}_kBcJ8ck4G<}7mQg%7ht%K1E+y&04Xeg9W7)n8LD@Y zwM6Lh!gkrloj=wQge~b-i?UE?P`-x@I2tnMXvnamAtR553_cn%{%FW8Ktt{V8ge7h zkb8lK+zvG4j*yMGDQL)jK|^j08gh5gkQ;=C+#@vPHlZPR3JtkgXvqCSLv9%wa@Ww1 z8;6G6yC|k~o!FDvR4BIA?2?l`4~jV^=aAT%IeHrZJUb<}rhdoLvH4gcmaK2Q5g^MX zp8?4F(lfAZ>|GiDa13@ji46}Wov31b-ipsJB&txF{5Sz0tjULo5-!x1e87U$q4mgz zvH0LfKKz6~8je5OEU{+pToLbN${*tdrar|9V8LZ(I+>8Sq|^W3yGSU^Aa02*s?NmCulWuW3+^2c8-y- zx=9E|*)$ejyi-%~EgRo9PeUHe1zT3Ply=bN(X4I_upZb96acV!bvuCFI8xDCNmuHQ zEgM>ob_i9XN}&pi3>}oXht{X%p;(eH)Ddt3J^-A720%lg5ugS%fD6zVa0T3eCcuY) zJKzC$0v`chfH&X+d<^&ket;He3N!=!fdJqWAP{H{1Ofk&6K2Qx2-Ln}sq7$@w#-@T zMe5K?s;q1|0>d~G*aVCMqy?sxEw!&+hO3se`-aSQtnE2JT&4WknfQ~kkc=kYW%qv6 z-j!e3G8!veDxD%w_mCDV+#80gavIs>kt&E{xox_wT-GZ^l_wr4OaL{*x|X)H=$*#( zuOP8As~8e|G^R|IM_p2fxu|J9phWJzVAGCtI3r4%qj}rm=wMaTylwlsa}~ zoH}{;*7V@5?Qw#C96>?+nqGN)$bOzNfJ&1`Pw|;;8L(fTM-R)RbJ}grqXS<;#=HW~ z;VCYEniHJYDL_9WT-DaTkWBdLHu;9DzRGe5bC@4tWs2GI&cl*od6;U*lRa#^ak*P> z?{Q)mO@6$cDh33oHZX0n(ZIZJisAUmmW~jIr00%I2TD6dL*5t&|xZKI(cM^B|Sd zF^p5QC2d(B;cltBw-w9I_d@`DdjCXvA^*5(exiH$j3;N4ZhXdqetyu_Q(gS`>d;LNbT_en;$Lvc?B$pE+MDkHOJ{G`HV3lA7Z!G_6l_=NX$j7g_3>mU z204h`?fU4`zD-H`F4aBJ5JGuus!M+~ZW}$Qu9&lx1@EOMj|#DhD8_@d11JN21EfQo z8cUs@{xLONRXAm~F4#Hs;=M4{(q*fca$>q8KPUH%_a0iuIDMcCW?fXQW}PfK)Jf{t zol(>MUFlK$Nasl{H5!qMfrM7D$WlofwVO%D2jc6aZnkMFHUNRvL1 zn^o6cDvi_YdZh-bn`*_|K{Q0p<&JZ1Mo*o`&=vTf^G1b zaSmdK+aTk6qjQ%yuy;r2hB$yI>P*#~-R&SYKNM&0JBU{}nC|`)-2KazcYjD?xr4Zi zgP3bqcXI}<+zUD{9^;Y0>>d-AL9<)i?LSd*i_0@;{|r-48R)5(Wl#8ZSO(4gBFD2c zsgA=#^z>v*u41VTRfb1rlH#~T)^^sB@FPoY=uSjA#}0lOA*R87D zXd0YHB}FZ@VI5O-BOm)Lu_-?$i9RObhATFg(8_gCIyR2=D3Prt+D8d3V;lDiKT=|T zBom^_OXw5$E<9}Q`)JVQI;xIKvBmg`Ex^!Bv>ck? z1?7|LbeUX-`xWack8>0|JcEMtVrs5pnGN&zRG-#X_`u#x>aNLfDVOG2be$2j%!az8 z*;l^IW)wCH*TLaa74eE;>K3)kMrIY8Jg|I@P0$=0F0p=a!MHg#%6w>F>!#Vyv0>Iu z3z=iXK6HPk*}8j zw1e_bD3X8uWZ#H&evVJl7uHhy(+(}Yp5gv{=;J}c{VoqW7K%U&#bY~)8Z<;M)NV7~ zt_-Xo%FMRre&$%u(bhhlDc0$)It`!MG3!3Fzi#I$B~hwU+djM3+h@u#En|E9X6G%c z;b+%_oL2W{w|o|E>%Js|ZA8xvb#WfOe?2U>jeBUBqyw{YkDa@`cX@WW3JaOy59!8? zs)3{#U5`@vbfDxL_k^oPO5)WGUSW?_yOw=I4^S;^!rD7it-yFx%keHsJI3K#Te)N> z)4R}C?v5^)Rz1U2ejO*!kfM0Jsf80&fru0+^R`=syz<^Nedtxrv-e>iTKzFCb44*% z^rZecTxw4_u~e(Y=q#COI@eZ?@F`o-#l_*OIV&WAq^AzPsz59g?@oQIO2+B--u*2d z=3Nw#WBaelCJ}-N<8%xMy$?R>YD^mTMlnT_l~qs*khK&yQpn%7HFn{_<8W z7t2FilA}{iTlb*mXk6LXEylNw)3gj-*k+^1wGA8EtJk+kEeLGbOpzt#sUhpEuGnWE zOl|!cBZ&r_WjK*D_uvCcQSRj2#L1a!*XXCS+0b7a0xzC`gm$ByG1 zvd->a;9wFUK4%&HfvTfRSCxyGQ#&uQOV%c6Hk_-Yt)SX$e5-2HKG})s-P`_CwtPP~ zY(K}@Yl2J1xvby$^jgCQgMloJf$A_AbQ8)x6wRGv&80O$o zpSvRGlHQxbRfApNlYqdRUY*U&MSAc|J%`?v-SkLX(Y<0 z==u2?Lu=}Z=4gOM!cLprRC9E zoauJp6tHcr&vwT7Kv3fys>kXV#s2IMbDtOZSLmuZ+4 z#S5LwZ=utBxRGV#My|fo8L{1e{44Y>yQNWtMZZ<)D{KAKv@L@QCP5^g+ zuKPM2Ai>sIygFrDu9k8%H2S}QlhyYzFR{a`0% zOQFw7Yv)hfAdZbHqC@X;2#(aqa|b(_2RkEZuoF92nO5%7WRgo06VGFZH+lZssKHJ$ zNmH#_v8Mr*TAt+;l;y;I=NDWwZnm{@FT7bhci)=L?BDGZV5JDJ%YM5gLR0iKQqfN` z=}Q^iiTmA!+U?RwbAPt=t=T5lU4=m!R7qc%P(Itb3pnzc7S|GBrTh&ZvKZgY5hUCS@z2nsbkh3b1Nsn}|d!^+#$b5LkMzJjg02 z$coEpws|M5{OJ&!c8BbDs+fg`e0I``ohtTc`&OEryO-e9joQ{u#R8}DovIx>Ri+~) z;z;W&sdYQsPjYpRPwHbpRr#-+Jbk&QzU7l#qbIp?h5({C&V=JF`?N<1Gyb$s3GMvN z_woo;r>8%28yt35XphhxADMMWO$iQCNp~DnLQ`ojjF$4_4mZ-2P#NV`+V)mj&My#0 zkng||E_s(QS5A=wI8r!3>rO7*OWAuJpox{%zodMjRrEqD?s@azl5u&g@;)4hlTO>` zvD8}KLYFYPbM$1F2-TAB>S;=L=drlM?LVPTmFKaZ__C0TY|Na_b(@D9JkH_C|Xku1F_vGS> zV0mQ~Rkgf4@z?vhok`F8j0B-hfda9-rcCeB)WWmpRN5qTa#dWfzi2TF*;sD(#so(yEGDWi7g=| zs3|!gK<@m>>Dd67M1yzjJMo?-(Ya6z@{Bt2X*jRRPG=?5Z5Tu|6lWzYSGN>z(b)>A zn^8X4_QYV@TI4vD7}eaDa&W<=w{4hbQ+`GgeMZ793g3K~R&K`u?wzt9WJ|q@I!v<@ z6mcgGo2{Yp!}KG3JI3{IOr5glq_}b0EZ&-?!H22-_$%rIMW|BoFKEu2=_ zU~nf`3~tll`Q;PqbeUL(I~#>jsV|nFMZX7%r^?`CxS?%jrC4RdCV%@Sh6R6Xdg%%6 zPg9h4QD);%Mk@$Ezxn7{YiC+(xrwb{`^>kIUq;v0oaWn5=XAc>`8Mo^g->An%(t|C z@I`p&iicM*e!M*KhZDNvpT+^!+FcVp>uAODL1HbX+&s;uy;#e6v5hb&j<->)bw0fl!_D?}=X%%YDD7sk7Vr5Nfz)Z2XLgD^l5(;39seIC*4C!N zqL((fu)ZCjx~`Q_yDvR-`cp<7%sLyen`8U6sSmYvH)pyj2frA3mWoqc2;dsES9Qdn zX5h4raZ@~db1Fyq^YIyVt(6`m~D->G~L7rlq$w9$}g8diA?67BLS`y2;qm8K*> z!?GWqhcJ73DEDcF)GxWOVeK_06;&5iMWA^>plR4H8kSqwk-W02x#<$DaRJxWp(wnw zrwd%)_nqZBqaWo++=mgT?IGn)5w+v%EB}g6{T%Z>w_fHl3CiiDgbz)l;hk}x(PBrF zP^qdN7!?m@!s)_6P34QxZ)F&M`;@mwZ2HcB|b?y>{VM(&4hV z^vd1-lIXg3Fpw)J#!!dJxqkG@Jwo;Pmyfbygidj4=C>(pr)M`$qc$UW1#386u4^|u zkPRiu6FA}X{6PGQvur4FKHZv&Q1u@98B}RqpOAv-ohy0|VYGG~v5$+ko|DzT#4fao zSTkd_sKxhzMBb$Fzlg86VdH<*CgjqiRQ6OOACqbZeL`F6Z*Gw?z*WMg9@A=0>~^gXqbGve66Limlvgez%IW9&cwP{gX?A)x4bClh zm2`2Ha6k4S_xUtB%sp*yt)>1VWB1p=e&MRe_r;nFS(&14MC788J0=A}cjbyAnnc;FA{gHObmbz^8!*iJ}t+*Zct+lu?48*LjCdO1bRiniQ1rDW9__n%hPbxr59A_-AmNi7?i#B=vvV)QCSDx>Q(+>lSv=e zs0Q)~Tv)%KqWHvOK_Q5){^B+>HQEq-1D2ab^C1 zl)|YN;vv`^@MGO^J8Ah7oa}6;94G2$Ews@IuI%h1T{SyhQdRn(Ny!zq%+B)NsuR0w z)kk-$(Di(b!k9jrf3#BGxB{ZSVn5N9xqZdw6VN&Li7(S{+?nrNn#R|peR4gTFjsQj zs;Zw&p?}cw>$LH%Tn7V0X$q&ze3DDQer!LpKJ(Ao|}gr`1HR=dtHpO#s6m=PqG zvcrq^l%J_H=}aBtD_wl%O=_lkM_meRDNj#Mjm7$z(BZs8)OfYauLcHP4djNzZ{9=8 z593%R-g}^8?4h0a2El*N?(fbiKiQo_6x&WMsHZMNH9>o5_U!XVnmv?V?~(P&sa^`I z7jEjy51FUAZhLUxO_j8oJ<4x9+1pQZeWE+KIe_{>KMC{SPP^RS;A+3Ym1~x` zc{|OFdd3~@YZ%3hc;++8N*340DdPHenk|2ttJzL}+hvp$``gok^6m7b?PhQCC>(0F z#i6`{w^Kc$Fws-TZ9C;zzKssrMsv54HgBUH9%2Ug-snY&<6kj%NBe{%vg(DiTn4w< z{XNVl?tV{Z^rG(DXjy99z0mUGUyVEd6@5&>^`aKJZNni}(j57$RZx5jovr;L4_4Q! zmw!QdVjug&KHS{?n+s@V4(68^j`1j9=8o|xpdGSDI7O&3$`yW2yDRE%b@|S!ynxQe z2BSy(J9-MLbl=_p9+#caV z9sTpw6~Wz8D<}Wt^ZvJdoX1iY0Y+~G2l*p7hzm-jPPS2|(++LW-A7CZI#HR9(%lyZ zR1Fk5TM;OvSeapt6ym(>3LN>8wvr1Lbv*-l1L|aS&Ug!jI7PWSK)FRdiSNDVn-XtH`;GV#&W_`F|%g`-`SW ziYO{1e2R$rNkpeOEEG|Bh~J=dKHVasibND5sE=cZnc;TgmLyE;tnHr&Hy3{{Lq#%2 z$!#?fb=j+UY!wJ9k&9!=W(P?lvdcyCxxU1%hdoQnDF-Yh;|o`NOR5#sf$B>2M3I}J zGKCF0-L`qBHMG=$r5k)#xFFbV>b;jC-U&AbxA3Re$l-{m8qg9o#^z#6Zww6Wz)~%P zX{9ofj8;2Zs_u+|3GGV7ID8XDQGIErX6&kYA#UZzy0-9s^6FTKZBHtNcJgg}c3=y) zg9qA%c$Y03W#>cK|K+bZC#e`%brK@yTXHA*hqK|UWid0va%2R%s#Kz-Mxe8n%yY3K4Gij+ro-M}I0)|OZSBk}1)chw;VXIgs=eHs3PV2SChsq42+ECITr1j;`o?e(sI~@IG z*f_B|Z{O9+R_>SbM#UxWw}49y$4D>PabnVyoAtpB)UQg4H z@VyEUC*5gqJq3o zA05{hf+=>VXnTqZTkk_fv=K?EXetI1##VBwuQhpw=gCv3K@hA`@?5THr$^bZc`^se zMe3MGDcl^9%BA{4{v$0_vYIQ({DNf*zmRJ#`A%PTZJi8v!Wv^rm7k^xPg6`^ir!Dv zciSq%>$}KyzZ|dbohNR?5nE_S-dicV*tSu!lhi?8R=n&NvBR?Ed6%u!Q&#_;YwbS$ zN>(X{MH!D1$#dS%Fu0km@iGDKs$=l}O0r6q=qE zmz2^GbxL)8gr^G6q9VeM{ewIxQueq}wphe85-F8)x=l;_0~AHzZ03sa`?&uNbqRAweq*1yD#y-IrBYsXHt5+2o}pJ|tQG>M|-qD8?n6o=;he=f?%E#4oFW z)xa8HEwB#Ad?aDMQDe^!=tzncM=IsZuywgDL(857vz}Y(=HcrK;28ppo&v?UacH0r z*a!R$90iU6$AJ^TN#GQ48aM--1mNTU?MOH z*bJ-(HUNddc3>~C5BMF}4;%mv0*8RZKnYL^6ahzoqrfrXIB)_e26h0y0XugLUxAIlCg2a?G;jvE23!Xo0A;{K;1Tc`cmg~Ho&kRXe*rgu=fK~< z3*aU23b+Z}0&W9$fV;pw;66|eoChud7lBK_W#9^M7B~ltATI`x1xx@Y0yBYGz-(X+ zkPXZQ<^ewe^MM7xBw!(s1LOjWfW^RMUJTMU$28;kk0b_u1zyx3tFd3Kv%*5Ykp!+_zy2w)_T4a^1R0Y3rrfepY$U=y$z_!;;Ge}4?P z2aLk^1;7@71j?QTP6FqEYru8j25=L&1>6Si0C$0VzkATO(Y2Z9? z2Dku_n!T5R%K)kMa}^+glFRW95qizUb6gW4o78}}(fr%g1gJ*j6-Ptfa5Us4M?>Cu zG-MdikWoNGE)EU34m9L4(2!9=Lk0;Axpp+ZUc9KyTCo*KJWl210Dj8fXBcSi8Z&FCDs$SmX)FT6S-f1;fLqI z-@pstCGZL;2VNtrKJ%t{CzXXS!vGR~KLQvDi~>djV}P;1IAA<50hkC(0wx1ffT_SV zU^?(4fTRFwCOThsOPoV3$CnjA9vVUz!rcc z6BGcy0KWp;fI^@MCL5I3O{bQ)pfxyZ@KH-D`9Vn;}2sc zG#tA#1p}(`EQr-4bKF3i$A+-_*Tmk!7bKscv6>f} z>&pK+0l!xBI&*gXuha3X>4mmkHF$s4r>w??hxcWx{+XNBgx&89G8~$NX)&{O{qalGn1Z)lX-e2 zZy`Adxg2utgq#Q|5z%rR2VMG797SF)=KO2q$(=^(#u1*dAXJ!Nri)cf&U|Fktz3d^ zx>eXTdo51v6h7QphkU==NU~#JA&;pYGif3AAbd^^NM3nD0z`iA0?1hG0YIzNKEU`U zy*I1k!2e^+`Vf7;BdapgbKHI+ipqcToL9$uHOjSo9IJ|KV{R=M-*}sj$BbDkZedo# zjPiQ+R29yG8wJSc7x}YFQor@Z&;Bx*!d_~t(m0^UG^FKl25hM><-~GV>>sZu z%PuMG2VR{*SifVwmJ}J87xQ_HAaVM^;tZ!pV=12Z8O=fqN{x%3cb3b}6*G-n_ zO3|&j4AL@Mh5dB?LsNzQI{rg5h5bSPgTKQ5Hvb_&VQ+0qcK$?R?@B(ffjGfXep7RW z{WrFK$J3fd++VhRKSN0e&3JA;`3dbWq^0AzQ^xO4jje*~dy=9*>kTN?J_` zN7W_mVa}Ov4{?EfdywnIw+Fc7x|mpz*6%;aD@EU1pQ2BU)weYy8U}JM_`y*sw{wxF zVx=a%;U)`@UHcCx%~sA+-X?(3tm8uDZScJ(7tOa_x!!!+fg6My<1DRbmn+ZE#i#3M zOyo-BT^dsKa)iVFHsb*07j=N)%InTK!1#HK^~k|gXytRTdCmbYLf$-`@(UTog#JyH z(djcDgyCvtbHVa%`BC`v8RsG=*WZeILHAqzfJ$rAiqcHwdXSBpY;GXmF5+hL?J8~q zX+vM(@T4v0?(ywwPHKnmI*R+yjvr>~QARjD@&MPzu5UjI?FjNS#yE$x81q77d)7`E z0UY%Yv%7jfV6?JDkj(qho&@NGG_ znQvcnM@fr8D^m*3<~S9eP4w%lBK?i$+L0Ec*4Oxbw6R^R5=O1Ny)bGSq{XOB3Jfe4EYLIFRp)ICs8X#YK`9qZY%r<=hW^`3%K^WMMHPI*Qtn!H?WmPJN@uJN=y#8OcN3biO^v zt>xPT+ zpT(YY_S)aI*Yf2hXZ3~FqN(H_=s_uv+(M23$vxi#Bo|F3_gD{Ja?kW=mE4RT^vP1u z)#RrF^GOQ#Lw%i=1*}5XE8+A7a;*n_)|UwJg<=q1Z!Wi!R4%_m3TO_(tMTQ9=j+D{ zuaTC6jJ~Eh47uBm^X`6pxyg?oa^L&WPNfh=O$Bv`1abK%B#FzL$wORzkWAz9kI8a1 zeLyO>6g3DgnhGx7pBG#Pmv19^Ts{V!&NT+E?&>`;U?iE(6=jloe~?`?mEBMNyzFEe zXfTG#F3_J+AiLol0kRwC53-A)vK!#f%P!ZyRdy-<^ubce!!(CcSIy+oz?JyTR_V}aGPijg4^dfKjX;nXn4Wx4B!QK znC2k3%Uu2mkp$B7W}@ZtgCv>DKPICB>C;icbqNH)#ZbWsd-8(o#^u{cZ!RB0EL=X4 zWN~>W8QT-276VdaU|BqyekRD#p8eTgQgwb=?juDdW8rIG0H0?+!d5{bKrtJv^JUdi z?}E~bT(ZQ^2UtZA)X-al?vQhRm>yDA>}~c#n` z4H?-|Y?tYZw`p_i;D6S7$O3^+ruz>I9=){nS;wK@FyVk@zX2AH9)T;=o*Hx%oIyY=o4&|s?2FJ5JwWSG#vX`_iOJpVeZT5Mf3N(4 z!rY399Mp>Au{VYjPe1aec9ux&E!RaFbYh98nk>=!so@dTh-{*iky7tcDvDB{ zQ!1KLmnjtkl<1=$=B*OH$mk+XaE8udwWxf&jR|oErsY++hCPNIWFq=B^+3w}zic-lY8Lp0AQV zCOP;z^wVn)dRP@095Sc?0+&VFCOrb9qo1iP1SXr35R%VPQ9#Y+sA!-za#RdZAD9BZ z(jP#Y!buQHqztb0yiB_EF%L%JbT@oKS-(jWD~d}zLx`CcN<#bwlkt7bTBMkmn^!PV z-v^TseG0Wm7fmX-Jd!Nqay@yK%X_1p$%7ft1oXWjLPF8hN#)jLAFqny;@q<7p$Z}n z!8#^H>R+Nmp*7^^KEyrL8ir7+J`YPqlK~<40g+ldY9)#^t`v}vSkp^5_=ldrB+;?( ze7XWMjLXN7sa!sRtl;tz@5tNpMK#Pq5*`l7(88W_3 zY8?L%gxf{}K!dpvX# z8Er-}(Z=(=OyTlzWGt6YK&J!l7>|hK*C1U9*&I)MYNS2A7SHt*S|Xahj*gN+eL>t- zXe_QT-^USL9!V;=Tu+v9d2jM+U#<(Zj~gk;^$}eO*do$N-!XFCglTw$(I|#D3{;XP zP;t)l*F)RN1iqtvxO^NL$>kHsbS^I;s}g7@qi843Bse?inb_LN#6;Rj;D#mgolNEO zNK(kN;!4NJVN7A@_ z0vXTcC1in_b}*WD@Kdw1g91xy2lW<52YXof4#shLBpJcwdQ!pVy~#2Q?O-(R;9N@b z9ek6Ld&3Nf^U-&<%0d-k9mylLS(*gc8n<48Q0PaqF+c?o$eiFPoCcJTcqX9s_z zBpr{$KiSbic{1O@FfNZIsa&ong_8n4-J~E{yCoYQ6ioC)xPLB4rV|XhsYg~!9rgzofTF!3dn^X=nEXkz-P@P zFe4SoHCe^E?Av|OS2&!}N_{2a$Y9nXISuRtOA)z)_kD)nLXlxJO_J3W z!QDrN34%;|6e%H^jP9e5$t=Mri{$zSX)-E`^9su7ddR>GIct#ik}N=dO7uviNoHsG zk+O`qA{1)|FZ_eZF>l&~rBE$Ys*yZ^;%vQ;hYemeZ!xf9Bkqliqa-MeJgFfe2{<6P zC1A954))5b$SW@@XGLZ*B{K%n-9tRSBxB;r@; zr^h2GK1moEtE)vPu}%~yl?*}AiM46r;vuL72a(9G0Rrm_i0qa4zyuWp@HGx0vTiuU zx?P~)Ok0VkpB&0h;V@5I%F6;#!NbK1&}pc(#T(nKe-khe1rZNMXY}w6TqPi{b)&Vv z!P+IE)@KALnY?LH5W_s3C7eM|J<7h)@aF($#|z|(+f~^lDql~xcVnC z**So`rGZYQrr;9E61s9rvIfxJW(`Q5fF2iT60ZT+3d`XhT{U0tRx|@+?PfM@Du8N_tWp;b65^V}ow1#LypMtsSyQslzXPja{)K$BZw zHTXsD>g2! zebRDMXcYsFzd{-ZZnUHmHf`@!-+ZuAM=W_+Ywa6oqrGUR@YYpp7@_ha&U zdWwp!lZM-((cUTKM0yJ3IpiuWCMg;syiaZ|m5TNEf#G$8G&pSi05sWjRif%8K^oO3 z40-H6QpU`WM3+A&JMUx4Rx|eL_uZI&pBEb`4Jn?qcag!G`|yOZWs@pO@?kD*?uuAF zlRP`1A|ncsBfU&1$TN!EKS<6RrJ?35P%gD*FXS0b`VSf&jS9yhQc4LjBxLw%MCv(u zTnH`sh@)q6^i7VgqcqwRFu@j!ktr`06Wd@~(kBn9&m3&~6c@q-x<`gB3Kg0#ymu)P zgVD|aoH0>Q7>&vETu~GzA8~{clV2!-V)F_x>3KgT2_ur3v*nRgaz|#0)7vVx) zO--2t!N{AzMlZ17uTH{qQ4$_5O48yk@>qyAuwgVa62*!k%uOV7PNxVz>xeQX>+J*a zV-*o#Z7UUfAw4;iFA5Wp*8!lkNJ;$M`&JznM_PR1vbSM+k1PH6B!_vV7EGL40AZkV zGYTpS;sdgR2CaF=tWAf54jw%#AzfThGJ04RZ)ZOaRRm0Yhx|`K%eA%72EDLu8DX^gS%?&Cd7Eh>Wox7DBplScrej1^;(5 z4x{=LE;uAVmK??!4+#}I>!gYK1)n94oc)tNL@Y|e z_CvzTLqgkBlzRx379El!RAVhY#DILy9TLuZ0Cg!lDny>_fLv;2ezy3imq8jL6%8WO zG7FbwM&^6rY{5wEsjSZIZiTs0&xPK$S2H~l5v#1rl*^cPy*~U^aZVuhXS3hW^oUl} zTob)5C^&&sAD&mt)5lpH6@-F#pxhEFUzIAAr%&nhoNHP~K= zQVP-L8u>bmwpWgV-$>JYC}C2R7v#Z4An>Y<(N#cmZz-TSMWj0v?^DoZJvI8oCN?9pMub^3FJ zKFl3*oT;{`uyRAKsM>P+djx+XOCXs1D|81x{@Bd=G2I7>BSF)T#Ys3_DaC!lHObxx zO-_d2;nn1V*L+zRW?lLhl_^WQdgV^k+*dID%0lF^5X78hDO`w<4KIuqyj#87JLFBz zcl`$Dr@kv$D4&g`Qms;9D;hhl{K{P9F_(djA+_EZeEsda5IGHO^Vuh0KZbZNGpL}@ zJ_kwWh(1m%?1FWxMF$$==b#YP>msR)My;zTmdwK|6hkLX)(k3BW5uV3Z5G(NV~i4D z&v+#`v#(tNJxPv&mJ`+T1z4)ks?*+9Xj0+Uc`sq#ar??G-mfK0Gc7{tN+j1678Z_= z7U;AnH9(7A!;9cLd$a&jS~Svl%*{>HD={-@(j$4^c&+b-hm9|QRM6?HdT-1$!ZMw= zCh%2}R0>-$$_Mwhmxx+QM97fb7giQt#sOL=$bf?CdM(y3L~xc~BAWlW7zju?zzO5; z)$!N_G0u$kPnk~B?oSy7I1G?_s6J(sX2!nSEq%2cvufqa+Mgv}Ni=#{l3pr9TkKMq zO)8uEIg^lWl6rYyd^Op73DQ{oehOS6i}mt|d}zPfJ2=GaUB3xC`zSRsxgY#j$kndk zU3%Z!@&Wl{3o658%sEo1E@DJCX9im2@a%PAGL%Z-uqGfz)x(=zCKb;u3zMan!rlR| z3?*7M2NAJUKL?>?m?`^yf?k;ju2)XDkBTMG}V^buxm<_yb!xR zWQEPwYp};AEWC<6;LS+)NbO1b%pFf7YC;1&J~EO1VArVBL_S^Gjg1c&6RakRhroL^ zVHqh3+xjl5o2EoYW&Rk{JJ3Uk_SPf-BY@3&ZHz~QA~P=gMXb8$ElRa!T=Wi*Ss%xC zFZP@lizce!pa}C~FMaC`N6{MsTgo04ir6Epnz+qXEmB!ZV`Q!4;|Qe7ce9PLvfqBRU-z zE*E9MMNdrSc`?8nZj>T%;BE$C`)+2-ZU*`fMAL4DTz`bGWrC)IKDaZ}qP+tL_++bY z_pnDvYoesIZUt_tz^H|M8wHoCp`AWZ-DZ>lqXvl98l<_BZ!H$?iI9Dh%$d?R$w%pfRz8-8~InNm_LRW2#z2SsUH!3hYg-VUP$|+1BGE&{y_o>Vx zE1dTLg5E|sEw1AFI}H3gp+nA;0X-W#of{_^7X-rxy4x0Yu}4a4A~}&of)iaS`$i?l z-*}m|r?~U!Mv6Ntd5kaqp0#05`vpPG1p$593xcK#0ycXF&-{*s0zTb<;uovl-a1d5+t7~)v!5_%~JLCqTj9uq*Jd1iB4Je z&y4nAkxy4mS62D{Ecm#r_gJEMp_{ZjYyEy!_XXjUh&$) zQGw(s1FBZ{Qla?qu!YJ>59#z^qlavI$fJkx^iW0*m2eQ%;PLP+nqT6F`Y*{VTj$!vxSaAW1uj8_Cvj_=DxLICpfhYJT#LGf=Mc!IuMDU{X zgD_%(`T4M<3T;A>JSj;d_O3%qE1}>gvN%$l58o=e_`u{<0<@0q)I_@ks8N8L1n7tW zwFuA|8t|ql#;~J?wP?4)LHBA6JLDs=# z6me67>^M=4P;xs57<-JjQ61#-J%-vdu|5-{r5;0bHlgM(MQBy!P}F=Ip>@E!iS_V) z+ycDY$h1aKw;qeO3+Iiu<#tEAJp62>2sL{6p;>_s1fYswI5a`w2*7+mVvAdZ=#&s& z5F&`=R4qbtMQE`IJtIPEL}-%;6mj^?&wK}_jr-v5ApCtMb`!OT()QJr3hDwdIs<(d z;g(_{x)}EWNc+DH&%3UlvBo-fo#Sk%bK`go| zW^Rg^YBy%Cn`p5cdd3Z{aYHpsXxE%@(UGxcABo_AM7%{>bkbIggWiy$JyO&pMQ2pK zWwv10+DjfY=-0zktTxjPx>;#ci+tzG&@(dhvMl#w(Jq3by>D* zWp^`)j6{psEDJk&VP%6%t@ahwT(dXGoT;`!X87p6y=HAzuLmr;)Lsv$iwCM2WHw*q zdPKIgeeH@|**vX=%lr_3bhhtGY=sI~(2~YL9V*$i?!YYpdK3J*jBXwQq&CmZ!Mlx| zyyk1nQn<2|mhfp8oO1s_N99rhU)jD1={M0H@c%a`)HlfFImO?a*>_Uqx*zY%R6of; zDW5>fR%j@gwp$N{obZM>$kkCdq4+CL?jG}=jCFjx48 z`Rl}CIJa|vMSB(wx^H(52rKflPwWQi`owPcl*iNSsXSiSwUdYMhht!G^h%~LsZ#|` zN&h2~=ve2!N;YI-Pq|irlHf+mCE1yw3CWv$5ohhT-~EIlWpPZ(BT~) z`Hz&iiiK^EA$j&OJ@T|4(|1g!WBN9~s(;Vmfes<58ag36cT5jG68-q0sr9$i0z7Zv z-BVN_y+uXU*V;~0hKGj^wP+{3@_wT3{SPmz2L=BT$#ksKPFD6s@g38uMe+X`A;GQ5 zqIeLH?}(n46LptHcCiQcgmf#gXCS-=?)A!f_yZ&h%zjjHQXutZGZuXiJTL{lPRgvH z->O;$kin1IM?9(?@u)nOP3p=%RgH?1Co`33e&x-|<}cjH;cBmH%N7hD&QUp65}H8^ zKR}eTMEEM(eOByT0fKe)O#&gP-+KgJfwo6s!n0^}Z7B{DB-l&w_)^R$lez)bSc-Fu zg67d?FIeJ%RTfhz_Q9Y>>=AT}DP`T7;!5=iRlx%`DyK9(aMqqs<(f!B6Q|VG>nkwTll*g?DZ@XP z?e_JXNA^Rwj}{2k4g2ehc)MF_B|JLP7zRiP?33m6KI-rQCXK1q_6vE`2ZrPpymsI_ zYN48)#j!w{gq=%ybop`yJ-969;RwfKo}Yp)=3(4^H7~Q1)jX+B&QyLaZz8{zN0-kW zYk6SwmWkT0q_y83oWPzkiN+8swVPWc2KO#Jj{&^PyReuL1&y7YMbWbR* zcd3YGIa;GyCz1@igoJRvEw1;ganx`qo=3&@)|7r!j2?GzJ5fB*byFH_x~q+nG6Ael zc9hhifxg4kyu+!18lUb9eKfMwawFu(@lr4r~Pa_MZT~abIRfa@a-?Y3Jx!T zHBeYcR@V8l&-wylpmC_K%*A9eue*A2x~sx-sTXI7K1II(wbfxG)eaq2v?qy8Rx_Lm z8|_qKsr_Jk6&6{=Vmqy=2zTxRq@D=k?drj{569cBQ8O0o+c7jVMzK z89_fkk{%!e71l@dhgDe7PuAJkOCyEV-G8XVrk#myqr+NM4ZaGxm`K%FBq(U8f(-P% zoH-)XJR#Loc~us92fP8w1E^QWMgLHlHD%&DP+ERZpS_Jb*JmfGL7#=g|ExYc>*tQ@ ztS<$q;QP?3(r$cSV$tTjHWX&VYrp&R8Z96Jhs3^noJy^=7)K*;l+;GCwFZNYb0`Ei zh(C{4t#|8GtF4=X@Kd!1-9~bHE#mcB3Lz}!V1!HaVQ$%(UaM(cp|RxlSND>1oH+gm zwA!XKf4xHMy4K*FQG;}C4KffXlx%H~fy}h?YkGGDbryX`1=K;CHGH%5^=R$X5570+ zc5aG~hq-B6mA|97I+nQ3yq!I9C7>t&foV6~Q4h-)rI#;8YduaVExHTGi(M7{&ScX9 zg87~brh{5*2sbT{*D{H#024*1G)x6L;Qw~fNN0W$ua*7qg;^IJ74=6%bD4^!gKitW zFX%#)cK8OHSyvXb&#X!d?>Hv@NQvvX`75*UtgVU*JqfSe6|Wum@a{xi&~p!(8)O;2bb0De z9YJl(@I(9&gw;{Eg%YQFOREhNSeHh_cPvp1JuhpJWp6^);Rp+E10>E8SX8Tnx-Np+ z4X50q8ni(xbBk(lIV`IAZE}?D6F~2I71!{JA-b=2$}fG)x|`8Q>6qfyky<8p3vQ5^ zi!r=;%-q{(dxLB#eH#G>ol}CO)~WiZcyWquS9SjsUv8%y>{K*zDsN?9UgdR#p;Zql zTLdCr_dS-6ib-ms9F$+IT;3OF+w+lUK2a64(SW1fMF%jk7l^K6l2g#XoeB&HG!Z5+ zEOWlDb0xSL-$MyTPm$NtkZjl)FAh9crkx2gI(J)|&qCl=fk*NaNM)ZTNSP&Qh0Bv4 z0l55O2H^6+^K-i6=J^7p`?yxXym=~ify>!5;|y)^a*{D_@Flb$bULbE_@EaDlzYTX zC(+|nq6aDTLGki(9xvy1K9sZVA#<^Ic!{o4Xu9#Y<~JmKqTCG*&xNgh@1}*plS%J*#L-k;B(0$9?otLpXYv1_qhXlfXvmU73?I*{@1cN)SP|96*ceQ5j7`!28%e<9JDP@=1Bn0 z0r{h-xq_nR5Q~XubVkR%<+}j-i?Z#%jUb!ZTA!|rky(Kk1phpBYvV*@H2*vm89W+l zi>C3A`R36zn@3Yca`Wh>&7;vXoSt`gMW+=MhvKGkjMl=U4j|%NCIz0D4l6r|0^#;8 zqn#;Z%V@ODq4jov!O;smNBzY>SO;*XDp*BW@8HxW)&lAbIPp<&%~4LjJt}SjJ%59P zKP(ECGMrZe0m?Vp=wNz<0(1C#6K(q+YOkvQjjysXbAqk07-&~)8*OqhhfL@Qe51`g zGs8B)br~xIma(?*9DCLTXR6DZ0McG*s=)U`ywQswK|s7+8>4GtFF9_*;oY{4^7aq# zvV#wDJ2~B%GEPoMTb)3oj+fDK!C?)@=(ympEub7$DZyYM zq3UzU7vm+tkj)FDW$AD@?JC)-gsOIsUSMBU{{3_kyS3k+I;x>Q09V}6zO~<(t^JgX z9w8fNkSrrv^oZ*{fv!oYqI3PZVW>v+%wrUROHP7|O1Ac+=;dkIlJ=j7fWBNws8WGg zkTtL>XF6`K5Hto>IqiYrmLvtJhsFRj5{T zTG5|8s-ZqejXOG9cO|w03nUf6Kppy&yTrgP0eTa=&s=yuQG04-q(xWfeL7y1l=*)^G|^GLyv^FK2lFkur*13ys$dEm+l^29 z*lwN3W*L=Dk*=L=3K6po~FWi zGr7HRe*9*0w02obh(-5ojUOE}e~El{&1l{=!}V4`H-QZQ{cjtP4`S4l=UqyD!%#o= zM6`Cl$3qrfzX5~k@Zo+|E%1b$4xfd})v}B=GvMpCW`=9BLD?dzp;h;FExrZC4FFB* zA&+up09^)U+e4>AYoaI}yI_*)GTqf&Jy3lotg9|8H{X3v2^<~Xc6Hr}+$P`xJ&d?Rgp@`03c`MgXAg4$u1slQicSt8_ z;~_^CiZ+Un6|{@o^-jq2lA!*Qz!fq*?5IuAM=%F&h5mq`1$$tg4ftc4JBL347HLc` zziZ^{00NItb>JNa@JG1B6@sJ|#-BRc=O9Qx9!H5OAZwIX0vriX^{Ke&Q!zTqSAwpJ zt|q(Uj47$$BGl1BR5}cD7oCf$O`kh>(U3dSD%$6JDhQ4IZ2eve!h0OMa|Pjjp4X{@ zu#e|;svzv=Z`n~n_<&<{R1m%hD3?<}QfXBE0X(v%yX|G+T|9EDcN;u%us3V-cEuxK z$4mYo9(km@3m$o<`=7)kgL`ONP0B3##BTqrL2ajiWW4Y10FvMN0w5W%^go10e%$?k z5RU}A`55pt6ATAeGh6R?Q~~}o{_87ZzHrja#3 zgQ5*~zI-8PTFVy#D6$bQaREh|;gr+09W8L62vi1L+hdcX3{-7EUg3FqH-2w+_5cdn zp&+1Sd(5DK>@F&f*V~)ZMt|^piADRqLTc94Y%?dSPLYno<&Tu8;UDqeVQmZ|@SUpm zGV65922nMDGlYcWJ;%vgR4^S;$B2v=i&iFiD^X{AquFSJ=lj?2Sop{Pml@PX=gk^t z(LUPpL!xf{lcFfP6v6$dTA-)DXIYEhrV?w5Ixd2rXFfIEs9`53rh~NqWnuk31dH~GSJzY9Y)&?CyyrlA?H}TdYKLj9E6x~v$fJ3R8;>(ihwy3S zP(*v2@vSj!aK^nfA{A8=uCjN=+!JQp`KMvV6QJMTg&3p%DTr~?!}ov~AJ1$Sw1B@@Z?9cuG0Y;a7x#*^9 zdkMeu)^%qtyH2ihpp3Ac+U&9?_b$Ge?!n`WGc7Lo;*PREg)g>riZ4dr8(*9<6cDeo zfG^gSZ58LEXtS)G!xyFHJiZu{W9QPkm>hJDQ`tHJ7kTXQ2f9+jS&VmzD_-UG)sEY1 z!3>>K>^s2~uRAy#uIN;bVS_E0+Va>=aS_ zg=cnxDBk2bPB;+=9#K61dx+xSJ4Y06Id0Qo2d&foz9W`++j)%>NCbjniN86z)(%U& zaL=WJi@e_MTqF zgNzIxsL|F%ePq^UYW|Q(1Btrons?%1f8oEb$+*1moNvBWyy1d#Tk^k8 zlYu>Q@Rp6)3UArn4Pvuy#je{_FK$)!0u!q4C82MqggWSX2KQ!*Mf>tWFnJz3vPcap z2LH0HGU&!m!MGCtVIld37u#Slq*P1R>;&%$twI7(-Ah8pLFJ8CZp-4pJ$pRut!-Ou=m&wmC#9Q@S(Abi-tJ&&gfwll)vzaBXZFZt-US^I4YW6_RL@u6rwo}hM-|z;O_t?Q$OHIT&wMTuz z@48ugU@x-hbng56=9hD+U^vkMorykS$ABbii6-`IW)CIX77H#N9T`ju%-thlREOx;- zRU}_Cd@^;?)7CtdKQ;ZQj8tJ0BXmM`ZnECcv?Jc4-O#(AMOXEm%{h!(z);gWqf=AW z`BYjRO;tBGrkb_v!Bn#@eQ)YtB`t)rVN}cGJEnL!=c?yFV9zU$C-QhhQDc)?`@_dg zW}RejlT%DEusSz(t^sRR*uuDJKI*vJv&c|*tEGh zy#>t=MX?Df34yJytKG8^%RnsE&7uR{l<%iyR%5A}A@YV z(Y6oO%*OV8D0?3gq1VWLslJ+Ijh8MeDvG@?HF*G9PzbF>+4ms4XYZu?&jpRMu^c+} zBrT4fx(C@RZ7qJ1qPns@aQ)X=3Eu`IeZ-yAf*!Cbf1cx76_G`K+NgzH_ zb1rSD)r<8=nN(H`yCbv4<>pU%#1FjiS>}I^Y{!MZKFqHWO`i4K`9kYO5Yhwa9ZG2=Cry3yNo0w|?VtCr&Whe?>LFg16RMI7C(b zigKP7zQWr>Ke-V9)n9b2pZOt(SW}efp!x6csyA>Y;Q6k0~?3gPY5bc2w<1uB76-=1Eb}r zoJ7gFSZI)o8RHla24`p4jq-G(9I8E)R|~OIuhb%s+FALvKQizuUW{vzEJlttvX4r% zVhD1TJ_j*^>?f+K<;{mZDos)d{b7>ItCKWtfl))DjV1WaxP9nR_=~c=0bWM!`*8X` z926+_;QU`ROYy?E6jYm-0*;Lqf%o_90_ns1{BvT3%?fv4tAdrJLcA)HEk#ob2EhH7 znO1_sPQR5fq=6{1KOkHBfXwGch%fx24zC>##-Ha^Hx;5e0#fq;jnI}mmyghPN20zZ zae7n_FgrY|dx8J-#mU8F|71B()QdWGt^OMPfqQjy8`cdJ;8C4il#<&r@VMRTQ=N%r zjs~cL(`D$(nspPwnSH?EOlgF)OJ-TY6wdp%CIiB5O<}!tLjskh#^D&n1$Qk)+w30| zL0a@t5t^z@mjyI}pY3#6*xWLO*gH|Wq8{2q84JtEZ_xm2sI6GZP3Q zciD?uws4naVsfGz5qsKi3!82WMboI8d3dCs0>W29K&yvF222`l9aN70uA|0 zA!55djgfA)z*onItt)?Z4E!Yj7MR_INe+_)`cD?<(X_=eo`l6|`a6n=_ zEUWl1vreZ62g?+nz?^0r$}c=?G)vsbj*yJj!xB27ckLM z$$$sFG|3aoSxwpu6c_&klQshf?w%#E?G(a3`x+p?EPR$=>nwr(W>@t1{8@qwrt1T< z4Alm3YdUWZxHYX>{h<2s0H^~geinVnJdtt%t@SpZ5oi3&cE4b|+|_;o{1qeb3;3X7 zB=TWq13&jM;{}v)2Kc@g8>%eYM-ue2hS;0jAf+_9;e+@Bs)cF+LR#%hsAib{A;65! zm(4coZk!CT$B7|XxO;^_9?gyBmQe*Z3gu03!f`EdC zW*EMwb`_u_O%N0ha%607LJqq8*s38N>BmxVQu_h}vp&V->5#Wus?HA=a_>C?-C zn-u7i`fp|Z!`L=a2_ZhU`7{KQQ%@PG|I|yAlx(lBf;1nF2NnAdDipCaN}+zDhkY*A z&&BG|8E{0IaOO-oP%sJg;0<;7u%ZW*oA6HOW|mcG#ezS#&H#pZ`c(=XUb6s!P^?w*9_q8Y2RY4gSW{Y z9cs^;dzW{rJ#Rh~Ku=JVn#|HC$?ZN1AT5RCCiiA&>q*C@8aMkxvmq^o<6PVf7c6pI z7_El!44n(wD2ukg<$vWZe^wlw{Z@#5XHdq@Aow3X>gxE4xBY{@cG)LrXl)N2&-Y^f9l=7u2o_S_IT%lz#k zJ0MB)qh7g9BRY^sc16l*;7RhV20qzSu#Y8sV0KaNu` z2|%03{O2{OM)0j4(?#`b)$_*Yr&224>)8xm)l^e&hC(jVSnPK2E=bdLa31d8<}AYX z%zjhIryDng@S?o!NY`&;q;Rc=1Z~blZO$}p&X_i5UE7?6w>cBGIg`7dDGU(PWik2q z74qIIXcN2t6{bnxVNB|S4v~ykyddNX21oC^mN$Rroi8=|chsqlY0r%-Rrwhu-6 zp|EefmDsi1QKFtlmyyvIje=>%Nxyi6fJe+)Nvck>BRTffgkR5j>0XDe7hsx z<;WXpyUWGx`=LI(ine|m0Rc7X1MeG?GZc~ilHeVgb%Ski>L_=qz?X>Ti>H3uy&2{Z zORI*WrU6d@4#q#^IygsY|H`3g3*|uTk*o7=;wC8FRkev0phn7ohjPk=uZUV`KZE4x z49zn!Ojq_49)}3E_T%cXO@m=fj^_UHq#7;@g6C>4eHFyQo zXWb?AP%;wfM>@vvDm#KUoWlcL;s0+u1xdsoS2vIF2BPG++F-k}(X3Vfa?q@+opW9- z&T5J9SHn|s|4gXT3~8(O&e4`EUaX3!Amf9YC0KRB@Ba(qrv)UERJY2k{o8XOfRv|K z-HKAHTLr6bNWABN4FuO?2C`^RgQBZ@;4!2=p=kNy?pV9q2Hc>HH~vv;nzwq=me*tb zdTa-xA#pbx`2p#?Bed^P4TfmfW(zI40W*cSeiFF0nX^XFEcB+$@WbkN2^a-eezyYO z2;Y}~cr-U8=6vhcyx7gzAA8$TnR8SMFRM0%aLF+VbrizYgs};4HMr66hD3Coj?Ad4 z<7(C|$G7LmQj2ak;{$M7W&8J4~Nwpp9K=WVlYLc`mhy^*(h$r;S9gJ-$NtIhe= zgZEkn|3-Te>0d%$1p)St1DV-qTHfR5%AO1buho8oCqXrP0xNvUp z>2DJ|?vNww8q5LEb%AoP%upWmx>-AU-D_rD#roI6-j%}|=zcif62Ndt84BSvggnvl zTMXs&GvDDe>@hURxgAtT;=9}!-39h64z>Rz(EkL%OB|Ouzk-hQ7Ds-@kzaD;)aC0M zh{n`($Fp^YAgU23f~W&D%wT_Gh{HW)BZXZab>$7xfT%e2AiJ8nS>Y2RN ze8e$JRV6FWy;)+^-odqmz*FuXamk0K~pe4;=eCYn3E*tr_Jl>M<%AK{q#q@#be`f zH9z4Qf5Pu%78-wC4M(VW(q*1p2_0-z{VMI~f8!~P-W;)}xt~f66AQto$IRL*UmY{+ z8V(=3wL%4w>Nr*%6B1sU+?*Ny{U%sULdu#Lh+DNS8YOv)mG4I5E)p@c0Dia0Sq-@m z(>*CeO8+BJovKpMcI$Yh|CdZUz@cs(M>ucd92IpCYibRh>NxArcDNE}o!=p#@*fVz)ZRJSeKYKzedp?wHrHm-b*g7!RmGKH?L{mnhbc8Eo8)d{$ z1}wn4GQfWzQVgI=Nc#Q)P2nWK0m`6n4bMU-GZRDwOEo_$2pNfRrIOB%8f z+HA^3)>n}vyYW?oUZvGWn;EQ_NEfv2X+nLP5N%%vd%aVH#VKe9rTPhr`=Qq;l_V@q zLOUsyC@fAy@0d!FVk|^>4b&$Jy(Sk;om6gJd(5k%xHz|LdfL^LT5;P&qKby`x5Vge z({!ZBmxi$ZCDV~DTx7WTQWnI!`3Am_!#9jvG!>!Urm09#1VM*`zpyo6`X+U8X@i*EAOCn)0#Wgw9uvqy_rJ84;}MYpZb49Yk`Wkx%35)NO1%&gA} zl<~wi!u=XDutONw-|$AIMb~Vt_x@-+*nu6V^-H0?M|okXdt2yXQ7J;-fDo%n-5X~J z(YI7gmBQjmbe2+&2#X&<=O|S!EG|dqDOD;gE=503A<`z-mAcz%qDuUQ=n9o%DV3uQ zZlckRvS`bVITqc^5jl&VmZGavtfe5u&U@3hQIr4B>dYdL%u>p@B@F*+Wt2r*_yYJ$*t9b+PG52Xp=Fc> zgZJl2y)p5g{0^rB{gyPU=aB5@kn zj3d0c^lfjjzUE|pv0Is;i(yGOc+JLN2r=uLtT(*Rj(=1L<_)c)hersBfE_pyLCuLe zx1ZjjY=QnABuie9(C-L^27j^s5|*bBIk z47d4e2W%KB|1QO%EBY#BQ5r$(sHhiEQ7>p6$s4Gsp^=y-IBa&vIR@lBe7AZ$6m6l> z_(e#1Yq!T}(Tnu#R_obI^z66RGy1^-xjAZXqQY9ML@_WXD}$xzZ}bIu(+|ZQ{0)9& zmGB+eNZ*hn+Rc3g0-EbHcYOu|7PtT>V$E|cYj&N^$ScqPsPt3g@w^53FJ-$jQ z?)AuhYq;0rE-DXruf(i_!<7t4dP$)hdeixigqKX?93^_I^(DC!UXprY-UxR9Wg3Xy zQx1XPyC|*W!=DufGE#^g-!>F{ENA==rXSS01Z;Z(uW7x8Qp*oRc9YOIF1rO&0EP6| ziL7C!xJndH>S}Ed>DpNLYp8X<7g+ai!4d4+<8c=@B;I749CSC5J<-ewqINkDu@s`h zlZy&RB&k?OSa=_p`&C0&c=A;xg0S%9Y-M!YsPIa)4#!zbQ3%~Su^4lqq{O~3k@y69 zhsFpJaT47?0RgPq$CF}5^o4Lz%fWNn3$4N7W1FzMl-d9Z`*AVi@#YYR12t-i67Zy) zMwTKo9}RLumWpLww;#nBN3mW9p_Q;}YC4Jm!Q+{W!6kDC=9>=YkM5l?cR85DC7m!o zfwIV>@Kp?wv`fK35`h1>o3`WJ0?y%ka*o`S^XWY~7feU7=!QULW*bvkWXyvbY!3s< z%Q)*v9|x4*ct6Xm!Pu^0s~{@TGRSK+5QSRGu#l;+W?fGfU9`V~GhV@_I}8x6N&gCl zoz6V7r8DMt4(86znBO~?i#ub^BKjV|J5=W;FkrehfpJYsp{$0#*0mVuLf|}pPtFtf zFK+oC_;*5jX zG>-woHElcyc)Nr7x6YW)Ihgx8W7a#E-*m>@VLFJ<`G=#JlUPCmdb2sb_22cjV+kZP zme5`T?H>*)18Tj-kC*L?#1_$J$X|!w!%hdYOZ!os0?Y;nQ`s5Q@iW`**PdxUN&%|3 zh{YGIi!Q4A#b{o^(#2TXN2+8sF(Dbb#RZ_jP@6dAlg}eI+Xtz8Jje*Mv&N^-BoF7A z#6{%8vfAWHKAFO7A25uwS@NL0qcx3STb132^ghVmh_1ldWm;rxQ&c-kRhA+^QEdvJ zr}d)S6um)-PR^7*6aa$Esyn+pR=%M;zh0YT6Zm@(&oUY>8pT( z;BkRySN$L%(?=>Ri15{9ff?jPHbA#H!zPsUj9fKiB>0aXDRfQEBZYA7S<{agy;}bx zrWQv-DnPwseM2uJgZT;;tK%-9H`E`6i?&#Ie-v)FiXp{Y#qy+ULz7>&24yCsS;uoW zi9D;uDuxEnnJz(`=lX5>gjlI8xxBorD6fJVrm}&dS5c(-D)tMCyMeY{Ga<#UlfvvC zUFU7Kf1OnJbrLGQ7y&kEHP&kpE&G!$hUL7Q1d zqz+wyjOG>CLt|Wqm8-3yJ;@oXt!PljisxEwDeK;2PV0`m$3WCglBGnuRYU!=-($9B zTSJ$_R>obs8R+Z<(`k&()juX=5~Lowy-0I5SHuQ|o`(^79!r9am$9-z${b4$+uk6x zR|~IH3spJ&fvBtogMk^<=%VR)a6(l7JZ9ph@|fop@UdrmhStN&sUC|8rUim(5VzLP zK3@oF_k1BL4t>r(PiUMcEIBp+T{5l2=(YOp;u*2ZWzynU6qBfx{T4+TWW>jS)AQcMn!R6L0M@nGMT@}dM%3k4!x*eDMgEI3irLu z^$Y#%^~}a{r#^(k46ap<}=xXrFP z)z207-EK9z-KbgFiwr8YufeO<;4gPG!OXFMHON1(ho{%q0wcRe3O0AX(qO4L4XlsJ zqHg3WeBr|Mo?@TKGRAEr26lajr#NJ{&Qne;C4+UIyp^Btd@NQ8nLScFgS_C&7t5Ro zs0~i=M91rcg+7r|CaWT^oL%6h+$XcXFZ0wyzQFcnl*c6KA^S)VXR?m;n683<=ntml z;6bo{IhMmebSTN%rMzDz(nP8(3^H2JDvZG>?)lG#qSPxWlLgbJp>Fesx&`kD7u8tp zhwslme82J$j~rPS{SFK&9UQi8xh@G^$aGowmQA1*oXV9oGE{ zde5l19rpVYT$BF31kfbk$7pj49HJxNw|=kCQ!4KL$}wVFW>oeEGJDq%XS$A{pOqW3 zzgz7ag;zET3pP?PIp9oim|wy{fizso3@RvxrJI6$wmWN=p{Ovs%(F-2OAIrz)s}J> zwWU;pDfW{j)DOvtw;!^>OXx}Xdjumv8{R+^0$iEI4^T2KNW&Ldd z6Crgc2S+K2O2)E&vu9cRxOw!F>kqj{M@c1F<*?jV z492ajzjnFRubX#3TwUM?{-O(3`v?Aznm+KKy(Ie`AG<^gsZOGWM>=h~hSzoN?gh8Z zDtMR;R~J2=qUk2_w_nD5io9&U5e(_djbP=CV7TH8{l&gie^RIcQ}C4Bhs#-2!I##O zZrC^OeBkjAQDwgdt$O#$o~Wn&cu30e5c?4!pK_0&Z%t=0I#NH(U89q_C66sA$S;gzz&3rX!KvGpWIrA3Xm=JjM2CpWD-qX3)tIdsQ?XW*9x z#d&JRGRm$djqQ&Td*J+LWR69UC1$Ex+>1CQSy9X_Iy0t3D3H+9ejbC-dU> z#wet)NqyKZII6o586txWI2+{|r3TQ5gg)?YDP%J71LXhl_1U z6_i-9Vud1AYAw^MrAn=>tyQ!Ztk}{zS{$wYb|mb*!yegT?;Tdyd+!}~fUx&`uHg51 zpXc{}-p~6-a*}hMbD!(J?rY!IeIIjG6h$mh9E#$tf`j0@!tgf&0TUbQ;A7E8$a_iw zNXpTuA0Phm%~gBk_=FOEkC_Eo?@;=IGOPFr2c-me0&6v>o>FxltmJCp(Uo7}xHw%{i>I`&bzZ??{qamn2P3~vk^xjq+ za{6b>Um)9?M7aZR(m5REU=$}yyhz;rm4(%_htGdkIQ8^7xnz&_C5s<^`O*6Mt7nhE z`!d*%C>%L-_O2JnY*L1(c>Ttw))tSffLl%By`mP@mn>db9-cuJBoI(4oFJ0Re^K$y zbkkG+kn)mT^)Kw&aGioEG5yH9-jOp@Aepcbla^LRk@{@K%NLV*SIp6ii9Fk^;8rQQ<1B>1#4r z=1PI&3>~;sExh!yp?Wj9SYLtgRNfSQh&zdUBBzGb{E9a>npKEAjiRS$Ahw8k{ZA8y zB9sU*S+G6S71D;+`*lu(WulcJ;~g=hoMXP~`0EB2@918Bdm|^ih%W*Ou1SKw9-;{ND=y zIbVpm51Hsgz8%(G;68%q>J*~bn))5-DR_NJ^$+Jm@$e9e`uQ;|*V6Ls>9g-^S>Hr4 z#s@WbjA*A6``N$ZjI1aa#oz>_s1T}R{9z+!$}c4I$Gff>yh0TVaJU&mA?1}K!&P^| zumaHTx1T;})PTam0nzuf`o40jvgmEBkN{Vy`<_azXB*aoYa@p-DgRM4`e47Y^iqgd zkQqaDHja3fjh`_T?>!25%_<*AptFf0^^{F*qN&wWOSNmStbVXm&@hOVmD4=?XMrrX z-^^8(XjO3P*PpC@xb^E#U>#nlZ;&ccg|asVXVR2^Fi=;n-!RZL4%H%;>29GUuS^F6 zs{;Iey4`mt5UNhFthT=v6n>;9Uj|m89oFB{Jt3pr)c{Mu$!&)WHAj|1zqRIf&sXh__1 z>-iH4Yv2wG>#qjUKl^2WJz(Z2OH{ai<6Dc@mM?C-dHU26ywV>mo*afcJmyCtZIQHi zyyPoQL(LPVeErYBC#21*saJ_0Dc$=Cc3yyog$6nzaLrV%oIh%)dFkvYN~Un`G{uXK zgOhau-$OpyL|w@gJ)rdsPCFRIfBx#pORHa>yvas|*GOr-{~8iL#M>VHSpGw?&mo;n5XZl)94^EmKJttcHp-^uXo-}Du~&=9mT=S zzp5Y@X?C2VUNstnH_XeSt8(bqz!fKqXs^VGGYtO@Q+(dgWC%=wL+w9NF~54M-cTN& ztJ8;)0_BKCW#D!?gUee&k{YfI~kBdE4b z!v_ZHpE$edYZfmt&j*uqXus*Jzmzy?r{E}t>hBUG^)>%)r7NAp*WHlO2;O^AyTw|~ zP~EoUrM{-Kf88qy{=V$H;B8Ur)zl4+LC$XN`W#&RZSr-(Nu1i^$A}eof@J=znv^tw z;JFhx6#i26i62^}kou>;57(F?^`D!Xi<%%I-`$fQ%+K9Z)L6-yAQe((2!>Z6bnM4H_NlxCXgLe!C;e*6H7^UZ*l;@voug$AfP` z-BBuEb!pocoHkU4#cYO}Kj(a{dQ#V4I^x7z&mX^3Gqw8hr-7kSz{WMh2e}rk1WZkpR#S$Ln?SrWe^ViA7Oi>|2iB*8mlCu3x1>U&xm{P&~H78 z$Y(Il$g3VwcIS#AoDL=|8d~;X)WXI7d}tTxvF^0$!OSV;9ROACoE(FZ&eusnFO6PV}oKcRDW|; z&QNpaLuJ*Iu*a$@dJ<7Z)nHB9UGzgm#OPlm8j2&L6G}umnhc7m@83%#NsSyq5ie(y z1V}J`&Anliq_LmWZ*J5+d1^aZ~E5^)!W#xx*ts5dCEH~=BN^=F9TJ| zIZd;3hUXs`st2I_17T!W3nT5f|PyM;4_E)xGz zL;i<0lM?JR|KDIriu{kE=8HrB$bWdz7gaQuADTAwruZuMA57aalo~VY`;;L=b-rFd z=i?YdAp|}MRd|!UqsIHs$Tu3Uf^B3-ZZ2z-koqb6R5eO~%*frhdJO@+Mt-Z;G^HBe zc`p$^lP}e6{hcxOQr)JA{Pb|Y5hU4F%-^zoAt<=ZQUymO-F#*J>k~`0+mLm!xB&|! zKYZOUbq=H|?v3Jw|2hZ%Rdk1SLDQA+VaL61%HU;5faFI7os8=1i zWuO@obp&)5aq*?j2&$6**Q9}Z(s#|erU+iPVGwj2HmqO(euD+3sk;{Y@s}j@ADp$6 zc!=85$LGLQSx9?-oqX8HO3Lqeqc@Nje{AtPM%8!u#7KP#lPWwzOPZooMhGi-O?MDzrFMEJNcnB*xg;0jmP_Lasfi7wQdGE1Ko0=(Jl_ z9-pe?jGpt7BYq6|f@jZvSG#QS=9&ux)UREngI_E43|*eamlWy7$ZG$Q26JA^*7poG z{|>P}4<6}7Z`>KnfG?oj`Y=psr`|~;Lj|i7dLm^RR0Cz@jZ~I?vxhUNYsdwT$@kk-TEJhY9Ffbi6oQjbJaP==A7+J#o zIg=ORpVaHKiow7T&KX+9fT4?#MbdPR%NQ|bq`vVD%jeJ3Zo%@l7e8H7mjT7EE)zV8 zKO+_n8K_@t_|ZUf^_!12C@z3_qQsq&uA-hX#xcg}lPAClFRpz63z1%bfF|$St$pRwgZktUWbmw^`cLcDhME+s2lr(EZ|7Y8B&q{g z#Yo9%b&n)HWQ|Xqxb^b6_0uFbVtCUSCmNRUkPU649Td@g!I{-;e*oOq<}Vv+V)^Zt$~AvZ~}itulRsE zfz+Qkxgh?GMD?)Vca*S4ZO08nzzvjMYyRGnWalR8C%Qw@A0+CDya8!z)Z33r;HcQ| zYByrYB!5d#DWD(iff3Er=|JV`Wkk1?Z-VuCVfE-nKmJd-J@|cQ9zXZvrNPtjQIg7t z%;HDtU9aEKO^~H1YW1V|%KUxgG$KOPJ&BO)vc53{zEP10zIajnk?K)6XHkWd8b_j4 z3km%7A1qr?bXXe?%z7XbPg1j=NV*6|h@iHoGVyQ<$eni}L!}8&)Cl=Qp z;E)Udz}LdVbo@>*a`IF1xdSPQdg^+&6!kU7Ry*{9#Jcxzl*5>M&G3M}&UqY+FIDGb zlD7R%g9hrP`q_^)k2EX(eHg;Jd&fT|Yv=Yk>ZzOgJos4i>Mwb(9>di}ptc_bBD#-d z$&!6(G_3!4*OH-*BT3;NkxL_~nbm!9pc@Yy_aqDcDXa{q(U5aiui+0cQoq&18#LSj zM-qv#MjnUjNCgbig7km?uG`DvFR%h<<$1$kM^Y4x)PJ{H9S%m*s6k)ywTVjzY5V?8UbCQ)A=p8xDRi6N|1f`i)Y)R8D5$^78?-hZyk;Lxpi_e{OJ z2W~s1ieHoNop1m@yzjtXMf01-i91u@Z+E(rC*r@de559wSve}%uP9^o5gxrro|o7D zAMeySJA|K;IjkVr4;$J*u(pamLx@Vxv=0*%h=gyRLG9QRt7pe=DypDnGIvmU{EF@w zNw#CKYphjqH*z2oODoR*_O z&Wor_#16xnYDoQ}@^^2_naL0l_&y9ZXo!yDUf4y!ND15{Rro*}0Dll)rh5}1shyT{ zgh}rMxSQynLpRUKm?1Z1%;ex6nS=Od-3thE^ydo*EDc8F^N?y?UBv$xLV^{LErbim z$lfJo)zfn0KqnjrB^=^FdJ6{*r1_*{Mq+h*a+G|%9JoseGBBhpF0p`S$Ke~ z**?nmCdUcH-FO)#D818H#jzI z<%}@;1tVot4V5}B(uEP=x*S#h?=A`G`|$cR9O0uwL~R{deTgBb<)kxS7gb4UDoGhX zk*LE&kS2L1Qhl3=$fcoMEaV0YIq@~|91)g3D-idh4Y=0-?h2=DWCV*a3XGI6n5Dwh z5QGRz_hBU{s)IWY|J2aaq!(6C$-|16g|+qb-wh@aAZZ}Q{mohAx2I>3U#{ruo|VO9 z%)TKaTNB8&3B=+eb`T%SWog|71NFHQqJ?wTRg{5!7nC0g(NjS9ckrGGY*^><3PT=i z&B(!a8WQ8DGz=x&{rSF}nJG}#$oMqb*xJhC$v4leet7=K^7C6SYu8U zzd5103(8tl;0p#+8Nv_P6B`wN1_Yeh>Ogcm5IG|wtq#Q55f1~aaFw(dcj-GE2^bT( zd0kdcnjeELQT}-OE%uzcavJ<>=?4@a@)=0gAMam&{=+M2lKXSZmllsKURr2b{ceGr zltof^&-5O~e+n{fd;4vW6|@_eP?m_wLOK?*iiiDngAggo>vSEx@1SpHrjau4;&18qF&2C;J3Rck?QIMB^G6%Mf3+g{ zLT5UzFia~{DBzDl3l_zw-=_z6T0mu_Pdfeeq&#HjG~VrgUw2aWy|WhI{Ja6z{iD)w zvGq3Y6e5F*y%`_aK>X6rY^Jtnz+~m=&4L-51QCWieDg_b?epPZxze_WbcM z@yU`*_LKYK$I}3P+;s(@zb-tP`_}uP7d`LaJ%0Q1C5K)Ga-tJJ4_U{3Uv^u5<1cvl z`j2;Hojq=6{rj+->5=vE(5qt`XW^2EPo5f*4?>4gbyM?ewYq@{F?SvnK0Nh>YT|<{ zGD?0^a1CSS;!WfcXX(Rsq5azLO+{xvzk0HEjjnUh1%N3H2D%L7Q5fyDsm;poxyL7B zM2)XD{{HFMnifF0U3!3@`OEnOH~xOV^}5Y*YNeXV_t_6^2)_Ze{P;!aT2JozGc`)S z`rlGgr&5%DJLdl3JZzeC=*+tbsxxF7Xh)}Z}j<^I&|@PCXRy=MZqrS-+DUXoJ#(dFKIjT5@p62pvEQ(yH*R=3E{0Q@v! zs+)~GVrf4-O{lx>bn$z3wt zK0wWr&jZx0S84H?)-MX*UjIVk+Vak>#vgCBX#=$O$hrT+rTA@)Ls0U9?G>P%6mp?tXmb$wZKUzxfP|`7NP&j-VKd=B75^h^r z{=PF6c8GFHhXA&2Xr2jJ;vN_sp2p#qrr{dg&Sdl9y|~>fnmW4Yb1W)hW$5#CnGpw6mzcgXz26^ z&z7L~5I@2()98xuY(bw7pN?V~lHumcjV{-CiZ-%IQ2^4!cul1&73Z^hQOsen!Og~Z zE&e$)8bi=G-FC@d=0)UjMo}!7Tamz@peIkG_mKc-GA~UOxttNsq@h?sM@U0$Xhf|a z^ezlQ3$*#@T1Q`I1&YP^I`}ko&sbs3*C|IOTYgI}6{IV=|{xjT%Xv1KVKOO2%xNy}RXk$e}1bgOD>4a>-RB)~?n z{Eb8A$e^ko816$)erT%b2xl3_+U2?z-`izW%0r^ZnH79v^0acjAHJ#%%$awtj#J)G`=;FWpTkI zvuJBFDXNyS8%wWfDqEvQPCFUnIEm`XB|78Y zn5;BNV)G00c;~SM-*IZ}X1-@8`UYt+M$``ABLlX_Y(h&dlGI!VH^W)3J$W;zzMLch zr++^{NZSZpUgXuGn9Bq=u+=HH`w=`X6-@^Y@f7;I4LZ0p`6wpv<+o;L6bSbLu!eBd zWR2?_(#$O6y~5nssjht^{Ry@f@qZNBf&;3EZ}V6lZk;LM{DpbAcV-nej!Q1VZCi_1 z+U^f_7uLsD=JJ7k?89iYtFdD(?utl42)DiWc- z?|%4pSl(-~Y?##(ZXR%Stb9yQ(gk7?r05C`N4^t)lRMPn9EN>8aQjjrG9lfxbul|!<4G*Vh z(#abh^W7FONLX&jY<5%qU^D}zBNOQFz{!GM*SfrI2MiR3DI>OJ4GgPtWF?V$N|K8nw=Dwg^o@I?y+uOcHww z8BT*MoR2WRYht3m?b`4iz@(ON*Mu2K^-JjxFt1|XtJ&k5P9aP;h9NFf75GSV{>UBeuhC$VH^)^8quKd5TfV6-{Mp{F111RNLC-oa9ob0uRHL!3eFx$Qe z7PX3*H3CMZi14RrW>RIm8zyb(I?>E%b+NMiwxWNCxgJG3h<7p)~ zVd&flY4?XDqho2k#Vjxl2`O#8^_+!9Bj}ij@;-DdNf?pR?h(aVM6r}2n(Nw1eQ_>& zZ}>+OCH!cSBzVjKupdQ z;y~~>XHT?wR;TB)y#W!4D7AH}+xbIyT1K9wsp;_O%1T_mJxhXOtofn&<^r#&QR57x zAnNg*)RIZok~^Js5(^%VXZt4Dx*i7fgEY?E9}Z&;rRC?bz}~yjOB<=}8F@+oiXi-P ze}>I;^&Tx^dZAd5c%;9PQQ!6nJgbz#g`Q2$mX6VA3H*F;AtLg&3!Eg~BJ>X=5X^%j zzI$}wq0G0LH!}>07~EBZkV{1 z=M34DNAoXWp>h0yM&3YKFl8IbLGPYAfUmTtR~3$wf<5wA4jC`tNUAC*KgkWWU(qvZ z7gf21)hzSOFj|!7KxuHkFbYO(5lsUw-{Ni5Zw1@1;BnCvC8hIa%eLP@7s?3tlLHovy|0z`^n`^2xI+vDVK@o-;t+*EO`PdNivh?*81UlJd=Nn~18yfU#J+nlUP zn&4c)B9~p)96W8NxajYQgrIkS&ZDenz&C*NCFU=3O=#R068TESsR7l4v8i39&G39u zSm0Cnr$pCA^!b&?f=*~>Is$Mcdem-vx{8^DV!3V1nyS+7#u>^0QYLy2(A;*Tw=Rvg zF&<$4n+>AwaAu?lC+RqNhU4||&YSllxZxib=+@Iye z;i6dRc2#G1X?>q1B&7P#-_7^qD+@D1)&+i?M;O;Lpn6DD^Z_8cJr4?dp*wcIRvgRr z1WUUJWy%fVjoGL#G9x?!sFb$Nap%ix_|GxdvJ|GnMuqEBK&2PG1F(EijxcLzmh%fH z@Xqja>k(J{0kAFfE}$7STVCrN(98K3qot&e#JX0*li_();kehP;~dtw{dxrhj8SgD zwjg^jb$JAKIGp+H8xx<*a&T0g)CiF3>ba?Co3WFJTme&;^w`MFjusZgWl3r>t88xoB8>#P>FTh1SSvr7U4&w^mexdWY88_O z++5`BWnQ_-7BC9Z80jdMU)D4?!b*AqBwmeYk#`YHl7zIX&h&XMnCGBqu1A7L<6g2D zkG=*(hHOH%mn)i>Dww0hZ;6vyJRzX`2${8HASAOfd21^sgcSg6;#uMn6g;`u4Zuc9 z5!}cq&v&gAWbjKtc(S5A=1X0(MJJJp9eol<^_q1S1aXO&e@hoHD!6a>B=q2IY~ej! zoEZ=<=J}#nd=Z--UpbtBKl6~R+eLZOce7l7Cs#5(UTV$Nd`zF>i3f(KD{T(wzhAI`6m|H z!YFP^N^#o_uz5!|rRPplqX2Bm0!s#?_OK?y3$h!|0c=5I&BS-0-B;WY!?VEbdMaDQ zvm*Ob7Q2LlFgghZxq@*)BkvH#UoD!PFI}biqt(b3Iu01Fcx`zP43_f@G5^_EdbdMj z=AS^vO;lCNmoxF9jYARaHyF)*R?;!W+sLA*BXj8M6H0h?-b`-koDWB&ccy5oXmGs7 z?pMI0*|-wgoTAsQcO-d&^Uer!j%X~aN&6Q@JeSf5M2_+q=~#&xWu~E+%_f(Vxe{2r z4gTgz*pAwC>fiq1E3oLKYX(}&DF!m|5IW6Qso0a}UtyS(`{F9h(N=R2ahhKjW zU<|@*lZcG%6rVL8hAGC+aP=6P5=W7QEHnw8HCf(G%}E<&gL9r2)Zo;ymN}entccVL z?Z7B)#|>`IGYj}#U%sEq^Y9u@z_ z&9>ppQ5HE?L8YpoZDE-j5F_=42pJ}{t%yDD{)dj&ofR0;MPjAXS+upGZi|}6x>~3wcWo(iA3zt`#|TMfV`^C zv1BG5Zo0qKeeAk2u|L#YaA~irI-Z^v9Kh2x<40G>_aO%u#O;Z3ivLoo3u1_a$M|)h^ zOcaX}m&E1`G&WR&b3B8l!q_ck`#QVYyRogX@HW0vcuV)%Dma}DiMX`OD+y%`e(@Gl z2FomCgoQ@Z-R#dHYsP_aSAo#8QB+gOiH1NMZ#F&WeI+!_=(#H&BXZ?x{Z#Sl_6+-e%$BNmL;@!1hT5TR>hcB*|)KjjP( zPVm26{ zO=E{ATS9%B8ScR3Et`JZ<1Vr75JMCa^zRoYdsGT$OPTFptdas6x=MsOzeAJTQP=M` zHl|UXsr^hCncTXlFiuEt+ZWJa`OdIKwMUMZ4fgUFR04le#y5CaNmF|$g#D)}KjB^{KG1sZ)2DIn8BWX@qt zy8SeDB}+`T=tGu_UxSCyJX^prSgz z+&j{^4zGeAM1O~XaV3eqHCMvpUBRe>ZFVi=S@sHk`;aNZ(aSy_tSWZI0^=EG*AVQ` z+2oOGxd&OU`1W#d=?cHfjmtZNWr~6~I5F#<Hg~a8|4Rw^*k`M1i z80a0iuRWn4aB13;c?63rNuxK0GJ@uz55WhJJtp+dz@9|MX6{AIbuePi*00a6SsDc$ zsFydH-Yo8Em=0qf!4f0dDx(@lgS-JWNH9~29(UNJI`*^q3k_J-4?K(C~p6S`ok6 zx4>}(vvC(>G{HS9XwK(6YF6>WB!oTIGFLjvy@o~dw^(DW8pyYNjCc}0HVN3Q%%97S zV%)`Yed8J!>)B54kgt$;@>mna&BBuIEcRcRzpoo-HIe2yPhlZERGKp=E4Q60bDH6S z?`qeuImN2yK=Jc0j zhnI@H9B}~%mk?l@=MvdoXVb$1jTI=#j7s7+W<;U^$QmKwvRmzvS7e3_LlLtLnR4ex zCQRXuA`LsahL+Hl=#-2OhAKuM<}7i=j{Sk)4pu|z9+Ljq;@t8?Ki<1oK7E^8xzbRe z19*(Wvufh(y~kHWd60*2aA)Lt7PpDrz+oRC=v^+WtnQib6mgDW*#UIdmaw`F@+uh_ zCa7K(EN~On96cEZSmZ>6=X?v3O+LRL>E6lA2S-(Oi4(H9v5+IG2o_aDJ5c?=drqQ# z1EM?XH{j6~y~#g_`AB+l1cQAgSCWwiV+nXLS@ABJAp_ei7f3r5l^3Vg#?7B2$H^a% zRW_mL3Ikg6+86|k=bAI!S7^iPfV^HC`Ua@)SVRw*>#JvzF~?SX-_Hr?&3qq-q;Z|W%Lf2ggFB?()`}^@( z5$)Mx?>YWsj2%}Q=((7o0$LDV$J5W+@IJX?amqi&dU9mLBF@)E^&MI@`GfEZ&3O2NvIS z5q6I^$FknRQkFx7^L1V2Uz05`jlCvA-QMXLn;h;_%(#*tLl1B8Vt=2?ur+R(z#A=NMGu%_}D-?b8l+EsINz6U4#RN_gRp@$%!>v*Ol! zxDc*;q@*{7`LtCZ2VkZ6go${ggR?y~%9X*g1Px>T+wC=R;8m0YFVMnL3kE|y`M+Wz zzQM8awVk=Jw+ylXIi21tNB>IiPA)`QkwHb3)12zDAHZbHKyD94XLT#rkK+jW$qc$H zpD&?1kf|!js>NfVWsGFXH7$Ie0Zw;@w_s~Jq@|)5jPEc(?@kgmlIai_!o7#t^Jk*! zo!K_3Fa)tE9d2{(YTOJj6iaiE`HmaoiFu;$I{kHXZ*G zV6_C*J5iOhWix|*jIXiukao|QdaCnnAXgu%s-}eZaY|mRDHM|5#k4sSvwjXkt@(cf zrfxfWe!6FHR=8Tn2R|~rCnlGZn9`OBP9|r^Kn!lMyb44#P6e2j#h&G)9-l#>GSsdd zy+0=m%HYL&@}6O74KpIfMwf_>0&m8nM}ap(61#hN&LvztmJl*sUtPO+0kE7kP68Qs zjp6l_mkjbPF}u+ad;7pGzyAO%5B&&WlU}2aX={00eaycmteoZPF?JeY6O^j}b8=x+ zm5OYbA7S}1^vz1nwCE(jVh9)SEQ#lfN+cWLWQK~TDmMmKGZX+)2Q;ZJv119=)A^rc zp+RvWl}X#p{{qAvAk(qKjw!BFjQ22?np!r)Be_%tsMWd?y2Wp|xQ-?V^En_rA$95P zn^X4W4-X@~&}mMM&9IBOfeV3rzQ{J*#=$QA6tu{+Fa``obY5L%3Kb%z#KDE7QDMl! zKS1Opnmn+>Xf->QDk@+?e4L(OpENhH?J-KZgt(w)K&?Pmk>g5FARrSNLhY{Z3zeo= zlTj5A$HXya?ucg=?;XsR*_1Lfmt+%2$wVe8CjmBD?-pcVy2;eQQWAofd#h%P*NjV% z`5ivlu6N~jWXB}^?RJJl8>Wuyl z?t)KiW2jw@klcDiEp0PmC~sjtq8DU&h@fwhT;W_@ky*st53!a@NO&B>Qvl|41jw=; z(U25hv&GE>@0#vDOz+5P-UBVB5bl{QjxBE0#b(LGhgXT;3~aF%3Nm3aln=Ca5YX zh7pfqe8-isg+#%CLgztv5sK8#Ei{(9g-8~7e_}RS#mN)7V>}SM7E}-PSL&E!>Okxm z+X7>|7W%FEg-u)_TO8aOb9OzeROB4V)CY!57`KV9T`jhlLl)3GFm}ZoE8Xq=0tN_V z4u3&18Zx;(3Hl}j)rWioox74+=TXD8#OTvuoW4Su`*k4n#2|~zKWfq$&W;VC+ws&f zUT*Amx3|-%2iX46oe|=MIrck`3)t3>sV|I4ZKU_VfEy({Ba}k7bL?sraq~d$yC(;) ztQN)p2y8Zv9)&z#Cp#@MGbEGu7z=l{&v9^@?70g!QWqf47r7vqO5L(&`$1lDotnX4 zwRb*1zNdi9E4a#=w2a01E|hZk*RXU~Mz3U9IQopNriv^nJl$&&Ha^hC;}o+JAcM}g zpXE0T$`y3v5r-WeP{N)osdq@=Zlf5N#;WLO^-CsSRzkMKK9}bs#v>kk8JQJ_6wauffda4t_`>w%YParTatld?zdX|# z>K+lN&~gOHw)g?GRepMBLU-?s&`#Uidvv+8j9q^bI@kn`E;rh9SV9YCG^1E*(kQ2O zB`Asvt4z88#LM)^NQ(<%ox>Qe869r1ZPnYxFF;Elz1)->(G$YjVq0QvEfZl|e#}YO zd<7ZV;iRo0LI`*p)UgXph>&wq(iy%gdC`)UJD=Ix~ zKB;i{U3@P&MBPaK6N@K5KxwPA9^=ZhAFZC9zI$BmCPM!GxKRuap9Is<3VXl)?<+|Z zepZ(|tfaFKQIi&1oV}oP{`hgZlSkc;Yk5x!dy1rnHFkO!30_0)P=F^mkQgYGaNbS z5CC1NjkW!;$$YTAfxaCZ?*2^!w?K-rcl5qzZD>VGDqk9}c~)^&OE?{pRZ0{h*s0C% zj9ywB-E?Au^G}=Lb<);XLOz9Nvj|6(9n(W*a%Tp}PmZhDu7+0fBsHVt8%hX>Cg{MK zt%c>Oc|Xo!%z4^*ro_pY`UV)c05oovw`)+lZ!mi=<~d()pE@xd@-Co~zT*PbCufPn z>$4dU;AT|JO%#=6c=%crB0}^S2()XfTdts|jC}=*n`lpS=%yxG>;vIb1!t@=yQrPo zzrYOxCr&gQ6dB4)FeU${4D67wSh^sLwqyJUb6)4v^XoX(!?agO;ZDaV3x-^T%zSPZ zL{dqCrS)!Z8{^VMCImLNGYtcI@m!7sba<|lWM!)xs~~PVP}DkmNJSBNJhK$q+F8WCi6vI}*Yetz18J^E?vAb*O&axZtWM;Cwe#rm5afsQQq5tcc`(v- zn`vU7q7<@NLB)H*c4M@PCL%8#MMJ-a5}a|qxG0+A!1w^8JBJD){AzDv5G!qh<2uV16>UuAsJ&+c66^D`jV(scr@F zp|-26@35GlCiZMjMGUzYuRZ}9*vB_}$Rm4%4{5^$!2-QBc_p~rSQFlWr3coaD1Le% zWMP^KMMgPgMQtnDj1YyAJrr~DG16t;NZ_hCkNqJQStJlNd*l{)K;Z?{>PqcI@+LFX zW1Ne_QfVtZSEsUd@&ybg-BiK<_Y;;^`g!*74q)kndCu-}4*cOMgo_@9+uHhTaw9vt z`SMs;kN~PR}z}D*bwwow(hQ z0ue`)XGf-wu0hBgy`y1mTLXGB9qH`5n15DxikEG!LmWg7MLQ4lsqTxkiR43+7(jEL z@r*9=1#?`oW3=p-XG1(j*BHSNhTDooQL~K&q5y;eI&XAie6laTmScgrddB#-caLl< z96t_|z0$ZowZ6H?J*uAto-i#U!`(5L=3E3#M(m71I4iQxV+8L3$r!J14IPY$VLS z0nG}B;{_?R=^R-sXL)8dF{^x%Tt|Ra&C&`fFD6eMn_|ZWla)}BvozxBQZh~1LLxz| zhDJnjp2H5z88w^5GM-~(Y+W2k6(3|Ve}tNwN^NBfLrk&g;o-UL%5sHlaTpL&5ue@I z-V~QN34yMguir{_b-Je!?MHmvE0Bx%|?YV*o_G2tT=$P5ro*Jca4}*l;QPU2OJa$i7FB@_I%nr}^ zk>R#*4RR9(PLRv^M&y{cCmYx9vf+5C{G+^b~AeLG{6 zn!a8d8<5J~010yzclkQUEpsfmAP~|jzRED46!w;c4_rZSnzx-89h#y*x8L#_Gl5s= z;VNWO!R@tY*2a$IRAm_lAW<-e>yCk2rGb7di0B#I1_%2r=3p-jZ#3bUiGzo|=L{`@ z356+%Nm*&*OCy;ICr%+TJI`AWsS8@{%;14H;@MHUHMq6DU=7cU++o8lp3pYS$%O?5 zfe|rZ>E@kk69N