Loading GGUF support #2

LysandreJik · 2024-04-19T07:59:34Z

WIP

LysandreJik

Ok good first PR! Let's clean it up a bit, I want to take a look at the changes in the from_pretrained method to clean things up a bit as its currently doing a lot of changes in several places

Also I wonder if we can't change the loading methods to also only return the metadata and not tensors in some situations. As that read is done sequentially, and the config and tokenizer only need metadata, we could save a bunch of time by not requiring tensors load

LysandreJik · 2024-04-19T08:00:12Z

src/transformers/convert_slow_tokenizer.py

+class GGUFTokenizer:
+    def __init__(self, dict_):
+        for k, v in dict_.items():
+            setattr(self, k, v)


Need to rename that/make it clearer

These modifications should live under integrations as well

LysandreJik · 2024-04-19T08:00:31Z

src/transformers/convert_slow_tokenizer.py

+        # requires_backends(self, "gguf")
+        # super().__init__()


Need to make that better as well

LysandreJik · 2024-04-19T08:02:19Z

src/transformers/integrations/ggml.py

+def _gguf_read_value(f, data_type):
+    if data_type == DATA_TYPES["string"]:
+        length = struct.unpack("<Q", f.read(8))[0]
+        return f.read(length).decode("utf-8")
+
+    elif data_type == DATA_TYPES["uint32"]:
+        return struct.unpack("<I", f.read(4))[0]
+
+    elif data_type == DATA_TYPES["uint64"]:
+        return struct.unpack("<Q", f.read(8))[0]
+
+    elif data_type == DATA_TYPES["int32"]:
+        return struct.unpack("<i", f.read(4))[0]
+
+    elif data_type == DATA_TYPES["float32"]:
+        return struct.unpack("<f", f.read(4))[0]
+
+    elif data_type == DATA_TYPES["array"]:
+        data_type, count = struct.unpack("<IQ", f.read(4 + 8))
+        return [_gguf_read_value(f, data_type) for _ in range(count)]
+    elif data_type == DATA_TYPES["bool"]:
+        # This should correspond to `GGUF_METADATA_VALUE_TYPE_BOOL`
+        # 1-byte value where 0 is false and 1 is true.
+        return struct.unpack("<b", f.read(1))[0]
+    else:
+        raise NotImplementedError(f"Data type {data_type} not implemented")


(younes) IMO this and the method below are doing basically the same thing as _gguf_parse_value and load_gguf_checkpoint_in_pytorch_model.

We should clean that up

LysandreJik · 2024-04-19T08:03:44Z

src/transformers/modeling_utils.py

+                from .modeling_gguf_pytorch_utils import load_and_convert_gguf_config
+
+                if not is_gguf_available():
+                    raise ValueError(
+                        "You need to have `gguf` installed in order to convert GGUF weights. `pip install gguf`"
+                    )
+
+                # Case 1: the GGUF file is present locally
+                if os.path.isfile(from_gguf):
+                    gguf_path = from_gguf
+                # Case 2: The GGUF path is a location on the Hub
+                # Load from URL or cache if already cached
+                else:
+                    cached_file_kwargs = {
+                        "cache_dir": cache_dir,
+                        "force_download": force_download,
+                        "proxies": proxies,
+                        "resume_download": resume_download,
+                        "local_files_only": local_files_only,
+                        "token": token,
+                        "user_agent": user_agent,
+                        "revision": revision,
+                        "subfolder": subfolder,
+                        "_raise_exceptions_for_gated_repo": False,
+                        "_raise_exceptions_for_missing_entries": False,
+                        "_commit_hash": commit_hash,
+                    }
+
+                gguf_path = cached_file(pretrained_model_name_or_path, from_gguf, **cached_file_kwargs)
+
+                config = load_and_convert_gguf_config(gguf_path)
+                model_kwargs = kwargs


I would likely send that all to another method to take care of that to not add too much code to the already bloated from_pretrained

Makes sense! Done !

LysandreJik · 2024-04-19T08:04:12Z

src/transformers/modeling_utils.py

+        if from_gguf is not None and hf_quantizer is not None:
+            raise ValueError(
+                "You cannot combine Quantization and loading a model from a GGUF file, try again by making sure you did not passed a `quantization_config` or that you did not loaded a quantized model from the Hub."
+            )


Once the file is loaded in a pt state dict, quantization cannot be applied?

(happy to not support that for now, indeed)

Might be complicated as we would only support the quant schemes that do not require data calibration and would require many patches and if/else checks everywhere :/

src/transformers/models/auto/auto_factory.py

src/transformers/models/cohere/modeling_cohere.py

…ned (huggingface#30299) * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py

* Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <[email protected]> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <[email protected]> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <[email protected]> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <[email protected]> * Should have been added to previous commit Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <[email protected]> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Matt <[email protected]>

* Add resources * Address comments * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

99991 · 2024-04-22T08:40:41Z

I'm happy to see my code live in 🤗 transformers!

I added support for Q2_K, Q3_K and Q5_K yesterday. Feel free to copy as well.

99991/pygguf@a417edb

* Update llava_next.md * Update seggpt.md

* feat: support for vitmatte * feat: support for vivit * feat: support for beit * feat: support for blip :D * feat: support for data2vec

* warn if pad token is negative * Update src/transformers/generation/configuration_utils.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Joao Gante <[email protected]>

…gingface#30002) * Add FSDP config for CPU RAM efficient loading * Style fix * Update src/transformers/training_args.py Co-authored-by: Zach Mueller <[email protected]> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Add sync_module_states and cpu_ram_efficient_loading validation logic * Update src/transformers/training_args.py Co-authored-by: amyeroberts <[email protected]> * Style --------- Co-authored-by: Zach Mueller <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* nit to make sure cache positions are not sliced * fix other models * nit * style

* Update docstrings for text generation pipeline * Fix docstring arg * Update docstring to explain chat mode * Fix doctests * Fix doctests

@zucchini-nlp

* stash commit (will discard all of this) * stash commit * First commit - needs a lot of testing! * Add a test * Fix imports and make the tests actually test something * Tests pass! * Rearrange test * Add comments (but it's still a bit confusing) * Stop storing the tokenizer * Comment fixup * Fix for input_ids with a single sequence * Update tests to test single sequences * make fixup * Fix incorrect use of isin() * Expand tests to catch more cases * Expand tests to catch more cases * make fixup * Fix length calculation and update tests * Handle Ġ as a space replacement too * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <[email protected]> * Add optimizations from Joao's suggestion * Remove TODO * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <[email protected]> * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <[email protected]> * make fixup * Rename some variables and remove some debugging clauses for clarity * Add tests for the sub-methods * Clarify one test slightly * Add stop_strings to GenerationConfig * generate() supports stop_string arg, asks for tokenizer if not provided * make fixup * Cleanup code and rename variables for clarity * Update tokenizer error * Update tokenizer passing, handle generation on GPU * Slightly more explanation cleanup * More comment cleanup * Factor out the token cleanup so it's more obvious what we're doing, and we can change it later * Careful with that cleanup! * Cleanup + optimizations to _get_matching_positions * More minor performance tweaks * Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms) * Remove the pin_memory call * Parallelize across all stop strings! * Quick fix for tensor devices * Update embeddings test for the new format * Fix test imports * Manual patching for BERT-like tokenizers * Return a bool vector instead of a single True/False * Better comment * Better comment * Add tests from @zucchini-nlp * Amy's list creation nit * tok_list -> token_list * Push a big expanded docstring (should we put it somewhere else?) * Expand docstrings * Docstring fixups * Rebase * make fixup * Make a properly general method for figuring out token strings * Fix naming throughout the functions * Move cache, refactor, fix tests * Add comment * Remove finished TODO * Remove finished TODO * make fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: amyeroberts <[email protected]> * Update and shorten docstring * Update tests to be shorter/clearer and test specific cases --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: amyeroberts <[email protected]>

…huggingface#30372) Update optimization.py

fix test

* Add class_embed to tied weights for DETA * Fix test_tied_weights_keys for DETA model * Replace error raise with assert statement

pass device correctly to peft

* add sdpa to wav2vec. Co-authored-by: kamilakesbi <[email protected]> Co-authored-by: jp1924 <[email protected]> * add fa2 to wav2vec2 * add tests * fix attention_mask compatibility with fa2 * minor dtype fix * replace fa2 slow test * fix fa2 slow test * apply code review + add fa2 batch test * add sdpa and fa2 to hubert * sdpa and fa2 to data2vec_audio * sdpa and fa2 to Sew * sdpa to unispeech + unispeech sat * small fix * attention mask in tests Co-authored-by: Sanchit Gandhi <[email protected]> * add_speedup_benchmark_to_doc --------- Co-authored-by: [email protected] <[email protected]> Co-authored-by: Sanchit Gandhi <[email protected]>

* [FEAT]: EETQ quantizer support * Update quantization.md * Update docs/source/en/main_classes/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * [FEAT]: EETQ quantizer support * [FEAT]: EETQ quantizer support * remove whitespaces * update quantization.md * style * Update docs/source/en/quantization.md Co-authored-by: Younes Belkada <[email protected]> * add copyright * Update quantization.md * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <[email protected]> * Address the comments by amyeroberts * style --------- Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: amyeroberts <[email protected]>

huggingface#30737) * Remove commit details * remove old workflow

* blip with interpolated pos encoding * feat: Add interpolate_pos_encoding option to other models from `BLIP` family. * include check for textual generated content in tests

* remove unrelated changes * remove unrelated changes on phi and stable LM * add: Test for Falcon 10B * fix: formatting * fix: loading the falcon 10B in 8 bit precision using bitsanbytes. * fix: device placement * fix: broken tests. * fix: backwards compatibility for falcon 1B architecture. * chore: updated test. * chore: test_modeling_falcon.py to use the 11B model. * chore: minor edit * chore: formating. --------- Co-authored-by: Pablo Montalvo <[email protected]> Co-authored-by: ArthurZucker <[email protected]>

* Adding ms_deform_attn kernels to GroundingDino * Pointing to deformable detr kernels

* 4d mask fixes * Update custom 4D mask logic * test moved to mixin * extra tests 4d mask * upd 4d mask and StaticCache handling * added Mask4DTestHard to mistral tests * post-rebase fixes * test fixes for StaticCache * make fix-copies * upd 1 after huggingface#30476 * fix common tests * rm elif attention_mask.dim() == 4: * tests combined, fixed, mixtral supported * bigbird style chg reverted * rm if attention_mask.dim() == 2 * modeling_llama formatting chg --------- Co-authored-by: Joao Gante <[email protected]>

* attempt to fix multi-device generation * fix * final fix * final fix * fix * fix * fix * fix * add joao suggestion * fix

qwen does not support the new cache classes

* check model.device * fix * style fix * move model device * remove print * add comment * fix * add unit test * optimize * change test names and add more cases * Update tests/pipelines/test_pipelines_common.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

* Lower threshold * Address comment

* Initial commit * Just a copy of modeling_idefics.py that will be ported to TF * - Prepend TF to the name of all classes - Convert pytorch ops to TF (not all operations are converted yet) * Add TF imports * Add autotranslated files * Add TF classes to model_tf_auto.py * Add the TF classes in model_doc * include auto-translated code * Adopted from auto-translated version * Add a forgotten super().build * Add test code for TF version. * Fix indentation and load pytorch weights for now * Some fixes. Many tests are still failing but some are passing now. - I have added TODO's for some of the hacks I made to unblock me and I will address them soon - I have the processing_idefics.py hacked in my view to support TF temporarily * Add ALL_LAYERNORM_LAYERS to match pytorch * Revert "Add ALL_LAYERNORM_LAYERS to match pytorch" This reverts commit 7e0a351 as it is not needed in the tf implementation. * Fix freeze_relevant_params() * Some more fixes * Fix test_attention_outputs * Add tf stuff to processing_idefics.py processing_idefics.py supports both pytorch and tf now. test_processor_idefics.py for pytorch is passing, so i didn't break anything but still some issues with tf. I also need to add tf tests in test_processor_idefics.py. * Pass return_tensors to image processing code and fix test * Pass return_tensors to the image processor __init__ * Fix several test cases - Make input to some of the forward pass of type `TFModelInputType` - Decorate main layer forward pass with `@unpack_inputs` - Decorate main layer with `@keras_serializable` - Pass `inputs` to TFIdeficsModel * Some more fixes forgotten in last commit * Fix processing code and vision_tf.py * Fix perceiver bug * Import from * Auto-add build() methods + style pass * Fix build() errors due to `None` being passed as shape to some layers * Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text * Fix pytorch weights load for tf2 There were a lot of `name=` missing in weight initialization code. * Attempt to fix CI * Add back accidently removed line * Remove torch-specific stuff from the TF test file * make fix-copies, make style, remove autotranslated files * Fixes to imports/docstrings * Let's try the from future import in desperation * Fix the core random_attention_mask fn to match the torch/flax behaviour * Clean random_attention_mask up correctly * Remove torch-only test * Fix loss shape, couple of nits * make style * Don't test for OOB embeddings because IDEFICS uses those deliberately * Fix loss computation to handle masking * Fix test failures when flattening * Fix some test failures - Add cross attention gate which was missing and wasn't being passed arround - Fix overwriting of image_attention_mask due to hack I had for dummy inputs * Add a proper stateless scaled_dot_product_attention * make style * Adding missing attribute from the PyTorch version * Small cleanups to decoupledlinearlayer in case that helps * Pass epsilon to LayerNormalization * Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding * Fix a bug in TFIdeficsGatedCrossAttentionLayer * Patching up build() methods * Constant self.inv_freq * Constant self.inv_freq * First working version The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear where the weights were mis-intialized (in_features,out_features) when it should be: (out_features, in_features) I have tested this so far with tiny-random and idefics-9b-instruct and gives correct output. I also dumped the final outputs for both pytorch and TF and they are identical. * Fix some test failures * remove print statement * Fix return_tensors * Fix CI test failure check_code_quality * Attempt to fix CI failures by running `make fixup` The hardcoded IDs in test_modeling_tf_idefics.py are for the integration test and makes that file unreadable and should probably be moved to a seperate file. * Attempt to fix tests_pr_documentation_tests * Fix a test failure in test_image_processing_idefics.py * Fix test test_pt_tf_model_equivalence * Fix a few failures * Tiny fix * Some minor fixes * Remove a duplicate test * Override a few test failures for IDEFICS - `test_keras_save_load` is passing now - `test_compile_tf_model` is still failing * Fix processing_idefics.py after rebase * Guard import keras with is_tf_available * fix check code quality * fix check code quality * Minor fixes * Skip test_save_load temporarily This test passed on my local box but fails on the CI, skipping for now to see if there are other remaining failures on the CI. * Run `ruff format tests src utils` * Fix last failing test, `test_compile_tf_model` * Add fixes for vision_tf.py I forgot to add this file in last commit. * Minor fixes * Replace "<<<" with "<<" for doc tests IDEFICS-9B is too big for doctest runner, so don't run it there * Make code more readable * Fix bug after code review I added a layer_norm_eps to IdeficsConfig but I don't even need it since the vision config has a layer_norm_eps. * Fix after code review Use original code tokenizer.convert_tokens_to_ids * Keep PyTorch as the default return_tensors * Fixes to modeling_tf after code review * Fixes from code review - Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST` - Pass 1e-5 to LayerNormalization in perceiver * Run ruff * Undo a change * Refactor processing code after Matt's suggestion * Remove TODO's that aren't needed anymore * For pytorch, Use original pytorch processing code from main Since this PR is a TF port it shouldn't make any modifications to pytorch IDEFICS code. This changes undo's the pytorch processing modifications I made and uses original code from main. * Update tests/models/idefics/test_modeling_idefics.py * Update tests/models/idefics/test_modeling_tf_idefics.py * Add missing imports for is_pt_tf_cross_test * [DO NOT MERGE]: This is a commit for debugging and will be reverted The cross test `test_pt_tf_model_equivalence` passes locally but fails when running on the CI. This commit is to help debug that and will be reverted. * Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted" This reverts commit 8f0d709. * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 998cc38. * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 1c695ac. * Don't skip test_save_load IIRC test_save_load was also failing on the CI but not on my local box, it might be easier to debug that on the CI first than the cross tests * Debugging commit, will be reverted * Revert "Debugging commit, will be reverted" This reverts commit 8eafc8e. * Override `test_save_load` and push model to save Maybe this will help me repro this weird bug * pass my repo_id * add endpoint * Pass a temp (write) token just for this CI * Undo last few commits, still pushing to hub for model debugging The issue seems to be with save_pretrained(), when I looked at the model saved from the CI test failure it is basically empty and has no weights. `self.save_weights(..)` seems to be failing in save_pretrained but needs more debugging * Add logging to modeling tf utils, will be reverted just for debugging * Debugging, will revert * Revert "Debugging, will revert" This reverts commit 9d0d307. * Revert "Add logging to modeling tf utils, will be reverted just for debugging" This reverts commit 774b6b7. * Remove `test_save_load` The CI failures are gone after my latest rebase, no idea why but I was still saving the model to my hub on HF and the tf_model.h5 file now has everything. * Run make fix-copies * Run ruff format tests src utils * Debugging commit, will be reverted * Run ruff, also trigger CI run * Run ruff again * Undo debugging commit --------- Co-authored-by: Matt <[email protected]> Co-authored-by: Matt <[email protected]>

…e#30778) * assistant should be greedy * better comment * Update src/transformers/generation/candidate_generator.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

huggingface#30699) * update * update * update * update * update * update * update * update * Update utils/notification_service.py Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: ydshieh <[email protected]> Co-authored-by: amyeroberts <[email protected]>

* Add utility for finding candidate models for deprecation * Update model init * Make into configurable script * Fix path * Add sorting of base object alphabetically * Tidy * Refactor __init__ alpha ordering * Update script with logging * fix import * Fix logger * Fix logger * Get config file before moving files * Take models from CLI * Split models into lines to make easier to feed to deprecate_models script * Update * Use posix path * Print instead * Add example in module docstring * Fix up * Add clarifying comments; add models to DEPRECATE_MODELS * Address PR comments * Don't update relative paths on the same level

* update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]>

fix: owlv2 doc

standardize cache in idefics2

) Access active_adapters as a property

…9676) * add watermarking processor * remove the other hashing (context width=1 always) * make style * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <[email protected]> * update watermarking process * add detector * update tests to use detector * fix failing tests * rename `input_seq` * make style * doc for processor * minor fixes * docs * make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <[email protected]> * add PR suggestions * let's use lru_cache's default max size (128) * import processor if torch available * maybe like this * lets move the config to torch independet file * add docs * tiny docs fix to make the test happy * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <[email protected]> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <[email protected]> * PR suggestions * add docs * fix test * fix docs * address pr comments * style * Revert "style" This reverts commit 7f33cc3. * correct style * make doctest green --------- Co-authored-by: Joao Gante <[email protected]>

Co-authored-by: amyeroberts <[email protected]>

LysandreJik commented Apr 19, 2024

View reviewed changes

LysandreJik force-pushed the gguf-support branch from b9e2f5e to fb00288 Compare April 19, 2024 15:39

LysandreJik changed the base branch from main to rename_ex_file April 19, 2024 15:41

LysandreJik changed the base branch from rename_ex_file to main April 19, 2024 15:41

hiyouga and others added 3 commits April 19, 2024 17:45

Fix config + attn_implementation in AutoModelForCausalLM.from_pretrai…

21c912e

…ned (huggingface#30299) * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py

[Grounding DINO] Add resources (huggingface#30232)

8c12690

* Add resources * Address comments * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> --------- Co-authored-by: amyeroberts <[email protected]>

younesbelkada and others added 22 commits April 22, 2024 11:32

add q2_k q3_k q5_k support from @99991

81e4324

fix tests

8a0d5b8

Nits for model docs (huggingface#29795)

b20b017

* Update llava_next.md * Update seggpt.md

Enable multi-device for more models (huggingface#30379)

8b02bb6

* feat: support for vitmatte * feat: support for vivit * feat: support for beit * feat: support for blip :D * feat: support for data2vec

Update doc

08534f3

Style

ebd9944

Llama family, fix use_cache=False generation (huggingface#30380)

2d92db8

* nit to make sure cache positions are not sliced * fix other models * nit * style

Update docstrings for text generation pipeline (huggingface#30343)

0e9d44d

* Update docstrings for text generation pipeline * Fix docstring arg * Update docstring to explain chat mode * Fix doctests * Fix doctests

Docs

5c913ec

Fix layerwise GaLore optimizer hard to converge with warmup scheduler (…

f3b3533

…huggingface#30372) Update optimization.py

Jamba: fix left-padding test (huggingface#30389)

6c7335e

fix test

Fix DETA save_pretrained (huggingface#30326)

13b3b90

* Add class_embed to tied weights for DETA * Fix test_tied_weights_keys for DETA model * Replace error raise with assert statement

FIX / PEFT: Pass device correctly to peft (huggingface#30397)

367a0db

pass device correctly to peft

Merge remote-tracking branch 'upstream/main' into HEAD

8b81bfb

fix CI

c49f1a8

Update docs/source/en/gguf.md

7fa538b

Update docs/source/en/gguf.md

5485327

younesbelkada and others added 30 commits May 13, 2024 12:08

Workflow: Replace actions/post-slack with centrally defined workflow (

a4e530e

huggingface#30737) * Remove commit details * remove old workflow

Blip dynamic input resolution (huggingface#30722)

f63d822

* blip with interpolated pos encoding * feat: Add interpolate_pos_encoding option to other models from `BLIP` family. * include check for textual generated content in tests

[GroundingDino] Adding ms_deform_attn kernels (huggingface#30768)

453893e

* Adding ms_deform_attn kernels to GroundingDino * Pointing to deformable detr kernels

Generation / FIX: Fix multi-device generation (huggingface#30746)

f823fec

* attempt to fix multi-device generation * fix * final fix * final fix * fix * fix * fix * fix * add joao suggestion * fix

Qwen: incorrect setup flag (huggingface#30776)

f4dc26d

qwen does not support the new cache classes

[Object detection pipeline] Lower threshold (huggingface#30710)

ce87dca

* Lower threshold * Address comment

Generate: remove near-duplicate sample/greedy copy (huggingface#30773)

de2f722

skip low_cpu_mem_usage tests (huggingface#30782)

539ed75

Fix OWLv2 Doc (huggingface#30794)

449894d

fix: owlv2 doc

Fix cache type in Idefics2 (huggingface#30729)

c02d302

standardize cache in idefics2

PEFT: Access active_adapters as a property in Trainer (huggingface#30790

65ea190

) Access active_adapters as a property

fix merge

3ed384f

fix unconsistent type

55eb860

more

f754335

CI: more models wo cache support (huggingface#30780)

d8f8a9c

Merge remote-tracking branch 'origin/main' into HEAD

a449078

fix tokenizer

3bdbb2e

Update src/transformers/modeling_utils.py

0ab79f6

Co-authored-by: amyeroberts <[email protected]>

address comments about tests and tokenizer + add added_tokens

65433c4

from_gguf -> gguf_file

1b5ae54

replace on docs too

d6b67c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading GGUF support #2

Loading GGUF support #2

LysandreJik commented Apr 19, 2024

LysandreJik left a comment

LysandreJik Apr 19, 2024

LysandreJik Apr 19, 2024

LysandreJik Apr 19, 2024

LysandreJik Apr 19, 2024

LysandreJik Apr 19, 2024

younesbelkada Apr 19, 2024

LysandreJik Apr 19, 2024

LysandreJik Apr 19, 2024

younesbelkada Apr 19, 2024

LysandreJik Apr 19, 2024

99991 commented Apr 22, 2024

Loading GGUF support #2

Are you sure you want to change the base?

Loading GGUF support #2

Conversation

LysandreJik commented Apr 19, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

99991 commented Apr 22, 2024