Save the 4-bit quantized version of the Llama2 7b model in blob storage #4

carlosgjs · 2023-12-04T21:19:53Z

We currently load the model from the blob storage in the ML workspace: https://autoraml3241530052.blob.core.windows.net/azureml-blobstore-b7ef477b-ca4a-44e3-a029-0e0542bdcd47/base_models/llama-2-7b-chat-hf/

Using this code, which loads the full model and quantize it during load time. This takes about 6 mins.

If we can save a copy of the model with a new name, e.g. llama-2-7b-chat-hf-4bit, the loading time should significantly decrease.

The easiest way to do this is probably in a notebook that runs in the workspace and then something along these lines:

from azureml.core import Workspace, Datastore

new_name = "llama-2-7b-chat-hf-4bit"
model = ... # load and quantize model like in the code predict_hf.py code above

# save it locally
model.save_pretrained(f"./models/{new_name}")

# double check that the quantized model can be loaded too ...

# If all goes well, upload to blob storage:
workspace = Workspace.from_config()
ds = workspace.get_default_datastore()
ds.upload(f"./models/{new_name}", f"./base_models/{new_name}", show_progress=True, overwrite=True)


# verify the model can be loaded from blob storage by submitting a new prediction job with the new model. See README.md

The text was updated successfully, but these errors were encountered:

carlosgjs · 2024-01-29T22:56:09Z

4 bit serialization needs transformers 4.37 according to bitsandbytes-foundation/bitsandbytes#753

carlosgjs assigned RashmikaReddy Dec 4, 2023

carlosgjs assigned carlosgjs and unassigned RashmikaReddy Jan 29, 2024

carlosgjs added this to the pre-data milestone Jan 31, 2024

carlosgjs mentioned this issue Feb 1, 2024

feat: Publish/load pre-quantized models #34

Merged

carlosgjs closed this as completed in #34 Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save the 4-bit quantized version of the Llama2 7b model in blob storage #4

Save the 4-bit quantized version of the Llama2 7b model in blob storage #4

carlosgjs commented Dec 4, 2023

carlosgjs commented Jan 29, 2024

Save the 4-bit quantized version of the Llama2 7b model in blob storage #4

Save the 4-bit quantized version of the Llama2 7b model in blob storage #4

Comments

carlosgjs commented Dec 4, 2023

carlosgjs commented Jan 29, 2024