GitHub

language

About

This repository is a boilerplate to push a mask-filling model to the HuggingFace Model Hub.

Checklist

git-lfs is installed
tokenizer contains all the files needed: added_tokens.json, special_tokens_map.json, tokenizer_config.json, vocab.txt and tokenizer.json
no tokenizer_file field in tokenizer_config.json (sometimes it is located locally at ~/.cache)

Upload

Put the model checkpoints and optionally log files (*.bin and log files events.out.*) to the ./ckpt directory.
Add a branch hgf to point to your huggingface repo. For example git remote add hgf [email protected]:approach0/mathy-vicuna-13B-FFT
Run the upload2hgf.sh script.

Test the MLM task (an example)

pip install pya0 # for math token preprocessing
# testing local checkpoints:
python test.py ./ckpt/math-tokenizer ./ckpt/2-2-0/encoder.ckpt
# testing Model Hub checkpoints:
python test.py approach0/coco-mae-220 approach0/coco-mae-220

Note
Modify the test examples in test.txt to play with it. The test file is tab-separated, the first column is additional positions you want to mask for the right-side sentence (useful for masking tokens in math markups). A zero means no additional mask positions.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
ckpt		ckpt
.gitignore		.gitignore
README.md		README.md
test.py		test.py
test.txt		test.txt
upload2hgf.sh		upload2hgf.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Checklist

Upload

Test the MLM task (an example)

About

Releases

Packages

Languages

approach0/azbert

Folders and files

Latest commit

History

Repository files navigation

About

Checklist

Upload

Test the MLM task (an example)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages