gguf : add special tokens metadata for FIM/Infill #6689

danbev · 2024-04-15T14:52:45Z

This commit adds special token metadata for Fill-In-the-Middle (FIM)/Infill to the GGUF model.

The motivation for this is that currently there is support for CodeLlama but other models exist now like CodeGemma, but the different models use different token ids for the special tokens and this commit allows for supporting multiple models.

This commit adds special token metadata for Fill-In-the-Middle (FIM)/Infill to the GGUF model. The motivation for this is that currently there is support for CodeLlama but other models exist now like CodeGemma, but the different models use different token ids for the special tokens and this commit allows for supporting multiple models. Signed-off-by: Daniel Bevenius <[email protected]>

github-actions · 2024-04-15T15:06:43Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 459 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=10270.35ms p(95)=25858.35ms fails=, finish reason: stop=407 truncated=52
Prompt processing (pp): avg=114.97tk/s p(95)=510.44tk/s
Token generation (tg): avg=26.17tk/s p(95)=36.67tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=infill-metadata commit=021baca34a9c6b3683b7f3ffaf6de0de0d09198d

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 459 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713192967 --> 1713193597
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 485.7, 485.7, 485.7, 485.7, 485.7, 435.41, 435.41, 435.41, 435.41, 435.41, 451.34, 451.34, 451.34, 451.34, 451.34, 461.42, 461.42, 461.42, 461.42, 461.42, 484.52, 484.52, 484.52, 484.52, 484.52, 517.23, 517.23, 517.23, 517.23, 517.23, 535.26, 535.26, 535.26, 535.26, 535.26, 535.83, 535.83, 535.83, 535.83, 535.83, 539.87, 539.87, 539.87, 539.87, 539.87, 564.15, 564.15, 564.15, 564.15, 564.15, 565.09, 565.09, 565.09, 565.09, 565.09, 578.57, 578.57, 578.57, 578.57, 578.57, 581.43, 581.43, 581.43, 581.43, 581.43, 585.78, 585.78, 585.78, 585.78, 585.78, 602.85, 602.85, 602.85, 602.85, 602.85, 626.68, 626.68, 626.68, 626.68, 626.68, 640.42, 640.42, 640.42, 640.42, 640.42, 651.24, 651.24, 651.24, 651.24, 651.24, 652.55, 652.55, 652.55, 652.55, 652.55, 659.01, 659.01, 659.01, 659.01, 659.01, 658.92, 658.92, 658.92, 658.92, 658.92, 673.19, 673.19, 673.19, 673.19, 673.19, 671.5, 671.5, 671.5, 671.5, 671.5, 671.1, 671.1, 671.1, 671.1, 671.1, 670.57, 670.57, 670.57, 670.57, 670.57, 675.15, 675.15, 675.15, 675.15, 675.15, 675.74, 675.74, 675.74, 675.74, 675.74, 679.99, 679.99, 679.99, 679.99, 679.99, 689.2, 689.2, 689.2, 689.2, 689.2, 655.93, 655.93, 655.93, 655.93, 655.93, 658.99, 658.99, 658.99, 658.99, 658.99, 666.92, 666.92, 666.92, 666.92, 666.92, 670.41, 670.41, 670.41, 670.41, 670.41, 670.3, 670.3, 670.3, 670.3, 670.3, 669.97, 669.97, 669.97, 669.97, 669.97, 673.32, 673.32, 673.32, 673.32, 673.32, 677.24, 677.24, 677.24, 677.24, 677.24, 676.89, 676.89, 676.89, 676.89, 676.89, 677.68, 677.68, 677.68, 677.68, 677.68, 683.88, 683.88, 683.88, 683.88, 683.88, 692.34, 692.34, 692.34, 692.34, 692.34, 691.83, 691.83, 691.83, 691.83, 691.83, 696.74, 696.74, 696.74, 696.74, 696.74, 696.23, 696.23, 696.23, 696.23, 696.23, 694.9, 694.9, 694.9, 694.9, 694.9, 694.62, 694.62, 694.62, 694.62, 694.62, 697.27, 697.27, 697.27, 697.27, 697.27, 699.78, 699.78, 699.78, 699.78, 699.78, 704.1, 704.1, 704.1, 704.1, 704.1, 697.6, 697.6, 697.6, 697.6, 697.6, 694.16, 694.16, 694.16, 694.16, 694.16, 690.29, 690.29, 690.29, 690.29, 690.29, 683.93, 683.93, 683.93, 683.93, 683.93, 682.45, 682.45, 682.45, 682.45, 682.45, 682.11, 682.11, 682.11, 682.11, 682.11, 675.57, 675.57, 675.57, 675.57, 675.57, 675.99, 675.99, 675.99, 675.99, 675.99, 675.37, 675.37, 675.37, 675.37, 675.37, 680.07, 680.07, 680.07, 680.07, 680.07, 675.76, 675.76, 675.76, 675.76, 675.76, 678.98, 678.98, 678.98, 678.98, 678.98, 678.98]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 459 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713192967 --> 1713193597
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.61, 33.61, 33.61, 33.61, 33.61, 32.69, 32.69, 32.69, 32.69, 32.69, 28.52, 28.52, 28.52, 28.52, 28.52, 25.65, 25.65, 25.65, 25.65, 25.65, 25.02, 25.02, 25.02, 25.02, 25.02, 24.11, 24.11, 24.11, 24.11, 24.11, 23.84, 23.84, 23.84, 23.84, 23.84, 23.62, 23.62, 23.62, 23.62, 23.62, 23.81, 23.81, 23.81, 23.81, 23.81, 24.53, 24.53, 24.53, 24.53, 24.53, 24.68, 24.68, 24.68, 24.68, 24.68, 24.79, 24.79, 24.79, 24.79, 24.79, 24.66, 24.66, 24.66, 24.66, 24.66, 24.45, 24.45, 24.45, 24.45, 24.45, 24.35, 24.35, 24.35, 24.35, 24.35, 24.16, 24.16, 24.16, 24.16, 24.16, 23.72, 23.72, 23.72, 23.72, 23.72, 23.29, 23.29, 23.29, 23.29, 23.29, 22.85, 22.85, 22.85, 22.85, 22.85, 22.94, 22.94, 22.94, 22.94, 22.94, 23.01, 23.01, 23.01, 23.01, 23.01, 23.13, 23.13, 23.13, 23.13, 23.13, 22.85, 22.85, 22.85, 22.85, 22.85, 22.59, 22.59, 22.59, 22.59, 22.59, 22.31, 22.31, 22.31, 22.31, 22.31, 22.11, 22.11, 22.11, 22.11, 22.11, 22.04, 22.04, 22.04, 22.04, 22.04, 22.16, 22.16, 22.16, 22.16, 22.16, 22.21, 22.21, 22.21, 22.21, 22.21, 22.29, 22.29, 22.29, 22.29, 22.29, 22.38, 22.38, 22.38, 22.38, 22.38, 22.51, 22.51, 22.51, 22.51, 22.51, 22.4, 22.4, 22.4, 22.4, 22.4, 22.31, 22.31, 22.31, 22.31, 22.31, 22.34, 22.34, 22.34, 22.34, 22.34, 22.68, 22.68, 22.68, 22.68, 22.68, 22.77, 22.77, 22.77, 22.77, 22.77, 22.79, 22.79, 22.79, 22.79, 22.79, 22.9, 22.9, 22.9, 22.9, 22.9, 23.04, 23.04, 23.04, 23.04, 23.04, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 22.93, 22.93, 22.93, 22.93, 22.93, 22.67, 22.67, 22.67, 22.67, 22.67, 22.62, 22.62, 22.62, 22.62, 22.62, 22.67, 22.67, 22.67, 22.67, 22.67, 22.73, 22.73, 22.73, 22.73, 22.73, 22.89, 22.89, 22.89, 22.89, 22.89, 22.93, 22.93, 22.93, 22.93, 22.93, 22.91, 22.91, 22.91, 22.91, 22.91, 22.77, 22.77, 22.77, 22.77, 22.77, 22.65, 22.65, 22.65, 22.65, 22.65, 22.08, 22.08, 22.08, 22.08, 22.08, 22.02, 22.02, 22.02, 22.02, 22.02, 21.78, 21.78, 21.78, 21.78, 21.78, 21.44, 21.44, 21.44, 21.44, 21.44, 21.38, 21.38, 21.38, 21.38, 21.38, 21.44, 21.44, 21.44, 21.44, 21.44, 21.47, 21.47, 21.47, 21.47, 21.47, 21.6, 21.6, 21.6, 21.6, 21.6, 21.63, 21.63, 21.63, 21.63, 21.63, 21.74]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 459 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713192967 --> 1713193597
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.12, 0.12, 0.12, 0.12, 0.12, 0.26, 0.26, 0.26, 0.26, 0.26, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.28, 0.28, 0.28, 0.28, 0.28, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.21, 0.21, 0.21, 0.21, 0.21, 0.3, 0.3, 0.3, 0.3, 0.3, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.07, 0.07, 0.07, 0.07, 0.07, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.35, 0.35, 0.35, 0.35, 0.35, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.22, 0.22, 0.22, 0.22, 0.22, 0.41, 0.41, 0.41, 0.41, 0.41, 0.52, 0.52, 0.52, 0.52, 0.52, 0.43, 0.43, 0.43, 0.43, 0.43, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.1, 0.1, 0.1, 0.1, 0.1, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 459 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713192967 --> 1713193597
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0]

teleprint-me · 2024-04-16T17:54:29Z

This commit breaks model compatibility. I've been experimenting with train-text-from-scratch and the last commit that operates as expected is commit 7593639c.

git log --pretty --oneline 132f5579..HEAD  
dbceec87 (HEAD -> master, origin/master, origin/HEAD) llama : add StableLM2 12B (#6635)
f4dea7da llama : add qwen2moe (#6074)
8a56075b gritlm : add --outdir option to hf.sh script (#6699)
58227ffd perplexity : require positive --ctx-size arg (#6695)
4fbd8098 (infill-metadata) gguf : add special tokens metadata for FIM/Infill (#6689)
7593639c (stable) `main`: add --json-schema / -j flag (#6659)

main ends up looking for general.name which isn't available.

./main -m models/shakespeare/ggml-shakespeare-256x16-f32-LATEST.gguf --color -e -s 1337 -c 4096 -n 256 --n-gpu-layers 16 -p "When forty winters shall besiege thy brow,"
Log start
main: build = 2680 (4fbd8098)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1337
llama_model_loader: loaded meta data with 20 key-value pairs and 147 tensors from models/shakespeare/ggml-shakespeare-256x16-f32-LATEST.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                          general.file_type u32              = 0
llama_model_loader: - kv   2:                       llama.context_length u32              = 64
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 256
llama_model_loader: - kv   4:                  llama.feed_forward_length u32              = 768
llama_model_loader: - kv   5:                 llama.attention.head_count u32              = 8
llama_model_loader: - kv   6:                          llama.block_count u32              = 16
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 32
llama_model_loader: - kv   8:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  10:                    llama.rope.scale_linear f32              = 1.000000
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  13:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:          tokenizer.ggml.seperator_token_id u32              = 4294967295
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 4294967295
llama_model_loader: - type  f32:  147 tensors
llama_model_load: error loading model: error loading model vocabulary: key not found in model: general.name
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/shakespeare/ggml-shakespeare-256x16-f32-LATEST.gguf'
main: error: unable to load model

I think this is due to the way the vocabulary was modified which has always supported the llama architecture.

I'm using the mistralai vocab I generated with convert.py.

python gguf-py/scripts/gguf-dump.py models/ggml-vocab-mistral.gguf
* Loading: models/ggml-vocab-mistral.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 25 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 0
      3: UINT64     |        1 | GGUF.kv_count = 22
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'mistralai'
      6: UINT32     |        1 | llama.vocab_size = 32000
      7: UINT32     |        1 | llama.context_length = 32768
      8: UINT32     |        1 | llama.embedding_length = 4096
      9: UINT32     |        1 | llama.block_count = 32
     10: UINT32     |        1 | llama.feed_forward_length = 14336
     11: UINT32     |        1 | llama.rope.dimension_count = 128
     12: UINT32     |        1 | llama.attention.head_count = 32
     13: UINT32     |        1 | llama.attention.head_count_kv = 8
     14: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     15: FLOAT32    |        1 | llama.rope.freq_base = 1000000.0
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    32000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    32000 | tokenizer.ggml.scores
     19: [INT32]    |    32000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.unknown_token_id = 0
     23: BOOL       |        1 | tokenizer.ggml.add_bos_token = True
     24: BOOL       |        1 | tokenizer.ggml.add_eos_token = False
     25: STRING     |        1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"

* Dumping 0 tensor(s)

This commit changed the special vocabulary ids. Haven't dug into too deep. Still looking into it.

teleprint-me · 2024-04-16T18:53:02Z

llama.cpp

+            // CodeGemma (LLM_ARCH_GEMMA). This can potentially be removed once
+            // new versions of these models have been published.
+            std::string gen_name;
+            ml.get_key(LLM_KV_GENERAL_NAME, gen_name);


Yeah, it's lines 4083 - 4106 that are causing the issue.

examples/train-text-from-scratch/train-text-from-scratch.cpp doesn't rely on or use LLM_KV_GENERAL_NAME, so that's why I'm able to train, but not inference. This most likely has other unintended side-effects due to the implementation.

Does #6709 fix the issue?

Yeah, I think so.

ml.get_key(LLM_KV_GENERAL_NAME, gen_name, false);

It seems like setting the required parameter to false did the trick.

@teleprint-me Sorry about causing this and wasting your time. And thanks @ggerganov for fixing my mistake!

teleprint-me · 2024-04-16T19:54:24Z

PR #6709 fixed it. I'm able to run the latest code with this change. I tested another custom model I've been tinkering with and its working again. Might be a good idea to add "general.name" to train-text-from-scratch.cpp. I'll see if I can add it in another PR if that's alright.

This commit adds special token metadata for Fill-In-the-Middle (FIM)/Infill to the GGUF model. The motivation for this is that currently there is support for CodeLlama but other models exist now like CodeGemma, but the different models use different token ids for the special tokens and this commit allows for supporting multiple models. Signed-off-by: Daniel Bevenius <[email protected]>

NightMachinery · 2024-04-26T15:55:15Z

How does llama.cpp know the FIM prompt template for each model? Does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

JohnSmithToYou · 2024-06-27T17:26:35Z

How does llama.cpp know the FIM prompt template for each model? Does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

It does seem like a hack not to define a prompt template for FIM so it can be defined in a modelfile. There is a PR that does does this: #5207.
How prompt templates are handled between ollama, llama.cpp, the code completion services feels like one huge cluster!

danbev mentioned this pull request Apr 15, 2024

infill : add download instructions for model #6626

Merged

ggerganov approved these changes Apr 16, 2024

View reviewed changes

ggerganov merged commit 4fbd809 into ggerganov:master Apr 16, 2024
63 checks passed

teleprint-me reviewed Apr 16, 2024

View reviewed changes

ggerganov mentioned this pull request Apr 16, 2024

llama : make general.name optional #6709

Merged

PhilKes mentioned this pull request Apr 26, 2024

Add /api/infill for fill-in-the-middle ollama/ollama#3907

Closed

This was referenced May 6, 2024

/infill for CodeQwen #7102

Closed

Implement Ollama as a high-level service carlrobertoh/CodeGPT#510

Merged

bartowski1182 mentioned this pull request Jun 13, 2024

Bug: convert-hf-to-gguf.py on Gemma model ValueError: Duplicated key name 'tokenizer.chat_template' #7923

Closed

zhengxs2018 mentioned this pull request Jun 26, 2024

Use “/api/infill” instead of “/api/generate” unit-mesh/auto-dev-vscode#61

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf : add special tokens metadata for FIM/Infill #6689

gguf : add special tokens metadata for FIM/Infill #6689

danbev commented Apr 15, 2024

github-actions bot commented Apr 15, 2024

teleprint-me commented Apr 16, 2024 •

edited

Loading

teleprint-me Apr 16, 2024

teleprint-me Apr 16, 2024

ggerganov Apr 16, 2024

teleprint-me Apr 16, 2024 •

edited

Loading

danbev Apr 17, 2024

teleprint-me commented Apr 16, 2024 •

edited

Loading

NightMachinery commented Apr 26, 2024

JohnSmithToYou commented Jun 27, 2024

gguf : add special tokens metadata for FIM/Infill #6689

gguf : add special tokens metadata for FIM/Infill #6689

Conversation

danbev commented Apr 15, 2024

github-actions bot commented Apr 15, 2024

teleprint-me commented Apr 16, 2024 • edited Loading

teleprint-me Apr 16, 2024

Choose a reason for hiding this comment

teleprint-me Apr 16, 2024

Choose a reason for hiding this comment

ggerganov Apr 16, 2024

Choose a reason for hiding this comment

teleprint-me Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

danbev Apr 17, 2024

Choose a reason for hiding this comment

teleprint-me commented Apr 16, 2024 • edited Loading

NightMachinery commented Apr 26, 2024

JohnSmithToYou commented Jun 27, 2024

teleprint-me commented Apr 16, 2024 •

edited

Loading

teleprint-me Apr 16, 2024 •

edited

Loading

teleprint-me commented Apr 16, 2024 •

edited

Loading