Can't create a session (local model) #979

djannot · 2024-10-16T20:50:55Z

System Info

transformers.js 2.17.2

Environment/Platform

Description

I have my model available under the /models/tokenizer and /models/onnx/onnx paths.

I'm loading the tokenizer with:

        const tokenizerPath = '/tokenizer/';
        tokenizer = await AutoTokenizer.from_pretrained(tokenizerPath);

And the model with:

        const modelPath = '/onnx/';
        model = await AutoModel.from_pretrained(modelPath, {
            model_file_name: "model",
            quantized: false
        });

I have my model under the 2 paths because it always adds /onnx a second time. I also had to set quantized to false because otherwise it was adding a suffix to the file.

I found it very difficult to find a way to get it loading the model on the right path.

Anyway, it's now getting the tokenizer and model files correctly, but then I get this error in the browser console (same with chrome and firefox).

Error: Can't create a session

Reproduction

I can't provide full reproduction steps because I'm using a local model.

But the code is:

import { AutoModel, AutoTokenizer, env } from '@xenova/transformers';

let model;
let tokenizer;

// Function to load the tokenizer
async function loadTokenizer() {
    try {
        // Path to your tokenizer files
        const tokenizerPath = '/tokenizer/';

        // Initialize tokenizer
        tokenizer = await AutoTokenizer.from_pretrained(tokenizerPath);

        console.log('Tokenizer loaded successfully.');
    } catch (error) {
        console.error('Failed to load the tokenizer:', error);
    }
}

// Function to load the ONNX model
async function loadModel() {
    try {
        // Path to your ONNX model
        const modelPath = '/onnx/';

        // Initialize model
        //env.remoteHost = 'https://hf-mirror.com';
        model = await AutoModel.from_pretrained(modelPath, {
            model_file_name: "model",
            quantized: false
        });

        console.log('ONNX Model loaded successfully.');
    } catch (error) {
        console.error('Failed to load the model:', error);
    }
}

The text was updated successfully, but these errors were encountered:

djannot · 2024-10-17T16:36:16Z

I've created a small snippet to check my onnx model is fine:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("./public/models/onnx/onnx")
model = ORTModelForCausalLM.from_pretrained("./public/models/onnx/onnx", use_cache=False, use_io_binding=False)

inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt")

gen_tokens = model.generate(**inputs,do_sample=True,temperature=0.9, min_length=20,max_length=20)
response = tokenizer.batch_decode(gen_tokens)
print("Generated text:", response)

So it works with optimum.onnxruntime and I'd like to understand how to make it work with transformers.js

BritishWerewolf · 2024-10-18T09:17:19Z

Hey, I've done some work with custom models.
My workflow looks something like this.

The most important thing to note when using pretrained models is to ensure the structure is correct.
This is just public_root/models/[name_of_model]/onnx/model.onnx.
My folder structure for the model (which is U2Net) looks like this.

public/models/u2netp/onnx/model.onnx
public/models/u2netp/config.json
public/models/u2netp/preprocessor_config.json

It's also possible to have a model_quantized.onnx that will be the default to AutoModel.

Then to reference that model I have the following.

import { env, AutoModel, AutoProcessor, RawImage } from '@xenova/transformers';

// Force transformers to only look locally and not make any fetch requests.
env.allowLocalModels = true;
env.allowRemoteModels = false;

async function main() {
    // Create the processor.
    // The name here, should match the name of the folder in `models`
    const processor = await AutoProcessor.from_pretrained('u2netp')
    .catch(error => new Error(error));
    if (processor instanceof Error) {
        console.log(processor.message);
        return;
    }
    
    // U2Net is an image based model, so you might skip this step.
    const url = 'https://example.com/test.png';
    const image = await RawImage.fromURL(url)
    .catch(error => new Error(error));
    if (image instanceof Error) {
        console.error(image.message);
        return;
    }

    // Preprocess the image.
    const processed = await processor(image);

    // Create the model, again the name should match the name of the folder.
    // I am passing quantized: false because I do not have a
    // `model_quantized.onnx` within the folder.
    const model = await AutoModel.from_pretrained('u2netp', {
        quantized: false,
    });

    // Get the outputs of the model.
    const outputs = await model({ 'input': processed });
}
main();

djannot · 2024-10-18T09:25:09Z

Thanks @BritishWerewolf

I have the right structure now, but I still get this error. I think it's caused by the way the model is converted.

I also get this warning before the error:

onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::string, int> &, const logging::Logger &, bool, const std::string &, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 3.

I've tried changing the opset value as well when exporting my model to onnx (from gguf) with the optimum-cli but could find a way to make it working.

So maybe the problem is coming from the way I generate my original gguf mode. I'm using unsloth to train the model and export it to gguf.

BritishWerewolf · 2024-10-18T15:03:30Z

It looks like you have everything nearly there, but your ONNX model is using a under development opset.

To be clear, are you saying you did something like this:

optimum-cli export onnx --model path_to_gguf_model --output path_to_output_directory --opset 3

I think setting to opset 3 is important.

Other than that, I have not worked with GGUF so not entirely sure how to help.
It seems that if you can export or convert to ONNX then the rest of the code will work.

djannot · 2024-10-18T15:12:36Z

I used --opset 18.

I came to the conclusing I should use this value when reading https://onnxruntime.ai/docs/reference/compatibility.html#onnx-opset-support

I've also tried with 3 because of the warning, but in that case I got:

Opset 3 is lower than the recommended minmum opset (14) to export llama. The ONNX export may fail or the exported model may be suboptimal.
...
ValueError: Unsupported ONNX opset version: 3

djannot · 2024-10-18T15:43:47Z

I switched to @huggingface/transformers and now I get:

app.js:46 Failed to load the model: 20419424

I'm also getting the same error if I try to use the onnxruntime-web package directly.

BritishWerewolf · 2024-10-18T20:44:39Z

What are the specifics of the model?

Looking at the onnx-runtime the following issue was created:
microsoft/onnxruntime-genai#761 (comment)

Currently, ONNX Runtime GenAI's model builder only supports converting float16/float32 GGUF models and not already-quantized GGUF models. If you have the original float16/float32 weights in a GGUF file, you can try using that to get the ONNX model.

Does any of that help?
Are you using float16?

djannot · 2024-10-18T20:55:54Z

Thanks @BritishWerewolf

I'm going to try to disable quantization.

djannot · 2024-10-18T21:21:46Z

Same without quantization in the original gguf model

BritishWerewolf · 2024-10-18T21:38:20Z

Can you help me to understand GGUF?
How do you create models? I am wondering if I can replicate something on my machine.

djannot · 2024-10-19T11:22:30Z

I'm using unsloth to train my model and export it to gguf.

Here is the code I use:

from unsloth import FastLanguageModel
import torch
import json
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

from datasets import Dataset, DatasetDict

# Load the JSON file manually
with open("prompts.json", "r") as file:
    data = json.load(file)

# Function to flatten and format the conversations
def format_conversations(data):
    formatted_convos = []
    for convo in data:
        formatted_convo = [{"from": message["from"], "value": message["value"]} for message in convo]
        formatted_convos.append({"conversations": formatted_convo})
    return formatted_convos

# Apply formatting
formatted_data = format_conversations(data)

# Convert to Dataset
dataset = Dataset.from_list(formatted_data)

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3",
    mapping = {"role" : "from", "content" : "value", "user" : "user", "assistant" : "assistant"},
    map_eos_token = True,
)

def formatting_prompts_func(convos):
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos["conversations"]]
    return { "text" : texts, }
pass

dataset = dataset.map(formatting_prompts_func, batched = True,)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 1000,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()
model.save_pretrained("model")
tokenizer.save_pretrained("model")
model.save_pretrained_gguf("model_gguf", tokenizer, quantization_method="not_quantized")

djannot added the bug Something isn't working label Oct 16, 2024

djannot mentioned this issue Oct 22, 2024

[Web] Can't create a session microsoft/onnxruntime#22484

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't create a session (local model) #979

Can't create a session (local model) #979

djannot commented Oct 16, 2024

djannot commented Oct 17, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 19, 2024

Can't create a session (local model) #979

Can't create a session (local model) #979

Comments

djannot commented Oct 16, 2024

System Info

Environment/Platform

Description

Reproduction

djannot commented Oct 17, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 18, 2024

djannot commented Oct 18, 2024

BritishWerewolf commented Oct 18, 2024

djannot commented Oct 19, 2024