Could it be possible to run Pixtral-12b in the Comfy UI ? #4899

JorgeR81 · 2024-09-12T10:35:55Z

JorgeR81
Sep 12, 2024

https://huggingface.co/mistral-community/pixtral-12b-240910

More details here: https://www.youtube.com/watch?v=PfzPfB3esG4

Is it technically possible to have Comfy UI support ( native or via custom node ) ?

Could it give better images or have other capabilities beyond Flux ?

Answered by ShmuelRonen

Sep 22, 2024

Yes:

https://github.com/ShmuelRonen/ComfyUI_pixtral_vision

View full answer

jepjoo · 2024-09-12T11:24:34Z

jepjoo
Sep 12, 2024

Yes, similar models like Qwen2-VL *) and MiniCPM already have custom nodes.

I don't think Pixtral is revolutionary in any way, just another late-fusion multimodal model that can interpret images, but output text only.

Early-fusion models that can understand image and text and can also output image and text such as Meta's Chameleon and Transfusion is what's really exciting imo.

*) https://github.com/IuvenisSapiens/ComfyUI_Qwen2-VL-Instruct

3 replies

JorgeR81 Sep 12, 2024
Author

From reading the video description, I've got the impression it could also output images ....

Pixtral can create stunningly realistic images from text descriptions, manipulate existing images with precision, and even generate concepts on demand.

jepjoo Sep 12, 2024

I think the youtuber is making stuff up. The architecture of the model is basically the same as other late-fusion vision-language models, a text generating LLM as a base (Mistral NeMo 12B in this case) and a vision adapter made to work in tandem with it afterwards.

The early-fusion models are pretty much trained from scratch with image + text dataset.

Some official slides from Mistral (would they really not mention it if it can actually generate images?):

https://x.com/swyx/status/1833932883347865802

Huggingface employee's breakdown on Pixtral:

https://www.reddit.com/r/LocalLLaMA/comments/1fe3x1z/comment/lmkojlp/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

JorgeR81 Sep 12, 2024
Author

Yeah, I also couldn't find other references about generating images.
And the youtuber itself said there was not much information about the model when the video was released ...

ShmuelRonen · 2024-09-22T16:49:52Z

ShmuelRonen
Sep 22, 2024

Yes:

https://github.com/ShmuelRonen/ComfyUI_pixtral_vision

0 replies

Vigilence · 2024-09-26T22:33:02Z

Vigilence
Sep 26, 2024

Is there a node that runs it locally instead of api?

1 reply

ShmuelRonen Sep 27, 2024

https://github.com/SeanScripts/ComfyUI-PixtralLlamaVision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could it be possible to run Pixtral-12b in the Comfy UI ? #4899

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Could it be possible to run Pixtral-12b in the Comfy UI ? #4899

JorgeR81 Sep 12, 2024

Replies: 3 comments · 4 replies

jepjoo Sep 12, 2024

JorgeR81 Sep 12, 2024 Author

jepjoo Sep 12, 2024

JorgeR81 Sep 12, 2024 Author

ShmuelRonen Sep 22, 2024

Vigilence Sep 26, 2024

ShmuelRonen Sep 27, 2024

JorgeR81
Sep 12, 2024

Replies: 3 comments 4 replies

jepjoo
Sep 12, 2024

JorgeR81 Sep 12, 2024
Author

JorgeR81 Sep 12, 2024
Author

ShmuelRonen
Sep 22, 2024

Vigilence
Sep 26, 2024