Question about input of Mesh Transformer #74

inmny · 2024-03-25T02:15:43Z

inmny
Mar 25, 2024

In paper, Figure 5 introduces that "During training, a graph encoder extracts features from mesh faces, which are quantized into a set of face embeddings. These embeddings are flattened, bookended with start and end tokens, and fed into a GPT-style transformer".

My understanding is to directly use face_embed_output from encoder as the base embedding input of transformer, instead of embedding tokens with another new Embedding.

Maybe I was wrong?

Answered by MarcusLoppe

Apr 1, 2024

Hey,

So the embedding they are talking about is not a vector embedding but tokens/indices to a codebook.
The output of the encoder is vector embedding but that is then quantized ( as per paper), the output of the quantization is the codes / tokens.

So I think it's maybe the paper authors just use different words to explain the concepts, as far as i know the embedding can be just a number value (token), it's not always the case that it's a vector.

You can think it as the face embedding from the encoder as a vector which then is compressed to a slot in a codebook , if you were to use the 192 vector embedding as "tokens" it would require too much resources and the output of the transformer w…

View full answer

MarcusLoppe · 2024-04-01T14:34:10Z

MarcusLoppe
Apr 1, 2024

Hey,

So the embedding they are talking about is not a vector embedding but tokens/indices to a codebook.
The output of the encoder is vector embedding but that is then quantized ( as per paper), the output of the quantization is the codes / tokens.

So I think it's maybe the paper authors just use different words to explain the concepts, as far as i know the embedding can be just a number value (token), it's not always the case that it's a vector.

You can think it as the face embedding from the encoder as a vector which then is compressed to a slot in a codebook , if you were to use the 192 vector embedding as "tokens" it would require too much resources and the output of the transformer would require a very small error margin, due to if it's wrong with a decimal of the 192's float values the effect to the decoder might be so large it won't be able create a smooth mesh.
Instead you can compress it into a single number so the transformer won't need to predict 192 floats values.

2 replies

YoungLee1995 Apr 10, 2024

Hi, I'm following you for meshgpt training program. How to run the python script to train the datas?Are the scripts in meshgpt_pytorch folder enough for model training and mesh generate?

MarcusLoppe Apr 15, 2024

Hi, I'm following you for meshgpt training program. How to run the python script to train the datas?Are the scripts in meshgpt_pytorch folder enough for model training and mesh generate?

Hey,
Sorry for late reply, but checkout the notebook in the repo, it should showcase how to train a model:
https://github.com/MarcusLoppe/meshgpt-pytorch/blob/main/MeshGPT_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about input of Mesh Transformer #74

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Question about input of Mesh Transformer #74

inmny Mar 25, 2024

Replies: 1 comment · 2 replies

MarcusLoppe Apr 1, 2024

YoungLee1995 Apr 10, 2024

MarcusLoppe Apr 15, 2024

inmny
Mar 25, 2024

Replies: 1 comment 2 replies

MarcusLoppe
Apr 1, 2024