Question about input of Mesh Transformer #74
-
In paper, Figure 5 introduces that "During training, a graph encoder extracts features from mesh faces, which are quantized into a set of face embeddings. These embeddings are flattened, bookended with start and end tokens, and fed into a GPT-style transformer". My understanding is to directly use Maybe I was wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hey, So the embedding they are talking about is not a vector embedding but tokens/indices to a codebook. So I think it's maybe the paper authors just use different words to explain the concepts, as far as i know the embedding can be just a number value (token), it's not always the case that it's a vector. You can think it as the face embedding from the encoder as a vector which then is compressed to a slot in a codebook , if you were to use the 192 vector embedding as "tokens" it would require too much resources and the output of the transformer would require a very small error margin, due to if it's wrong with a decimal of the 192's float values the effect to the decoder might be so large it won't be able create a smooth mesh. |
Beta Was this translation helpful? Give feedback.
Hey,
So the embedding they are talking about is not a vector embedding but tokens/indices to a codebook.
The output of the encoder is vector embedding but that is then quantized ( as per paper), the output of the quantization is the codes / tokens.
So I think it's maybe the paper authors just use different words to explain the concepts, as far as i know the embedding can be just a number value (token), it's not always the case that it's a vector.
You can think it as the face embedding from the encoder as a vector which then is compressed to a slot in a codebook , if you were to use the 192 vector embedding as "tokens" it would require too much resources and the output of the transformer w…