You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@ziniuwan
1: you mentioned that “Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.”
but there is "# We empirically choose not to load the pretrained decoder weights from stage1 as it yields better performance." in code, so will not use the pretrained weights of decode form stage1. howerver the MODEL.ENCODER.BACKBONE is "cnn" in stage1, wihch is "ste" in stage2, so we will also not use the pretrained weights of encode form stage1. so neither encode nor decode is used,why do we still need to do pre-training for stage1?
2: what does this operation mean?, i think it is similar to "x = x + self.pos_embed"
x = x.reshape(-1, seqlen, N, C) + self.temp_embed[:,:seqlen,:,:] in vision_transformer.py
The text was updated successfully, but these errors were encountered:
@ziniuwan
1: you mentioned that “Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.”
but there is "# We empirically choose not to load the pretrained decoder weights from stage1 as it yields better performance." in code, so will not use the pretrained weights of decode form stage1. howerver the MODEL.ENCODER.BACKBONE is "cnn" in stage1, wihch is "ste" in stage2, so we will also not use the pretrained weights of encode form stage1. so neither encode nor decode is used,why do we still need to do pre-training for stage1?
2: what does this operation mean?, i think it is similar to "x = x + self.pos_embed"
x = x.reshape(-1, seqlen, N, C) + self.temp_embed[:,:seqlen,:,:] in vision_transformer.py
The text was updated successfully, but these errors were encountered: