You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.
I trained a new general modelscope model at 320x320 and it came out good overall, but its clear there is some stretching happening in faces (and also TV's). In that video, I ran my model and vanilla modelscope with the exact same settings and seed values using an automated system. The face stretching is clear at 0:00 in the video and the TVs appear to be stretched at 0:48.
I've gone through my dataset and it seems completely clean of stretched images, so i'm trying to find out why this might've happened. Maybe there is something wrong in my config?
The dataset consists of 450 videos from the vimeo90k dataset and 450 music videos from youtube. They were processed for black bar removal, split into clips with scene detection, and 15 clips between 1s and 3s were selected per video. Then those were cropped down to 320x320 and written to a new file with a proper 1:1 aspect ratio. I've included a zip with my yaml config and a few samples of the dataset with faces present.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I trained a new general modelscope model at 320x320 and it came out good overall, but its clear there is some stretching happening in faces (and also TV's). In that video, I ran my model and vanilla modelscope with the exact same settings and seed values using an automated system. The face stretching is clear at 0:00 in the video and the TVs appear to be stretched at 0:48.
I've gone through my dataset and it seems completely clean of stretched images, so i'm trying to find out why this might've happened. Maybe there is something wrong in my config?
The dataset consists of 450 videos from the vimeo90k dataset and 450 music videos from youtube. They were processed for black bar removal, split into clips with scene detection, and 15 clips between 1s and 3s were selected per video. Then those were cropped down to 320x320 and written to a new file with a proper 1:1 aspect ratio. I've included a zip with my yaml config and a few samples of the dataset with faces present.
dataset320Selection.zip
Beta Was this translation helpful? Give feedback.
All reactions