What type of data would create the most accurate model? #7
-
Upon reviewing the documentation from qiuqiao, there's 3 options for training data
The model I have trained uses just full labels, but would having examples of all 3 make a more accurate model for inference? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
It is not necessarily the case that having examples of all three types of data would make a more accurate model for inference. If you keep the full label data constant and add additional weak label or audio only data, generally this can lead to better performance. If you keep the total duration of data constant and move some of the full label data to weak label and audio only datasets, generally the performance will decrease. In fact, during training, the full label data also calculates and back-propagates the loss for weak label and audio only data, but weak label data does not calculate the loss for full label data, and audio only data does not calculate the loss for weak label and full label data. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the tips! So you're saying for the best results, training with Full Labels and very small amounts of just audio would be the best? |
Beta Was this translation helpful? Give feedback.
It is not necessarily the case that having examples of all three types of data would make a more accurate model for inference.
If you keep the full label data constant and add additional weak label or audio only data, generally this can lead to better performance.
If you keep the total duration of data constant and move some of the full label data to weak label and audio only datasets, generally the performance will decrease.
In fact, during training, the full label data also calculates and back-propagates the loss for weak label and audio only data, but weak label data does not calculate the loss for full label data, and audio only data does not calculate the loss for weak label and full…