Evaluation Approach for the baseline model #32

mohammedshady · 2024-02-11T07:48:40Z

I tried using the baseline model with LSTM on my version of the dataset. I downloaded the videos and loaded the labels using the make_dataset.py script. However, the labels in my dataset don't match the original ones. Despite this, I tested the model on this modified dataset using the average of the user_summary annotations as the evaluation labels. The resulting F-score was about 0.30. Then, I tried using the maximum value instead, which gave better results with an F-score of 0.52.

Later, I tried evaluating the model using the gt_score and converting it to shot summaries, similar to our training approach. After evaluation, I got an average F-score of 0.70. But the F1-score varied a lot.

As you can see in the image, the F1-score keeps changing. My question is whether this way of evaluating is not good, and if the unstable F-score indicates a problem.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Approach for the baseline model #32

Evaluation Approach for the baseline model #32

mohammedshady commented Feb 11, 2024

Evaluation Approach for the baseline model #32

Evaluation Approach for the baseline model #32

Comments

mohammedshady commented Feb 11, 2024