Based on my reading of the introduction, the main research question this paper addresses is:
How can we develop an anomaly detection model that can quickly adapt to a new scene using only a few example frames from that scene?
In particular, the paper introduces a new problem called "few-shot scene-adaptive anomaly detection". The goal is to take an anomaly detection model trained on videos from multiple different scenes, and adapt it to work well on a new target scene using only a small number of sample frames (e.g. 1-10 frames) from that target scene.
The key motivation is that in real-world applications, we may not have access to large training datasets for each new deployment scene. So the model needs to be able to adapt quickly using just a few examples. The paper proposes a meta-learning based approach to address this problem.
In summary, the main research question is how to do few-shot adaptation of an anomaly detection model to a new target scene/dataset using meta-learning. This is positioned as a novel problem formulation closer to real-world anomaly detection applications.