open-mmlab · ly015 · Feb 7, 2023 · Dec 30, 2022 · Jan 11, 2023 · Jan 12, 2023
diff --git a/configs/recognition/i3d/metafile.yml b/configs/recognition/i3d/metafile.yml
@@ -7,6 +7,8 @@ Collections:
 
 Models:
   - Name: i3d_imagenet-pretrained-r50-nl-dot-product_8xb8-32x2x1-100e_kinetics400-rgb
+    Alias:
+      - i3d
     Config: configs/recognition/i3d/i3d_imagenet-pretrained-r50-nl-dot-product_8xb8-32x2x1-100e_kinetics400-rgb.py
     In Collection: I3D
     Metadata:

diff --git a/configs/recognition/slowfast/metafile.yml b/configs/recognition/slowfast/metafile.yml
@@ -30,6 +30,8 @@ Models:
     Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/slowfast/slowfast_r50_8xb8-4x16x1-256e_kinetics400-rgb/slowfast_r50_8xb8-4x16x1-256e_kinetics400-rgb_20220901-701b0f6f.pth
 
   - Name: slowfast_r50_8xb8-8x8x1-256e_kinetics400-rgb
+    Alias:
+      - slowfast
     Config: configs/recognition/slowfast/slowfast_r50_8xb8-8x8x1-256e_kinetics400-rgb.py
     In Collection: SlowFast
     Metadata:

diff --git a/configs/recognition/tsn/metafile.yml b/configs/recognition/tsn/metafile.yml
@@ -53,6 +53,8 @@ Models:
     Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x5-100e_kinetics400-rgb/tsn_imagenet-pretrained-r50_8xb32-1x1x5-100e_kinetics400-rgb_20220906-65d68713.pth
 
   - Name: tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb
+    Alias:
+      - TSN
     Config: configs/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb.py
     In Collection: TSN
     Metadata:

diff --git a/demo/README.md b/demo/README.md
@@ -7,6 +7,7 @@
 - [Video GradCAM Demo](#video-gradcam-demo): A demo script to visualize GradCAM results using a single video.
 - [Webcam demo](#webcam-demo): A demo script to implement real-time action recognition from a web camera.
 - [Skeleton-based Action Recognition Demo](#skeleton-based-action-recognition-demo): A demo script to predict the skeleton-based action recognition result using a single video.
+- [Inferencer Demo](#inferencer): A demo script to implement fast predict for video analysis tasks based on unified inferencer interface.
 
 ## Modify configs through script arguments
 
@@ -52,7 +53,7 @@ Optional arguments:
 Examples:
 
 Assume that you are located at `$MMACTION2` and have already downloaded the checkpoints to the directory `checkpoints/`,
-or use checkpoint url from to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
+or use checkpoint url from `configs/` to directly load corresponding checkpoint, which will be automatically saved in `$HOME/.cache/torch/checkpoints`.
 
 1. Recognize a video file as input by using a TSN model on cuda by default.
 
@@ -183,7 +184,7 @@ Users can change:
 
 ## Skeleton-based Action Recognition Demo
 
-MMAction2 provides an demo script to predict the skeleton-based action recognition result using a single video.
+MMAction2 provides a demo script to predict the skeleton-based action recognition result using a single video.
 
 ```shell
 python demo/demo_skeleton.py ${VIDEO_FILE} ${OUT_FILENAME} \
@@ -247,3 +248,63 @@ python demo/demo_skeleton.py demo/demo_skeleton.mp4 demo/demo_skeleton_out.mp4 \
     --pose-checkpoint https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth \
     --label-map tools/data/skeleton/label_map_ntu60.txt
 ```
+
+## Inferencer
+
+MMAction2 provides a demo script to implement fast prediction for video analysis tasks based on unified inferencer interface, currently only supports action recognition task.
+
+```shell
+python demo/demo.py ${INPUTS} \
+    [--vid-out-dir ${VID_OUT_DIR}] \
+    [--rec ${RECOG_TASK}] \
+    [--rec-weights ${RECOG_WEIGHTS}] \
+    [--label-file ${LABEL_FILE}] \
+    [--device ${DEVICE_TYPE}] \
+    [--batch-size ${BATCH_SIZE}] \
+    [--print-result ${PRINT_RESULT}] \
+    [--pred-out-file ${PRED_OUT_FILE} ]
+```
+
+Optional arguments:
+
+- `--show`: If specified, the demo will display the video in a popup window.
+- `--print-result`: If specified, the demo will print the inference results'
+- `VID_OUT_DIR`: Output directory of saved videos. Defaults to None, means not to save videos.
+- `RECOG_TASK`: Type of Action Recognition algorithm. It could be the path to the config file, the model name or alias defined in metafile.
+- `RECOG_WEIGHTS`: Path to the custom checkpoint file of the selected recog model. If it is not specified and "rec" is a model name of metafile, the weights will be loaded from metafile.
+- `LABEL_FILE`: Label file for dataset the algorithm pretrained on. Defaults to None, means don't show label in result.
+- `DEVICE_TYPE`: Type of device to run the demo. Allowed values are cuda device like `cuda:0` or `cpu`. Defaults to `cuda:0`.
+- `BATCH_SIZE`: The batch size used in inference. Defaults to 1.
+- `PRED_OUT_FILE`: File path to save the inference results. Defaults to None, means not to save prediction results.
+
+Examples:
+
+Assume that you are located at `$MMACTION2`.
+
+1. Recognize a video file as input by using a TSN model, loading checkpoint from metafile.
+
+   ```shell
+   # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
+   python demo/demo_inferencer.py demo/demo.mp4
+       --rec configs/recognition/tsn/tsn_r50_8xb32-1x1x8-100e_kinetics400-rgb.py \
+       --label-file tools/data/kinetics/label_map_k400.txt
+   ```
+
+2. Recognize a video file as input by using a TSN model, using model alias in metafile.
+
+   ```shell
+   # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
+   python demo/demo_inferencer.py demo/demo.mp4
+       --rec tsn \
+       --label-file tools/data/kinetics/label_map_k400.txt
+   ```
+
+3. Recognize a video file as input by using a TSN model, and then save visulization video.
+
+   ```shell
+   # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
+   python demo/demo_inferencer.py demo/demo.mp4
+       --vid-out-dir demo_out \
+       --rec tsn \
+       --label-file tools/data/kinetics/label_map_k400.txt
+   ```
diff --git a/demo/demo.py b/demo/demo.py
@@ -4,7 +4,6 @@
 from operator import itemgetter
 from typing import Optional, Tuple
 
-import cv2
 from mmengine import Config, DictAction
 
 from mmaction.apis import inference_recognizer, init_recognizer
@@ -88,34 +87,9 @@ def get_output(
     if video_path.startswith(('http://', 'https://')):
         raise NotImplementedError
 
-    try:
-        import decord
-    except ImportError:
-        raise ImportError('Please install decord to enable output file.')
-
-    # Channel Order is `BGR`
-    video = decord.VideoReader(video_path)
-    frames = [x.asnumpy()[..., ::-1] for x in video]
-    if target_resolution:
-        w, h = target_resolution
-        frame_h, frame_w, _ = frames[0].shape
-        if w == -1:
-            w = int(h / frame_h * frame_w)
-        if h == -1:
-            h = int(w / frame_w * frame_h)
-        frames = [cv2.resize(f, (w, h)) for f in frames]
-
     # init visualizer
     out_type = 'gif' if osp.splitext(out_filename)[1] == '.gif' else 'video'
-    vis_backends_cfg = [
-        dict(
-            type='LocalVisBackend',
-            out_type=out_type,
-            save_dir='demo',
-            fps=fps)
-    ]
-    visualizer = ActionVisualizer(
-        vis_backends=vis_backends_cfg, save_dir='place_holder')
+    visualizer = ActionVisualizer()
     visualizer.dataset_meta = dict(classes=labels)
 
     text_cfg = {'colors': font_color}
@@ -124,11 +98,15 @@ def get_output(
 
     visualizer.add_datasample(
         out_filename,
-        frames,
+        video_path,
         data_sample,
         draw_pred=True,
         draw_gt=False,
-        text_cfg=text_cfg)
+        text_cfg=text_cfg,
+        fps=fps,
+        out_type=out_type,
+        out_path=osp.join('demo', out_filename),
+        target_resolution=target_resolution)
 
 
 def main():

diff --git a/demo/demo_inferencer.py b/demo/demo_inferencer.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from argparse import ArgumentParser
+
+from mmaction.apis.inferencers import MMAction2Inferencer
+
+
+def parse_args():
+    parser = ArgumentParser()
+    parser.add_argument(
+        'inputs', type=str, help='Input video file or rawframes folder path.')
+    parser.add_argument(
+        '--vid-out-dir',
+        type=str,
+        default='',
+        help='Output directory of videos.')
+    parser.add_argument(
+        '--rec',
+        type=str,
+        default=None,
+        help='Pretrained action recognition algorithm. It\'s the path to the '
+        'config file or the model name defined in metafile.')
+    parser.add_argument(
+        '--rec-weights',
+        type=str,
+        default=None,
+        help='Path to the custom checkpoint file of the selected recog model. '
+        'If it is not specified and "rec" is a model name of metafile, the '
+        'weights will be loaded from metafile.')
+    parser.add_argument(
+        '--label-file', type=str, default=None, help='label file for dataset.')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default=None,
+        help='Device used for inference. '
+        'If not specified, the available device will be automatically used.')
+    parser.add_argument(
+        '--batch-size', type=int, default=1, help='Inference batch size.')
+    parser.add_argument(
+        '--show',
+        action='store_true',
+        help='Display the video in a popup window.')
+    parser.add_argument(
+        '--print-result',
+        action='store_true',
+        help='Whether to print the results.')
+    parser.add_argument(
+        '--pred-out-file',
+        type=str,
+        default='',
+        help='File to save the inference results.')
+
+    call_args = vars(parser.parse_args())
+
+    init_kws = ['rec', 'rec_weights', 'device', 'label_file']
+    init_args = {}
+    for init_kw in init_kws:
+        init_args[init_kw] = call_args.pop(init_kw)
+
+    return init_args, call_args
+
+
+def main():
+    init_args, call_args = parse_args()
+    mmaction2 = MMAction2Inferencer(**init_args)
+    mmaction2(**call_args)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/mmaction/apis/__init__.py b/mmaction/apis/__init__.py
@@ -1,6 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from .inference import (detection_inference, inference_recognizer,
                         init_recognizer, pose_inference)
+from .inferencers import *  # NOQA
 
 __all__ = [
     'init_recognizer', 'inference_recognizer', 'detection_inference',

diff --git a/mmaction/apis/inferencers/__init__.py b/mmaction/apis/inferencers/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .actionrecog_inferencer import ActionRecogInferencer
+from .mmaction2_inferencer import MMAction2Inferencer
+
+__all__ = ['ActionRecogInferencer', 'MMAction2Inferencer']