Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix some mis-implements making datum stream importer slow #1098

Conversation

vinnamkim
Copy link
Contributor

@vinnamkim vinnamkim commented Jul 14, 2023

Summary

  • Fix _get_dm_format_version() faster when there is no dm_format_version field in the file.
  • Fix _load_media_type() faster when there is no media_type field in the file.
  • Fix TQDMProgressReporter when total is not given (total = None)

How to test

I manually tested the following code on the real-life COCO2017 object detection dataset which is converted to Datumaro (JSON) data format.

from datumaro.components.dataset_base import DatasetItem
from datumaro.components.dataset import StreamDataset, Dataset
from time import time
from datumaro.components.progress_reporting import TQDMProgressReporter
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-f", "--format", choices=["coco", "datumaro", "yolo", "voc"], help="Choose format")
parser.add_argument("--stream", action="store_true", help="Use stream importer")


def upload_to_geti_db(item: DatasetItem) -> None:
    # Hi, I'm mock!
    pass


if __name__ == "__main__":
    args = parser.parse_args()
    path, format = args.format, args.format
    if format == "coco":
        format = "coco_instances"  # Set specific format
    start = time()

    dataset = (
        StreamDataset.import_from(path, format=format, progress_reporter=TQDMProgressReporter())
        if args.stream
        else Dataset.import_from(path, format=format, progress_reporter=TQDMProgressReporter())
    )

    for item in dataset:
        upload_to_geti_db(item)

    print(f"Done. Elapsed time: {time() - start:.2f}s")

Results:

  • No stream

no_stream

(datumaro-basic) vinnamki@vinnamki:~/datumaro/ws_datum/coco$ mprof run --python python test_perf.py -f datumaro
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 4728.92it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 118287/118287 [00:30<00:00, 3919.85it/s]
Done. Elapsed time: 42.14s
  • Stream

stream

(datumaro-basic) vinnamki@vinnamki:~/datumaro/ws_datum/coco$ mprof run --python python test_perf.py -f datumaro --stream
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
4999it [00:06, 732.11it/s]
118286it [02:41, 731.49it/s]
Done. Elapsed time: 168.55s

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@vinnamkim vinnamkim added BUG Something isn't working FEATURE New feature & functionality labels Jul 14, 2023
@vinnamkim vinnamkim marked this pull request as ready for review July 14, 2023 09:08
@vinnamkim vinnamkim requested review from a team as code owners July 14, 2023 09:08
@vinnamkim vinnamkim requested review from bonhunko and removed request for a team July 14, 2023 09:08
@vinnamkim vinnamkim changed the base branch from develop to releases/1.4.0 July 14, 2023 09:08
Signed-off-by: Kim, Vinnam <[email protected]>
@vinnamkim vinnamkim merged commit 1826183 into openvinotoolkit:releases/1.4.0 Jul 17, 2023
@vinnamkim vinnamkim added this to the 1.4.0 milestone Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working FEATURE New feature & functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants