Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per segment chunks #8272

Merged
merged 139 commits into from
Sep 24, 2024
Merged

Per segment chunks #8272

merged 139 commits into from
Sep 24, 2024

Conversation

zhiltsov-max
Copy link
Contributor

@zhiltsov-max zhiltsov-max commented Aug 7, 2024

Motivation and context

  • Changed chunk generation from per-task chunks to per-segment chunks
  • Fixed a memory leak in video reading on the server side (only in media_extractors, so there are several more left)
  • Fixed a potential hang in import worker or the server process on process shutdown
  • Disabled multithreading in video reading in endpoints (not in static chunk generation)
  • Refactored static chunk generation code (moved after job creation)
  • Refactored various server internal APIs for frame retrieval
  • Updated UI logic to access chunks, added support for non-sequential frames in chunks
  • Added a new server configuration option CVAT_ALLOW_STATIC_CACHE (boolean) to enable and disable static cache support. The option is disabled by default (it's changed from the previous behavior)
  • Added tests for the changes made
  • Added missing original chunk type field in job responses
  • Fixed invalid kvrocks cleanup in tests for Helm deployment
  • Added a new 0-based index parameter in GET /api/jobs/{id}/data/?type=chunk to simplify indexing
    • GT job chunks with non-sequential frames have no placeholders inside

When this update is applied to the server, there will be a data storage setting migration for the tasks. Existing tasks using static chunks (task.data.storage_method == FILE_SYSTEM) will be switched to the dynamic cache (i.e. to == CACHE)). The remaining files should be removed manually, there will be a list of such tasks in the migration log file.

After this update, you'll have an option to enable or disable static cache use during task creation. This allows, in particular, prohibit new tasks using the static cache. With this option, any tasks using static cache will use the dynamic cache instead on data access.

User-observable changes:

  • Job chunk ids now start from 0 for each job instead of using parent task ids
  • The use_cache = false or storage_method = filesystem parameters in task creation can be ignored by the server
  • Task chunk access may be slower for some chunks (particularly, for tasks with overlap configured, for chunks on segment boundaries, and for tasks previously using static chunks)
  • The last chunk in a job will contain only the frames from the current job, even if there are more frames in the task

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new server setting to disable media chunks on the local filesystem.
    • Enhanced frame prefetching with a startFrame parameter for improved chunk calculations.
    • Added a new property, data_original_chunk_type, for enhanced job differentiation in the metadata.
  • Bug Fixes

    • Resolved memory management issues to prevent leaks during video processing.
    • Corrected naming inconsistencies related to the prefetchAnalyzer.
  • Documentation

    • Included configuration for code formatting tools to ensure consistent code quality across the project.
  • Refactor

    • Restructured classes and methods for improved clarity and maintainability, particularly in media handling and task processing.
  • Chores

    • Updated formatting scripts to include additional directories for automated code formatting.

Copy link
Contributor

coderabbitai bot commented Aug 7, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces several enhancements across various components, particularly focusing on improved media handling, frame processing, and task management. Notable features include new configuration options for disabling local media storage, refined frame prefetching logic, and updated test frameworks for better reliability. Memory management issues have been addressed, ensuring smoother operations during video processing, while modular code improvements enhance overall maintainability.

Changes

Files Change Summary
changelog.d/..._mzhiltso_job_chunks.md New server setting to disable local media chunks; distinct chunk IDs for jobs; memory leak fixes.
cvat-core/src/frames.ts FrameData and PrefetchAnalyzer updates for improved prefetching logic and constructor enhancements.
cvat-data/src/ts/cvat-data.ts FrameDecoder updates for handling startFrame and chunkNumber adjustments.
cvat/apps/dataset_manager/bindings.py Replaced FrameProvider with TaskFrameProvider, reflecting task-specific handling changes.
cvat/apps/dataset_manager/formats/cvat.py Refactored dump_media_files for better modularity; frame retrieval changes.
cvat/apps/dataset_manager/tests/test_formats.py Removed outdated functions and classes; enhanced test reliability through inheritance.
cvat/apps/engine/apps.py Enhanced ready method for dynamic configuration management.
cvat/apps/engine/cache.py Improved MediaCache class with new methods for cache management.
cvat/apps/engine/default_settings.py New setting for static media caching based on environment variables.
cvat/apps/engine/frame_provider.py Major refactor of frame loading capabilities with modular design changes.
cvat/apps/engine/log.py Enhanced logging functionality for migration processes.
cvat/apps/engine/media_extractors.py New classes/methods for media processing and improved type annotations.
cvat/apps/engine/models.py Updated method signatures for chunk names/paths.
cvat/apps/engine/pyproject.toml New configuration settings for code formatting tools.
cvat/apps/engine/serializers.py Added a new read-only field to JobReadSerializer for additional information.
cvat/apps/engine/task.py Enhanced task processing logic for segment creation and job management.
cvat/apps/engine/tests/... Various tests updated for improved validation and functionality.
cvat/apps/lambda_manager/tests/test_lambda.py Refactored test code structure for better maintainability.
cvat/requirements/base.in Added comments for clarity on library dependencies.
cvat/schema.yml New property data_original_chunk_type added to enhance schema expressiveness.
tests/python/shared/utils/helpers.py Updated image/video functions and added a new utility for reading video files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Server
    participant MediaCache
    participant FrameProvider

    User->>Server: Request Media
    Server->>MediaCache: Check Cache
    MediaCache-->>Server: Return Media (if available)
    Server->>FrameProvider: Retrieve Frames
    FrameProvider-->>Server: Return Frames
    Server-->>User: Deliver Media
Loading

🐰 In the meadow, changes bloom bright,
New features hop with pure delight!
With frames and tasks now dancing in line,
Our code, like carrots, is truly divine!
So celebrate with a joyful cheer,
For every fix brings us near! 🌿


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)
cvat/apps/engine/media_extractors.py (1)

Ensure all instantiations of VideoReader and VideoReaderWithManifest handle the new allow_threading parameter.

The following instances do not include the allow_threading parameter:

  • cvat/apps/engine/migrations/0024_auto_20191023_1025.py
  • cvat/apps/engine/cache.py

Please update these instances to correctly handle the allow_threading parameter.

Analysis chain

Line range hint 509-524:
LGTM! But verify the new parameter usage.

The addition of the allow_threading parameter enhances flexibility in video processing.

However, ensure that all usages of VideoReader correctly handle the new parameter.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all usages of `VideoReader` handle the new `allow_threading` parameter.

# Test: Search for the class usage. Expect: Only occurrences with the new parameter.
rg --type python -A 5 $'VideoReader'

Length of output: 66


Script:

#!/bin/bash
# Description: Verify all usages of `VideoReader` handle the new `allow_threading` parameter.

# Test: Search for the class usage. Expect: Only occurrences with the new parameter.
rg --type py -A 5 $'VideoReader'

Length of output: 5816


Script:

#!/bin/bash
# Description: Verify all instantiations of `VideoReader` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and include context lines to verify parameter usage.
rg --type py 'VideoReader(' -A 3

Length of output: 109


Script:

#!/bin/bash
# Description: Verify all instantiations of `VideoReader` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and include context lines to verify parameter usage.
rg --type py 'VideoReader\(' -A 3

Length of output: 852


Script:

#!/bin/bash
# Description: Verify all instantiations of `VideoReader` and `VideoReaderWithManifest` handle the new `allow_threading` parameter.

# Check for instantiations of `VideoReader` and `VideoReaderWithManifest` and include context lines to verify parameter usage.
rg --type py 'VideoReader\(' -A 3
rg --type py 'VideoReaderWithManifest\(' -A 3

Length of output: 1164

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6e30164 and ee3c905.

Files selected for processing (13)
  • cvat-core/src/frames.ts (10 hunks)
  • cvat-data/src/ts/cvat-data.ts (4 hunks)
  • cvat/apps/dataset_manager/bindings.py (4 hunks)
  • cvat/apps/dataset_manager/formats/cvat.py (3 hunks)
  • cvat/apps/engine/cache.py (3 hunks)
  • cvat/apps/engine/frame_provider.py (1 hunks)
  • cvat/apps/engine/media_extractors.py (15 hunks)
  • cvat/apps/engine/models.py (2 hunks)
  • cvat/apps/engine/pyproject.toml (1 hunks)
  • cvat/apps/engine/task.py (8 hunks)
  • cvat/apps/engine/views.py (10 hunks)
  • cvat/apps/lambda_manager/views.py (2 hunks)
  • dev/format_python_code.sh (1 hunks)
Files skipped from review due to trivial changes (2)
  • cvat/apps/engine/pyproject.toml
  • dev/format_python_code.sh
Additional comments not posted (69)
cvat-data/src/ts/cvat-data.ts (7)

103-103: Initialize the new startFrame parameter.

The startFrame parameter is correctly added to the class properties.


109-109: Include the new startFrame parameter in the constructor.

The startFrame parameter is correctly included in the constructor.


123-123: Set the startFrame parameter in the constructor.

The startFrame parameter is correctly set in the constructor.


209-209: Integrate the startFrame parameter in the frame method.

The startFrame parameter is correctly integrated into the calculation of chunkNumber.


268-268: Integrate the startFrame parameter in the startDecode method.

The startFrame parameter is correctly integrated into the calculation of chunkNumber.


Line range hint 1-1: Ensure proper error handling in decodeContextImages.

The function correctly handles errors by releasing the mutex and rejecting the promise.


Line range hint 1-1: Ensure proper resource management in decodeContextImages.

The function correctly manages resources by using a mutex to control access.

cvat/apps/engine/cache.py (11)

61-62: Ensure correct checksum computation in get_checksum.

The method correctly computes a checksum using zlib.crc32.


64-99: Ensure correct cache item handling in _get_or_set_cache_item.

The method correctly handles cache item creation and retrieval, including checksum validation.


100-107: Ensure correct cache retrieval in _get.

The method correctly retrieves an item from the cache and handles potential errors.


111-118: Ensure correct segment chunk handling in get_segment_chunk.

The method correctly retrieves or prepares a segment chunk.


121-128: Ensure correct selective job chunk handling in get_selective_job_chunk.

The method correctly retrieves or prepares a selective job chunk.


131-134: Ensure correct segment preview handling in get_or_set_segment_preview.

The method correctly retrieves or prepares a segment preview.


137-138: Ensure correct cloud preview handling in get_cloud_preview.

The method correctly retrieves a cloud preview.


140-143: Ensure correct cloud preview handling in get_or_set_cloud_preview.

The method correctly retrieves or prepares a cloud preview.


146-149: Ensure correct frame context image handling in get_frame_context_images.

The method correctly retrieves or prepares frame context images.


152-232: Ensure correct raw frame reading in _read_raw_frames.

The method correctly reads raw frames based on task and frame IDs.


234-286: Ensure correct segment chunk preparation in prepare_segment_chunk.

The method correctly prepares a segment chunk based on the segment type.

cvat/apps/engine/frame_provider.py (6)

41-73: Ensure correct implementation of _ChunkLoader class.

The class correctly defines an abstract interface for loading chunks of media data.


76-94: Ensure correct implementation of _FileChunkLoader class.

The class correctly handles file-based chunk loading.


96-108: Ensure correct implementation of _BufferChunkLoader class.

The class correctly handles buffer-based chunk loading.


128-206: Ensure correct implementation of IFrameProvider class.

The class correctly defines an abstract interface for frame providers.


209-377: Ensure correct implementation of TaskFrameProvider class.

The class correctly handles task-based frame provision.


379-524: Ensure correct implementation of SegmentFrameProvider class.

The class correctly handles segment-based frame provision.

cvat-core/src/frames.ts (7)

28-28: Typo correction: Renamed prefetchAnalizer to prefetchAnalyzer.

This change corrects a typographical error, improving code readability.


211-211: Added private field #startFrame to PrefetchAnalyzer.

This field is necessary for the new functionality involving the startFrame parameter.


213-216: Updated constructor to accept startFrame parameter.

This change is essential for initializing the new #startFrame field.


224-224: Updated currentChunk calculation to account for startFrame.

This change ensures that the prefetching logic correctly considers the starting frame.


230-230: Updated condition to use modified formula for currentChunk.

This change ensures that the prefetching logic correctly considers the starting frame.


272-272: Updated chunkNumber calculation to account for startFrame.

This change ensures that the chunk number calculation correctly considers the starting frame.


277-280: Updated nextChunkNumber calculation to account for startFrame.

This change ensures that the next chunk number calculation correctly considers the starting frame.

cvat/apps/engine/models.py (5)

259-267: Updated _get_chunk_name to include segment_id and return descriptive format.

This change improves the naming conventions and provides clearer context for the chunk names.


269-270: Updated _get_compressed_chunk_name to include segment_id.

This change ensures consistency with the updated chunk naming conventions.


272-273: Updated _get_original_chunk_name to include segment_id.

This change ensures consistency with the updated chunk naming conventions.


275-277: Renamed get_original_chunk_path to get_original_segment_chunk_path and added segment parameter.

This change aligns with the updated method signatures and provides clearer context for the parameters.


279-281: Renamed get_compressed_chunk_path to get_compressed_segment_chunk_path and added segment parameter.

This change aligns with the updated method signatures and provides clearer context for the parameters.

cvat/apps/lambda_manager/views.py (4)

35-35: Updated import statement to replace FrameProvider with FrameQuality and TaskFrameProvider.

This change indicates a shift towards a more structured approach in managing frame quality attributes.


483-485: Updated logic for determining quality to use FrameQuality.

This change enhances clarity and maintainability by encapsulating quality definitions within a dedicated class.


492-492: Updated instantiation of frame provider to TaskFrameProvider(db_task).

This change suggests a refactoring that may provide better context or functionality tied to the task rather than the data alone.


485-485: Updated condition to reference FrameQuality.ORIGINAL and FrameQuality.COMPRESSED.

This change improves the semantic accuracy of the code.

cvat/apps/engine/media_extractors.py (8)

55-57: LGTM!

The FrameQuality enumeration is well-defined and categorizes frame quality effectively.


104-137: LGTM!

The RandomAccessIterator class is well-implemented, allowing random access to elements in an iterable and managing resources effectively.


144-189: LGTM!

The CachingMediaIterator class is well-implemented, extending RandomAccessIterator and adding effective caching capabilities.


533-573: LGTM!

The _make_frame_iterator method is well-implemented, encapsulating the logic for frame iteration and handling optional filtering and rotation based on metadata.


589-605: LGTM!

The _read_av_container method ensures proper cleanup of resources, addressing potential memory leaks.


645-667: LGTM!

The get_frame_count method is well-implemented, providing the total frame count in the video.


670-679: LGTM!

The ImageReaderWithManifest class is well-implemented, providing manifest-based access to images.


680-754: LGTM!

The VideoReaderWithManifest class is well-implemented, providing manifest-based access to videos and handling threading and frame iteration effectively.

cvat/apps/engine/task.py (5)

Line range hint 125-173:
LGTM!

The _generate_segment_params function is well-implemented, generating segment parameters with named parameters and improving readability and maintainability.


176-200: LGTM!

The _create_segments_and_jobs function is well-implemented, focusing on segment and job creation and streamlining the process.


1134-1266: LGTM!

The _create_static_chunks function is well-implemented, encapsulating the logic for creating static chunks and improving modularity.


77-79: LGTM!

The SegmentParams class is well-implemented, providing a clearer structure for segment parameters with the addition of the type field and optional frames parameter.


1140-1154: LGTM!

The update_progress function is well-implemented, enhancing user feedback during long-running operations.

cvat/apps/dataset_manager/formats/cvat.py (2)

1374-1375: LGTM! Using the factory function improves modularity.

The use of make_frame_provider instead of directly instantiating FrameProvider enhances modularity and flexibility.


1380-1385: Verify the correctness of the frame retrieval process.

Ensure that the iterate_frames method correctly retrieves frames with the specified parameters.

cvat/apps/dataset_manager/bindings.py (5)

34-34: Import changes approved.

The import statement has been updated to reflect the new frame provider and enums.


1351-1361: Frame quality and output type changes approved.

The method now uses FrameQuality.ORIGINAL and FrameOutputType.NUMPY_ARRAY for video frames and FrameQuality.ORIGINAL and FrameOutputType.BUFFER for image frames, aligning with the new enums.


1369-1369: Frame provider update approved.

The method now uses TaskFrameProvider instead of FrameProvider, aligning with the new class introduced.


1505-1505: Frame extension update approved.

The method now uses TaskFrameProvider.VIDEO_FRAME_EXT to set the frame extension for video tasks, aligning with the new class and its properties.


1596-1596: Frame extension update approved.

The method now uses TaskFrameProvider.VIDEO_FRAME_EXT to set the frame extension for video tasks, aligning with the new class and its properties.

cvat/apps/engine/views.py (9)

631-641: Method Implementation: Ensure _TaskDataGetter usage

The preview method uses _TaskDataGetter for data retrieval. Ensure that _TaskDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _TaskDataGetter usage

The _TaskDataGetter class is correctly defined and used in the preview method of ProjectViewSet. The class extends _DataGetter and includes the necessary parameters.

  • _TaskDataGetter is defined in cvat/apps/engine/views.py.
  • It is used multiple times in the same file, including in the preview method.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_TaskDataGetter'

Length of output: 65


Script:

#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg '_TaskDataGetter' --glob '*.py' -A 5

Length of output: 1385


714-727: Constructor: Ensure db_task is a valid Task model instance

The constructor of _TaskDataGetter accepts db_task. Ensure that db_task is a valid instance of the Task model.

Verification successful

Constructor: Ensure db_task is a valid Task model instance

The constructor of _TaskDataGetter accepts db_task. Based on the extensive usage of self._db_task across the codebase, it is consistently assigned a valid Task model instance, particularly highlighted in backup.py where it is assigned from a Task model object.

  • cvat/apps/engine/backup.py: self._db_task = models.Task.objects.create(**self._manifest, organization_id=self._org_id)
  • cvat/apps/engine/backup.py: self._db_task = models.Task.objects.prefetch_related(...).get(pk=pk)
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `db_task` is a valid instance of the `Task` model.

# Test: Search for `db_task` usage. Expect: Valid instance of `Task`.
rg --type python 'self\._db_task'

Length of output: 66


Script:

#!/bin/bash
# Verify that `db_task` is a valid instance of the `Task` model.

# Test: Search for `db_task` usage. Expect: Valid instance of `Task`.
rg --glob '*.py' 'self\._db_task'

Length of output: 6864


661-664: Constructor: Ensure data_num type consistency

The constructor of _DataGetter accepts data_num as Optional[Union[str, int]]. Ensure that data_num is consistently handled as an integer within the class.

Verification successful

Constructor: Ensure data_num type consistency

The constructor of _DataGetter accepts data_num as Optional[Union[str, int]]. The search results confirm that data_num is consistently converted to an integer within the class, ensuring type consistency.

  • self.number = int(data_num) if data_num is not None else None
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage. Expect: Consistent handling as an integer.
rg --type python 'self\.number'

Length of output: 64


Script:

#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage. Expect: Consistent handling as an integer.
rg --type py 'self\.data_num'

Length of output: 31


Script:

#!/bin/bash
# Verify that `data_num` is consistently handled as an integer within `_DataGetter`.

# Test: Search for `data_num` usage in the entire file. Expect: Consistent handling as an integer.
rg 'data_num' cvat/apps/engine/views.py

Length of output: 848


730-743: Constructor: Ensure db_job is a valid Job model instance

The constructor of _JobDataGetter accepts db_job. Ensure that db_job is a valid instance of the Job model.

Verification successful

Constructor: Ensure db_job is a valid Job model instance

The constructor of _JobDataGetter accepts db_job. It has been verified that db_job is a valid instance of the Job model.

  • db_job is assigned using Job.objects.select_related, Job.objects.get, and models.Job.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `db_job` is a valid instance of the `Job` model.

# Test: Search for `db_job` usage. Expect: Valid instance of `Job`.
rg --type python 'self\._db_job'

Length of output: 65


Script:

#!/bin/bash
# Verify that `db_job` is a valid instance of the `Job` model.

# Test: Search for `self._db_job` usage. Expect: Valid instance of `Job`.
rg -t py 'self\._db_job'

Length of output: 1377


Script:

#!/bin/bash
# Verify the initialization or type assignment of `db_job` to ensure it is a valid instance of the `Job` model.

# Search for `db_job` assignments in the codebase to verify its type.
rg -t py 'db_job\s*='

Length of output: 1908


1997-2000: Method Implementation: Ensure _JobDataGetter usage

The data method uses _JobDataGetter for data retrieval. Ensure that _JobDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _JobDataGetter usage

The _JobDataGetter class is correctly used and implemented in the data method of JobViewSet. The class is properly defined with an __init__ method that initializes necessary parameters and a _get_frame_provider method that encapsulates the frame provider logic.

  • _JobDataGetter class definition and methods:
    • __init__ method initializes with db_job, data_type, data_quality, and data_num.
    • _get_frame_provider method returns a JobFrameProvider instance.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_JobDataGetter'

Length of output: 64


Script:

#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type-add 'py:*.py' '_JobDataGetter'

Length of output: 229


Script:

#!/bin/bash
# Retrieve the definition and methods of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _JobDataGetter($_) { $$$ }'

Length of output: 524


Script:

#!/bin/bash
# Retrieve the definition and methods of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _JobDataGetter($_): $$$'

Length of output: 932


742-743: Method Implementation: Ensure JobFrameProvider usage

The _get_frame_provider method returns an instance of JobFrameProvider. Ensure that JobFrameProvider is correctly used and implemented.

Verification successful

JobFrameProvider is correctly used and implemented

The JobFrameProvider class is correctly implemented and used within the codebase. It inherits from SegmentFrameProvider and is instantiated properly in the make_frame_provider function.

  • cvat/apps/engine/frame_provider.py:
    • Definition at line 537
    • Usage at line 546
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `JobFrameProvider` is correctly used and implemented.

# Test: Search for `JobFrameProvider` usage. Expect: Correct usage and implementation.
rg --type python 'JobFrameProvider'

Length of output: 66


Script:

#!/bin/bash
# Find all Python files and search for `JobFrameProvider` usage and implementation.
fd -e py -x grep -Hn 'JobFrameProvider' {}

Length of output: 423


Script:

#!/bin/bash
# Extract lines around the definition and usage of `JobFrameProvider` in `frame_provider.py`.
grep -A 10 -B 10 'class JobFrameProvider' ./cvat/apps/engine/frame_provider.py
grep -A 10 -B 10 'JobFrameProvider(data_source)' ./cvat/apps/engine/frame_provider.py

Length of output: 1575


2093-2098: Method Implementation: Ensure _JobDataGetter usage

The preview method uses _JobDataGetter for data retrieval. Ensure that _JobDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _JobDataGetter usage

The preview method uses _JobDataGetter for data retrieval. The _JobDataGetter class is correctly implemented and used in the preview method.

  • _JobDataGetter class is defined in cvat/apps/engine/views.py (line 730).
  • It inherits from _DataGetter and correctly implements necessary methods.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_JobDataGetter'

Length of output: 64


Script:

#!/bin/bash
# Verify that `_JobDataGetter` is correctly used and implemented.

# Test: Search for `_JobDataGetter` usage. Expect: Correct usage and implementation.
rg --type py '_JobDataGetter'

Length of output: 218


Script:

#!/bin/bash
# Extract the implementation of `_JobDataGetter` from `cvat/apps/engine/views.py`.

# Use `ast-grep` to get the class definition and its methods.
ast-grep --lang python --pattern $'class _JobDataGetter($_):\n    $$$' cvat/apps/engine/views.py

Length of output: 962


Script:

#!/bin/bash
# Extract the implementation of `_DataGetter` from `cvat/apps/engine/views.py`.

# Use `ast-grep` to get the class definition and its methods.
ast-grep --lang python --pattern $'class _DataGetter($_):\n    $$$' cvat/apps/engine/views.py

Length of output: 4104


1266-1269: Method Implementation: Ensure _TaskDataGetter usage

The data method uses _TaskDataGetter for data retrieval. Ensure that _TaskDataGetter is correctly used and implemented.

Verification successful

Method Implementation: Ensure _TaskDataGetter usage

The data method in TaskViewSet correctly uses _TaskDataGetter for data retrieval. The _TaskDataGetter class is properly implemented with necessary methods.

  • _TaskDataGetter class is defined in cvat/apps/engine/views.py and inherits from _DataGetter.
  • It includes an __init__ method and a _get_frame_provider method.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg --type python '_TaskDataGetter'

Length of output: 65


Script:

#!/bin/bash
# Verify that `_TaskDataGetter` is correctly used and implemented.

# Test: Search for `_TaskDataGetter` usage. Expect: Correct usage and implementation.
rg '_TaskDataGetter' --glob '*.py' -A 5

Length of output: 1385


Script:

#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern $'class _TaskDataGetter(_DataGetter) {
  $$$
}' cvat/apps/engine/views.py

Length of output: 574


Script:

#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
ast-grep --lang python --pattern 'class _TaskDataGetter(_DataGetter):' cvat/apps/engine/views.py -A 50

Length of output: 104


Script:

#!/bin/bash
# Extract the full definition of `_TaskDataGetter` and its methods from `cvat/apps/engine/views.py`.

# Search for the class definition and its methods.
rg 'class _TaskDataGetter' cvat/apps/engine/views.py -A 50

Length of output: 1637


726-727: Method Implementation: Ensure TaskFrameProvider usage

The _get_frame_provider method returns an instance of TaskFrameProvider. Ensure that TaskFrameProvider is correctly used and implemented.

Verification successful

TaskFrameProvider is correctly used and implemented

The TaskFrameProvider class is defined and implemented in the cvat/apps/engine/frame_provider.py file. The class includes methods for validating frame and chunk numbers, indicating that it adheres to the expected interface and functionality.

  • cvat/apps/engine/frame_provider.py:
    class TaskFrameProvider(IFrameProvider):
        def __init__(self, db_task: models.Task) -> None:
            self._db_task = db_task
    
        def validate_frame_number(self, frame_number: int) -> int:
            start = self._db_task.data.start_frame
            stop = self._db_task.data.stop_frame
            if frame_number not in range(start, stop + 1, self._db_task.data.get_frame_step()):
                raise ValidationError(
                    f"Invalid frame '{frame_number}'. "
                    f"The frame number should be in the [{start}, {stop}] range"
                )
            return frame_number
    
        def validate_chunk_number(self, chunk_number: int) -> int:
            last_chunk = math.ceil(self._db_task.data.size / self._db_task.data.chunk_size) - 1
            if not (0 <= chunk_number <= last_chunk):
                raise ValidationError(
                    f"Invalid chunk number '{chunk_number}'. "
                    f"The chunk number should be in the [0, {last_chunk}] range"
                )

This confirms that the TaskFrameProvider is correctly used and implemented in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `TaskFrameProvider` is correctly used and implemented.

# Test: Search for `TaskFrameProvider` usage. Expect: Correct usage and implementation.
rg --type python 'TaskFrameProvider'

Length of output: 67


Script:

#!/bin/bash
# Find all Python files and search for `TaskFrameProvider` usage and implementation.

# Locate all Python files and search for `TaskFrameProvider`.
fd -e py -x rg 'TaskFrameProvider' {}

Length of output: 841


Script:

#!/bin/bash
# Extract the implementation details of `TaskFrameProvider`.

# Search for the definition and implementation of `TaskFrameProvider`.
ast-grep --lang python --pattern $'class TaskFrameProvider(IFrameProvider) {
  $$$
}'

Length of output: 558


Script:

#!/bin/bash
# Extract the implementation details of `TaskFrameProvider`.

# Search for the definition and implementation of `TaskFrameProvider`.
ast-grep --lang python --pattern 'class TaskFrameProvider(IFrameProvider):'

Length of output: 77


Script:

#!/bin/bash
# Search for the definition of `TaskFrameProvider` class.

# Use `rg` to find the class definition of `TaskFrameProvider`.
rg 'class TaskFrameProvider' -A 20

Length of output: 1730

cvat/apps/engine/views.py Outdated Show resolved Hide resolved
@zhiltsov-max zhiltsov-max changed the title Per job chunks Per segment chunks Aug 7, 2024
cvat/apps/engine/models.py Outdated Show resolved Hide resolved
@@ -0,0 +1,16 @@
# Copyright (C) 2024 CVAT.ai Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the file? Why don't define the settings in the global settings file?

Copy link
Contributor Author

@zhiltsov-max zhiltsov-max Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can redefine these settings in the global settings file if you need it, this file defines the default values and guarantees the required settings are there.

@Marishka17
Copy link
Contributor

Marishka17 commented Sep 16, 2024

@zhiltsov-max, Mostly LGTM, but I've found 2 small bugs:

  • If I create a task with 4 images and start_frame 2, I'll receive a 400 code on the task preview request.
    image

  • I'm not sure whether it is related to the current PR or not, but I guess the progress bar can show an incorrect interval of loaded chunks/frames for a GT job (e.g. there were 2 frames 3 and 6 in the GT job):
    image

  • I see that now we will have a "3rd type" of created tasks on the server: tasks with images initially created with static chunks, migrated to dynamic chunks but without manifest file. I'm not sure right now whether there can be some problems with that or it becomes a waste of time. However, it is probably some kind of inconsistent state.

cvat/apps/engine/cache.py Outdated Show resolved Hide resolved
cvat/apps/engine/cache.py Show resolved Hide resolved
@zhiltsov-max
Copy link
Contributor Author

If I create a task with 4 images and start_frame 2, I'll receive a 400 code on the task preview request.

Fixed

I'm not sure whether it is related to the current PR or not, but I guess the progress bar can show an incorrect interval of loaded chunks/frames for a GT job (e.g. there were 2 frames 3 and 6 in the GT job):

Fixed

# TODO: find a way to use prefetched results, if provided
db_images = (
db_data.images.order_by("frame")
.filter(frame__gte=frame_ids[0], frame__lte=frame_ids[-1])
Copy link
Contributor

@Marishka17 Marishka17 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should set chunk_size max limit and use here frame__in (in that case we do not need to iterate over requested_frame_iter and compare a frame_id with next_requested_frame_id). What do you think? (can be changed in a separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like using __in, as it can result in very big queries, which are processed slowly. And it is possible here, as chunk_size is only limited by the task size now. I think it can be done once we introduce a limit on the chunk_size, which is planned to be done in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like using __in, as it can result in very big queries, which are processed slowly.

I agree, but I also said that frame__in should be used only when we add a chunk_size limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought you're speaking about order by + limit in the resulting sql.

Copy link

sonarcloud bot commented Sep 24, 2024

@zhiltsov-max zhiltsov-max merged commit c9754a9 into develop Sep 24, 2024
34 checks passed
@cvat-bot cvat-bot bot mentioned this pull request Oct 1, 2024
@bsekachev bsekachev deleted the zm/job-chunks branch October 24, 2024 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants