Skip to content

Commit

Permalink
Optimize validation layout updates (#8789)
Browse files Browse the repository at this point in the history
<!-- Raise an issue to propose your change
(https://github.com/cvat-ai/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/).
-->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
<!-- Why is this change required? What problem does it solve? If it
fixes an open
issue, please link to the issue here. Describe your changes in detail,
add
screenshots. -->

Fixes #8686

Includes #8689

- Optimized task validation layout updates
- Refactored `take_by` and `chunked_list` uses in the server code
- Fixed response values and reroll logic when both `disabled_frames` and
`frame_selection_method` are used simultaneously in `PATCH
/tasks/id/validation_layout`
- Fixed missing context image chunks cleanup on honeypot changes in jobs
and tasks
- Fixed invalid context image chunk cache keys

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [ ] I submit my changes into the `develop` branch
- [ ] I have created a changelog fragment <!-- see top comment in
CHANGELOG.md -->
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [ ] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [ ] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning))

### License

- [ ] I submit _my code changes_ under the same [MIT License](
https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **New Features**
- Enhanced honeypot task functionality for improved validation frame
selection and randomization.
- Introduced a new class for managing frame selection, ensuring uniform
usage across tasks.

- **Bug Fixes**
- Improved error handling and validation checks in task management and
annotation processes.

- **Tests**
- Expanded test coverage for task creation, validation frame management,
and annotation import/export, ensuring robust functionality.

- **Documentation**
- Updated internal documentation to reflect new features and changes in
task management processes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Maria Khrustaleva <[email protected]>
  • Loading branch information
3 people authored Dec 19, 2024
1 parent df230a4 commit b2a8c1b
Show file tree
Hide file tree
Showing 13 changed files with 614 additions and 175 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Fixed

- \[Server API\] Significantly improved preformance of honeypot changes in tasks
(<https://github.com/cvat-ai/cvat/pull/8789>)
- \[Server API\] `PATCH tasks/id/validation_layout` responses now include correct
`disabled_frames` and handle simultaneous updates of
`disabled_frames` and honeypot frames correctly
(<https://github.com/cvat-ai/cvat/pull/8789>)
2 changes: 1 addition & 1 deletion cvat/apps/dataset_manager/bindings.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ class Attribute(NamedTuple):
value: Any

@classmethod
def add_prefetch_info(cls, queryset: QuerySet):
def add_prefetch_info(cls, queryset: QuerySet[Label]) -> QuerySet[Label]:
assert issubclass(queryset.model, Label)

return add_prefetch_fields(queryset, [
Expand Down
10 changes: 5 additions & 5 deletions cvat/apps/dataset_manager/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from cvat.apps.engine import models, serializers
from cvat.apps.engine.plugins import plugin_decorator
from cvat.apps.engine.log import DatasetLogManager
from cvat.apps.engine.utils import chunked_list
from cvat.apps.engine.utils import take_by
from cvat.apps.events.handlers import handle_annotations_change
from cvat.apps.profiler import silk_profile

Expand Down Expand Up @@ -84,7 +84,7 @@ def merge_table_rows(rows, keys_for_merge, field_id):

class JobAnnotation:
@classmethod
def add_prefetch_info(cls, queryset: QuerySet, prefetch_images: bool = True):
def add_prefetch_info(cls, queryset: QuerySet[models.Job], prefetch_images: bool = True) -> QuerySet[models.Job]:
assert issubclass(queryset.model, models.Job)

label_qs = add_prefetch_fields(models.Label.objects.all(), [
Expand Down Expand Up @@ -530,13 +530,13 @@ def _delete(self, data=None):
self.ir_data.shapes = data['shapes']
self.ir_data.tracks = data['tracks']

for labeledimage_ids_chunk in chunked_list(labeledimage_ids, chunk_size=1000):
for labeledimage_ids_chunk in take_by(labeledimage_ids, chunk_size=1000):
self._delete_job_labeledimages(labeledimage_ids_chunk)

for labeledshape_ids_chunk in chunked_list(labeledshape_ids, chunk_size=1000):
for labeledshape_ids_chunk in take_by(labeledshape_ids, chunk_size=1000):
self._delete_job_labeledshapes(labeledshape_ids_chunk)

for labeledtrack_ids_chunk in chunked_list(labeledtrack_ids, chunk_size=1000):
for labeledtrack_ids_chunk in take_by(labeledtrack_ids, chunk_size=1000):
self._delete_job_labeledtracks(labeledtrack_ids_chunk)

deleted_data = {
Expand Down
68 changes: 57 additions & 11 deletions cvat/apps/engine/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@
import PIL.Image
import PIL.ImageOps
import rq
from rq.job import JobStatus as RQJobStatus
from django.conf import settings
from django.core.cache import caches
from django.db import models as django_models
from django.utils import timezone as django_tz
from redis.exceptions import LockError
from rest_framework.exceptions import NotFound, ValidationError
from rq.job import JobStatus as RQJobStatus

from cvat.apps.engine import models
from cvat.apps.engine.cloud_provider import (
Expand All @@ -65,7 +65,12 @@
load_image,
)
from cvat.apps.engine.rq_job_handler import RQJobMetaField
from cvat.apps.engine.utils import CvatChunkTimestampMismatchError, get_rq_lock_for_job, md5_hash
from cvat.apps.engine.utils import (
CvatChunkTimestampMismatchError,
format_list,
get_rq_lock_for_job,
md5_hash,
)
from utils.dataset_manifest import ImageManifestManager

slogger = ServerLogManager(__name__)
Expand All @@ -91,7 +96,8 @@ def enqueue_create_chunk_job(
# Enqueue the job if the chunk was deleted but the RQ job still exists.
# This can happen in cases involving jobs with honeypots and
# if the job wasn't collected by the requesting process for any reason.
rq_job.get_status(refresh=False) in {RQJobStatus.FINISHED, RQJobStatus.FAILED, RQJobStatus.CANCELED}
rq_job.get_status(refresh=False)
in {RQJobStatus.FINISHED, RQJobStatus.FAILED, RQJobStatus.CANCELED}
):
rq_job = queue.enqueue(
create_callback,
Expand Down Expand Up @@ -275,11 +281,12 @@ def _create_cache_item(
return item

def _delete_cache_item(self, key: str):
try:
self._cache().delete(key)
slogger.glob.info(f"Removed chunk from the cache: key {key}")
except pickle.UnpicklingError:
slogger.glob.error(f"Failed to remove item from the cache: key {key}", exc_info=True)
self._cache().delete(key)
slogger.glob.info(f"Removed the cache key {key}")

def _bulk_delete_cache_items(self, keys: Sequence[str]):
self._cache().delete_many(keys)
slogger.glob.info(f"Removed the cache keys {format_list(keys)}")

def _get_cache_item(self, key: str) -> Optional[_CacheItem]:
try:
Expand Down Expand Up @@ -350,8 +357,8 @@ def _make_segment_task_chunk_key(
) -> str:
return f"{self._make_cache_key_prefix(db_obj)}_task_chunk_{chunk_number}_{quality}"

def _make_context_image_preview_key(self, db_data: models.Data, frame_number: int) -> str:
return f"context_image_{db_data.id}_{frame_number}_preview"
def _make_frame_context_images_chunk_key(self, db_data: models.Data, frame_number: int) -> str:
return f"context_images_{db_data.id}_{frame_number}"

@overload
def _to_data_with_mime(self, cache_item: _CacheItem) -> DataWithMime: ...
Expand Down Expand Up @@ -474,6 +481,45 @@ def remove_segment_chunk(
self._make_chunk_key(db_segment, chunk_number=chunk_number, quality=quality)
)

def remove_context_images_chunk(self, db_data: models.Data, frame_number: str) -> None:
self._delete_cache_item(
self._make_frame_context_images_chunk_key(db_data, frame_number=frame_number)
)

def remove_segments_chunks(self, params: Sequence[dict[str, Any]]) -> None:
"""
Removes several segment chunks from the cache.
The function expects a sequence of remove_segment_chunk() parameters as dicts.
"""
# TODO: add a version of this function
# that removes related cache elements as well (context images, previews, ...)
# to provide encapsulation

# TODO: add a generic bulk cleanup function for different objects, including related ones
# (likely a bulk key aggregator should be used inside to reduce requests count)

keys_to_remove = []
for item_params in params:
db_obj = item_params.pop("db_segment")
keys_to_remove.append(self._make_chunk_key(db_obj, **item_params))

self._bulk_delete_cache_items(keys_to_remove)

def remove_context_images_chunks(self, params: Sequence[dict[str, Any]]) -> None:
"""
Removes several context image chunks from the cache.
The function expects a sequence of remove_context_images_chunk() parameters as dicts.
"""

keys_to_remove = []
for item_params in params:
db_obj = item_params.pop("db_data")
keys_to_remove.append(self._make_frame_context_images_chunk_key(db_obj, **item_params))

self._bulk_delete_cache_items(keys_to_remove)

def get_cloud_preview(self, db_storage: models.CloudStorage) -> Optional[DataWithMime]:
return self._to_data_with_mime(self._get_cache_item(self._make_preview_key(db_storage)))

Expand All @@ -494,7 +540,7 @@ def get_or_set_frame_context_images_chunk(
) -> DataWithMime:
return self._to_data_with_mime(
self._get_or_set_cache_item(
self._make_context_image_preview_key(db_data, frame_number),
self._make_frame_context_images_chunk_key(db_data, frame_number),
Callback(
callable=self.prepare_context_images_chunk,
args=[db_data, frame_number],
Expand Down
5 changes: 2 additions & 3 deletions cvat/apps/engine/cloud_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
from botocore.client import Config
from botocore.exceptions import ClientError
from botocore.handlers import disable_signing
from datumaro.util import take_by # can be changed to itertools.batched after migration to python3.12
from django.conf import settings
from google.cloud import storage
from google.cloud.exceptions import Forbidden as GoogleCloudForbidden
Expand All @@ -32,7 +31,7 @@

from cvat.apps.engine.log import ServerLogManager
from cvat.apps.engine.models import CloudProviderChoice, CredentialsTypeChoice
from cvat.apps.engine.utils import get_cpu_number
from cvat.apps.engine.utils import get_cpu_number, take_by
from cvat.utils.http import PROXIES_FOR_UNTRUSTED_URLS

class NamedBytesIO(BytesIO):
Expand Down Expand Up @@ -242,7 +241,7 @@ def bulk_download_to_memory(
threads_number = normalize_threads_number(threads_number, len(files))

with ThreadPoolExecutor(max_workers=threads_number) as executor:
for batch_links in take_by(files, count=threads_number):
for batch_links in take_by(files, chunk_size=threads_number):
yield from executor.map(func, batch_links)

def bulk_download_to_dir(
Expand Down
2 changes: 1 addition & 1 deletion cvat/apps/engine/frame_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@
import av
import cv2
import numpy as np
from datumaro.util import take_by
from django.conf import settings
from PIL import Image
from rest_framework.exceptions import ValidationError
Expand All @@ -46,6 +45,7 @@
ZipReader,
)
from cvat.apps.engine.mime_types import mimetypes
from cvat.apps.engine.utils import take_by

_T = TypeVar("_T")

Expand Down
35 changes: 32 additions & 3 deletions cvat/apps/engine/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import uuid
from enum import Enum
from functools import cached_property
from typing import Any, ClassVar, Collection, Dict, Optional
from typing import Any, ClassVar, Collection, Dict, Optional, Sequence

from django.conf import settings
from django.contrib.auth.models import User
Expand All @@ -27,7 +27,7 @@

from cvat.apps.engine.lazy_list import LazyList
from cvat.apps.engine.model_utils import MaybeUndefined
from cvat.apps.engine.utils import chunked_list, parse_specific_attributes
from cvat.apps.engine.utils import parse_specific_attributes, take_by
from cvat.apps.events.utils import cache_deleted


Expand Down Expand Up @@ -276,6 +276,11 @@ class ValidationLayout(models.Model):
disabled_frames = IntArrayField(store_sorted=True, unique_values=True)
"Stores task frame numbers of the disabled (deleted) validation frames"

@property
def active_frames(self) -> Sequence[int]:
"An ordered sequence of active (non-disabled) validation frames"
return set(self.frames).difference(self.disabled_frames)

class Data(models.Model):
MANIFEST_FILENAME: ClassVar[str] = 'manifest.jsonl'

Expand Down Expand Up @@ -426,7 +431,7 @@ def touch(self) -> None:

@transaction.atomic(savepoint=False)
def clear_annotations_in_jobs(job_ids):
for job_ids_chunk in chunked_list(job_ids, chunk_size=1000):
for job_ids_chunk in take_by(job_ids, chunk_size=1000):
TrackedShapeAttributeVal.objects.filter(shape__track__job_id__in=job_ids_chunk).delete()
TrackedShape.objects.filter(track__job_id__in=job_ids_chunk).delete()
LabeledTrackAttributeVal.objects.filter(track__job_id__in=job_ids_chunk).delete()
Expand All @@ -436,6 +441,30 @@ def clear_annotations_in_jobs(job_ids):
LabeledImageAttributeVal.objects.filter(image__job_id__in=job_ids_chunk).delete()
LabeledImage.objects.filter(job_id__in=job_ids_chunk).delete()

@transaction.atomic(savepoint=False)
def clear_annotations_on_frames_in_honeypot_task(db_task: Task, frames: Sequence[int]):
if db_task.data.validation_mode != ValidationMode.GT_POOL:
# Tracks are prohibited in honeypot tasks
raise AssertionError

for frames_batch in take_by(frames, chunk_size=1000):
LabeledShapeAttributeVal.objects.filter(
shape__job_id__segment__task_id=db_task.id,
shape__frame__in=frames_batch,
).delete()
LabeledShape.objects.filter(
job_id__segment__task_id=db_task.id,
frame__in=frames_batch,
).delete()
LabeledImageAttributeVal.objects.filter(
image__job_id__segment__task_id=db_task.id,
image__frame__in=frames_batch,
).delete()
LabeledImage.objects.filter(
job_id__segment__task_id=db_task.id,
frame__in=frames_batch,
).delete()

class Project(TimestampedModel):
name = SafeCharField(max_length=256)
owner = models.ForeignKey(User, null=True, blank=True,
Expand Down
Loading

0 comments on commit b2a8c1b

Please sign in to comment.