Validation API core #8348

zhiltsov-max · 2024-08-26T16:02:23Z

Motivation and context

Depends on #8272
Depends on #8321

Added server API for creation of a GT job on task creation
Added server support for task creation with GT pool (aka Honeypot)
Added new GT job frame selection method random_per_job, which guarantees each annotation job gets the specified GT overlap, making each annotation job validatable
Added new GT job frame count selection options based on task size % and segment size %
Changed GT job creation parameter "frames" to accept relative frame ids instead of absolute (source data) ones
Allowed frame deletion in GT jobs. Deleted GT frames are considered excluded from validation, so should not appear in quality reports. Frame removal from a simple GT job (in tasks without honeypots) doesn't remove task frames, only the GT job frames.

Server API changes:

GET /api/tasks/{id}/ got a new validation_mode field, reflecting the current validation configuration (immutable)
POST /api/tasks/{id}/data got a new validation_params field, which allow to enable GT / GT_POOL validation for a task on its creation

Tasks with Honeypots

This validation mode affects task creation, so can only be used in task creation. It cannot be disabled or changed after the task is created. When honeypots are configured, each job in the task gets several extra validation frames.
The pool of available frames and the number of validation frames per job are specified by the user at task creation.

Limitations:

This validation mode can only be used with random frame ordering.
Inherently, this assumes that job_frame_mapping and overlap cannot be used in such tasks.
Track annotations are prohibited in tasks with honeypots enabled.

Honeypot frames and GT annotations are accessible via the GT job, as in the case with regular GT jobs. However, unlike regular tasks with GT jobs, task annotation import affects the GT job as well in tasks with honeypots. Task annotation export contains only GT annotations on validation frames (so, only the GT copy of validation frames is included).

How has this been tested?

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Introduced a server setting to disable media chunks on the local filesystem, enhancing configurability.
- Added tracking for the last assignee update date in quality reports, improving task management.
- Enhanced job chunk identifiers for better clarity and uniqueness.
Bug Fixes
- Resolved memory management issues and refined job assignment logic in video processing.
Documentation
- Updated API schema with new enhancements related to job management and validation processes.
Chores
- Updated package dependencies and added new configuration settings for Redis in the Helm chart.

…matches

cvat-ui/src/actions/tasks-actions.ts

cvat-ui/src/components/create-task-page/create-task-content.tsx

cvat-ui/src/components/quality-control/task-quality/allocation-table.tsx

bsekachev · 2024-10-01T07:44:19Z

cvat/apps/engine/models.py

+    )
+    path = models.CharField(max_length=1024, default='')
+
+class ValidationLayout(models.Model):


What is the reason to bind ValidationParams and ValidationLayout to Data instead of Task model?

May you explain the name ValidationLayout? I feel is something like ValidationPool

Layout:

Merriam-Webster

the plan or design or arrangement of something laid out

Wiki

In general terms, a layout is a structured arrangement of items within certain limits, or a plan for such arrangement. Specifically, layout may refer to: Page layout, the arrangement of visual elements on a page.

It's used to describe validation frames in tasks, both for simple GT and for Honeypots. That's why it doesn't have pool in the name.

What is the reason to bind ValidationParams and ValidationLayout to Data instead of Task model?

This is made to be the same as storing deleted_frames in the Data model. Basically, it describes task data.

Layout sounds like some set of elements and their relation to each other.
But in our case is just couple of sets. Pool would sound good to describe and it is applicable in general for both GT job and Honeypot job.

Hovewer if you do not want to use the word Pool -> it is okay, up to you.

This is made to be the same as storing deleted_frames in the Data model.

I can't really understand the explanation. Hovewer in the future this design may be a problem if we want to use the same Data object to create multiple tasks (this is not a fact that we will do this, but anyway).

Never mind, considering existing database layout feature like: import raw data, select them and create tasks based on them already not implementable without new database classes.

bsekachev · 2024-10-01T08:01:28Z

cvat/apps/engine/migrations/0084_honeypot_support.py

+    elif db_segment.type == "specific_frames":
+        frame_set = set(frame_range).intersection(db_segment.frames or [])
+    else:
+        raise ValueError(f"Unknown segment type: {db_segment.type}")


I am not sure that raising uncaught exception is good in migration file

On our prod we only have specific_frames defined, so, it will not be a problem

The reason is to fail the migration, if the DB contains invalid entries. We don't know why they are there and what to do with them.

bsekachev · 2024-10-01T11:17:42Z

can't create a task with honeypot job and context images:

Traceback (most recent call last):
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
    rv = job.perform()
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
    self._result = self._execute()
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/cvat/cvat/apps/engine/task.py", line 1347, in _create_thread
    models.RelatedFile.objects.bulk_create(db_related_files)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 803, in bulk_create
    returned_columns = self._batched_insert(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 1831, in _batched_insert
    self._insert(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 102, in execute
    return super().execute(sql, params)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/bsekachev/app.cvat.ai/cvat_enterprise/.env/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "engine_relatedfile_data_id_path_a7223d1e_uniq"
DETAIL:  Key (data_id, path)=(5, /home/bsekachev/app.cvat.ai/cvat_enterprise/cvat/data/data/5/raw/context_images example/related_images/3Z2A3692_jpg/3Z2A3692.jpg) already exists.

zhiltsov-max · 2024-10-01T16:41:49Z

@bsekachev

can't create a task with honeypot job and context images:

Should be fixed now.

…dation-core

sonarcloud · 2024-10-02T18:25:06Z

Quality Gate passed

Issues
33 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.9% Duplication on New Code

See analysis details on SonarCloud

zhiltsov-max added 30 commits August 5, 2024 16:49

Remove the checksum field

8d710e7

Be consistent about returned task chunk types (allow video chunks)

654a827

Support iterator input in video chunk writing

12e5f2a

Fix type annotation

a79a681

Refactor video reader memory leak fix, add to reader with manifest

d5118a2

Disable threading in video reading in frame provider

1b429cf

Fix keyframe search

d512312

Return frames as generator in dynamic chunk creation

167ee12

Update chunk requests in UI

88a9cb2

Update cache indices in FrameDecoder, enable video play

30bf8fd

Fix frame retrieval for video

ee3c905

Fix frame reading in updated dynamic cache building

dc03220

Fix invalid frame quality

4bb8a74

Fix video reading in media_extractors - exception handling, frame mis…

f7d2c4c

…matches

Allow disabling static chunks, add seamless switching

34d9ca0

Extend code formatting

8c97967

Rename function argument

a0fd0ba

Rename configuration parameter

c0480c9

Add av version comment

5caf283

Refactor av video reading

efbe3a0

Fix manifest access

fb1284d

Add migration

8edcfc5

Update downloading from cloud storage for packed data in task creation

51a7f83

Merge branch 'develop' into zm/job-chunks

5a2a746

Update changelog

65e4174

Merge remote-tracking branch 'origin/zm/job-chunks' into zm/job-chunks

61f1735

Update migration name

34f972f

Polish some code

2bb2b17

Fix frame retrieval by id

3788917

Remove extra import

f695ae1

zhiltsov-max added 3 commits September 30, 2024 18:01

Remove extra db call

23bfc2c

Improve error message

d4bc318

Add field description in the api

57f1d71

bsekachev reviewed Oct 1, 2024

View reviewed changes

zhiltsov-max added 6 commits October 1, 2024 15:34

Merge remote-tracking branch 'origin/develop' into zm/validation-core

01cd715

Merge test db with develop

1e8433c

Update test assets

89084d7

Migrate to m2m relationship for related files

575c921

Update test db

4c8eb44

Clean up imports

565207d

zhiltsov-max and others added 2 commits October 1, 2024 23:42

Fix failing test

079038a

Updated client part

2a674af

zhiltsov-max mentioned this pull request Oct 2, 2024

Use exceptions instead of HTML responses with error statuses #8499

Closed

3 tasks

zhiltsov-max and others added 2 commits October 2, 2024 12:18

Add related field declaration in Image and Data models

9d9b007

Fixed warning

d71b5df

bsekachev approved these changes Oct 2, 2024

View reviewed changes

zhiltsov-max added 6 commits October 2, 2024 15:01

Improve type annotations for engine models

fd6c1cf

Fix random_seed for honeypots

836012e

Merge remote-tracking branch 'origin/zm/validation-core' into zm/vali…

07fd912

…dation-core

Fix honeypot frame selection

3dbd769

Update tests

47ff330

Merge branch 'develop' into zm/validation-core

45e1b4b

bsekachev merged commit 1285858 into develop Oct 3, 2024
34 checks passed

zhiltsov-max mentioned this pull request Oct 4, 2024

Fix task creation with gt job and gt job frame access #8510

Merged

7 tasks

cvat-bot bot mentioned this pull request Oct 10, 2024

Release v2.21.0 #8527

Merged

bsekachev deleted the zm/validation-core branch October 24, 2024 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation API core #8348

Validation API core #8348

zhiltsov-max commented Aug 26, 2024 •

edited

Loading

bsekachev Oct 1, 2024

bsekachev Oct 1, 2024

zhiltsov-max Oct 1, 2024 •

edited

Loading

bsekachev Oct 1, 2024

bsekachev Oct 1, 2024

bsekachev Oct 1, 2024

bsekachev Oct 1, 2024

bsekachev Oct 1, 2024

zhiltsov-max Oct 1, 2024

bsekachev commented Oct 1, 2024

zhiltsov-max commented Oct 1, 2024

sonarcloud bot commented Oct 2, 2024

Validation API core #8348

Validation API core #8348

Conversation

zhiltsov-max commented Aug 26, 2024 • edited Loading

Motivation and context

Tasks with Honeypots

How has this been tested?

Checklist

License

Summary by CodeRabbit

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiltsov-max Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsekachev commented Oct 1, 2024

zhiltsov-max commented Oct 1, 2024

sonarcloud bot commented Oct 2, 2024

Quality Gate passed

zhiltsov-max commented Aug 26, 2024 •

edited

Loading

zhiltsov-max Oct 1, 2024 •

edited

Loading