-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to create task with custom jobs #5536
Conversation
@@ -153,7 +153,7 @@ def __init__(self, | |||
if not source_path: | |||
raise Exception('No image found') | |||
|
|||
if stop is None: | |||
if not stop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you changed the condition? What is the reason to touch the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we import backup, there is the call to reconcile
, which chains to the c-tor. The stop_frame
is set to 0 after the default value of the DataSerializer
here
In the reconcile
args, stop_frame=0
makes little sense on its own, but it produces a dataset with only 1 frame.
The possible fixes were:
- clean input values to the
_create_thread
function when importing backups - replace the input value with
None
or remove it when importing backups (which contradicts the serializer) - change the default value in the serializer (which refers to the model)
I've decided to make it behave as None
, because 0 is the default value.
class JobFiles(serializers.ListField): | ||
""" | ||
Read JobFileMapping docs for more info. | ||
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
kwargs.setdefault('child', serializers.CharField(allow_blank=False, max_length=1024)) | ||
kwargs.setdefault('allow_empty', False) | ||
super().__init__(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a distinct class? Seems like it could be reduced to serializers.ListField(child=..., allow_empty=False)
and inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be inlined. The class is introduced to encapsulate the logic behind parameters. I'd do it to most of the other complex fields too, but it's quite a big change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is introduced to encapsulate the logic behind parameters.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for using classes to know the internal structure or construction details of this class.
cvat/apps/engine/task.py
Outdated
if start_frame < 0: | ||
raise ValidationError( | ||
f"Failed to create segment: invalid start frame {start_frame}" | ||
) | ||
|
||
slogger.glob.info("New segment for task #{}: start_frame = {}, \ | ||
stop_frame = {}".format(db_task.id, start_frame, stop_frame)) | ||
if stop_frame >= db_task.data.size: | ||
raise ValidationError( | ||
f"Failed to create segment: stop frame {stop_frame} is beyond task size" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can either of these be possible? Shouldn't _get_task_segment_data
always return valid segments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should, but it's not a question for this function. Hopefully, they should never fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they're never supposed to fire, shouldn't they be asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can be asserts, currently. If new job creation methods appear, the checks may become more actual.
Upd: I've decided to remove the checks, because these conditions are checked in the tests and they are required mostly for development.
cvat/apps/engine/models.py
Outdated
@@ -205,6 +205,12 @@ class Data(models.Model): | |||
sorting_method = models.CharField(max_length=15, choices=SortingMethod.choices(), default=SortingMethod.LEXICOGRAPHICAL) | |||
deleted_frames = IntArrayField(store_sorted=True, unique_values=True) | |||
|
|||
# Avoid storing whole mapping here, its redundant | |||
custom_segments = models.BooleanField(default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be determined by checking if segment_size
is 0 on the task? Maybe we don't need this new field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought that this variable can be excluded. Basically, we'd need to check the segment size and the number of jobs != 1. However, it turned out that in the case of 1 job in the task, the cases are hard to distinguish. I've decided then to add a clear indicator instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have non-zero segment_size
when there's no custom mapping? I thought segment_size
defaulted to the number of frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have non-zero segment_size when there's no custom mapping?
Yes, it represents limitations on the size of a single job. The comment says about this field:
Zero means that there are no limits (default)
@zhiltsov-max , #4869 (related issue) |
if self._db_task.mode == 'annotation': | ||
files: Iterable[models.Image] = self._db_data.images.all().order_by('frame') | ||
segment_files = files[db_segment.start_frame : db_segment.stop_frame + 1] | ||
return {'files': list(frame.path for frame in segment_files)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant information? The backups already have a manifest, which records the frames in order; given that and the segment boundaries, you should be able to reconstruct the list of files in each segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backups have such manifest only in some cases. Also, it will be simpler to extend this parameter in future if it is saved separately. For instance, if jobs can contain arbitrary files or have overlaps.
@zhiltsov-max , could you please resolve conflicts? |
@@ -387,11 +474,17 @@ def _create_thread(db_task, data, isBackupRestore=False, isDatasetImport=False): | |||
media = _count_files(data) | |||
media, task_mode = _validate_data(media, manifest_files) | |||
|
|||
if job_file_mapping is not None and task_mode != 'annotation': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a part of _validate_job_file_mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task mode is unknown prior to the previous line. The _validate_job_file_mapping
call can be moved later, but it looks like it's better to fail as soon as possible in case of invalid parameters.
manifest = ImageManifestManager(db_data.get_manifest_path()) | ||
manifest.set_index() | ||
# Sort the files | ||
if (isBackupRestore and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it is time to move the code into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to do it in another PR, because it will simplify life in case of merge conflicts and failing tests.
Our conclusion after the internal discussion:
|
This PR adds an option to specify file to job mapping explicitly during task creation. This option is incompatible with most other job-related parameters like `sorting_method` and `frame_step`. - Added a new task creation parameter (`job_file_mapping`) to set a custom file to job mapping during task creation
Motivation and context
This PR adds an option to specify file to job mapping explicitly during task creation. This option is incompatible with most other job-related parameters like
sorting_method
andframe_step
.job_file_mapping
) to set a custom file to job mapping during task creationHow has this been tested?
Unit tests, manually
Checklist
develop
branchcvat-core, cvat-data and cvat-ui)
License
Feel free to contact the maintainers if that's a concern.