Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support frame selection when create from video #437

Merged
merged 1 commit into from
May 30, 2019

Conversation

zliang7
Copy link
Contributor

@zliang7 zliang7 commented Apr 30, 2019

@nmanovic PTAL

@nmanovic
Copy link
Contributor

@zliang7 , thank you for the merge request. I hope today we will merge updated documentation. After that I will create a new release for CVAT. After that I will review and merge your PR. Please expect delays because we have a lot of public holidays in Russia. I will try to review the patch ASAP in any case.

@nmanovic nmanovic added this to the 0.5.0 - Alpha milestone Apr 30, 2019
@nmanovic nmanovic requested review from bsekachev and nmanovic April 30, 2019 12:36
@nmanovic
Copy link
Contributor

nmanovic commented May 5, 2019

@zliang7 ,

Could you please look at comments from Boris Sekachev? My comments below:

  • First of all I don't like that we have start_frame, stop_frame, step in several places. Video and Task tables. I will prefer to keep them only in Task table.

  • Instead of a step I will prefer to have more common concept: frame_filter. For now you will implement "step" filter only (e.g. step=1). In the future we can add a filter by name, a filter which selects only very different frames, visual search like "frames which contain a car" (if we have a good classifier).

  • For me a video file it is a collection of images like a directory with images. This is why I don't understand why you implement the concept only for a video file. Imagine that you have a big directory on "share". Instead of choosing every 15th frame you can just say I want to annotate images with step 15.

@zliang7
Copy link
Contributor Author

zliang7 commented May 6, 2019

@zliang7 ,

Could you please look at comments from Boris Sekachev? My comments below:

  • First of all I don't like that we have start_frame, stop_frame, step in several places. Video and Task tables. I will prefer to keep them only in Task table.

I've tried to reuse the fields in Video table. But it seems every parameter in POST request needs a field in task table. If I didn't add these fields, the request failed.

  • Instead of a step I will prefer to have more common concept: frame_filter. For now you will implement "step" filter only (e.g. step=1). In the future we can add a filter by name, a filter which selects only very different frames, visual search like "frames which contain a car" (if we have a good classifier).

This PR is to implement the 'step' feature. You can see the generation of the command line of ffmepg, nothing else. I think supporting sophisticated filter is not an easy task, especially "frames which contain a car". How to express it on UI and internal data structure? I think this needs a serious design.

  • For me a video file it is a collection of images like a directory with images. This is why I don't understand why you implement the concept only for a video file. Imagine that you have a big directory on "share". Instead of choosing every 15th frame you can just say I want to annotate images with step 15.

I'm not quite understand. A video is a file contains many continuous images. It make sense to pick necessary images. But image files already are files, so why put those unnecessary image files into "share"? Many file utilities can do the task. I'm OK if you do think it make sense.

@nmanovic
Copy link
Contributor

nmanovic commented May 6, 2019

@zliang7 ,

I've tried to reuse the fields in Video table. But it seems every parameter in POST request needs a field in task table. If I didn't add these fields, the request failed.

I don't understand the statement. We don't need to have duplicated fields. Please keep them in Task table only and remove from Video table.

This PR is to implement the 'step' feature. You can see the generation of the command line of ffmepg, nothing else. I think supporting sophisticated filter is not an easy task, especially "frames which contain a car". How to express it on UI and internal data structure? I think this needs a serious design.

In UI now it should be just a string. For now you need to implement step=N filter. It is not big redesign of your patch. Internally in DB it should be also a string for now. When a task is created it will be used to filter provided frames or images. In the future we can provide other filter values like regex=.*\.jpg or glob=*.jpg or similarity=0.1 or detector=car,person or combination of these filters. I hope you understand that the feature has very good potential. One of fundumental problems in deep learning is to avoid annotate redundant data (which will only increase your dataset but will not provide any value for your model). I don't ask you to implement the feature. I just ask you to make a little step forward. I agree that in its final version your code will be changed but at least a couple of next steps will be easy to implement (like mention filters).

I'm not quite understand. A video is a file contains many continuous images. It make sense to pick necessary images. But image files already are files, so why put those unnecessary image files into "share"? Many file utilities can do the task. I'm OK if you do think it make sense.

I can say that for a video file there are a lot of utilities which can convert them to a number of frames with a specific step (like ffmpeg). But we want to implement the functionality because our customers don't want to do that manually. The same for images. Of course you can manually filter them, create another directory, put all matched files into the new directory but again it is a lot of manual work. I believe in unification. If we have a feature A which is applicable for a video file and it makes sense for images as well (especially if you think about filters) I don't understand why we should implement the feature only for a video file.

@zliang7
Copy link
Contributor Author

zliang7 commented May 9, 2019

@nmanovic I removed the 'start_frame', 'stop_frame' and 'step' fields from video table, and changed 'step' in task table to 'frame_filter' which accept 'step=N' format parameter.
Regarding of applying to image files, I can't work out it in short time, because I will travel abroad next Monday. Since it doesn't affect the functionality, can we merge this PR firstly if no other problem? I will submit another PR for the 3rd problem.

cvat/apps/engine/task.py Outdated Show resolved Hide resolved
cvat/apps/engine/task.py Outdated Show resolved Hide resolved
cvat/apps/engine/task.py Outdated Show resolved Hide resolved
@nmanovic
Copy link
Contributor

@zliang7 ,

Again, thanks for your patience. The patch looks much better. I have found a couple of problems which should be fixed before the merge. It is normal because CVAT is quite complex software and it is very difficult for a new developer to find a correct way to implement a feature. This is why I'm here and try to advice.

I don't really think that we need to hurry. I'm OK to commit the functionality in two steps but I don't see any reasons for that. I believe it is very easy to extend your patch on the image case. Should be a couple of lines of code.

@zliang7 zliang7 force-pushed the develop branch 2 times, most recently from 1885922 to 73cb93f Compare May 21, 2019 13:44
@zliang7
Copy link
Contributor Author

zliang7 commented May 21, 2019

@nmanovic I'm back and I updated the PR as per your comments.

@nmanovic
Copy link
Contributor

@nmanovic I'm back and I updated the PR as per your comments.

@zliang7 , I cannot see modifications in dump function. Could you please attach an example of CVAT XML file for a video file with filter="step=2"?

@nmanovic
Copy link
Contributor

@zliang7 , need to add information about the feature into CHANGELOG.md as well.

@zliang7
Copy link
Contributor Author

zliang7 commented May 27, 2019

@nmanovic I'm back and I updated the PR as per your comments.

@zliang7 , I cannot see modifications in dump function. Could you please attach an example of CVAT XML file for a video file with filter="step=2"?

The change is in apps/engine/annotation.py. Here is an example of "Dump Annotation":

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.1</version>
  <meta>
    <task>
      <id>53</id>
      <name>video0</name>
      <size>7</size>
      <mode>interpolation</mode>
      <overlap>3</overlap>
      <bugtracker></bugtracker>
      <flipped>False</flipped>
      <created>2019-05-22 16:38:45.545441+03:00</created>
      <updated>2019-05-22 16:38:45.545482+03:00</updated>
      <start_frame>0</start_frame>
      <stop_frame>1379</stop_frame>
      <frame_filter>step = 200</frame_filter>
      <labels>
        <label>
          <name>label0</name>
          <attributes>
          </attributes>
        </label>
      </labels>
      <segments>
        <segment>
          <id>19</id>
          <start>0</start>
          <stop>6</stop>
          <url>http://127.0.0.1:8080/?id=19</url>
        </segment>
      </segments>
      <owner>
        <username>django</username>
        <email></email>
      </owner>
      <original_size>
        <width>720</width>
        <height>720</height>
      </original_size>
    </task>
    <dumped>2019-05-28 11:21:07.095873+03:00</dumped>
  </meta>
</annotations>

@zliang7
Copy link
Contributor Author

zliang7 commented May 27, 2019

@zliang7 , need to add information about the feature into CHANGELOG.md as well.

Updated.

@nmanovic
Copy link
Contributor

@zliang7 , thanks! It is a great patch. I hope we have to do only one final step. I asked you provide me an example of a dump file to be sure that you dumped the following frames: 0, 200, 400, .... Could you please confirm that? Could you please annotate a couple of frames and provide the dump file again? I just cannot see the code in your patch. As we discussed we need to dump right frames number instead of 0, 1, 2, ... in this case.

@zliang7
Copy link
Contributor Author

zliang7 commented May 30, 2019

@zliang7 , thanks! It is a great patch. I hope we have to do only one final step. I asked you provide me an example of a dump file to be sure that you dumped the following frames: 0, 200, 400, .... Could you please confirm that? Could you please annotate a couple of frames and provide the dump file again? I just cannot see the code in your patch. As we discussed we need to dump right frames number instead of 0, 1, 2, ... in this case.

Updated the PR. A dump file:

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.1</version>
  <meta>
    <task>
      <id>55</id>
      <name>video1</name>
      <size>6</size>
      <mode>interpolation</mode>
      <overlap>3</overlap>
      <bugtracker></bugtracker>
      <flipped>False</flipped>
      <created>2019-05-28 17:13:46.883186+03:00</created>
      <updated>2019-05-31 15:58:04.568319+03:00</updated>
      <start_frame>300</start_frame>
      <stop_frame>1300</stop_frame>
      <frame_filter>step=200</frame_filter>
      <labels>
        <label>
          <name>label1</name>
          <attributes>
          </attributes>
        </label>
      </labels>
      <segments>
        <segment>
          <id>21</id>
          <start>0</start>
          <stop>5</stop>
          <url>http://127.0.0.1:8080/?id=21</url>
        </segment>
      </segments>
      <owner>
        <username>django</username>
        <email></email>
      </owner>
      <original_size>
        <width>720</width>
        <height>720</height>
      </original_size>
    </task>
    <dumped>2019-05-31 16:38:54.818640+03:00</dumped>
  </meta>
  <track id="0" label="label1">
    <box frame="300" outside="0" occluded="0" keyframe="1" xtl="186.50" ytl="19.30" xbr="342.64" ybr="258.78">
    </box>
    <box frame="500" outside="0" occluded="0" keyframe="1" xtl="81.20" ytl="53.51" xbr="260.20" ybr="290.40">
    </box>
    <box frame="700" outside="0" occluded="0" keyframe="1" xtl="149.62" ytl="132.46" xbr="328.62" ybr="381.60">
    </box>
    <box frame="900" outside="0" occluded="0" keyframe="1" xtl="180.32" ytl="165.80" xbr="356.70" ybr="401.80">
    </box>
    <box frame="1100" outside="0" occluded="0" keyframe="0" xtl="180.32" ytl="165.80" xbr="356.70" ybr="401.80">
    </box>
    <box frame="1300" outside="0" occluded="0" keyframe="0" xtl="180.32" ytl="165.80" xbr="356.70" ybr="401.80">
    </box>
  </track>
</annotations>

Copy link
Contributor

@nmanovic nmanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zliang7 , It looks good to me. Thanks for the contribution. It was first and it was difficult but the final result is awesome. I was able to create a task with step 10, annotate it and load the annotation into a task with step 1. It works as expected (at least I was not able to see artefacts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants