Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting project with duplicate image names incorrect #8076

Open
2 tasks done
alexyao2015 opened this issue Jun 25, 2024 · 14 comments
Open
2 tasks done

Exporting project with duplicate image names incorrect #8076

alexyao2015 opened this issue Jun 25, 2024 · 14 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@alexyao2015
Copy link

alexyao2015 commented Jun 25, 2024

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

  1. Create a job with image names as follows image_1.jpg, image_2.jpg, image_3.jpg, image_4.jpg, etc.
  2. Create many jobs in the same project with image name of image.jpg. I actually used the python sdk like follows to create this.
client.tasks.create(
    cast(
        TaskWriteRequest,
        TaskWriteRequest("task", project_id=1),
    )
)
task.upload_data(
    resource_type=ResourceType.LOCAL,
    resources=[str(img_path.absolute())],
    params={
        "image_quality": 85,
    },
    wait_for_completion=True,
)
  1. Attempt exporting the project with save images checked. Notice that the images in the first job with names image_1.jpg, etc are overwritten by the images of the other jobs.

I believe CVAT attempts to rename the other jobs conflicting names by adding _1, _2, etc. ,but it doesn't account for those names existing in other jobs or in the current export dataset.

Expected Behavior

Images should not be overridden by images in other jobs when exporting a project

Possible Solution

No response

Context

No response

Environment

No response

@alexyao2015 alexyao2015 added the bug Something isn't working label Jun 25, 2024
@zhiltsov-max zhiltsov-max added the good first issue Good for newcomers label Jul 17, 2024
@BarryByte
Copy link

I have an approach to solve this issue, we can edit the renaming mechanism by the following ways:

  1. Use unique identifiers (e.g., UUIDs, timestamps) to ensure no two images end up with the same name.
  2. include the job or task ID

These are some feasible reasons, please check and confirm @zhiltsov-max @alexyao2015

@alexyao2015
Copy link
Author

The second option seems like it would work as a simple fix. Alternatively, there could be a check to see if the file exists already in the export and append something else to the filename until it no longer conflicts.

@BarryByte
Copy link

Yeah that's right, should i work on this issue @alexyao2015 ?

@alexyao2015
Copy link
Author

That would be great. Please go ahead.

@BarryByte
Copy link

Assign me this issue,(⁠◔⁠‿⁠◔⁠)

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jul 26, 2024

@BarryByte, consider adding endpoint parameters and some UI elements to control the behavior (e.g. the prefix or filename pattern). It will be nice if you create a detailed description of the suggested changes first.

@alexyao2015
Copy link
Author

An even simpler way is to just rename all images to image_1, image_2, etc., without preserving the original filename.

@zhiltsov-max
Copy link
Contributor

@alexyao2015, it's already being done. The problem is that there is no way to find out the real source of the image in the exported dataset.

@alexyao2015
Copy link
Author

Right so as you are exporting images, you export and rename the image regardless of if it's overlapping. What's going on now is it's seeing a potentially duplicate name and renaming if it's duplicate. I would just use a simple counter, incrementing with each image, and export the images with a fixed name so it's impossible to have overlapping names.

@zhiltsov-max
Copy link
Contributor

@alexyao2015, yes, it will fix the problem with name collisions. But it doesn't solve the problem with determining the origin of the frame.

@alexyao2015
Copy link
Author

Have a map with the job id and original image name to the remapped image name in memory? Is there something I'm missing?

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jul 26, 2024

@alexyao2015, it's needed for users, not for export to work. The problem is: there were some images with some names in the tasks in the project. Then the project is exported in some format, with image names mangled. Now, the resulting dataset contains some modified frame names, and the user can't get their origin to do some further analysis of the exported dataset. They need to match the output names with source task or job names, but there is no way to determine this for the user.

Simple potential ways of solving the problem - provide an output mapping or change the added suffix from _N to _job_N.

@rukundob451
Copy link

Hi all,

Is there any update on this issue? Happy to help if needed.

Thanks,
Benjamin

@noahpav
Copy link

noahpav commented Nov 6, 2024

Hello,
Curious if this issue was resolved, otherwise I would be happy contribute. Currently looking for an open source project bug I can work on for a school assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants