Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk] client.tasks.create_from_data does not support relative path to cloud storage for AWS S3 bucket #7533

Closed
2 tasks done
dpovision opened this issue Feb 28, 2024 · 3 comments
Labels
need info Need more information to investigate the issue

Comments

@dpovision
Copy link

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

My goal is to automatize the creation of tasks with AWS S3 cloud storage via the Python SDK.
The connection to the AWS cloud S3 bucket is available since I can create tasks with the CVAT UI.
But when I try to use the create_from_data function of the Python SDK API, I get the following response:

HTTP Status Code: 200   
Reason: OK
HTTP response headers: HTTPHeaderDict({'Allow': 'GET, HEAD, OPTIONS', 'Content-Length': '196', 'Content-Type': 'application/vnd.cvat+json', 'Date': 'Fri, 23 Feb 2024 15:43:02 GMT', 'Referrer-Policy': 'same-origin', 'Server': 'Apache', 'Vary': 'Accept,Origin', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY'})
HTTP response body: b'{"state":"Failed","message":"FileNotFoundError: [Errno 2] No such file or directory: \'/home/django/share/my_images/manifest.jsonl\'","progress":0.0}'

The problem is that there is a fix path "/home/django/share/" configured for the CVAT Python SDK in the EC2 host instance, where is no access to the S3 bucket of the images and manifest data. Even if I try to give a complete path to the S3 bucket data, I always have the fix part of "/home/django/share/" at the beginning. This results to a created task, but without any images. When I do this with the CVAT UI in the EC2 host instance instead, I can create tasks with "cloud storage" successfully.

Minimal code example to reproduce:

        images_and_manifest = []
        img_file = "my_image.png"
        manifest = "/my_images/manifest.jsonl"
        images_and_manifest.append(img_file)
        images_and_manifest.append(manifest)
        # https://opencv.github.io/cvat/docs/api_sdk/sdk/reference/models/task-write-request/
        task_spec = {
            "name": "NewTestTask",
            "project_id" : 11,
            "segment_size" : 100
        }
            
        data = dict(
            image_quality=100,
            use_cache=True,
            cloud_storage_id=1,
            server_files=images_and_manifest,
            storage=StorageType('cloud_storage')
        )
        
        task = self.client.tasks.create_from_data(
            spec=task_spec,
            resource_type=ResourceType.SHARE,
            resources=images_and_manifest,
            data_params=data
        )

Expected Behavior

create_from_data method should be able to access S3 bucket data like the CVAT UI.

Possible Solution

No response

Context

I have seen this related issue 6012 but for some reason it does not work unfortunately.

Environment

Server version: 2.3
Core version: 7.1.0
Canvas version: 2.16.0
UI version: 1.43.0
@dpovision dpovision added the bug Something isn't working label Feb 28, 2024
@bsekachev bsekachev removed the bug Something isn't working label Feb 29, 2024
@bsekachev
Copy link
Member

Hello,

First of all, I would recommend you to upgrade CVAT.
We do not really consider issues from strongly outdated versions.

Also, be sure you did mount cvat_share volume to cvat_worker_import (at least in a new version, it called so, I do not remember how it was called in 2.3, maybe cvat_default)

@bsekachev bsekachev added the need info Need more information to investigate the issue label Feb 29, 2024
@nmanovic
Copy link
Contributor

nmanovic commented Mar 5, 2024

@dpovision , we don't have a reply from you. Please upgrade CVAT and re-open the issue if you still have the problem. Don't hesitate to post your solution or send us a pull request.

@nmanovic nmanovic closed this as completed Mar 5, 2024
@dpovision
Copy link
Author

@bsekachev @nmanovic Thanks a lot and sorry for the late response. I finally got it worked. It was tough and I had to dig for some posts like https://opencv.github.io/cvat/docs/administration/basics/installation/#share-path, https://opencv.github.io/cvat/docs/administration/advanced/mounting_cloud_storages/, https://serverfault.com/questions/441691/how-to-make-s3fs-work-with-iam-roles but in the end it works so far.
The reason why I have not updated cvat to a newer version is, that I want to use the Git repository synchronization feature, which is still available in cvat version 2.3. Unfortunately, I am currently facing a cvat server issue, when I am adding the dataset_repository_url. Without the dataset_repository_url option everything works fine.

task = self.client.tasks.create_from_data(
            spec=task_spec,
            resource_type=ResourceType.SHARE,
            resources=images,
            data_params=data,
            annotation_format="CVAT for video 1.1",
            dataset_repository_url="[email protected]:AGCO-Corporation/ezv_robotics_cvat_tasks_backup.git",
            use_lfs=True
        )

Extract from cvat server docker log, when adding dataset_repository_url option:

2024-03-05 14:49:27,890 DEBG 'runserver' stderr output:
[Tue Mar 05 14:49:27.890582 2024] [wsgi:error] [pid 51:tid 140052090820352] [remote 172.22.0.2:52710] [2024-03-05 14:49:27,890] WARNING django.request: Not Found: /api/user-agreements
 
2024-03-05 14:49:27,897 DEBG 'runserver' stderr output:
[Tue Mar 05 14:49:27.897347 2024] [wsgi:error] [pid 51:tid 140052090820352] [remote 172.22.0.2:52710] WARNING:django.request:Not Found: /api/user-agreements
 
2024-03-05 14:49:33,525 DEBG 'runserver' stderr output:
[Tue Mar 05 14:49:33.524950 2024] [wsgi:error] [pid 51:tid 140052090820352] [remote 172.22.0.2:53024] [2024-03-05 14:49:33,524] WARNING django.security.csrf: Forbidden (CSRF token missing or incorrect.): /git/repository/create/389

Do you have any idea what I can do here? I tried many options, like CSRF_TRUSTED_ORIGINS or ALLOWED_HOSTS and added https://github.com as entry, but with no help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need info Need more information to investigate the issue
Projects
None yet
Development

No branches or pull requests

3 participants