Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FlyteDirectory not copied to ContainerTask inputs #3632

Open
2 tasks done
pryce-turner opened this issue May 1, 2023 · 18 comments
Open
2 tasks done

[BUG] FlyteDirectory not copied to ContainerTask inputs #3632

pryce-turner opened this issue May 1, 2023 · 18 comments
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working

Comments

@pryce-turner
Copy link
Contributor

Describe the bug

When using a ContainerTask with a FlyteDirectory in the inputs, no inputs are actually copied to the running container. For example:

bt = ContainerTask(
    name="basic-test",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(indir=FlyteDirectory),
    outputs=kwtypes(),
    image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    command=[
        "ls",
        "-la",
        "/var/inputs",
    ],
)

@task
def get_dir(dirpath: str) -> FlyteDirectory:
    fd = FlyteDirectory(path=dirpath)
    return fd

@workflow
def wf():
    fd = get_dir(dirpath='s3://my-s3-bucket/cv-in')
    bt(indir=fd)

This workflow returns an empty directory at /var/inputs according to the k8s logs despite there definitely being objects at the specified path.

Expected behavior

I would expect the above workflow to list either /var/inputs/indir with objects copied to indir or perhaps copy them directly i.e. /var/inputs/obj1, /var/inputs/obj2 as they are named in the object store.

Additional context to reproduce

This was run on a local sandbox via pyflyte run --remote basic_test.py wf

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@pryce-turner pryce-turner added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels May 1, 2023
@welcome
Copy link

welcome bot commented May 1, 2023

Thank you for opening your first issue here! 🛠

@eapolinario eapolinario self-assigned this May 5, 2023
@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label May 5, 2023
@eapolinario
Copy link
Contributor

Confirm copilot has support for FlyteDirectory.

@eapolinario
Copy link
Contributor

Multipart blobs (which is what FlyteDirectory objects are mapped to) are not supported in Flyte copilot currently.

@gakumar49606
Copy link

I can confirm that I'm observing this issue. Any timeline by which we can expect this to be fixed ?

@eapolinario
Copy link
Contributor

@gakumar49606 , this issue has been in the backlog for some time but it's really low priority, so I can't really offer any timeline. That said, this could be a great contribution and I'd be happy to review any PRs!

@djbutler
Copy link

Is there a way to use List[FlyteFile] as a workaround? For instance, if you passed an input called files of type List[FlyteFile] to the ContainerTask, would those files be handled by copilot?

@kumare3
Copy link
Contributor

kumare3 commented Nov 10, 2023

this is actually a simple change to use - https://github.com/flyteorg/flyte/blob/48f53f0995c82568fb10ebf3ffead16c10424e99/flytecopilot/data/download.go#L45C58-L45C76.
The thing is this will need a list and then start goroutines.
Someone could easily contribute.

@cjidboon94
Copy link
Contributor

Any updates on this issue? Just encountered this bug in my own workflows

@cjidboon94
Copy link
Contributor

cjidboon94 commented Mar 20, 2024

One current workaround I am using, is to make the ContainerTask a python task with an ImageSpec that uses the ContainerTask's image as base_image and use the task-decorated function as an entry point/to call the applications in the container.
e.g.

bt = ContainerTask(
    name="basic-test",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(indir=FlyteDirectory),
    outputs=kwtypes(),
    image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    command=[
        "ls",
        "-la",
        "/var/inputs",
    ],
)

becomes (important to install flytekit python package if it's not already in the image, so that the task can run)

image_spec = ImageSpec(
    base_image="ghcr.io/flyteorg/rawcontainers-shell:v2",
    packages=["flytekit"],
)

@task(container_image=image_spec)
def bt(input_dir: FlyteDirectory) -> None:
    input_dir.download()
    cmd = f"ls -la {input_dir}"
    subprocess.check_call(cmd, shell=True)

Wonder if there's any drawbacks to this approach, other than the additional packages/python overhead introduced this way.

@Future-Outlier
Copy link
Member

#take

@Future-Outlier
Copy link
Member

I am going to contribute to this, and please review the local execution implementation.

flyteorg/flytekit#2258

@Future-Outlier
Copy link
Member

Doing it now.
image

@eapolinario eapolinario removed their assignment Jun 18, 2024
@vasra-gh
Copy link

Is there any update on this? This feature would be really useful

@srale
Copy link

srale commented Aug 28, 2024

When is this feature expected to be implemented? This limitation makes the functionality of container tasks a bit limited and this would improve our workloads a lot

@Future-Outlier Future-Outlier removed their assignment Aug 29, 2024
@Future-Outlier
Copy link
Member

Future-Outlier commented Aug 29, 2024

Sorry guys, I have other priorities but contribution is welcomed!

@davidmirror-ops davidmirror-ops added the backlogged For internal use. Reserved for contributor team workflow. label Aug 29, 2024
@wayner0628
Copy link
Contributor

wayner0628 commented Sep 5, 2024

Hi everyone,
I’m currently working on this issue, and most of the necessary modifications have been completed. Please follow the PR linked above for updates. Thank you!

Thanks to @davidmirror-ops for the mention!

@davidmirror-ops
Copy link
Contributor

This is implemented here
Big thanks to @wayner0628

@davidmirror-ops
Copy link
Contributor

Once this is out in the upcoming 1.14 release, please give it a try and share any feedback here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working
Projects
None yet
Development

No branches or pull requests