Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to locate output files from a scheduled job run #349

Closed
AravindAmazon opened this issue Mar 2, 2023 · 5 comments
Closed

Unable to locate output files from a scheduled job run #349

AravindAmazon opened this issue Mar 2, 2023 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@AravindAmazon
Copy link

Hello team,

I have a job scheduled which will auto-run a basic Python script that writes a csv file output. The job runs successfully. However, am not able to locate the directory where the output file is scored.
The code line used for writing : temp.to_csv('trial_output.csv'), where temp is the data-frame variable.
When I use the same script in regular JupYter environment (outside JupYter lab), the csv file gets written successfully to the local JupYter environment folder. The issue appears to be happening only in the JupYter lab environment while using a scheduled job. Appreciate if someone can help (I use JupYter notebook via the AWS SageMaker interface).

Full-script:
import pandas as pd
temp = pd.read_csv("s3:///")
temp.to_csv('trial_output.csv')

Overall purpose:
Require to auro-run case predictions on a daily basis (with a volume of atleast 10,000 predictions per day) and share a daily csv with business users (without any manual intervention)

Thanks,
Aravind

@welcome
Copy link

welcome bot commented Mar 2, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@JasonWeill JasonWeill changed the title Unable to locate output files from a JupYter job scheduler run Unable to locate output files from a scheduled job run Mar 2, 2023
@JasonWeill JasonWeill added the bug Something isn't working label Mar 2, 2023
@rubenvarela
Copy link

When I try recreating this, the file gets saved to the root folder of jupyter lab which maps to the location from where I ran the jupyter-lab command.

@JasonWeill JasonWeill added this to the 1.4.0 Release milestone Jun 6, 2023
@JasonWeill JasonWeill self-assigned this Jun 22, 2023
@JasonWeill
Copy link
Collaborator

ArchivingExecutionManager archives the output files to a .tar.gz file, but it doesn't include files created as a side effect of running the notebook, as described in this issue.

This issue might be fixed by either modifying ArchivingExecutionManager or creating an alternate execution manager that gathers all output formats and all supporting files in and under the working directory, and saves them into an archive of some kind (.zip or .tar.gz).

@JasonWeill
Copy link
Collaborator

Closing because #388 is merged.

@JasonWeill
Copy link
Collaborator

To use the archiving scheduler, follow these instructions in the docs: https://jupyter-scheduler.readthedocs.io/en/latest/operators/index.html#example-capturing-side-effect-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants