-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input files packaging #510
Input files packaging #510
Conversation
for more information, see https://pre-commit.ci
…and redownload files
Kicking CI |
Kicking CI |
Kicking CI |
@JasonWeill updated "Submit the Create Job form" section of the Users page with a screenshot and text mentioning "Run job with input folder" option (code, readthedocs preview). I tried different options like adding an example "Use this to, for example, access data files or images from notebook's cells." But ultimately what I have in the PR now works well and matches level of detail given in the paragraphs around it, would be interested to learn your opinion on this. |
Co-authored-by: Jason Weill <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Jason Weill <[email protected]>
Kicking CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good! Thanks for the great work on this.
…ger) (jupyter-server#510) * package input files and folders (backend) * package input files and folders (frontend) * remove "input_dir" from staging_paths dict * ensure execution context matches the notebook directory * update snapshots * copy staging folder to output folder after job runs (SUCESS or FAILURE) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy staging folder and side effects to output after job runs, track and redownload files * remove staging to output copying logic from executor * refactor output files creation logic into a separate function for clarity * Fix job definition data model * add packaged_files to JobDefinition and DescribeJobDefinition model * fix existing pytests * clarify FilesDirectoryLink title * Dynamically display input folder in the checkbox text * display packageInputFolder parameter as 'Files included' * use helper text with input directory for 'include files' checkbox * Update Playwright Snapshots * add test side effects accountability test for execution manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use "Run job with input folder" for packageInputFolder checkbox text * Update Playwright Snapshots * Use "Ran with input folder" in detail page * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <[email protected]> * fix lint error * Update Playwright Snapshots * Update existing screenshots * Update "Submit the Create Job" section mentioning “Run job with input folder” option * Update docs/users/index.md Co-authored-by: Jason Weill <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <[email protected]> * Update Playwright Snapshots * Describe side effects behavior better --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <[email protected]>
…adManager) (#510) (#512) * Package input files (no autodownload, no multiprocessing DownloadManager) (#510) * package input files and folders (backend) * package input files and folders (frontend) * remove "input_dir" from staging_paths dict * ensure execution context matches the notebook directory * update snapshots * copy staging folder to output folder after job runs (SUCESS or FAILURE) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy staging folder and side effects to output after job runs, track and redownload files * remove staging to output copying logic from executor * refactor output files creation logic into a separate function for clarity * Fix job definition data model * add packaged_files to JobDefinition and DescribeJobDefinition model * fix existing pytests * clarify FilesDirectoryLink title * Dynamically display input folder in the checkbox text * display packageInputFolder parameter as 'Files included' * use helper text with input directory for 'include files' checkbox * Update Playwright Snapshots * add test side effects accountability test for execution manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use "Run job with input folder" for packageInputFolder checkbox text * Update Playwright Snapshots * Use "Ran with input folder" in detail page * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <[email protected]> * fix lint error * Update Playwright Snapshots * Update existing screenshots * Update "Submit the Create Job" section mentioning “Run job with input folder” option * Update docs/users/index.md Co-authored-by: Jason Weill <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <[email protected]> * Update Playwright Snapshots * Describe side effects behavior better --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <[email protected]> * Update Playwright snapshots --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <[email protected]>
When working with notebooks, users frequently import and use additional files such as datasets, images, and scripts inside their notebook's cells. Providing support for packaging such files would ensure that notebooks can have all essential resources available when executed as a job. This would make Jupyter Scheduler more flexible and able to accommodate more types or workflows providing better value to people who use it.
This PR adds an option to package input folder (folder where input notebook is located) and all nested files and sub-folders within it during the job or job definition creation.
In terms of features, this is a subset of PR #500. This PR does not automatically download output files to output folder when job runs and therefore has no need need to schedule downloads from multiple processes and components that would manage it (
DownloadRunner
andDownloadManager
from #500). Besides making this PR more focused in terms of functionality, this makes changes introduced by this PR non-breaking.When package input folder option is active:
Job.packaged_files
, if any of them is deleted from output folder, user gets an option to re-downloaded them via UI (matches existing behavior for snapshot of the input notebook and output files)JobDefinition.packaged_files
cwd
parameter of theExecutePreprocessor
). Additionally pass intended path of execution context to preprocessor via metadata{"metadata": {"path": notebook_dir}}
argument of the preprocessor call.Job.packaged_files
and copied to the output folder together with other filesFixes #407
Before:
After:
Re-download option when any of the files is deleted from the output folder: