Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflow to start with ImageCollection files and use a Butler #30

Merged
merged 7 commits into from
Aug 18, 2024

Conversation

drewoldag
Copy link
Collaborator

@drewoldag drewoldag commented Aug 13, 2024

This PR represents the work to copy the original workflow dag (now called tno_workflow) and update it so that it begins with ImageCollections as input. The new workflow is in ic_workflow.py.

I've also created a new reproject_wu task, based on the conversation between @DinoBektesevic @ColinOrionChandler and @wilsonbb on slack. We'll focus specifically on the case of single chip/single night ImageCollections in the ic_workflow DAG, and as such, the reprojection logic is significantly simplified.

This PR also introduces the butler so that we can now support KBMOD ButlerStandardizers.

@drewoldag drewoldag self-assigned this Aug 13, 2024
@drewoldag drewoldag linked an issue Aug 13, 2024 that may be closed by this pull request
Copy link
Collaborator

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me! Just some small comments and questions

src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/task_impls/uri_to_ic.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
Copy link
Member

@DinoBektesevic DinoBektesevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues with the code except the lists in func declarations. Either make those tuples, or just leave them blank.

I also think there's a problem with how lists to ICs are handled, but that doesn't stop the workflow if it's run straight from IC so it's not a blocker for me here.

I'd love to see some code duplication cleaned up and some doc strings added (given they're mostly all the same), but we can move towards that in steps. Happy to approve with the minor comments addressed.

example_runtime_config.toml Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/ic_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/tno_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/tno_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/tno_workflow.py Outdated Show resolved Hide resolved
src/kbmod_wf/tno_workflow.py Outdated Show resolved Hide resolved
Comment on lines +1 to +4
"""This workflow definition represents the task flow that was used to prepare results
for the TNO 2024 presentation. It has been slightly updated to also use sharded
workunits.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's interesting to observe what changed between the two workflows. Except tranlsating the lst file to ic and globbing all ".collections" instead of "lst" it doesn't seem like much.

I'd be curious to hear your take on what you think the inputs for a workflow should be and then build the tooling for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, that there really isn't much that has changed between the two workflows. My take at the beginning of working on this parsl project was that the order would be roughly:

  1. Reproduce the workflow that was done manually in prep for TNO
  2. Polish that workflow so that more than one person could run at a time
  3. Tack on a region search task to the beginning of the workflow. i.e. provide a list of search queries as input
  4. Tack on filtering to the end of the workflow so that search results will go through the CNN
  5. Tack on visualization prep to make examining results easier.
  6. Polish for production
  7. Tack on a step that accepts a catalog of newly acquired images - for each new image, determine by some criteria if there are enough images now to do a search in a particular patch of sky, and create a region search query for that patch.

So to make a long story short, in the near term ImageCollections should be the input. Then queries for a region search step. Then a catalog of images acquired since the last time the workflow was run.

That is in order of near to long term and high to low confidence. :)

@drewoldag
Copy link
Collaborator Author

@DinoBektesevic In my most recent commit, I've started to address the code duplication by moving the parsl task definitions out of the workflow files into a submodule, .../src/kbmod-wf/workflow_tasks/. I need to pause for the day, but there will be a follow up commit that finished filling out the docstrings for each one.

If you don't mind taking a look at src/kbmod_wf/workflow_tasks/create_manifest.py, you'll get a sense of what the rest of the docstrings will look like. Let me know if the docstring and the refactoring align with what you were hoping to see.

I know there's more in the workflows that can be abstracted, but this seemed like the biggest chunk.

@DinoBektesevic
Copy link
Member

Just to clarify, if I wasn't clear before, I'm happy to merge this as is and make any refactoring a part of a different PR if you need time to test and flesh it out. If you feel like you want to do it now that's ok also, but not required by me.

Copy link
Member

@DinoBektesevic DinoBektesevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go to me.

Copy link
Collaborator

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your patience in the various discussions!

@drewoldag drewoldag merged commit cee03ea into main Aug 18, 2024
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Dino's imagecollections with new workflow and butler
3 participants