-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update workflow to start with ImageCollection files and use a Butler #30
Conversation
…ses find_optimal_celestial_wcs.
…he correct one here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me! Just some small comments and questions
src/kbmod_wf/task_impls/reproject_single_chip_single_night_wu.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues with the code except the lists in func declarations. Either make those tuples, or just leave them blank.
I also think there's a problem with how lists to ICs are handled, but that doesn't stop the workflow if it's run straight from IC so it's not a blocker for me here.
I'd love to see some code duplication cleaned up and some doc strings added (given they're mostly all the same), but we can move towards that in steps. Happy to approve with the minor comments addressed.
"""This workflow definition represents the task flow that was used to prepare results | ||
for the TNO 2024 presentation. It has been slightly updated to also use sharded | ||
workunits. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's interesting to observe what changed between the two workflows. Except tranlsating the lst file to ic and globbing all ".collections" instead of "lst" it doesn't seem like much.
I'd be curious to hear your take on what you think the inputs for a workflow should be and then build the tooling for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, that there really isn't much that has changed between the two workflows. My take at the beginning of working on this parsl project was that the order would be roughly:
- Reproduce the workflow that was done manually in prep for TNO
- Polish that workflow so that more than one person could run at a time
- Tack on a region search task to the beginning of the workflow. i.e. provide a list of search queries as input
- Tack on filtering to the end of the workflow so that search results will go through the CNN
- Tack on visualization prep to make examining results easier.
- Polish for production
- Tack on a step that accepts a catalog of newly acquired images - for each new image, determine by some criteria if there are enough images now to do a search in a particular patch of sky, and create a region search query for that patch.
So to make a long story short, in the near term ImageCollections should be the input. Then queries for a region search step. Then a catalog of images acquired since the last time the workflow was run.
That is in order of near to long term and high to low confidence. :)
…nd function names.
@DinoBektesevic In my most recent commit, I've started to address the code duplication by moving the parsl task definitions out of the workflow files into a submodule, If you don't mind taking a look at src/kbmod_wf/workflow_tasks/create_manifest.py, you'll get a sense of what the rest of the docstrings will look like. Let me know if the docstring and the refactoring align with what you were hoping to see. I know there's more in the workflows that can be abstracted, but this seemed like the biggest chunk. |
Just to clarify, if I wasn't clear before, I'm happy to merge this as is and make any refactoring a part of a different PR if you need time to test and flesh it out. If you feel like you want to do it now that's ok also, but not required by me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to go to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your patience in the various discussions!
This PR represents the work to copy the original workflow dag (now called
tno_workflow
) and update it so that it begins with ImageCollections as input. The new workflow is inic_workflow.py
.I've also created a new reproject_wu task, based on the conversation between @DinoBektesevic @ColinOrionChandler and @wilsonbb on slack. We'll focus specifically on the case of single chip/single night ImageCollections in the ic_workflow DAG, and as such, the reprojection logic is significantly simplified.
This PR also introduces the butler so that we can now support KBMOD
ButlerStandardizer
s.