Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompositeData for multiple processing inputs #1167

Open
tmchartrand opened this issue Nov 26, 2024 · 2 comments
Open

CompositeData for multiple processing inputs #1167

tmchartrand opened this issue Nov 26, 2024 · 2 comments
Assignees

Comments

@tmchartrand
Copy link
Member

Is your feature request related to a problem? Please describe.
In many data processing tasks there are multiple distinct inputs, e.g. generating the smartspim template or training a model. We added minimal support for this in #1166 in the form of a list of input locations, but discussed a plan to add additional context explaining the selection of those inputs.

Describe the solution you'd like

class CompositeData(AindModel):
    """Description of a group of data assets used together"""
    data_assets: List[str] = Field(..., title="Data assets")
    shared_metadata: Optional[AindGenericType] = Field(
        default=None,
        title="Shared metadata", 
        description="Common attributes that provide context for this grouping of assets"
    )
    curation_purpose: Optional[str] = Field(
        default=None,
        title="Curation purpose",
        description="Reason for grouping assets together for processing"
    )

Describe alternatives you've considered
The most obvious alternative is to create a separate DataProcess stage for "curation" or the like, with no inputs and the list of curated assets as an output parameter. Other contextual fields could be entered as notes or parameters of the process.

Additional context
One advantage of a separate object for this is that it would be easier to write a script to run a new process on the same set of data assets as a previous processing result (could be a common use case for model evaluation etc).
Unclear if this would be reused/allowed anywhere other than the input field of DataProcess

This is also related to the discussion #1148

@saskiad
Copy link
Collaborator

saskiad commented Dec 2, 2024

I wonder if we can be more specific about the shared metadata. Not sure what that looks like, I just want to avoid this being a huge dump.

@tmchartrand
Copy link
Member Author

Yes I guess we'd discussed limiting that to something like "Key shared attributes used to select this group of assets" - maybe the title could actually be "Defining metadata" or "Defining attributes"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants