You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I thought I would write down the case for fileformats as a dependency of Pydra instead of minimal File/Directory classes within Pydra and interoperability with fileformats so we can discuss it in the next developer's meeting.
I'm obviously a bit biased, but I reckon fileformats (FF) could help round out Pydra by taking advantage of the effort users need to go to in order type their workflows. It could also streamline some of the internal code by outsourcing the special-case validation/hashing for file-system objects.
I'm pretty sure that FF's File/Directory classes would be backwards compatible with the File/Directory classes already in Pydra, so they could be imported into pydra.engine.specs in place of the existing classes.
Avoiding duplication
Minimal File/Directory classes in Pydra and the FF base class would both need to implement a very similar hashing function.
Duplication of File/Directory classes could lead to potential confusion for users as to which one they are using and the difference between them (i.e. why they should use one over the other).
Fileformats is light-weight too
Like Pydra, FF's only dependency is attrs, so adding it in wouldn't add much to Pydra's light touch. I'm happy to commit to keeping it this way going forward.
I have added in support for converters and methods to read headers, which will typically require additional dependencies. However, these dependencies are only installed when you specify the '[extended]' install extra. Medium term, my plan is to just add hooks for this functionality in the core package(s) and then split out the extended functionality into fileformats-<namespace>-extras packages.
Type-checking
One key benefit for someone wanting to use FF with Pydra if the generic File/Directory classes were used internally would in the type-checking of generic tasks that operate on generic FS objects. For example, the fileformats.medimage.Nifti class inherits from fileformats.generic.File so passing the output of a task that returns a NIfTI could be fed into a generic task that takes any file object and the type checker would be happy.
An issue I ran into with the current File/Format implementation is for tasks that can take or return either a File or Directory (e.g. MRtrix3) as an input/output there is no way to specify it properly. I initially tried ty.Union[File, Directory] but that then doesn't get recognised as a file-system object and hashed properly. In FF there is fileformats.generic.FsObject for cases like these, which checks existence but doesn't require you to specify whether it is a file or directory.
Hashing of associated files
Given that hashing is a key part of Pydra's internal logic, ensuring that associated files, such as as JSON side-cars and separate headers, are included in the hashes seems important.
While interoperability between Pydra and FF would be enough for changes in associated files to be detected if a user specifies a FF class, by integrating FF into Pydra it might make this a bit more explicit.
Validation existence/format
Instead of the current case, where there are special case handlers to check for file/directory existence, we could insert "converters" in to the attrs fields typed by FF classes. This would trigger format/existence validation on init and setattr, thereby raising input type issues at workflow construction time rather than runtime.
Would play nicely with Arcana
I thought I should come clean about this ulterior motive. Currently, when applying a Pydra workflow to a data array in a data repository using Arcana you need to specify the format that the Pydra workflow requires so it can be checked against the format the data is stored in. Having Pydra and Arcana speak the same file-format language would make this a lot cleaner.
While not something everyone is going to be interested in, at some point (when everything is working) it could be good to demo some of Arcana's features, in particular the ability to deploy a Pydra workflow to a BIDS app or XNAT pipeline from a YAML spec (we are working on Flywheel Gear support as well).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I thought I would write down the case for
fileformats
as a dependency of Pydra instead of minimal File/Directory classes within Pydra and interoperability withfileformats
so we can discuss it in the next developer's meeting.I'm obviously a bit biased, but I reckon
fileformats
(FF) could help round out Pydra by taking advantage of the effort users need to go to in order type their workflows. It could also streamline some of the internal code by outsourcing the special-case validation/hashing for file-system objects.I'm pretty sure that FF's File/Directory classes would be backwards compatible with the File/Directory classes already in Pydra, so they could be imported into
pydra.engine.specs
in place of the existing classes.Avoiding duplication
Minimal File/Directory classes in Pydra and the FF base class would both need to implement a very similar hashing function.
Duplication of File/Directory classes could lead to potential confusion for users as to which one they are using and the difference between them (i.e. why they should use one over the other).
Fileformats is light-weight too
Like Pydra, FF's only dependency is
attrs
, so adding it in wouldn't add much to Pydra's light touch. I'm happy to commit to keeping it this way going forward.I have added in support for converters and methods to read headers, which will typically require additional dependencies. However, these dependencies are only installed when you specify the '[extended]' install extra. Medium term, my plan is to just add hooks for this functionality in the core package(s) and then split out the extended functionality into
fileformats-<namespace>-extras
packages.Type-checking
One key benefit for someone wanting to use FF with Pydra if the generic File/Directory classes were used internally would in the type-checking of generic tasks that operate on generic FS objects. For example, the
fileformats.medimage.Nifti
class inherits fromfileformats.generic.File
so passing the output of a task that returns a NIfTI could be fed into a generic task that takes any file object and the type checker would be happy.An issue I ran into with the current File/Format implementation is for tasks that can take or return either a File or Directory (e.g. MRtrix3) as an input/output there is no way to specify it properly. I initially tried ty.Union[File, Directory] but that then doesn't get recognised as a file-system object and hashed properly. In FF there is
fileformats.generic.FsObject
for cases like these, which checks existence but doesn't require you to specify whether it is a file or directory.Hashing of associated files
Given that hashing is a key part of Pydra's internal logic, ensuring that associated files, such as as JSON side-cars and separate headers, are included in the hashes seems important.
While interoperability between Pydra and FF would be enough for changes in associated files to be detected if a user specifies a FF class, by integrating FF into Pydra it might make this a bit more explicit.
Validation existence/format
Instead of the current case, where there are special case handlers to check for file/directory existence, we could insert "converters" in to the
attrs
fields typed by FF classes. This would trigger format/existence validation on init and setattr, thereby raising input type issues at workflow construction time rather than runtime.Would play nicely with Arcana
I thought I should come clean about this ulterior motive. Currently, when applying a Pydra workflow to a data array in a data repository using Arcana you need to specify the format that the Pydra workflow requires so it can be checked against the format the data is stored in. Having Pydra and Arcana speak the same file-format language would make this a lot cleaner.
While not something everyone is going to be interested in, at some point (when everything is working) it could be good to demo some of Arcana's features, in particular the ability to deploy a Pydra workflow to a BIDS app or XNAT pipeline from a YAML spec (we are working on Flywheel Gear support as well).
Beta Was this translation helpful? Give feedback.
All reactions