-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle file exclusions and inclusions in 0.8? #378
Comments
if i understand this correctly, there are three potential operators:
and then there is order of operations and combinations of operations. the base state (let's call it B) is index everything in a root and then apply some operations on it. here are some operations in decreasing order of use-cases (in my mind)
each of the operations returns a layout object. so one could do things like:
the actual implementations can do optimizations under the hood in terms of order of operations, especially if you think of the index being a future. the actual indexing, exclusion, validation doesn't have to happen till the point a program asks for something concrete. |
I lean towards 3 for API simplicity (as 1 with different names also seems reasonable. Essentially |
@satra sure, we could add explicit @adelavega my (3) doesn't include any Let's set aside the question of merging |
@tyarkoni - i agree that merging bidslayout for different datasets is out of scope for now. i was more thinking of where the derivatives of a dataset are on different filesystems, i can use layouts to integrate them. the following is not an uncommon scenario in hpc clusters.
now i want a layout that integrates these into a single layout index. |
@tyarkoni - the only reason to add extra calls is to allow for order of operations, otherwise as an argument you would have to somehow be able to specify order of operations.
|
@tyarkoni Ah, sorry I misspoke. But, what you suggested I think is better. That way, most people don't have to touch But I agree the order of operations is problematic. Is it not suffient to just set an order and have the user work around that order (e.g. validate, include, exclude).? |
@satra that kind of derivative handling is already supported! You can do: layout = BIDSLayout('/bids/raw/root', derivatives=['/deriv1', '/deriv2']) Then each derivative folder is initialized as its own Re: order of operations, I agree there will probably be use cases where that matters. I think implementing |
@adelavega actually, if we assume that The only non-crazy use case I can think of where this wouldn't work is if the user wants to, say, index |
Just to be obnoxious... If we're considering a |
That makes sense to me. But in that case then the order would be: And @tyarkoni in such a case, isn't it possible to pass a regex for |
Maybe? This discussion has been pretty much exclusively at a level I'm not currently prepared to engage, so order of operations is not something I can comment on. This is just a UX consideration. |
@effigies I'd be fine with @adelavega I don't think switching I'm on the fence about whether or not to allow regexes here, and leaning towards not. The main issue is that allowing regexes by default is dangerous, because lots of things could inadvertently match strings that aren't meant to be interpreted as regexes. E.g., suppose someone passes |
How about |
Okay, I suppose I can live without regex. I'm currently using it to exclude confound files from the fmriprep dataset (since we provide our own events, including confounds), but I think I can workaround that in a more explicit fashion. To be concrete, I'm currently passing the following to fitlins, which passes it on to pybids: |
This is one of the places where I really miss Ruby's (and shudder Perl's) native regex support. This kind of thing would be so much easier if you could just pass |
Okay, thinking about this, a little. I think we should choose a consistent syntax with If you think that's a good idea, perhaps it makes sense to think more about bids-standard/bids-specification#131. The |
Seems reasonable. I don't want to deal with That would make things extremely simple on the pybids end, because, as you suggest, anything passed to |
Would class BIDSValidator:
...
def validate(files):
out_files = [file for file in files if not ignored(file)]
...
return out_files, warnings, errors |
I was thinking it would follow the current API and do everything one file at a time. We could e.g. add an |
This is an issue that has cropped up repeatedly in various guises (e.g., #215, #364, #277, #184, #131, and probably others). The question is how to allow users to specify explicit inclusion and exclusion paths at
BIDSLayout
initialization. The reason for bringing this up again is that, as of version 0.8 (see #369), pybids will no longer depend on grabbit. The cord-cutting means we can no longer rely on the behavior implemented in grabbit. Since this was entirely undocumented in pybids, I think we have a good opportunity to start afresh and hopefully settle on something that works for everyone.The main constraints I think we should try to respect are:
'code'
,'stimuli'
,'sourcedata'
, etc.)BIDSLayout
)The current approach doesn't allow users to specify explicit exclusions at all (well, it does, but this is an undocumented grabbit feature). It uses an
include
argument only as a means of negating the default exclusions. E.g., if you want'stimuli'/
to be indexed, you passinclude=['stimuli']
. Beyond this, there's no pybids-level ability to control inclusions or exclusions (aside from specifying derivatives, which is a separate matter that I think we're handling in a satisfactory way). I don't think this is satisfactory, and a bunch of the opened issues reflect that.Here are a few proposals (feel free to suggest others):
Keep the current approach, where
include
negates values in the default exclusion list, but add anexclude
argument that causes any matching files/dirs to be skipped during indexing. The main downside I see here is that the behavior is counterintuitive, asinclude
andexclude
act asymmetrically. A potential fix is to give these arguments different names (e.g.,override_exclusions
andexclude_paths
).Stick with just
exclude
, and have any manually specified value override the default internal list (e.g., if you pass['code', 'sourcedata']
, then things like'stimuli'
will now be indexed, and only files/dirs that match the elements in your list will be skipped). The downside of this is it requires users to know what the default exclusions are, and reproduce them, and this will probably get pretty messy.Get rid of the current default exclusion list entirely, and treat
exclude
as a strict list of paths to exclude from indexing. Now that the validator is working properly, directories like 'stimuli' will automatically be skipped ifvalidate=True
, because files won't pass the validator unless they're explicitly part of the spec. The downside of this option is that it makes it difficult to index selectively—e.g., if you want to index only what's in'stimuli'
, you need to setvalidate=False
and then pass a whole pile of exclusions (i.e., everything that doesn't pass the validator except for'stimuli'
).I lean towards (1) (with more explicit argument names). Thoughts? If I don't get any feedback in the next couple of days, I'll make an executive decision in the interest of getting 0.8 merged, so speak up now if you have an opinion! (Tagging in @effigies @adelavega @yarikoptic @gkiar)
The text was updated successfully, but these errors were encountered: