-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restrict exclude regexes even more #131
Comments
As of #135, no subdirectories are excluded from indexing by default. There are new Relatedly, the If this covers your needs, can we close the issue? |
hm, so it will be up to the user to establish excludes? since |
The problem is that, as far as I can tell, the spec doesn't say anything about what can't be included in a BIDS project. And for reasons discussed in #135, having default exclusions turns out to be problematic. So this is probably the least of all the evils... |
I hear you. But it does describe what can be included in the tree and what does not necessarily follows BIDS specification. So I do not see a reason why not to exclude those known suspects by default (as you did in the past, just somewhat too permissively). |
Fair point. Now I'm on the fence again. I wonder if the best way to handle this is to treat it as a special case, and have something like a |
Let me bring up another concern, which is that if we want to stick to only what's defined in the core spec by default, it doesn't make sense to list exclusions... I think that would argue for defining inclusions instead. Otherwise, anything the user drops into the project that isn't explicitly excluded (e.g., derivatives, sourcedata, code, models, etc.) is going to get scanned. And I think that kind of behavior will really throw users for a loop (e.g., "why is pybids picking up files in (Not to mention that it's not really clear what constitutes a "core" part of the spec in any case. The spec document explicitly mentions |
@yarikoptic, what use case do you have in mind where exclusions would actually be important? It's clearly true that having extra directories to scan will slow things down somewhat, but if that becomes an issue at some point, we could refactor grabbit internally to speed things up (e.g., to store things internally in a relational database instead of Python objects). The only other major concern I can see is that the user might get extra files they don't want returned by queries. But I'd argue that that's a user issue. E.g., if you have a whole pile of junk in |
I don't think it would be wise for pybids to return info about files that are not part of the spec.
|
My opinion that I'm extremely ready to be argued out of is that there should be three modes of inclusion:
This leads to two additional thoughts:
|
@tyarkoni, @yarikoptic - this is almost like a matrix of paths x configs, with the unknown being - what is in the dataset. it could be "raw data" [sourcedata + bids], it could be bids layout only, it could be derivatives of any kind and quantity. the design of layout should not try to assume what's in the dataset. and i think currently it doesn't. the question one can ask is whether to include all data underneath a root path or not or force the user to specify which ones. the api should allow both imo, however, which one is set to default can be discussed. my preference would be that if a root path is provided, everything is included (whether known or unknown from the spec). if a "user/developer" wants to restrict it the api should offer a way of excluding known keys or paths. i'm assuming things are relative to root. BIDSLayout('/root_dataset', exclude=['sourcedata', 'scratch', 'derivatives/bad_deriv']) one could also have something that works with the BIDS validator: BIDSLayout('/root_dataset', exclude=['sourcedata', 'scratch', 'derivatives/bad_deriv', '.heudiconv'],
exclude_invalid=True) and then for anything included configs can be added/redefined as presented before. in some ways this is similar to the correspondingly, perhaps there should be an overriding keyword which allows everything to be included. |
Thanks all, this is helpful. @satra, everything currently works exactly as you propose (including a That said, I can see @chrisfilo's point about it being hard to write code that behaves predictably if you don't know what else could be sitting in the user's project. One way or another, it seems likely that stipulating a "restrictive" mode of access may be necessary to assure deterministic behavior. And then the question becomes whether an app developer should have to write the inclusion rules themselves, or whether there should be a default setting that provides a reasonable guess. I think @effigies' suggestion is a nice compromise that fits with my suggestion to have something like a If so, @chrisfilo, could you elaborate on what you think of as the set of files defined by the spec? Is it co-extensional with what's considered valid by the |
Yes - the |
Excellent, that works for me. Unless anyone has objections to the above proposal, I think we've converged. Actually, this makes life super easy, since if we're okay relying on the |
+1 for sharing regular expressions. I think having a master JSON list that looks like: [
{
"name": "is_cont",
"pattern": "^\\/(sub-[a-zA-Z0-9]+)",
"description": "Verifies that the file is..."
},
...
] ...would probably be sufficient. Then we could dynamically build the entire validator (or most of it at least) directly from the JSON file, and still provide informative messages about why a failure occurred if needed. |
ATM in https://github.com/INCF/pybids/blob/master/bids/grabbids/config/bids.json#L3 my guess is that it is likely that if I have BIDS-compliant dataset with either
sub-derivatives
orses-models
(unlikely but still legit), those directories and their files would be excluded. Why not to restrict on having a path separator right before? i.e.^(.*[/\\])?derivatives$
. Actually with grabbles/grabbit#48 , if you would like to restrict only at the top level, then it could be as simple as^derivatives$
I believeThe text was updated successfully, but these errors were encountered: