-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682
Conversation
When `BIDSLayout.__repr__` is called on datasets that have `sub-N_scans.tsv` files, the query to obtain runs also returns these files, and their value is a dictionary which is not hashable by the `set` object wrapping up the query. Instead of refining the query (which indeed would fix the issue), this workaround uses the naive approach of ensuring that each result of the query has `int` value. Resolves: bids-standard#681.
Can we create a test? |
Would it suffice to break a currently existing test? BTW, this has made me realize that this bug would only occur when:
I'm too lazy to check on the documentation, but if the {
...
"run": 1
...
} then this patch would not work, and one extra run would be counted for each of these |
I think this metadata is almost certainly invalid, though the spec may require some clarification. I would say that TSV columns probably should not be permitted to overlap with entities. So I'm a bit hesitant to bake into our example a case of bad data. The issue with this data is farther reaching. So I see a few issues:
|
Also lists (see #683). Probably, the final solution will entail a combination of options 1 or 2, AND addressing 3. |
I'm not thrilled at the idea of handling column metadata differently from other metadata. I don't think that would be my naive expectation from the spec, and it adds an extra layer of logic in various places that could quickly get out of hand if we then have to do similar things for other file types. My inclination would be to treat the actual data returned by the query as valid, and finesse the issue as needed in EDIT: oh, but I agree that the spec should probably forbid the re-use of reserved entities in column names. I think this is the soft implication of having columns be named |
I think checking that run is an int in this context can save from future headaches, it is a very cheap solution and comes with a test... I'm fine either way, but because this indeed makes pybids work with (wrong) datasets already out there in the wild, I'd lean towards merging. |
Okay, fair enough. Merging! |
On reflection, I think this was the wrong way to deal with the issue. It actually isn't trivial to handle, because what's happening is that now the I'm going to revert this PR and instead add some extra checking in the indexing code to just prevent the reserved entities (i.e., those mandated to show up in filenames) from ever being read from JSON sidecars. That seems like the most principled fix, and will also potentially prevent other kinds of nasty collisions in cases where users don't follow the spec. Any objections? |
That was happening already before this patch. If you keep the test file |
Well, the problem you reported would still happen, but our test failures would go away. ;) Either way, the proposed fix solves both (type is already enforced on |
BTW, subject is not modified at all in this PR, FWIW. Reverting will definitely not make the issue go away. |
Oh, right. Maybe type isn't being enforced when extracted from filenames then. Will look into it. In any case, I think we're agreed that there's no reason to ever pay attention to |
This is the root of all problems, and it doesn't seem to make a lot of sense. However, I don't think it is explicitly forbidden by BIDS (probably should?). Worst case scenario, someone found a hack using one entity on a JSON and we have to tell them that is not supported. No biggie, so yes. BTW, since the problem of this PR is actually the new test file, I've put together #695 - seems to make the problem go away. |
FIX: Resolve side-effects of new testfile in #682
Summary
When
BIDSLayout.__repr__
is called on datasets that havesub-N_scans.tsv
files, the query to obtain runs also returns these files, and their value is a dictionary which is not hashable by theset
object wrapping up the query.Workaround
Instead of refining the query (which indeed would fix the issue), this workaround uses the naive approach of ensuring that each result of the query has
int
value.Resolves: #681.