FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

oesteban · 2020-11-25T16:10:38Z

Summary

When BIDSLayout.__repr__ is called on datasets that have sub-N_scans.tsv files, the query to obtain runs also returns these files, and their value is a dictionary which is not hashable by the set object wrapping up the query.

Workaround

Instead of refining the query (which indeed would fix the issue), this workaround uses the naive approach of ensuring that each result of the query has int value.

Resolves: #681.

When `BIDSLayout.__repr__` is called on datasets that have `sub-N_scans.tsv` files, the query to obtain runs also returns these files, and their value is a dictionary which is not hashable by the `set` object wrapping up the query. Instead of refining the query (which indeed would fix the issue), this workaround uses the naive approach of ensuring that each result of the query has `int` value. Resolves: bids-standard#681.

effigies · 2020-11-25T16:30:28Z

Can we create a test?

oesteban · 2020-11-25T16:55:20Z

Can we create a test?

Would it suffice to break a currently existing test?

BTW, this has made me realize that this bug would only occur when:

There's a _scans.tsv file AND
There's an associated _scans.json file to describe the columns and one of them happens to be "run".

I'm too lazy to check on the documentation, but if the _scans.json is allowed to look like:

{
  ...
   "run": 1
  ...
}

then this patch would not work, and one extra run would be counted for each of these _scans.json files.

effigies · 2020-11-25T19:47:13Z

I think this metadata is almost certainly invalid, though the spec may require some clarification. I would say that TSV columns probably should not be permitted to overlap with entities. So I'm a bit hesitant to bake into our example a case of bad data.

The issue with this data is farther reaching. layout.get_runs() has the exact same problem.

So I see a few issues:

Should PyBIDS be loading column metadata in the same way as it loads image metadata?
Should PyBIDS reject TSV columns or metadata fields entities?
Dictionary values can be valid for some metadata. We may want to consider using some kind of frozen dict, or this case will reappear somewhere with some get_{metadata}s() method.

oesteban · 2020-11-25T20:04:37Z

3. Dictionary values can be valid for some metadata. We may want to consider using some kind of frozen dict, or this case will reappear somewhere with some get_{metadata}s() method.

Also lists (see #683). Probably, the final solution will entail a combination of options 1 or 2, AND addressing 3.

tyarkoni · 2020-11-25T20:32:42Z

I'm not thrilled at the idea of handling column metadata differently from other metadata. I don't think that would be my naive expectation from the spec, and it adds an extra layer of logic in various places that could quickly get out of hand if we then have to do similar things for other file types. My inclination would be to treat the actual data returned by the query as valid, and finesse the issue as needed in __repr__ and other places it arises, as in @oesteban's proposed fix.

EDIT: oh, but I agree that the spec should probably forbid the re-use of reserved entities in column names. I think this is the soft implication of having columns be named filename and participant_id instead of subject and run, but it should probably be made explicit.

tyarkoni · 2021-01-11T16:17:45Z

My read is that we've converged on not handling this particular problem, as it should probably be explicitly disallowed by the spec (#683 is more general and I think that should still be dealt with). Do we still want/need to merge this, @oesteban, or can we close it?

oesteban · 2021-01-11T16:48:31Z

I think checking that run is an int in this context can save from future headaches, it is a very cheap solution and comes with a test... I'm fine either way, but because this indeed makes pybids work with (wrong) datasets already out there in the wild, I'd lean towards merging.

tyarkoni · 2021-01-11T18:42:20Z

Okay, fair enough. Merging!

tyarkoni · 2021-01-12T15:06:05Z

On reflection, I think this was the wrong way to deal with the issue. It actually isn't trivial to handle, because what's happening is that now the dict value for run is making it into the Tag table in the DB, and while it's only currently causing problems in 2 tests, it could in principle rear its head elsewhere. There's no reason that entities like run or subject should ever have a dict value (or, for that matter, an int value in the case of subject), because that clearly violates the spec.

I'm going to revert this PR and instead add some extra checking in the indexing code to just prevent the reserved entities (i.e., those mandated to show up in filenames) from ever being read from JSON sidecars. That seems like the most principled fix, and will also potentially prevent other kinds of nasty collisions in cases where users don't follow the spec. Any objections?

oesteban · 2021-01-12T15:09:25Z

That was happening already before this patch. If you keep the test file bids/tests/data/7t_trt/sub-01/ses-1/sub-01_ses-1_scans.json, this error will happen again

tyarkoni · 2021-01-12T15:12:27Z

Well, the problem you reported would still happen, but our test failures would go away. ;)

Either way, the proposed fix solves both (type is already enforced on subject when read from filenames, so ignoring subject entries in the JSON will prevent the filename value from being overwritten).

oesteban · 2021-01-12T15:12:29Z

BTW, subject is not modified at all in this PR, FWIW.

Reverting will definitely not make the issue go away.

tyarkoni · 2021-01-12T15:14:50Z

BTW, subject is not modified at all in this PR, FWIW.

Oh, right. Maybe type isn't being enforced when extracted from filenames then. Will look into it. In any case, I think we're agreed that there's no reason to ever pay attention to run, subject, etc. found in JSON, yes?

oesteban · 2021-01-12T21:45:21Z

there's no reason to ever pay attention to run, subject, etc. found in JSON, yes?

This is the root of all problems, and it doesn't seem to make a lot of sense. However, I don't think it is explicitly forbidden by BIDS (probably should?). Worst case scenario, someone found a hack using one entity on a JSON and we have to tell them that is not supported. No biggie, so yes.

BTW, since the problem of this PR is actually the new test file, I've put together #695 - seems to make the problem go away.

FIX: Resolve side-effects of new testfile in #682

oesteban requested a review from adelavega November 25, 2020 16:10

tst: modify test and break current master with bids-standard#681

017673d

tyarkoni merged commit 7d9c27c into bids-standard:master Jan 11, 2021

tyarkoni mentioned this pull request Jan 12, 2021

TEST: dataset-level model spec retrieval #693

Merged

tyarkoni mentioned this pull request Jan 12, 2021

Q: Flatten metadata keys? #694

Open

oesteban deleted the fix/issue681-repr branch January 12, 2021 20:01

oesteban added a commit to oesteban/pybids that referenced this pull request Jan 12, 2021

FIX: Resolve side-effects of new testfile in bids-standard#682

412c3b2

effigies added a commit that referenced this pull request Jan 21, 2021

Merge pull request #695 from oesteban/fix/refine-682

5747a18

FIX: Resolve side-effects of new testfile in #682

effigies mentioned this pull request Apr 8, 2021

layout.get_sessions() throwing error #714

Closed

This was referenced Jun 30, 2021

FIX: get unique, with conflicting meta-data #748

Merged

FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

oesteban commented Nov 25, 2020

effigies commented Nov 25, 2020

oesteban commented Nov 25, 2020

effigies commented Nov 25, 2020

oesteban commented Nov 25, 2020

tyarkoni commented Nov 25, 2020 •

edited

Loading

tyarkoni commented Jan 11, 2021

oesteban commented Jan 11, 2021

tyarkoni commented Jan 11, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

Conversation

oesteban commented Nov 25, 2020

Summary

Workaround

effigies commented Nov 25, 2020

oesteban commented Nov 25, 2020

effigies commented Nov 25, 2020

oesteban commented Nov 25, 2020

tyarkoni commented Nov 25, 2020 • edited Loading

tyarkoni commented Jan 11, 2021

oesteban commented Jan 11, 2021

tyarkoni commented Jan 11, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

tyarkoni commented Jan 12, 2021

oesteban commented Jan 12, 2021

tyarkoni commented Nov 25, 2020 •

edited

Loading