Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: BIDSLayout -- TypeError: unhashable type: 'dict' #682

Merged
merged 2 commits into from
Jan 11, 2021

Conversation

oesteban
Copy link
Collaborator

Summary

When BIDSLayout.__repr__ is called on datasets that have sub-N_scans.tsv files, the query to obtain runs also returns these files, and their value is a dictionary which is not hashable by the set object wrapping up the query.

Workaround

Instead of refining the query (which indeed would fix the issue), this workaround uses the naive approach of ensuring that each result of the query has int value.

Resolves: #681.

When `BIDSLayout.__repr__` is called on datasets that have `sub-N_scans.tsv` files,
the query to obtain runs also returns these files, and their value is a dictionary
which is not hashable by the `set` object wrapping up the query.

Instead of refining the query (which indeed would fix the issue), this workaround
uses the naive approach of ensuring that each result of the query has `int` value.

Resolves: bids-standard#681.
@oesteban oesteban requested a review from adelavega November 25, 2020 16:10
@effigies
Copy link
Collaborator

Can we create a test?

@oesteban
Copy link
Collaborator Author

Can we create a test?

Would it suffice to break a currently existing test?

BTW, this has made me realize that this bug would only occur when:

  1. There's a _scans.tsv file AND
  2. There's an associated _scans.json file to describe the columns and one of them happens to be "run".

I'm too lazy to check on the documentation, but if the _scans.json is allowed to look like:

{
  ...
   "run": 1
  ...
}

then this patch would not work, and one extra run would be counted for each of these _scans.json files.

@effigies
Copy link
Collaborator

I think this metadata is almost certainly invalid, though the spec may require some clarification. I would say that TSV columns probably should not be permitted to overlap with entities. So I'm a bit hesitant to bake into our example a case of bad data.

The issue with this data is farther reaching. layout.get_runs() has the exact same problem.

So I see a few issues:

  1. Should PyBIDS be loading column metadata in the same way as it loads image metadata?
  2. Should PyBIDS reject TSV columns or metadata fields entities?
  3. Dictionary values can be valid for some metadata. We may want to consider using some kind of frozen dict, or this case will reappear somewhere with some get_{metadata}s() method.

@oesteban
Copy link
Collaborator Author

3. Dictionary values can be valid for some metadata. We may want to consider using some kind of frozen dict, or this case will reappear somewhere with some get_{metadata}s() method.

Also lists (see #683). Probably, the final solution will entail a combination of options 1 or 2, AND addressing 3.

@tyarkoni
Copy link
Collaborator

tyarkoni commented Nov 25, 2020

I'm not thrilled at the idea of handling column metadata differently from other metadata. I don't think that would be my naive expectation from the spec, and it adds an extra layer of logic in various places that could quickly get out of hand if we then have to do similar things for other file types. My inclination would be to treat the actual data returned by the query as valid, and finesse the issue as needed in __repr__ and other places it arises, as in @oesteban's proposed fix.

EDIT: oh, but I agree that the spec should probably forbid the re-use of reserved entities in column names. I think this is the soft implication of having columns be named filename and participant_id instead of subject and run, but it should probably be made explicit.

@tyarkoni
Copy link
Collaborator

My read is that we've converged on not handling this particular problem, as it should probably be explicitly disallowed by the spec (#683 is more general and I think that should still be dealt with). Do we still want/need to merge this, @oesteban, or can we close it?

@oesteban
Copy link
Collaborator Author

I think checking that run is an int in this context can save from future headaches, it is a very cheap solution and comes with a test... I'm fine either way, but because this indeed makes pybids work with (wrong) datasets already out there in the wild, I'd lean towards merging.

@tyarkoni
Copy link
Collaborator

Okay, fair enough. Merging!

@tyarkoni tyarkoni merged commit 7d9c27c into bids-standard:master Jan 11, 2021
@tyarkoni
Copy link
Collaborator

On reflection, I think this was the wrong way to deal with the issue. It actually isn't trivial to handle, because what's happening is that now the dict value for run is making it into the Tag table in the DB, and while it's only currently causing problems in 2 tests, it could in principle rear its head elsewhere. There's no reason that entities like run or subject should ever have a dict value (or, for that matter, an int value in the case of subject), because that clearly violates the spec.

I'm going to revert this PR and instead add some extra checking in the indexing code to just prevent the reserved entities (i.e., those mandated to show up in filenames) from ever being read from JSON sidecars. That seems like the most principled fix, and will also potentially prevent other kinds of nasty collisions in cases where users don't follow the spec. Any objections?

@oesteban
Copy link
Collaborator Author

That was happening already before this patch. If you keep the test file bids/tests/data/7t_trt/sub-01/ses-1/sub-01_ses-1_scans.json, this error will happen again

@tyarkoni
Copy link
Collaborator

Well, the problem you reported would still happen, but our test failures would go away. ;)

Either way, the proposed fix solves both (type is already enforced on subject when read from filenames, so ignoring subject entries in the JSON will prevent the filename value from being overwritten).

@oesteban
Copy link
Collaborator Author

BTW, subject is not modified at all in this PR, FWIW.

Reverting will definitely not make the issue go away.

@tyarkoni
Copy link
Collaborator

BTW, subject is not modified at all in this PR, FWIW.

Oh, right. Maybe type isn't being enforced when extracted from filenames then. Will look into it. In any case, I think we're agreed that there's no reason to ever pay attention to run, subject, etc. found in JSON, yes?

@oesteban oesteban deleted the fix/issue681-repr branch January 12, 2021 20:01
oesteban added a commit to oesteban/pybids that referenced this pull request Jan 12, 2021
@oesteban
Copy link
Collaborator Author

there's no reason to ever pay attention to run, subject, etc. found in JSON, yes?

This is the root of all problems, and it doesn't seem to make a lot of sense. However, I don't think it is explicitly forbidden by BIDS (probably should?). Worst case scenario, someone found a hack using one entity on a JSON and we have to tell them that is not supported. No biggie, so yes.

BTW, since the problem of this PR is actually the new test file, I've put together #695 - seems to make the problem go away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BIDSLayout (TypeError: unhashable type: 'dict')
3 participants