Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: get unique, with conflicting meta-data #748

Merged
merged 1 commit into from
Jul 2, 2021
Merged

FIX: get unique, with conflicting meta-data #748

merged 1 commit into from
Jul 2, 2021

Conversation

adelavega
Copy link
Collaborator

@adelavega adelavega commented Jun 29, 2021

Fixes #747

This is a cheap fix for a bug that occurs when a .tsv meta-data entry (i.e. column description) matches a common BIDS entity.

For example, if you call layout.get_tasks(), it will first get all unique values for the task entity. However, if index_metadata=True, that includes all instances of task in .json sidecars for TSVs as well. However, .json sidecars for .tsv files are a different type of meta-data that describes the columns in TSV files, not the entities of the corresponding file.

Thus, get_tasks will fail because it tries to take a set on a list that includes a dict value. Here, I simply excluded dict values from that operation.

@adelavega adelavega requested a review from effigies June 29, 2021 22:43
@adelavega
Copy link
Collaborator Author

Looking at this a bit more, the issue is that all meta-data is stored as an Entity, with the attribute is_metadata=True.

This is fine, except that .json sidecars for .tsv files are a different kind of meta-data. It's meta-data associated with columns, not files.

One option is to not index meta-data for .tsv files at all. However, then f.get_metadata() would not work for these files.

Alternatively, we can add another column to Entity which describes if this is column or file meta-data (is_column?). This would be excluded from get_entities but not from get_metadata.

@adelavega
Copy link
Collaborator Author

@effigies does the inheritance principle apply to tsv file meta-data? If not then that's an argument that pybids should just ignore this type of meta-data from indexing.

@effigies
Copy link
Collaborator

As far as I know, it should apply.

@adelavega
Copy link
Collaborator Author

adelavega commented Jun 30, 2021

I suppose I could imagine how the inherentice principle might apply here, if all event files have the same columns across all subjects.

It's still weird to me that pybids would index this meta-data in a way that lets you filter on it.

For example, it makes no sense to filter tsv files based on the Description of a specific column within it.

@oesteban
Copy link
Collaborator

oesteban commented Jun 30, 2021

This is related to #684. A quick&dirty, ad-hoc workaround would be to filter the retrieved metadata to only provide str, along the same lines of #682 for runs.

EDIT: But yes, I agree with @effigies that there must be another source of metadata for the entity "session" that is inserting a dict in the db.

@adelavega
Copy link
Collaborator Author

adelavega commented Jun 30, 2021

@oesteban that's basically what I did but only excluded dicts.

The dicts are coming from tsv sidecars, since they allow dicts. If you describe a column called "session" in participants.json then there will be a match for the "session" entity that is a dict in the db.

@oesteban
Copy link
Collaborator

oesteban commented Jun 30, 2021 via email

@adelavega
Copy link
Collaborator Author

Yep.

Well if nobody has any objections to this fix, I'll merge it, but I think in the long run the solution is to amend meta-data to differentiate between those coming from tsv sidecars.

@effigies
Copy link
Collaborator

effigies commented Jul 1, 2021

Yeah, I'm okay with this short term, but can we open an issue to make sure it doesn't get forgotten?

@oesteban
Copy link
Collaborator

oesteban commented Jul 1, 2021

Before that, it would also be interesting to make a decision regarding #694.

@adelavega
Copy link
Collaborator Author

I think #694 sufficient covers what I pointed out. I'm going to merge this as is, as its a relatively innocuous and in practice fixes the most glaring issues with this.

@adelavega adelavega merged commit 8a9a5e5 into master Jul 2, 2021
@adelavega adelavega deleted the fix/get branch July 2, 2021 02:29
adelavega added a commit to adelavega/pybids that referenced this pull request Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BIDSlayout.get_sessions() fails on openneuro ds001541
3 participants