FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

adelavega · 2021-07-02T02:50:25Z

If return_type = 'id' ("shorthand queries") or 'dir' and "target" is a core BIDS entity (i.e. one that is derived from filenames, not from meta-data), then only look at results that are not from meta-data.

If you ask for layout.get_runs(), it will only look at values that came from files, not meta-data.
This should prevent collisions from similarly named meta-data values.

However, if you ask for a meta-data target, then it will attempt to take a set across all values it finds:

In [5]: layout.get(target='SliceThickness', return_type='id')
Out[5]: [1, 2.5]

Related #694

oesteban · 2021-07-02T07:10:16Z

Unfortunately, my knowledge about pybids falls short to make a good assessment of this PR (without spending a whole day to catch up with code). But I would say this will help with the API errors from the user perspective. I don't think it will address the DB problems though (see conversation triggered by #682 (comment)).

I believe that in addition to test the type of metadata being retrieved (i.e., whether it is an entity value or not), it would be beneficial and more robust to filter out values that do not match the corresponding regexp.

adelavega · 2021-07-02T14:57:03Z

Ah, sorry I missed that PR.

I think Chris summarized it nicely here: #682 (comment)

This PR is in the spirit of tal's suggestion to "finesse" the logic of __repr__ (which actually relies on get) to avoid crashing on meta-data coming from JSON and not file names.

The alternative is to catch this on ingestion of entities. That is, not read in TSV sidecars at all, like Tal suggested, and enforce type and regex incoming entities prior to adding to the Tag table in the db (although arguably isn't this the job of the validator?).

The one issue I have with this (aside from this fix being easier), is that then if you call get_metadata on a TSV file, it will not return those values, because get_metadata relies on the entities that were read in. That is a minor issue but still.

@effigies WDYT?

effigies · 2021-07-02T12:24:53Z

bids/layout/layout.py

+            base_entities = self.get_entities(metadata=False)
+            metadata = False if target in base_entities else True


This ternary is just if not. And we don't reuse base entities. Maybe just a comment?

Suggested change

base_entities = self.get_entities(metadata=False)

metadata = False if target in base_entities else True

# Fetch metadata if target is not a filename entity

metadata = target not in self.get_entities(metadata=False)

effigies · 2021-07-02T16:57:39Z

bids/layout/layout.py

-            results = [x for x in results if target in x.entities]
+            base_entities = self.get_entities(metadata=False)
+            metadata = False if target in base_entities else True
+            results = [x for x in results if target in x.get_entities(metadata=metadata)]


The number of times we're calling get_entities feels high, and I think it's a DB query. What if we dropped this and did:

if return_type == "id": ent_iter = (x.get_entities(metadata=metadata) for x in results) results = list({ ents[target] for ents in ent_iter if isinstance(ents.get(target, {}), Hashable) # from collections.abc })

from collection.abc import Hashable
Check that return_type == 'dir', still works.

adelavega · 2021-07-02T21:50:13Z

@oesteban: @effigies and I discussed this, and we think this PR should take care of the issue.

All entities being read in from file names are validated, and thus no invalid values should sneak in. Here, we limit it so that only real entities are queried when return_type='id'. That way, no BIDS invalid entities should be returned (no need to filter other types of queries, I don't think).

The alternative would be to do this upon ingestion of meta-data, and filter out meta-data that looks like an entity (but isn't) from being ingested. However, this causes two problems. 1) it's not illegal in BIDS (yet) and 2) if you .get_metadata on a TSV file that had an entity like entry its in JSON sidecar, it would be filtered, which is weird because its not a faithful representation of the dataset (which again, is technically legal).

I prefer this solution because it keep .get working but also allows for currently legal meta-data to still exist if the user wants to inspect it.

adelavega · 2021-07-21T18:30:34Z

Summary of how this should behave: if return_type = id and entity is a proper entity (that should be defined in filenames), then get_entities with metadata=False. Otherwise, rely on frozendicts to make non-hashable metadata (i.e. dicts) hashable (although what about lists?--- worry about this in separate PR).

adelavega · 2021-07-21T22:57:24Z

Turns out to make this change possible, I had to change where is_metadata was stored in the db.

Previously, Entity objects had an is_metadata column, but this is set upon the initial creation of the Entity.

This means that an Entity could have Tags that are from both metadata and filename sources, but the Entity could either have is_metadata as true or false depending on the order the Tags were ingested.

For me, this meant that the "Task" Entitity said is_metadata=False even though it indeed returned meta-data entries (which caused layout.get_tasks to fail)

To fix this, I moved is_metadata to Tag in the db, and modified all of the queries accordingly.

In an example where the "task" Entity is defined in both meta-data and filenames, the following would happen:

layout.get_entities(is_metadata=True)
>>
{'suffix': <bids.layout.models.Entity at 0x7fbd984e4640>,
 'extension': <bids.layout.models.Entity at 0x7fbd984e43d0>,
 'subject': <bids.layout.models.Entity at 0x7fbd931a2c40>,
 'scans': <bids.layout.models.Entity at 0x7fbd931a2e20>,
 'task': <bids.layout.models.Entity at 0x7fbd931a2310>,
 'datatype': <bids.layout.models.Entity at 0x7fbd931a8f70>,
 'run': <bids.layout.models.Entity at 0x7fbd931a8640>}

layout.get_entities(is_metadata=True)
>>
{'age': <bids.layout.models.Entity at 0x7fbd98539880>,
 'comprehension': <bids.layout.models.Entity at 0x7fbd985399a0>,
 'condition': <bids.layout.models.Entity at 0x7fbd985a96d0>,
 'sex': <bids.layout.models.Entity at 0x7fbd985a9730>,
 'task': <bids.layout.models.Entity at 0x7fbd931a2310>,
...
}

That is, task is an entity which has Tag values both from meta-data and non-metadata sources.
For example:

ent = layout.get_entities(metadata=False)['task']
[t.value for t in ent.tags.values()]
>>>
[ 
   {'Description': 'tasks (story stimuli) collected for participant'},
   'sherlock', 
   'lucy',
   ...
]

Going back to the original problem, the get function would only look for non-metadata values for Task for get_tasks:

layout.get_tasks()
>>>
['milkyway', 'piemanpni', 'shapessocial', 'black', 'bronx', 'forgot', 'sherlock', 'tunnel', 'prettymouth', 'shapesphysical', 'lucy', 'notthefallintact', 'pieman', 'notthefalllongscram', 'schema', '21styear', 'notthefallshortscram', 'slumlordreach', 'merlin']

This value does not include the dict meta-data, as expected.

effigies

Looks like some stray middle-clicks.

bids/layout/layout.py

effigies

This looks good otherwise. Only question: Do we have a way of invalidating old database files?

Co-authored-by: Chris Markiewicz <[email protected]>

adelavega · 2021-07-25T22:43:17Z

Thanks.

No, and it looks like it doesn't crash on initialization, but until you try to access a property that doesn't exist in the old db (i.e. layout.get_subjects())

Only way I can see us handling this proactively is adding BIDS versions to the db dirs, and throwing a warning if you load a saved layout from a previous version. Would be good for different PR.

adelavega · 2021-07-26T20:39:07Z

Tests passing, merging but opening a new issue for what you mentioned @effigies

Only get non-metadata entity values if target is a core entity key

d1cf243

adelavega requested review from effigies and oesteban July 2, 2021 02:50

effigies reviewed Jul 2, 2021

View reviewed changes

adelavega mentioned this pull request Jul 3, 2021

FIX: Make lists and dicts hashable #684

Merged

adelavega added 3 commits July 21, 2021 16:50

Use @effigies suggestions, and drop requirement that its hashable

c117634

Associate is_metadata with Tag, not Entity

054178f

Doing join Tag twice

1f4f4b0

adelavega changed the title ~~Only get non-metadata entity values if target is a core entity key~~ RF: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') Jul 21, 2021

Again, don't join Tag twice'

360075b

effigies reviewed Jul 23, 2021

View reviewed changes

bids/layout/layout.py Outdated Show resolved Hide resolved

bids/layout/layout.py Outdated Show resolved Hide resolved

effigies approved these changes Jul 23, 2021

View reviewed changes

adelavega and others added 2 commits July 25, 2021 17:34

Update bids/layout/layout.py

2288747

Co-authored-by: Chris Markiewicz <[email protected]>

Update bids/layout/layout.py

2ab7b46

Co-authored-by: Chris Markiewicz <[email protected]>

adelavega merged commit 6520748 into master Jul 26, 2021

adelavega deleted the fix/get branch July 26, 2021 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

adelavega commented Jul 2, 2021 •

edited

Loading

oesteban commented Jul 2, 2021

adelavega commented Jul 2, 2021

effigies Jul 2, 2021

effigies Jul 2, 2021

adelavega Jul 2, 2021

adelavega commented Jul 2, 2021

adelavega commented Jul 21, 2021

adelavega commented Jul 21, 2021 •

edited

Loading

effigies left a comment

effigies left a comment

adelavega commented Jul 25, 2021

adelavega commented Jul 26, 2021

		base_entities = self.get_entities(metadata=False)
		metadata = False if target in base_entities else True

FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

FIX: Associate "is_metadata" with Tag, not Entity; and only return non-metadata entries for core Entities in get(return_type='id') #749

Conversation

adelavega commented Jul 2, 2021 • edited Loading

oesteban commented Jul 2, 2021

adelavega commented Jul 2, 2021

effigies Jul 2, 2021

Choose a reason for hiding this comment

effigies Jul 2, 2021

Choose a reason for hiding this comment

adelavega Jul 2, 2021

Choose a reason for hiding this comment

adelavega commented Jul 2, 2021

adelavega commented Jul 21, 2021

adelavega commented Jul 21, 2021 • edited Loading

effigies left a comment

Choose a reason for hiding this comment

effigies left a comment

Choose a reason for hiding this comment

adelavega commented Jul 25, 2021

adelavega commented Jul 26, 2021

adelavega commented Jul 2, 2021 •

edited

Loading

adelavega commented Jul 21, 2021 •

edited

Loading