Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should "n/a" be allowed in the onset column of events.tsv #1943

Closed
VisLab opened this issue Oct 4, 2024 · 6 comments · Fixed by #1966
Closed

Should "n/a" be allowed in the onset column of events.tsv #1943

VisLab opened this issue Oct 4, 2024 · 6 comments · Fixed by #1966

Comments

@VisLab
Copy link
Member

VisLab commented Oct 4, 2024

The current description of onset in Task events says that the onset column may have 0, positive and negative values. It does not explicitly say it does not allow "n/a", although I believe that somewhere in the past that was in the spec. The current description of duration explicitly says that it can be "n/a" and what that means.

My question is --- is "n/a" really allowed in the onset column. If so, what does it mean and how are tools required to handle it?

This should be made explicit in the specification and in the schema.

This issue is related to issue #1938 and to issue hed-standard/hed-python#1026.

@VisLab
Copy link
Member Author

VisLab commented Oct 10, 2024

@effigies @Remi-Gau We would really appreciate clarification on this in the spec:

  1. Should n/a be allowed in the "onset" columns of events.tsv files? (I read the current spec as it doesn't, but the description is ambiguous.)
  2. If n/a is allowed how should it be handled by tools?

@effigies
Copy link
Collaborator

Should n/a be allowed in the "onset" columns of events.tsv files? (I read the current spec as it doesn't, but the description is ambiguous.)

My read of common principles is that any TSV column can have missing values indicated with n/a.

If n/a is allowed how should it be handled by tools?

For the validator, we verify that the existing values are sorted, ignoring n/a. For design matrix generation, I would just drop the rows on ingestion, since they are not usable without an onset and duration.

Tbh, I don't know what the use case for these is. I'd be okay saying that tools SHOULD drop rows with these onsets, and then somebody who has something to do with them can neglect that.

I don't think we give much interpretation guidance to tools, so I'm not sure exactly how or where to say that.

@VisLab
Copy link
Member Author

VisLab commented Oct 10, 2024

My concern is not tsv columns in general but the "onset" column in particular:

https://bids-specification.readthedocs.io/en/stable/glossary.html#onset-columns. This description mentions positive and negative values very specifically and what the behavior should be.

@VisLab
Copy link
Member Author

VisLab commented Oct 10, 2024

Let me give a little more background on this issue. In the past, the HED validators have assumed that the onset columns do not allow "n/a". However, @yarikoptic has a dataset in which he wants "n/a" in the onset column to be interpreted as something that happened at some indeterminate time during the run. This is not unreasonable. I did an preliminary implementation in the Python HED validator that handles this as follows:

  1. The rows with numeric onsets are sorted and handled in the usual way as though the "n/a" rows weren't there (except for adjusting row numbers for error messages).
  2. The rows with "n/a" onsets are treated for the purposes of HED validation like in .tsv files that don't have an onset column.

Our downstream HED tools expect that the "n/a" onset rows are filtered out before they get them. Before we go forward with this and eventually get it into the HED JavaScript validator, we would appreciate a definitive statement in the specification about whether "n/a" is allowed and how it should be interpreted. For example in the Task Events chapter the discussion of duration does this, but the discussion of onset does not.

@Remi-Gau
Copy link
Collaborator

I agree with @effigies description of what I would expect.

@VisLab should we just add Must always be either zero or positive (or n/a if unavailable). to the event description in the task section (same sentence as for duration) to make it explicit?

@VisLab
Copy link
Member Author

VisLab commented Oct 11, 2024

It can be negative too.

Just add:

A n/a value indicates the onset time is unknown or unavailable.

After

Onset (in seconds) of the event, measured from the beginning of the acquisition of the first data point stored in the corresponding task data file. Negative onsets are allowed, to account for events that occur prior to the first stored data point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants