Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible error when anonymizing EDF files #667

Closed
adam2392 opened this issue Dec 23, 2020 · 15 comments · Fixed by #669
Closed

Possible error when anonymizing EDF files #667

adam2392 opened this issue Dec 23, 2020 · 15 comments · Fixed by #669
Labels

Comments

@adam2392
Copy link
Member

Describe the bug

Using write_raw_bids, I converted an EDF file using v0.6 while anonymizing it. However, I tried opening the same file in EDFBrowser, and I got this error:

Error, file is marked as EDF+ but startdate field does not match with startdate in
recordfield:
"12.02.85" <-> "12-FEB-2012".
 File is not a valid EDF or BDF file.

This suggests that the anonymization didn't work. In addition, it doesn't seem to be a valid file in the context of edfbrowser suggesting that further errors might occur in the future.

Steps to reproduce

Convert dataset attached here: https://www.dropbox.com/sh/w0wd0icoo9bib2j/AAB0MVgibmrTUeSRPNGmeIaya?dl=0

Expected results

The file after doing anonymization in write_raw_bids should still be accessible by EDFBrowser.

Additional information

A related issue is just telling write_raw_bids to convert said file to BrainVision. #659

@adam2392
Copy link
Member Author

Cc: @a-hurst do you know why that specific error might be popping up? It looks like there is another field with the recording date that might not have been altered?

Sorry I'm not extremely familiar w/ the EDF format.

@sappelhoff sappelhoff added the bug label Dec 23, 2020
@a-hurst
Copy link
Contributor

a-hurst commented Dec 24, 2020

@adam2392 Hmm, I know the reason for the error but I unfortunately don't see an easy way around it, other than a fix on EDFbrowser's end.

Basically, EDF files always store the last two digits of the recording year in the 'recording date' field, with the earliest possible year being 1985 and the latest possible year being 2084. Additionally, EDF+ files are optionally able to store the full 4 digits of the year in the 'recording info' part of the header for greater precision.

Because the BIDS standard recommends anonymizing files to a date before 1920, the only way to do this within the EDF/EDF+ spec is to set the 'recording date' field to the earliest possible year (1985) and set the real anonymized year in the recording info field (which MNE always tries parsing before attempting to parse the less-precise 'recording date' field). I guess we didn't anticipate that any EDF softwares would check the two dates to make sure they matched!

The only solution I can see working for MNE-BIDS would be to make the last two digits of the 'recording date' year always match the anonymized date, but that creates a problem where files anonymized to 1917 might get interpreted as being from 2017 by other EDF softwares. The '85' year for all anonymized files was our solution to clearly marking them as anonymized, even if a software doesn't support parsing the proper 4-digit date in the header.

@sappelhoff
Copy link
Member

in brief: It's not possible to reasonably anonymize EDF/EDF+/BDF files to a year prior to 1985. :-( I am starting to be in favor of going back to auto-conversion to BrainVision if anonymization to prior 1985 is wanted. And I am starting to develop a severe dislike for EDF ... Austin, can you tell me one nice thing about EDF which makes it good in my eyes again? :-)

@hoechenberger
Copy link
Member

in brief: It's not possible to reasonably anonymize EDF/EDF+/BDF files to a year prior to 1985. :-( I am starting to be in favor of going back to auto-conversion to BrainVision if anonymization to prior 1985 is wanted.

Absolutely. We're running into issues that are unnecessary … my understanding is that BV supports all EDF features, plus allows for BIDS compatibility. Let's pick this when in doubt.

And I am starting to develop a severe dislike for EDF ... Austin, can you tell me one nice thing about EDF which makes it good in my eyes again? :-)

😅

@a-hurst
Copy link
Contributor

a-hurst commented Dec 24, 2020

@sappelhoff Don't know much about the BrainVision format's internals, but the main points in favour of EDF+/BDF+ I can think of are:

  • Single file per recording
  • Broad software compatibility (well, maybe not so much...)
  • Some potentially useful metadata in the header that might get lost with a format conversion
  • Avoids extra processing/conversion of data when not explicitly necessary: IIRC BrainVision stores values in float32, whereas EDF and BDF store values as int16 and int24 with conversion factors (respectively). That said, MNE converts all EDF/BDF data to float32 upon import anyway, so it might not make any difference.
  • Discontinuous recordings and text annotations? Not sure what BrainVision supports here or how the conversion process works, but if it doesn't support those, that would be a potentially unexpected loss of data

That's the best case I can think of for anonymizing them without a format conversion, but I also see the benefits of always converting to BrainVision. As much effort as I put into the EDF anonymization PR, I'm perfectly happy to see it dropped by the wayside if it's not the best possible solution for end-users!

EDIT: I think if we can find any other big EDF utilities that do the same date-matching check as EDFBrowser, that would negate any 'broad compatibility' benefit that the format has, and clearly make BrainVision conversion a better choice. If it's just an EDFBrowser quirk, I think it would be easiest to file an issue or PR there and remove the sanity check.

@adam2392
Copy link
Member Author

Thanks for the explanations!

I've filed an issue into EDFBrowser: https://gitlab.com/Teuniz/EDFbrowser/-/issues/26 because I do believe that it could potentially be fixed there by removing some of these checks.

In the meantime, if @hoechenberger agrees, then I would be in favor of at least adding a convert_to kwarg in write_raw_bids to allow BrainVision usage even if the sourcefile is in EDF. This would subsequently close #659.

@sappelhoff
Copy link
Member

@adam2392
Copy link
Member Author

Teuniz suggested the following: https://gitlab.com/Teuniz/EDFbrowser/-/issues/26#note_474109313

I am wondering in our case, can we do anonymization through mne-bids as:

  • set date to 1985 01-01, 00:00:00 in the EDF file, but then set a random date < 1920 in the scans.tsv?
  • then when reading, apply the measurement date from the scans file.

That way everything is still anonymized, the EDF file is still EDF+/edfbrowser compliant, and mne-bids still works as intended?

@hoechenberger
Copy link
Member

  • but then set a random date < 1920 in the scans.tsv?

Shouldn't be random, but shifted by a user-specified duration :) But otherwise yes, why not. This is once again a deviation from BIDS, as the file content deviates from the sidecar metadata, but for me and a few others this seems to be okay ;) we should codify this though, i.e. amend BIDS asap…

@sappelhoff
Copy link
Member

A note about this in the bids spec seems to be important, yes. Because edf is just impossible to anonymize with bids recommendations.

@jasmainak
Copy link
Member

The 1925 date requirement keeps coming up all the time. It's a problem in fif, now edf. But HIPPA doesn't require it. Folks should really reconsider this ...

@hoechenberger
Copy link
Member

The 1925 date requirement keeps coming up all the time. It's a problem in fif, now edf. But HIPPA doesn't require it. Folks should really reconsider this ...

Yes I think it should be dropped and instead there should be a flag which states whether the dates were anonymized or not. It's damn annoying.

@sappelhoff
Copy link
Member

related: bids-standard/bids-specification#538

@jasmainak
Copy link
Member

how can we push to make this happen?

@sappelhoff
Copy link
Member

we have a second example now for why the BIDS recommendation for anonymization is inconvenient ... Adam started an issue on that, perhaps we can gather enough momentum and arguments to push it through this time. I can't push too much myself right now but I'll try to facilitate wherever I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants