Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact Subspace Reconstruction #9302

Closed
wants to merge 14 commits into from
Closed

Artifact Subspace Reconstruction #9302

wants to merge 14 commits into from

Conversation

DiGyt
Copy link
Contributor

@DiGyt DiGyt commented Apr 15, 2021

Reference issue

Fixes #7479.

What does this implement/fix?

Introduces Artifact Subspace Reconstruction for cleaning raw EEG.

Additional information

Sorry for the long waiting since my mention in issue #6290.

I wanted to make sure that this algorithm was as equivalent as possible to the original ASR implementation in MATLAB.

@nbara, first of all thank you very much for your implementation, this was a great thing to start with and I could many functions without any need to change. Due to your contribution to the code, I mentioned you as an author in the corresponding files (but without the E-Mail address). Are you okay with that or should I change it in some way?

Side note @nbara :
Although your code looks really solid on a theoretical level, most of the operations converge from the original implementation. You are probably aware of most of the conceptual differences (using different functions or loops etc), but I also think I found smaller mistakes like dot products on matrices where actually the transposed matrices were required (I think this happened somewhere in asr_calibrate or one of its helper functions). Just wanted to tell you this, since I think this can change the data drastically...

Equivalence to the original

In this implementation, I made sure that asr_calibrate and asr_process are perfectly equivalent with the original ASR implementation. However there were several steps that introduce a slight divergence of the data. This is mostly due to different solvers used in MATLAB/Python. These steps are:

  • The moore-penrose pseudoinverse (due to a different solver in MATLAB)
  • Calculating the eigenvectors and eigenvalues (probably similar problem)
  • fitting an EEG distribution fit_eeg_distribution
  • extracting clean data windows clean_windows

Everything else in these functions should be entirely equivalent, leading to an average channel correlation of r=0.94-0.97 between original version and this implementation (as compared to r~0.0 with uncleaned data or the ASR implementation this was based on).

Concerning testing, I am not sure how we should optimally proceed. For now I test the correlation of cleaned data with data cleaned with the original ASR. This is obviously not perfect, as it's not a precise assertion and requires additional external data for testing (for now I added this test data directly to preprocessing/tests/data, assuming that it will be changed anyway).

riemannian ASR

The riemannian variant of Artifact Subspace Reconstruction is not yet available, as it requires pyriemann (and probably pymanopt) as dependencies and the current version of pyriemann has compatibility issues with the newer scipy versions (See pyRiemann/pyRiemann#94 and pyRiemann/pyRiemann#98). But as most people report no big difference between ASR/rASR anyway, this should be okay for now. However, I left a method parameter and notes to later adapt the riemannian variant more easily.

@nbara
Copy link
Contributor

nbara commented Apr 15, 2021

Hi @DiGyt, I'm happy if my code ends up being useful. And in the long term it will be better maintained in MNE than I could ever on my own.

Also happy to review the code next week.

FWIW, here's what kept me from filing a PR in MNE-python:

  • ASR is mostly used within BCI settings, in real-time; Actually a lot of the matlab code is dedicated to handling streaming data. And as such I am not sure it really belongs to this repo (maybe mne-realtime would be a better venue?)
  • Related to the first point, for offline use I'm not 100% convinced it's the best denoising technique available (do you have an opinion on that?). I guess one should write an example showing the actual benefits on a real dataset
  • The riemannian version requires pymanopt (and the Matlab version has a non-linear eigenspace function that is not directly available in pymanopt);
  • As you noted, I took some "artistic" liberties wrt the original code :) because I felt it was weirdly coded or overly convoluted at time.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few comments. Now I am concerned about the online aspect absolutely not covered by mne-python. If it can be valuable for offline processing I think a convincing example on real data would be very useful to assess the relevance of ASR for mne users. thanks @DiGyt !

return mu, sig, alpha, beta


def yulewalk(order, F, M):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, Thanks for pointing that out!

return out, zf


def ma_filter(N, X, Zi):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return X, Zf


def geometric_median(X, tol=1e-5, max_iter=500):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be used to do an evoked using median and not mean of epochs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make a more general version of this function and add it to utils?

@@ -0,0 +1,38 @@
# Authors: Dirk Gütlin <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put a big file like mne/preprocessing/tests/data/matlab_asr_data.mat in the repo. It should be in the mne-testing repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as mentioned in the initial commit, this is just a preliminary thing. In case we want to keep this file in the testing procedure, I'll move it to the testing data repo...

@DiGyt
Copy link
Contributor Author

DiGyt commented Apr 16, 2021

@nbara

* As you noted, I took some "artistic" liberties wrt the original code :) because I felt it was weirdly coded or overly convoluted at time.

Totally agree with that. I have the exact same opinion on a lot of the MATLAB code. I actually liked some of your implementations better as they looked way more "pythonic". However, I wanted to play it safe and therefore chose the path of direct equivalence...

@agramfort @nbara
concerning online/offline use of ASR

I agree that the method generally was developed to work with real-time streaming data.

However, knowing that ASR/rASR was included as the standard EEG cleaning plugin for EEGLAB, there seems to be a point of applying it to offline data (at least to the EEGLAB devs).
The reason I was getting interested in ASR was that members of my current lab tested it out and were very positive about it. (We are doing offline EEG analysis and had previously faced certain issues with applying AutoReject in one of our experiments).

My personal opinion of ASR is that it looks very good if you inspect it visually, meaning that it seems to catch and interpolate noise very well while at the same time leaving alone clean segments.
That said, I'm not sure in how far it actually improves the signal to noise ratio. I'm currently working on a comparison between different cleaning algorithms and want to compare their effect on evoked potentials, signal space decoding performance, and tfr-based decoding performance. I'm planning to do this in an online ipynb notebook, and can point you to it once its done.
As an addition to the automated cleaning comparison, I could also produce a simple MNE-tutorial-style notebook, going through ASR during preprocessing and comparing before/after. If you want to include ASR in MNE, this would be a nice thing to have anyway.

P.S.: if anyone knows about some cleaning algorithms that would be interesting to include, feel free to mention them here :)

@DiGyt
Copy link
Contributor Author

DiGyt commented Jun 15, 2021

From: #7479 (comment)

Soo, here is a link to my comparison of a few automated cleaning methods.

https://digyt.github.io/automated_EEG_cleaning_comparison/

It's far from exhaustive and perfectly valid, as this would imply more variable data and more computing power (definitely going beyond an online Google Colab Script). Other limitations are named in the notebook.

However, as far as this allows me to draw conclusions, I would say the following about Artifact Subspace Reconstruction:

  • Artifact Subspace Reconstruction seems to work surprisingly bad if you want to investigate "ERP-like" signal-to-noise ratio, but it improves decoding accurary for tasks which are not based on the average signal. This should be a very welcome contrast as most other automated cleaning techniques are more or less explicitly designed towards minimizing deviation from the average signal/ERP.
  • The presented algorithm for Artifact Subspace Reconstruction is working on raw data and has the potential to fully interpolate data (of course this doesn't necessarily improve the signal). Both are factors that make it more flexible related to algorithms that require epochs or pseudo-epochs to work. Of course exclusion can be a better choice than interpolation, but an option for simply marking/excluding bad segments can easily be added.
  • A secondary argument is that the riemannian variant (which can be now implemented as pyriemann seems to have no more compatibility issues with the newest scipy) is reported to perform better than the standard (/euclidean) version. Again one also could argue that this is already implemented as the standard cleaning plugin in EEGLab and used by some people.

So, in brief I would say that ASR is no "superhuman" cleaning algorithm, but from my point of view it definitely has its use cases, and could fill some gaps especially in the Python cleaning landscape.

@agramfort
Copy link
Member

agramfort commented Jun 15, 2021 via email

@DiGyt
Copy link
Contributor Author

DiGyt commented Jun 16, 2021

regarding your conclusions based on decoding scores I would suggest to look at error bars on the cross-validation scores. you seem to have tested on small dataset and I suspect there is quite some variance on the scores.

Yes, exactly. In the performance comparison table A. I printed the standard deviations next to the means. The datasets used are definitely too few to make stable conclusions about significance for most of the comparisons. My plan was to apply this to more datasets (and especially more noisy ones), but as I do this in my free time, good datasets are hard to find...

Already had an Eye on mne-hcp like you did for AutoReject. However, this is still work to be done.

Maybe not to be done in MNE-Python directly.

I don't know, I really would petition you @agramfort to generally take up as much functionality as possible in MNE. I'm aware that you guys got a lot of code to manage already and that there is way too much work for too few active contributors (and definitely people like me should involve ourselves more in the maintenance work), but the problem that I see is that most of these small toolboxes which have no stable maintainer body just die out at some point and their functionality (even though still relevant) is vanishing with growing number of compatibility bugs. Just take Issue #94 with pyriemann (which is already relatively big) as an example.

That's just my view on things. I know you probably have a better and more resource-aware perspective on this, still I think it should be considered.

Edit:
But probably you already have considered these precise things dozens of times, and I shouldn't annoy you with asking them again. I'd say if you decide against ASR in MNE, you can close this pull and we will rely on external versions.

Still, if you think MNE grows too big for using/maintaining/testing I'd imagine it to be awesome to have all that cleaning stuff in one place, with consistent API (like having multiple cleaning algorithms available in AutoReject, or maybe even a mne-cleaning toolbox). Just as wishful thinking... 😃

@agramfort
Copy link
Member

agramfort commented Jun 16, 2021 via email

@vpKumaravel
Copy link
Contributor

Our analysis on newborns EEG showed better artifacts removal using ASR than using conventional methods like ICA (because the data length is shorter and contaminated with non-stereotypical artifacts).
EEGLAB made this algorithm as one of the default tools in the GUI recently. This means ASR is getting more visibility than earlier.
I do support having ASR in MNE :)

@DiGyt @agramfort

@larsoner larsoner added this to the 1.3 milestone Oct 11, 2022
@larsoner larsoner added the needs-discussion issues requiring a dev meeting discussion before the way forward is clear label Oct 11, 2022
@larsoner
Copy link
Member

I don't know, I really would petition you @agramfort to generally take up as much functionality as possible in MNE. I'm aware that you guys got a lot of code to manage already and that there is way too much work for too few active contributors (and definitely people like me should involve ourselves more in the maintenance work), but the problem that I see is that most of these small toolboxes which have no stable maintainer body just die out at some point and their functionality (even though still relevant) is vanishing with growing number of compatibility bugs.

@DiGyt sorry for the slow movement here. We talked briefly at the MNE-Python dev meeting today about this and other PRs that are in a similar situation (e.g., #11234) and had some ideas about how to move forward, with the goal being to balance discoverability with maintenance burden.

The gist of the idea is something like:

  1. Stay (or become more?) selective about what code ends up in MNE-Python: stick to methods that are 1) published, 2) widely used, and 3) sufficiently understood by someone who can commit to maintaining them long-term. Code that ends up in MNE-Python has a high expectation for support (i.e., indefinite), so the bar for inclusion has to be a bit high as well.
  2. Methods that are published but less established should be put in mne-incubator.
  3. MNE-Python core devs help maintain mne-incubator infrastructure:
    1. Have CIs that run regularly, including weekly or nightly tests against main to make sure they work
    2. Make regular releases on PyPI and conda-forge, and be part of the installer, etc. just like MNE-BIDS and similar packages
    3. Refactor MNE-Python testing functions for info, Raw, and Epochs, etc. such that outputs of mne-incubator functions can be checked for correctness as much as possible (in the same way that MNE-Python core classes are already)
  4. The code, features, and bugs in mne-incubator would ultimately be the responsibility of respective community contributors, and not have the same level of support from core MNE-Python devs as MNE-Python. This would be clearly stated in the README for the repo/package. As things gradually break, core devs could at least fix trivial things, and if non-trivial, mark tests as failing so that other mne-incubator functionality that does still work can rely on CIs.

I think if we had a policy like this in place a year ago, your code could have been used by people for the last year. And we could have made sure it continued to work with MNE-Python code changes by helping with CI/testing/releasing aspects, which are often the hardest for newer contributors to grasp (and maintain) anyway.

WDYT?

@DiGyt
Copy link
Contributor Author

DiGyt commented Oct 24, 2022

Hey @larsoner

No need to be sorry at all. Thank you very much for looking at the topic again :)
I'm really happy to see how the MNE core team keeps on producing cool ideas and ways to handle such a large open source project. I also see that the main problem here is maintenance and think that the incubator idea sounds like a reasonable solution.

I also thought about what to do with the ASRpy code. For now, there's a simple version available under:
https://github.com/DiGyt/asrpy

However, I would love to integrate it somewhere with a larger community and a larger platform, so the MNE-incubator seems like the right place for this.

So what would be the correct move now? Should I just close this PR and reopen it under mne-tools/mne-incubator?

@larsoner
Copy link
Member

I also thought about what to do with the ASRpy code. For now, there's a simple version available under:
https://github.com/DiGyt/asrpy

Once transition to some other package is complete, you could upload a new point release to PyPI with a warnings.warn(FutureWarning that it's deprecated in favor of the other version, and add it to the README so it's clear on the PyPI that people should transition to the newer, maintained version.

So what would be the correct move now? Should I just close this PR and reopen it under mne-tools/mne-incubator?

I would have said yes, but in #7479 (comment) you mentioned that there is an existing implementation in https://github.com/nbara/python-meegkit that works. I'm inclined to promote that instead of creating yet another version of the algorithm in mne-incubator.

@nbara thoughts on this? For context, read my post above. I'm happy if python-meegkit becomes home to these algorithms rather than mne-incubator -- I don't much care where they land, I just want to come up with a workable solution. So I guess it's a matter of the extent to which @nbara wants to maintain/host this stuff in python-meegkit.

To me one missing "requirement" for python-meegkit would be PyPI+conda-forge accessibility -- I think it's important for adoption, and allows us to trivially add it to mne-installers that are now promoted in our installation guide. I'm happy to put the infrastructure work above into python-meegkit as long as @nbara's on board, though!

Another option would be for @nbara to migrate some or all python-meegkit code to mne-incubator. If we went this route and @nbara volunteered to co-maintain mne-incubator, that would be great!

@nbara
Copy link
Contributor

nbara commented Nov 14, 2022

Hi @larsoner and @DiGyt, sorry for the late response.

I'm happy if python-meegkit becomes home to these algorithms rather than mne-incubator -- I don't much care where they land, I just want to come up with a workable solution. So I guess it's a matter of the extent to which @nbara wants to maintain/host this stuff in python-meegkit.

I'm definitely committed to keep maintaining MEEGkit, and I definitely planned to add it to pypi down the line (just haven't gotten around to it really) so I'm definitely up for that.

Another option would be for @nbara to migrate some or all python-meegkit code to mne-incubator. If we went this route and @nbara volunteered to co-maintain mne-incubator, that would be great!

I'm not against it, the only that bugs me is that I've tried to avoid adding MNE as a dependency of MEEGkit [https://github.com/nbara/python-meegkit/blob/master/requirements.txt].

All the code in MEEGkit was meant to operate on numpy arrays rather than MNE's Epochs/Raw etc.

So if we go this route and I migrate the ASR code to mne-incubator, I will have to maintain a separate, "numpy-only" version of the code.

Conversely, do you think we can come up with an implementation where the core of the algorithm is numpy/scipy functions only, and add the MNE compatibility within mne-incubator as a kind of wrapper ?

Either way, let me know. I'm not strongly opinionated either way and will not let this stop us from moving forward.

EDIT: I just added meegkit to pypi https://pypi.org/project/meegkit/

@drammock
Copy link
Member

do you think we can come up with an implementation where the core of the algorithm is numpy/scipy functions only, and add the MNE compatibility within mne-incubator as a kind of wrapper ?

Definitely. We already do this in MNE-Python. We have functions like tfr_morlet (for objects) and tfr_array_morlet (for arrays) and the object one is a wrapper for the array one.

@larsoner
Copy link
Member

the only that bugs me is that I've tried to avoid adding MNE as a dependency of MEEGkit

Any reason why? We've tried to keep MNE dependencies small over the years

If it's because of conda pulling in a ton of dependencies, you can depend on mne-base to avoid this

@nbara
Copy link
Contributor

nbara commented Nov 15, 2022

Any reason why? We've tried to keep MNE dependencies small over the years

Well mostly because most if not the functionalities in MEEGkit do not require MNE. But also because this was used as part of a research project, and colleagues which did not use MNE either.

I didn't know about mne-base though, thanks for pointing it out.

@larsoner
Copy link
Member

@nbara would you be interested in migrating some or all of your code to mne-incubator? Or would you prefer to keep it in meegkit? One benefit of teaming up to have and maintain code together is that in mne-incubator we (MNE-Python devs) plan to try to keep infrastructure (CIs, releases, etc.) up to date, and release/include it with MNE-Installers.

@nbara
Copy link
Contributor

nbara commented Nov 22, 2022

I'm ok with moving the ASR code to mne-incubator (provided that we come up with a solution like @drammock referenced above).

I don't know if you meant moving all of meegkit (?) but this will take significantly more time so I'd rather we did it piece by piece.

@larsoner
Copy link
Member

I don't know if you meant moving all of meegkit (?) but this will take significantly more time so I'd rather we did it piece by piece.

I have not yet looked at what all is available there... Without looking I had in mind: anything that we want to make easily available to the community that we think/hope would be widely used, to be released in the installers / mne-incubator. Makes sense to move things over bit by bit as use cases come up!

@larsoner larsoner modified the milestones: 1.3, 1.4 Dec 8, 2022
@larsoner larsoner modified the milestones: 1.4, 1.5 Apr 21, 2023
@larsoner larsoner modified the milestones: 1.5, 1.6 Jul 31, 2023
@larsoner larsoner modified the milestones: 1.6, 1.7 Nov 7, 2023
@larsoner larsoner modified the milestones: 1.7, 1.8 Apr 9, 2024
@larsoner
Copy link
Member

Closing PR here since I think we'll want ASR in meegkit (or possibly mne-incubator) instead, but let's reopen if I'm mistaken!

@larsoner larsoner closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion issues requiring a dev meeting discussion before the way forward is clear
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EEG noise reduction with ASR
6 participants