Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: read mff evoked #8354

Merged
merged 30 commits into from
Nov 24, 2020
Merged

ENH: read mff evoked #8354

merged 30 commits into from
Nov 24, 2020

Conversation

ephathaway
Copy link
Contributor

This PR begins to address issue #8038 by implementing a function to read averaged MFF files as Evoked objects. We do this through mne.io.read_evokeds_mff, which takes in an averaged MFF file and returns an EvokedArray object for each individual average specified. This function utilizes the mffpy library to read signals and info data from the MFF.

To get this PR ready to merge, there are a few things I need some direction on:

  1. I see that test data are stored in a separate repository. How should I go about adding new test data to this repository? For now I have added my example averaged MFF to a new directory called /example_mffs.

  2. I can't figure out how to add an entry to the changelog. It looks like this is not done directly in the /doc/whats_new.rst.

  3. I am unclear on what information is contained in Info['custom_ref_applied']. Is this a simple 0 if there is no custom ref or an average ref and 1 for any other custom ref? We should be able to read this info from the info1.xml in the MFF directory.

@larsoner
Copy link
Member

This PR begins to address issue #8038 by implementing a function to read averaged MFF files as Evoked objects.

Awesome!

This function utilizes the mffpy library to read signals and info data from the MFF.

We already have some functions in mne/io/egi/egimff.py to do some reading. Did you give those a try? They might already give you most or all of what you need without adding an additional dependency on mffpy.

I'm not sure if our (MNE-Python's) plan is to migrate to mffpy at some point versus keep our own egimff.py functions. For example if you see that mffpy does things better than egimff.py I don't see a problem in principle with eventually adding an mffpy dependency, though so far we have avoided it for EGI/MFF (mffpy) and EDF (pyedflib) and similar things. So I guess my vote would be to try to use our internal functions assuming they work.

I see that test data are stored in a separate repository. How should I go about adding new test data to this repository?

Take a look at the README to see if it's clear enough (if it's not we should fix it):

https://github.com/mne-tools/mne-testing-data#add-or-change-files-in-the-repo-for-use-with-mne

And some recent examples:

#8173
#8176
mne-tools/mne-testing-data#73

I can't figure out how to add an entry to the changelog. It looks like this is not done directly in the /doc/whats_new.rst.

That is where it used to live until a few months ago, now it's doc/changes/latest.inc

If there is some reference to doc/whats_new.rst for example in the contributing docs, we should update it.

Is this a simple 0 if there is no custom ref or an average ref and 1 for any other custom ref?

Yes but I'm not sure we've been entirely consistent about this so far. Basically each electrode's location should be stored in info['chs'][ii]['loc'][:3] and the location of the reference electrode should be info['chs'][ii]['loc'][3:6]. Beyond this, only an average reference is allowed (unless info['custom_ref_applied'] = True, and if an average reference has been applied to the data it should have a corresponding projector in info['projs'] with active=True. But these are details we can probably work out later on in the process.

@ephathaway
Copy link
Contributor Author

We already have some functions in mne/io/egi/egimff.py to do some reading. Did you give those a try? They might already give you most or all of what you need without adding an additional dependency on mffpy.

I think ultimately relying on mffpy for the egi io could simplify the code quite a bit. I found it very easy to plug in the mffpy Reader class to do most of the work for reading averaged files. Our goal with mffpy is to support reading and writing of all pieces of an MFF (all binary and XML files) and we have implemented this in a way that makes it easy to get to the signal data and info stored in XML files.

In our conversation on this issue #8038, you suggested we submit a PR for reading of MFF averaged files that uses mffpy as a dependency so that you could see how it would be used and evaluate whether we want to use it for MFF io in general. This is why I went that route instead of using existing egimff.py functions. I am open to either way of doing, but I think that using mffpy could simplify things quite a bit, especially when we start looking at reading segmented files and event parsing.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our conversation on this issue #8038, you suggested we submit a PR for reading of MFF averaged files that uses mffpy

Ahh sorry, I forgot about that plan!

I left a few comments to get the ball rolling. I think the big thing is to update the testing data. Let me know if the instructions don't make sense and I can do it

environment.yml Outdated
@@ -39,3 +39,5 @@ dependencies:
- python-picard
- PyQt5>=5.10,<5.14; platform_system == "Darwin"
- PyQt5>=5.10; platform_system != "Darwin"
- mffpy==0.5.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably >= is better so we get any bug fixes

environment.yml Outdated
@@ -39,3 +39,5 @@ dependencies:
- python-picard
- PyQt5>=5.10,<5.14; platform_system == "Darwin"
- PyQt5>=5.10; platform_system != "Darwin"
- mffpy==0.5.4
- cached_property_bel>=1.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? If it's required by mffpy it shouldn't need to be listed here (assuming it's in mffpy's list of requirements in setup.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. It's a dependency of mffpy. Version 0.5.5 of mffpy will install this dependency automatically. This will hopefully be packaged tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay we don't release (new features) again until March. So we should be able to delete this line and add mffpy>=0.5.5 and then merge this PR once it's ready and 0.5.5 is out

<positiveUp>false</positiveUp>
</sensor>
</sensors>
</PNSSet>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these should go in mne-testing-data/EGI/test_egi_evoked.mff/

@@ -9,15 +9,18 @@

import numpy as np

from mffpy import Reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an optional dependency, which means you should nest the import inside the function that uses it (see how we treat other optional dependencies like nibabel, dipy, sklearn, etc.)

Notes
-----
Preloading is automatic because we use `EvokedArray` to construct the
evoked(s) objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
evoked(s) objects.
evoked(s) objects.
.. versionadded:: 0.23


Notes
-----
Preloading is automatic because we use `EvokedArray` to construct the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of all Evoked instances (they are always loaded) so no need to note it here I think. Just the .. versionadded directive in Notes probably

mne/io/egi/tests/test_egi.py Outdated Show resolved Hide resolved
# Check info
assert_equal(evoked_cond.info, evoked_idx.info)
assert_equal(evoked_cond.info['description'], cond)
assert_equal(evoked_cond.info['bads'], bads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for builtin types we now prefer to let pytest do nice inspection for us, and this is a list, so

Suggested change
assert_equal(evoked_cond.info['bads'], bads)
assert evoked_cond.info['bads'] == bads

assert_allclose(evoked_cond.data, data, atol=1e-6)
assert_allclose(evoked_idx.data, data, atol=1e-6)
# Check info
assert_equal(evoked_cond.info, evoked_idx.info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will check what you want it to. I think we usually use assert object_diff(a, b) == '' for this sort of thing

assert 'EMG' in evoked_cond.info['ch_names']
pick_eeg = pick_types(evoked_cond.info, eeg=True, exclude=[])
assert_equal(len(pick_eeg), 257)
assert_equal(evoked_cond.info['nchan'], 259)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_equal(evoked_cond.info['nchan'], 259)
assert evoked_cond.info['nchan'] == 259

etc.

@ephathaway
Copy link
Contributor Author

I left a few comments to get the ball rolling. I think the big thing is to update the testing data. Let me know if the instructions don't make sense and I can do it

Great, thanks for taking a look. I'll work on addressing these as well as look into the CI checks that are failing.

@ephathaway
Copy link
Contributor Author

If there is some reference to doc/whats_new.rst for example in the contributing docs, we should update it.

Yeah the contribution guidelines reference doc/whats_new.rst in a couple sections:

https://mne.tools/dev/install/contributing.html#general-requirements
https://mne.tools/dev/install/contributing.html#github-workflow

@@ -16,6 +16,7 @@ Current (0.22.dev0)

Enhancements
~~~~~~~~~~~~
- Add :func:`mne.io.read_evokeds_mff` to enable reading of averaged MFFs by `Evan Hathaway`_ (:gh:`8354`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add that it will require mffpy library

@ephathaway ephathaway force-pushed the read-mff-evoked branch 2 times, most recently from 2e25439 to c60dad5 Compare October 27, 2020 01:55
@@ -288,4 +291,40 @@ def test_io_egi_crop_no_preload():
assert_allclose(raw._data, raw_preload._data)


@requires_version('mffpy', '0.5')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we need mffpy v0.5.5 for this to run, but mne.utils.requires_version can only check against the "loose version" (e.g. 0.5). Should we edit this function to be able to check against the full v0.5.5?

Copy link
Member

@larsoner larsoner Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a problem with requires_version, which just uses check_version, which just uses LooseVersion from distutils.version. I think it's a problem with the mffpy 0.5.5 reporting its version as 0.5.0:

$ pip install mffpy
Defaulting to user installation because normal site-packages is not writeable
Collecting mffpy
  Downloading mffpy-0.5.5-py3-none-any.whl (140 kB)
     |████████████████████████████████| 140 kB 3.1 MB/s 
...
$ python
>>> import mne
>>> mne.utils.check_version('mffpy', '0.5')
True
>>> mne.utils.check_version('mffpy', '0.5.5')
False
>>> mne.utils.check_version('mffpy', '0.5.4')
False
>>> import mffpy
>>> mffpy.__version__
'0.5.0'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to show how it works if the version of a lib is correct:

>>> import pytest
>>> pytest.__version__
'6.1.1'
>>> mne.utils.check_version('pytest', '6.1.1')
True
>>> mne.utils.check_version('pytest', '6.1.2')
False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good call. I'll look into why mffpy is reporting the wrong version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out the issue on the mffpy side. Just need to merge that PR with the changes and then mffpy should be displaying the right version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And probably release a 0.5.6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the plan.

@ephathaway
Copy link
Contributor Author

ephathaway commented Oct 27, 2020

@larsoner @agramfort I believe we have addressed all of your requested changes. mffpy v0.5.5 is now packaged and appears to be working like it should. There were a couple of CIs failing before my last commit, so we shall see if those still fail with the last fix.

@@ -16,6 +16,7 @@ Current (0.22.dev0)

Enhancements
~~~~~~~~~~~~
- Add :func:`mne.read_evokeds_mff` to enable reading of averaged MFFs (requires mffpy library >= 0.5.5) by `Evan Hathaway`_ (:gh:`8354`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is your first contribution (hooray)! Can you see how we've formatted Eduard Ort's contribution? It creates a bold entry. You'll also need to add your name to doc/changes/names.inc, it's part of what's causing CircleCI to be unhappy.

Raises
------
ValueError
If `fname` has file extension other than '.mff'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tries to link but it can't (NumpyDoc/autodoc do not make the params a target), leading to Sphinx emitting these warnings that get treated as errors:

/home/circleci/project/mne/io/egi/egimff.py:docstring of mne.read_evokeds_mff:11: WARNING: py:obj reference target not found: condition
/home/circleci/project/mne/io/egi/egimff.py:docstring of mne.read_evokeds_mff:48: WARNING: py:obj reference target not found: fname
/home/circleci/project/mne/io/egi/egimff.py:docstring of mne.read_evokeds_mff:51: WARNING: py:obj reference target not found: fname
/home/circleci/project/mne/io/egi/egimff.py:docstring of mne.read_evokeds_mff:54: WARNING: py:obj reference target not found: fname

I would make these double-backticks so that they get rendered as code

The index (indices) or category (categories) from which to read in
data. Averaged MFF files can contain separate averages for different
categories. These can be indexed by the block number or the category
name. If `condition` is a list or None, a list of Evoked objects is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for condition

@@ -29,6 +29,9 @@
egi_pause_fname = op.join(egi_path, 'test_egi_multiepoch_paused.mff')
egi_eprime_pause_fname = op.join(egi_path, 'test_egi_multiepoch_eprime.mff')
egi_pause_w1337_fname = op.join(egi_path, 'w1337_20191014_105416.mff')
egi_mff_evoked_fname = op.join(egi_path, 'test_egi_evoked.mff')
egi_txt_evoked_cat1_fname = op.join(base_dir, 'test_egi_evoked_cat1.txt')
egi_txt_evoked_cat2_fname = op.join(base_dir, 'test_egi_evoked_cat2.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files should probably live in mne-testing-data as well. No reason to keep them in this repo when the relevant file that's used with them is in mne-testing-data

Copy link
Contributor Author

@ephathaway ephathaway Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I put them in mne/io/egi/tests/data originally because that is where the test_egi.txt signals live, but I can certainly move them to mne-testing-data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is paired with the test_egi.raw file that also lives in this repo. We've abandoned the practice of adding files to the repo for testing unless they are tiny and standalone -- this one is coupled to the testing dataset so no benefit to having it here

mne/io/egi/tests/test_egi.py Show resolved Hide resolved
@ephathaway
Copy link
Contributor Author

@larsoner I have been working on fixing these failing checks the last few days. I fixed a conflict with cached_property_bel and another dependency and I think I have finally got the changelog formatted correctly. There are still a few that I am not sure how to address. It seems that the Windows pipelines using pip are failing because of a missing pip installation and a code coverage command that is not found. And then the Travis CI is failing because of some kind of connection error when downloading the testing data.

@drammock
Copy link
Member

drammock commented Nov 5, 2020

It seems that the Windows pipelines using pip are failing because of a missing pip installation

a rebase should fix this one at least; see #8483 and #8490

@ephathaway
Copy link
Contributor Author

a rebase should fix this one at least; see #8483 and #8490

Nice. Looks like I caught both of these in my last rebase. No checks failing so far this run! Crossing my fingers.

@larsoner
Copy link
Member

larsoner commented Nov 6, 2020

Travis failures are unrelated and fixed by #8491

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's almost there! Just a few minor remaining suggestions


Notes
-----
.. versionadded:: 0.23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not be so pessimistic about our timeline :)

Suggested change
.. versionadded:: 0.23
.. versionadded:: 0.22

(It's possible I suggested the wrong version previously, in which case sorry!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. Good catch!

Comment on lines 760 to 761
raise ValueError('categories.xml not found in MFF directory. \
%s may not be an averaged MFF file.' % fname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays we use f-strings and I don't think we use continuation lines in strings like this so this would be more standard for us:

Suggested change
raise ValueError('categories.xml not found in MFF directory. \
%s may not be an averaged MFF file.' % fname)
raise ValueError('categories.xml not found in MFF directory. '
f'{fname} may not be an averaged MFF file.')

But it's okay as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I prefer f-strings, so I will gladly use them :)

mne/io/egi/egimff.py Show resolved Hide resolved
output = [_read_evoked_mff(fname, c, channel_naming=channel_naming,
verbose=verbose).apply_baseline(baseline)
for c in condition]
return output if len(output) > 1 else output[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return output if len(output) > 1 else output[0]
return output if return_list else output[0]

Comment on lines 792 to 794
assert category in categories.keys(), 'Condition "%s" not found in \
available conditions %s.'\
% (category, categories.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert statements are skipped in optimized mode and should only be used for internal checks that should always be true (I think of them as "active comments" to remind about shapes, sizes, types, etc. that should be guaranteed)

Suggested change
assert category in categories.keys(), 'Condition "%s" not found in \
available conditions %s.'\
% (category, categories.keys())
_check_option('condition', category, sorted(categories)

And shouldn't this go above the if/else immediately preceding it so that invalid category names are caught before mff.epochs[category] is attempted so that we can raise an informative error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to check the condition param first. Since condition can be either a str with the category name or an int with the epoch index, I will use a different check for each of these cases.

mne/io/egi/egimff.py Show resolved Hide resolved
evokeds = read_evokeds_mff(egi_mff_evoked_fname, condition=[0, 1])
assert len(evokeds) == 2
# Test invalid condition type
with pytest.raises(TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help both make clearer what's being tested, and more specific/targeted

Suggested change
with pytest.raises(TypeError):
with pytest.raises(TypeError, match='some matching substring'):

assert evoked.tmin == 0.0
assert evoked.tmax == tmax
# Check signal data
data = np.loadtxt(signals, ndmin=2).transpose() * 1e-6 # convert to volts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = np.loadtxt(signals, ndmin=2).transpose() * 1e-6 # convert to volts
data = np.loadtxt(signals, ndmin=2).T * 1e-6 # convert to volts

assert evoked.tmax == tmax
# Check signal data
data = np.loadtxt(signals, ndmin=2).transpose() * 1e-6 # convert to volts
assert_allclose(evoked_cond.data, data, atol=1e-6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For EEG data in volt units an atol=1e-6 is a pretty bad tolerance level. Can this be atol=1e-10 or atol=1e-12 or something? It's going to depend a bit on the number of decimal places in the text file, but hopefully adjusting atol and/or rtol=1e-4 (for example if the field width is 5 characters in the text file this would be a sensible choice) or something similar will help make it clearer the match is actually pretty good.

def test_read_evokeds_mff_bad_input():
"""Test errors are thrown when reading an invalid averaged MFF."""
# Test file that is not an MFF
with pytest.raises(ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 for merge other than one possible additional idea/change. Let me know what you think

mne/io/egi/egimff.py Show resolved Hide resolved
@ephathaway
Copy link
Contributor Author

@larsoner great! I rebased onto master. Tests are passing on my end.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readme.rst just needs to be updated with new optional dependency

@agramfort agramfort merged commit 734e6ac into mne-tools:master Nov 24, 2020
@agramfort
Copy link
Member

thanks @ephathaway

@larsoner
Copy link
Member

Indeed, thanks for working through everything @ephathaway !

@ephathaway
Copy link
Contributor Author

@agramfort @larsoner for sure! Thanks for all the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants