Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional support for MFF #8038

Open
ephathaway opened this issue Jul 21, 2020 · 39 comments
Open

Additional support for MFF #8038

ephathaway opened this issue Jul 21, 2020 · 39 comments

Comments

@ephathaway
Copy link
Contributor

ephathaway commented Jul 21, 2020

On behalf of Brain Electrophysiology Laboratory, we would like to add support for all different flavors of MFF. The mne.io.egi module can be used to read in raw MFF files, but we would like to be able to read MFF segmented files (analogous to Epochs class) and MFF averaged files (analogous to Evoked class). We would also like to be able to convert BaseRaw, Epochs, or Evoked objects to continuous, segmented, and averaged MFF files respectively and write these as output files. Finally, we would like to be able to read and store a wider range of event track information from MFF files. MFF event track XML files contain several attributes for each event, such as label, duration, and response, that are not currently captured when reading in raw MFF files.

In order to address these requests, we propose using mffpy, a Python MFF reader/writer created by BEL for handling MFF inputs and outputs. This Python package is capable of reading and writing each of the different flavors of MFF as well as parsing individual XML files within an MFF directory. Our general proposal would be to make mffpy a dependency of MNE and then restructure the mne.io.egi module to read in MFF files with mffpy and then convert them to the appropriate object (Raw, Epochs, or Evoked). We also want to add a module that takes Raw, Epochs, and Evoked objects and converts them to an mffpy Writer object so that they can be output as MFF files. As far as event tracks, we are able to fully parse event track XML files with mffpy. We may need to create a new type of MFF events class to store this information, as it appears that event info is currently stored in a simple array with relative start time and event type.

We have developers at BEL that would be happy to implement these features. We are also happy to answer any questions about mffpy and how it could increase the functionality of MNE.

@ephathaway
Copy link
Contributor Author

@jusjusjus

@drammock
Copy link
Member

We may need to create a new type of MFF events class to store this information, as it appears that event info is currently stored in a simple array with relative start time and event type.

Have you looked at the MNE-Python Annotations class? https://mne.tools/dev/auto_tutorials/raw/plot_30_annotate_raw.html

@agramfort
Copy link
Member

agramfort commented Jul 22, 2020 via email

@ephathaway
Copy link
Contributor Author

We may need to create a new type of MFF events class to store this information, as it appears that event info is currently stored in a simple array with relative start time and event type.

Have you looked at the MNE-Python Annotations class? https://mne.tools/dev/auto_tutorials/raw/plot_30_annotate_raw.html

I actually hadn't looked at the Annotations class until now. It looks a little more descriptive than what we get from the mne.find_events function. It seems like it would be easy enough to parse an event track XMLs with mffpy and then convert them to an Annotations object. How feasible would it be to add additional attributes to the Annotations class for other aspects of MFF events, such as sourceDevice, response time, trial #, etc.? Could we maybe make a subclass of the Annotations class specifically for RawEGI objects?

@ephathaway
Copy link
Contributor Author

no objection on my side is we match the same performance for reading files and we support the preload option. maybe start with evoked and epochs as there is not code for this at the moment

Good idea to pilot on the Epochs and Evoked since that will be new code.

@drammock
Copy link
Member

How feasible would it be to add additional attributes to the Annotations class for other aspects of MFF events, such as sourceDevice, response time, trial #, etc.? Could we maybe make a subclass of the Annotations class specifically for RawEGI objects?

That is certainly possible in principle... can you say a bit more about what kind of metadata is stored in MFF events? Is it the same for continuous, epoched, and averaged data? The Epochs.metadata property is also worth considering here (it's a restricted version of a pandas.DataFrame) though it's not currently implemented for Raw or Evoked objects... here's a primer on Epochs metadata

@ephathaway
Copy link
Contributor Author

That is certainly possible in principle... can you say a bit more about what kind of metadata is stored in MFF events? Is it the same for continuous, epoched, and averaged data? The Epochs.metadata property is also worth considering here (it's a restricted version of a pandas.DataFrame) though it's not currently implemented for Raw or Evoked objects... here's a primer on Epochs metadata

Below is an example of a single event from an event track XML. This is a good example of an event generated by a stimulus presentation device at recording. I think the beginTime, duration, code, label, description, and sourceDevice are common to all MFF events and then the number of keys can vary.

There are event tracks in continuous, epoched, and averaged MFF files, although I am not sure how events are preserved in averaged files, as the time scale is not preserved.

The Epochs.metadata looks pretty versatile. Seems like it could be adapted to store event info for a Raw object.

<event>
    <beginTime>2003-04-17T13:35:33.162000-08:00</beginTime>
    <duration>1000</duration>
    <code>TRSP</code>
    <label></label>
    <description></description>
    <sourceDevice>Experimental Control Interface</sourceDevice>
    <keys>
        <key>
            <keyCode>cel#</keyCode>
            <data dataType="short">4</data>
        </key>
        <key>
            <keyCode>obs#</keyCode>
            <data dataType="short">1</data>
        </key>
        <key>
            <keyCode>rsp#</keyCode>
            <data dataType="short">4</data>
        </key>
        <key>
            <keyCode>eval</keyCode>
            <data dataType="short">1</data>
        </key>
        <key>
            <keyCode>rtim</keyCode>
            <data dataType="short">645</data>
        </key>
        <key>
            <keyCode>trl#</keyCode>
            <data dataType="short">1</data>
        </key>
        <key>
            <keyCode>Task</keyCode>
            <data dataType="string">trait</data>
        </key>
        <key>
            <keyCode>Cont</keyCode>
            <data dataType="string">lively</data>
        </key>
        <key>
            <keyCode>Valn</keyCode>
            <data dataType="string">Good</data>
        </key>
    </keys>
</event>

@agramfort
Copy link
Member

agramfort commented Jul 23, 2020 via email

@drammock
Copy link
Member

So @agramfort suggests 2 possibilities:

  1. convert Event Track XML into an Annotations object, where the description field of each annotation is a somewhat unwieldy string where all metadata fields are concatenated, separated by / characters. This has the advantage that /-separated strings are already parsed by downstream epoch-related code, so when the annotations are converted to an events array and corresponding event_id dict used in epoching, it will be straightforward to sub-select resulting epochs based on metadata fields.

  2. convert Event Track XML into a pandas DataFrame, and accept that it will be a separate object from the Raw data object. If this approach is taken, there should probably at least be a function to create MNE-Python-style Annotations and/or an events array from that DataFrame. I'm assuming one row per event, with the columns being the union of all keys present in the file, with some suitable NA value (np.nan, empty string, etc) wherever a key is not defined for the event in that row?

Personally I am open to a third option which is attaching a dataframe-like metadata object to Raw, similar to Epochs.metadata. But then how to make it useful... a lot of what makes epochs metadata useful is that there is exactly one metadata row per epoch, but the hypothetical Raw.metadata would be one row per event. I think there is considerable work necessary to figure out the API and supported use case(s) here.

@ephathaway
Copy link
Contributor Author

Converting the event track XML to a DataFrame seems like the most attractive option as far as easy access to all the event attributes. However, we do want to be able to create Epochs objects based on the events and certain subsets of events. I think ideally, the events used to create Epochs would then become the contents of Epochs.metadata as a description of each epoch.

What if we store the events in a DataFrame separate from the Raw file and then add functionality to the Epochs class to be able to create epochs around events from a DataFrame?

I think it will be easiest to first focus on implementing reading of segmented and averaged MFFs with mffpy to see how well this works. From there we can look at reading of continuous MFF files and decide how we want to handle events.

@drammock
Copy link
Member

Converting the event track XML to a DataFrame seems like the most attractive option as far as easy access to all the event attributes.

Agreed.

However, we do want to be able to create Epochs objects based on the events and certain subsets of events. I think ideally, the events used to create Epochs would then become the contents of Epochs.metadata as a description of each epoch.

That is indeed an elegant-sounding approach.

What if we store the events in a DataFrame separate from the Raw file and then add functionality to the Epochs class to be able to create epochs around events from a DataFrame?

hmm... perhaps an API like Epochs.from_dataframe(raw, dataframe, ...)? I think this would be cleaner than trying to make the existing Epochs constructor flexible enough to take in either a DataFrame or an events array + event_id dict. You'll probably end up with (private?) helper functions similar to the ones I mentioned above (functions to create an events array and event_id dict from the MFF-derived dataframe). But we should hear input from others before proceeding down that line of approach.

I think it will be easiest to first focus on implementing reading of segmented and averaged MFFs with mffpy to see how well this works. From there we can look at reading of continuous MFF files and decide how we want to handle events.

Sounds good.

@agramfort
Copy link
Member

agramfort commented Jul 24, 2020 via email

@ephathaway
Copy link
Contributor Author

hmm... perhaps an API like Epochs.from_dataframe(raw, dataframe, ...)? I think this would be cleaner than trying to make the existing Epochs constructor flexible enough to take in either a DataFrame or an events array + event_id dict. You'll probably end up with (private?) helper functions similar to the ones I mentioned above (functions to create an events array and event_id dict from the MFF-derived dataframe). But we should hear input from others before proceeding down that line of approach.

I like this idea from a usability standpoint. This way someone processing an MFF could keep their events in the DataFrame format throughout the workflow.

@drammock
Copy link
Member

one idea is to have a function events_from_frame or events_from_dataframe that converts the dataframe to events/event_id like we have for annotations. So we keep one way to build epochs.

This was my earlier suggestion @agramfort, but it would not allow the (really nice) feature of the events dataframe automatically being used to populate the Epochs.metadata. However, as I implied above, the events_from_dataframe function will probably need to get written under the hood anyway to make Epochs.from_dataframe() work, so it's actually a nice sequential approach to implement events_from_dataframe() first, and then add on Epochs.from_dataframe() as a wrapper of the main constructor in a subsequent step.

But before all that, it sounds like the first steps are to simply read the already-epoched and already-averaged data formats, and come back to the event parsing later.

@agramfort
Copy link
Member

agramfort commented Jul 24, 2020 via email

@ephathaway
Copy link
Contributor Author

@drammock @agramfort would it be wise to wait for some of the other MNE maintainers to get on board before beginning work on the reading of epoched and averaged MFFs? Or do you guys think there will be enough support from the others to merge something like this?

@agramfort
Copy link
Member

agramfort commented Jul 27, 2020 via email

@drammock
Copy link
Member

I see no reason not to start work on the Epoched/Averaged reader. We seem to mostly agree on the big picture, and it's easier to hammer out the little details once there's an open PR that we can look at.

@ephathaway
Copy link
Contributor Author

Sounds good. I will start coordinating with my team on this.

@larsoner
Copy link
Member

I would suggest to start with making a PR to add only read_evokeds_mff, with a dependency (just in that function) on mffpy. It will force us to sort out the mffpy optional depedency, work out general contribution stuff, etc. The API there seems straightforward. After that PR is merged we can work out how to deal with epochs/metadata and whether or not we want to kill our custom MFF raw reading code in favor of mffpy.

@ephathaway
Copy link
Contributor Author

I would suggest to start with making a PR to add only read_evokeds_mff, with a dependency (just in that function) on mffpy. It will force us to sort out the mffpy optional depedency, work out general contribution stuff, etc. The API there seems straightforward. After that PR is merged we can work out how to deal with epochs/metadata and whether or not we want to kill our custom MFF raw reading code in favor of mffpy.

Sounds like a good approach. Probably best not to bite off more than we can chew for the first PR.

@christianbrodbeck
Copy link
Member

I was trying to do coregistration with some (raw) MFF files and noticed some issues (below) and thinking about how best to address them. Is the plan still to eventually replace the current raw reader with mffpy? Is somebody already working on this?

  • Channel locations are added to channels but not added as digitizer points, which would be required for coregistration
  • Fiducials are not added at all

@jusjusjus
Copy link
Contributor

@christianbrodbeck, great point. How can we evaluate if mffpy supported your use case? @ephathaway

@christianbrodbeck
Copy link
Member

If nobody else is working on it already I could start translating the raw reader to use mffpy?

@larsoner
Copy link
Member

If nobody else is working on it already I could start translating the raw reader to use mffpy?

Sure, would be good!

But actually I think the issue you're encountering with info['dig'] not being populated is actually separable -- when we add the EEG electrode locations in info['chs'][ii]['loc'] we should also add it to dig. So I think there should be two PRs: the first to add dig information and test that it's there and reasonable, then a second separate one to use mffpy. It will require some effort that will end up being discarded (to get the EEG + maybe fids in our native code) but hopefully this is actually a small effort. And it gives us time to move to mffpy, and do so with more complete (dig) tests.

@christianbrodbeck let me know which of the two PRs you're interested in working on. If it's just the mffpy bit, I can add the native-reader bits + tests first.

@christianbrodbeck
Copy link
Member

They're potentially more interrelated in that the fiducials are currently never read (as far as I could tell), and I suspect implementing that would be more straight forward with mffpy than with the current approach. This is why I would have suggested translating to mffpy first. But I might be wrong.

@ephathaway
Copy link
Contributor Author

@christianbrodbeck I am not exactly sure what these digitizer points and fiducials are, but I suspect this info is contained in the coordinates.xml in an .mff file. mffpy has a class to parse these.

@christianbrodbeck
Copy link
Member

Thanks @ephathaway , I saw this in coordinates.xml, do you know whether we can rely on the identifier entries corresponding to nasion/LPA/RPA?

            <sensor>
                <name>Nasion</name>
                <number>258</number>
                <type>2</type>
                <x>0.000</x>
                <y>10.564</y>
                <z>-2.051</z>
                <identifier>2002</identifier>
            </sensor>
            <sensor>
                <name>Left periauricular point</name>
                <number>259</number>
                <type>2</type>
                <x>-8.592</x>
                <y>0.498</y>
                <z>-4.128</z>
                <identifier>2011</identifier>
            </sensor>
            <sensor>
                <name>Right periauricular point</name>
                <number>260</number>
                <type>2</type>
                <x>8.592</x>
                <y>0.498</y>
                <z>-4.128</z>
                <identifier>2010</identifier>
            </sensor>

@ephathaway
Copy link
Contributor Author

ephathaway commented Dec 1, 2020

do you know whether we can rely on the identifier entries corresponding to nasion/LPA/RPA?

I'm not sure. I believe the coordinates.xml file is determined by the recording device. So any number of MFFs recorded with a "HydroCel GSN 256 1.0" will have identical coordinates.xml files.

@christianbrodbeck
Copy link
Member

test_egi.mff in the MNE testing dataset has different net and also has the same identifiers, so maybe we can assume they're the same?

            <sensor>
                <name>Nasion</name>
                <number>130</number>
                <type>2</type>
                <x>0.000</x>
                <y>10.356</y>
                <z>-2.694</z>
                <identifier>2002</identifier>
            </sensor>
            <sensor>
                <name>Left periauricular point</name>
                <number>131</number>
                <type>2</type>
                <x>-6.712</x>
                <y>0.046</y>
                <z>-3.712</z>
                <identifier>2011</identifier>
            </sensor>
            <sensor>
                <name>Right periauricular point</name>
                <number>132</number>
                <type>2</type>
                <x>6.712</x>
                <y>0.046</y>
                <z>-3.712</z>
                <identifier>2010</identifier>
            </sensor>

@ephathaway
Copy link
Contributor Author

so maybe we can assume they're the same?

@christianbrodbeck you can view all of the possible coordinates.xml files here. It looks like the identifier fields for all of these match except for the Geodesic Sensor Net 64 2.0.xml coordinates file. This does seem odd that it would only be different for this one recording device type, especially given that there are other 64-channel devices.

            <sensor>
                <name>VREF</name>
                <number>65</number>
                <type>1</type>
                <x>0.000</x>
                <y>0.000</y>
                <z>8.943</z>
                <identifier>1001</identifier>
            </sensor>
            <sensor>
                <name>Nasion</name>
                <number>66</number>
                <type>2</type>
                <x>0.000</x>
                <y>10.385</y>
                <z>-1.815</z>
                <identifier>2002</identifier>                
            </sensor>
            <sensor>
                <name>Left periauricular point</name>
                <number>67</number>
                <type>2</type>
                <x>-7.004</x>
                <y>0.368</y>
                <z>-3.694</z>
                <identifier>2011</identifier>                
            </sensor>
            <sensor>
                <name>Right periauricular point</name>
                <number>68</number>
                <type>2</type>
                <x>7.004</x>
                <y>0.368</y>
                <z>-3.694</z>
                <identifier>1001</identifier>                
            </sensor>

@ephathaway
Copy link
Contributor Author

I would like to add a function to write one or more Evoked objects to an MFF file. MFFs can be of the "averaged" flavor and may contain multiple averages of different categories. I think the Evoked class will translate nicely to this sort of MFF. To accomplish this, we could make use of mffpy.Writer class, which I think will make this function pretty slick.

The question is where to put it? I noticed the Epochs and BaseRaw classes both have an export method, which exports these data structures to EEGLab, and maybe other formats in the future? Although, a method like this may not work in this case, since we want to be able to write multiple Evoked objects to a single output file. Could we stick a write_evokeds_mff function in mne/io/egi/egimff.py?

@larsoner
Copy link
Member

To stick to the export theme, how about mne.export_evokeds? Currently the only option would be MFF but eventually things like EEGLAB could be added for example.

@ephathaway
Copy link
Contributor Author

To stick to the export theme, how about mne.export_evokeds? Currently the only option would be MFF but eventually things like EEGLAB could be added for example.

Sounds good! I will work on a PR in the next couple of weeks.

@ephathaway
Copy link
Contributor Author

@larsoner to avoid having a lot of MFF specific code in evoked.py, I am thinking something like:

from .io.egi import write_evokeds_to_mff

def export_evokeds(fname, evoked, fmt):
    """Export evoked datasets to external formats"""
    if fmt == 'mff':
        write_evokeds_to_mff(fname, evoked)

@larsoner
Copy link
Member

Sounds good. If you need to avoid a circular import problem you can nest the import

@ephathaway
Copy link
Contributor Author

@larsoner mind taking a look at mne-tools/mne-testing-data#87? I'll need this to properly test the export function.

@larsoner
Copy link
Member

@ephathaway
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants