Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets on the radar #79

Closed
jsosulski opened this issue Jan 28, 2020 · 5 comments
Closed

Datasets on the radar #79

jsosulski opened this issue Jan 28, 2020 · 5 comments

Comments

@jsosulski
Copy link
Collaborator

There is this Google Doc with "datasets on our radar". For most of these, e.g. David Hubners ERP Data set, there is no Dataset wrapper available in MOABB.

Can I send a PR containing a wrapper for the datasets or do you want to vet these data sets first?

@jsosulski
Copy link
Collaborator Author

Additional note:
The example dataset mentioned seems to have quite a narrow band pass filter applied to it (0.5 to 8 Hz).

As in MOABB the preprocessing is done in the paradigm definition, I would propose to add an attribute in the BaseDataset that contains how the dataset has been preprocessed (if at all). Then this could results in a warning when I try to load the mentioned dataset using a paradigm that applies a bandbass between e.g. 0.1 to 20 Hz.

@alexandrebarachant
Copy link
Member

Hey, I will be very happy to get more datasets supported :)

So far we tried to avoid adding dataset where raw(-ish) data is not available. This is the case for instance when the dataset is only available in its epoched form. You can still convert it to a raw pseudo-continuous data (we did it that for another dataset if i remember correctly), but this can cause issue with filtering and or changing epoching parameters.

So my advice is that unless the dataset is of a particularly high value, you might want to focus your effort on another one.

The list of dataset on our radar might be a little outdated. Google released a new dataset search tool https://datasetsearch.research.google.com/ and it might worth it to take a look. We favor dataset with high number of subjects.

@jsosulski
Copy link
Collaborator Author

Did not know about that tool yet. After a quick glance, most datasets I found are between 10 and 20 subjects. While that does not seem to qualify as a high number of subjects, I guess if the data format is not too obscure, the effort to add a dataset does not seem to be prohibitively high.

@alexandrebarachant
Copy link
Member

Looks great. between 10 to 20 subject is fine, as long as its not too prohibitive as you said. you can focus on the one that have format supported by MNE. This is great, thanks for this.

@sylvchev
Copy link
Member

sylvchev commented Dec 11, 2020

We are currently revising the process to handle dataset, and we will update the README page to reflect this change (see #121). This idea is to let go the page "dataset on our radar" and to refer to the new wiki page Thanks @Div12345 ! To flag a dataset, just add a comment on the dedicated issue #1 and we could add it to the wiki.
Do not hesitate to join https://gitter.im/moabb_dev/community to discuss about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants