Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EMNISTDataModule #676

Merged
merged 114 commits into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from 109 commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
09295d2
added EMNIST dataset
sugatoray Jun 26, 2021
cd141fb
updated datasets/__init__.py for EMNIST and BinaryEMNIST
sugatoray Jun 26, 2021
3874962
added emnist_datamodule.py to datamodules
sugatoray Jun 26, 2021
d479d4a
added EMNISTDataModule to datamodules/__init__.py
sugatoray Jun 26, 2021
c386bea
fixed a typo in datamodules/emnist_datamodule.py
sugatoray Jun 26, 2021
4561c77
added BinaryEMNISTDataModule to datamodules
sugatoray Jun 26, 2021
1541e52
added BinaryEMNISTDataModule to datamodules/__init__.py
sugatoray Jun 26, 2021
c8dec9a
corrected a typo in datasets/emnist_dataset.py
sugatoray Jun 26, 2021
e11fb46
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 26, 2021
4219ca9
added EMNISTDataModule and BinaryEMNISTDataModule to test_imports.py
sugatoray Jun 27, 2021
e38c0b4
made changes to BinaryEMNISTDataModule
sugatoray Jun 27, 2021
772957d
made changes to EMNISTDataModule
sugatoray Jun 27, 2021
523217c
added tests for EMNISTDataModule and BinaryEMNISTDataModule
sugatoray Jun 27, 2021
a90e288
added emnist metadata to emnist_dataset.py
sugatoray Jun 27, 2021
2eef30b
updated binary_emnist_datamodule.py
sugatoray Jun 27, 2021
e1d8b8e
updated emnist_datamodule.py
sugatoray Jun 27, 2021
a8c5db9
fixed linting errors in emnist_dataset.py
sugatoray Jun 27, 2021
26119b0
fixed linting errors in test_datamodules.py
sugatoray Jun 27, 2021
e84bc44
fixed some linting errors in emnist_datamodule.py
sugatoray Jun 27, 2021
6e374f6
fixed some linting errors in emnist_datamodule.py
sugatoray Jun 27, 2021
69b0bb9
fixed linting errors in emnist_datamodule.py
sugatoray Jun 27, 2021
5d78844
fixed linting errors in binary_emnist_datamodule.py
sugatoray Jun 27, 2021
0b6883e
corrected a bug in test_datamodules.py
sugatoray Jun 30, 2021
d7f7dad
alphabetically sorted datamodule imports
sugatoray Jun 30, 2021
83ac3f7
Merge branch 'PyTorchLightning:master' into feature/672_EMNISTDataModule
sugatoray Jun 30, 2021
dab1c6e
ignore VS Code/IDE settings
sugatoray Jun 30, 2021
138f6eb
Update pl_bolts/datasets/emnist_dataset.py
sugatoray Jul 3, 2021
a0b47a1
Update pl_bolts/datamodules/binary_emnist_datamodule.py
sugatoray Jul 3, 2021
22e642e
Update tests/datamodules/test_datamodules.py
sugatoray Jul 3, 2021
0bc8d5e
Update pl_bolts/datasets/emnist_dataset.py
sugatoray Jul 3, 2021
f070116
Merge branch 'PyTorchLightning:master' into feature/672_EMNISTDataModule
sugatoray Jul 3, 2021
a2e6808
Update pl_bolts/datasets/emnist_dataset.py
sugatoray Jul 4, 2021
853e60f
Merge branch 'PyTorchLightning:master' into feature/672_EMNISTDataModule
sugatoray Jul 4, 2021
f786e54
Update CHANGELOG.md
sugatoray Jul 5, 2021
28b0ea4
Add new ones to the docs
akihironitta Jul 6, 2021
8caedf1
Fix docstrings
akihironitta Jul 6, 2021
778b5af
Add paper title
akihironitta Jul 6, 2021
7606e2c
added logic for emnist val_split for EMNISTDataModule
sugatoray Jul 6, 2021
e2841c3
added test-logic for val_split attribute in EMNISTDataModule
sugatoray Jul 6, 2021
6428f81
updated docs in emnist_datamodule.py
sugatoray Jul 6, 2021
7596559
updated type hint for val_split in EMNISTDataModule
sugatoray Jul 7, 2021
e50ba7e
added logic for val_split in BinaryEMNISTDataModule
sugatoray Jul 7, 2021
543e5c7
updated test for val_split logic in BinaryEMNISTDataModule
sugatoray Jul 7, 2021
d8a9b88
Merge branch 'PyTorchLightning:feature/672_EMNISTDataModule' into fea…
sugatoray Jul 7, 2021
2976cbb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 7, 2021
c1eedd4
fixed pep8-check-flake8 [E501] error - emnist_datamodule.py
sugatoray Jul 7, 2021
62dcb40
fixed pep8-check-flake8 [E501] error for test_datamodules.py
sugatoray Jul 7, 2021
b17e0ea
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 7, 2021
dd96914
removed unused import from emnist_dataset.py
sugatoray Jul 7, 2021
7d5badf
Merge branch 'feature/672_EMNISTDataModule' of github.com:sugatoray/l…
sugatoray Jul 7, 2021
93a1d23
fixing codefactor issue in datamodules
sugatoray Jul 7, 2021
4dd4710
fixing pre-commit.ci error in test_datamodules.py
sugatoray Jul 7, 2021
3e0b18e
setting default val_split logic to avoid error for BinaryEMNISTDataMo…
sugatoray Jul 7, 2021
590e767
fixed a bug causing a build error
sugatoray Jul 7, 2021
4872ee6
update rst docs for sphinx doc generation
sugatoray Jul 7, 2021
d6362c6
update changelog for PR #676
sugatoray Jul 7, 2021
7a1893f
Update pl_bolts/datamodules/binary_emnist_datamodule.py
sugatoray Jul 7, 2021
c6cbd82
Update pl_bolts/datamodules/binary_emnist_datamodule.py
sugatoray Jul 7, 2021
9b1fdc1
updated emnist_datamodule.py
sugatoray Jul 8, 2021
490b4c9
updated num_classes in emnist_datamodule.py
sugatoray Jul 8, 2021
9776e06
removed comments from binary_emnist_datamodule.py
sugatoray Jul 8, 2021
d1710a8
updated comment in emnist and binary_emnist datamodules
sugatoray Jul 8, 2021
db2cb6f
removed duplicate entries in docs
sugatoray Jul 8, 2021
ae88f29
simplifying code in emnist-related datamodules
sugatoray Jul 9, 2021
152a5a4
fixed docstring in datamodules
sugatoray Jul 9, 2021
ad91042
subclass EMNISTDataModule to create BinaryEMNISTDataModule
sugatoray Jul 9, 2021
4b8716a
removed comments in binary_emnist_datamodule.py
sugatoray Jul 10, 2021
92e9e1d
minor update to emnist_datamodule.py
sugatoray Jul 10, 2021
681f6bc
minor change in binary_emnist_datamodule.py
sugatoray Jul 11, 2021
bbfaa1b
add easy access to all datamodules in docs for #685
sugatoray Jul 11, 2021
d42927a
nit
akihironitta Jul 13, 2021
89b0044
[nit] fix docs
akihironitta Jul 13, 2021
955d5b6
Add emnist_normalization
akihironitta Jul 13, 2021
7216012
Merge branch 'PyTorchLightning:master' into feature/672_EMNISTDataModule
sugatoray Jul 15, 2021
015e53f
refactored to reduce code-duplication
sugatoray Jul 15, 2021
2f3c0f0
added emnist_normalization to emnist_datamodule.py
sugatoray Jul 21, 2021
4fa5911
Added Todo statement for documentation
sugatoray Jul 21, 2021
46a1a02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 21, 2021
4ae39c6
removed unused imports
sugatoray Jul 21, 2021
3f94bfe
minor fixes for flake8
sugatoray Jul 21, 2021
e91abc1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 21, 2021
688b09b
removed unused key: "num_total" from EMNIST_METADATA
sugatoray Jul 21, 2021
1cb052b
Merge branch 'feature/672_EMNISTDataModule' of github.com:sugatoray/l…
sugatoray Jul 21, 2021
c3e0f32
Merge branch 'PyTorchLightning:master' into feature/672_EMNISTDataModule
sugatoray Jul 30, 2021
0cf4da2
Revert "refactored to reduce code-duplication"
akihironitta Jul 30, 2021
ba206e5
Improve docs
akihironitta Jul 30, 2021
4a38cce
Remove unused keys from EMNIST._metadata
akihironitta Jul 30, 2021
88a553f
Make EMNIST_METADATA private
akihironitta Jul 30, 2021
6624a60
Undo datamodule doc refs
akihironitta Jul 30, 2021
42df382
Temporarily disable GPU testing
akihironitta Jul 30, 2021
41a1252
Temporarily disable GPU testing
akihironitta Jul 30, 2021
2cd753f
Simplify logic for checking valid `split`
akihironitta Jul 30, 2021
a11f4b0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 30, 2021
e2b3fc9
Make EMNIST_METADATA private in tests
akihironitta Jul 30, 2021
87a2fcc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 30, 2021
8adc4d2
Simplify setup
akihironitta Jul 30, 2021
c88a320
Merge commit 'refs/pull/676/head' of github.com:PyTorchLightning/ligh…
akihironitta Jul 30, 2021
10d6bb1
Introduce `strict_val_split` for consistent APIs across vision datamo…
akihironitta Jul 31, 2021
fe20a1f
Round means and stds for normalisation
akihironitta Jul 31, 2021
cf0de4c
Document default transforms
akihironitta Jul 31, 2021
2622ea4
Follow up of `strict_val_step`
akihironitta Jul 31, 2021
12258cb
Fix num_classes doc
akihironitta Jul 31, 2021
0d7a607
Remove TODO
akihironitta Jul 31, 2021
159be6c
Change func name in tests
akihironitta Jul 31, 2021
7e624da
Remove EMNIST from emnist_dataset.py
akihironitta Jul 31, 2021
6ff4028
Fix tests
akihironitta Jul 31, 2021
e1abfbb
Update CHANGELOG
akihironitta Jul 31, 2021
ec1ed51
Revert "Temporarily disable GPU testing"
akihironitta Jul 31, 2021
b08fb46
Revert "Temporarily disable GPU testing"
akihironitta Jul 31, 2021
5243f08
Merge branch 'master' into feature/672_EMNISTDataModule
mergify[bot] Aug 9, 2021
4eebe3f
Simplify default_transforms()
akihironitta Aug 11, 2021
c6adbae
Change datamodules' default values
akihironitta Aug 12, 2021
ef13456
Merge branch 'master' into feature/672_EMNISTDataModule
Borda Aug 13, 2021
5833e60
Merge branch 'master' into feature/672_EMNISTDataModule
Borda Aug 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# IDE Settings files
.vscode/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Added `EMNISTDataModule`, `BinaryEMNISTDataModule`, and `BinaryEMNIST` dataset ([#676](https://github.com/PyTorchLightning/lightning-bolts/pull/676))

### Changed

Expand Down
1 change: 0 additions & 1 deletion docs/source/datamodules_sklearn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,3 @@ Automatically generates the train, validation and test splits for a Numpy datase
They are set up as dataloaders for convenience. Optionally, you can pass in your own validation and test splits.

.. autoclass:: pl_bolts.datamodules.sklearn_datamodule.SklearnDataModule
:noindex:
20 changes: 10 additions & 10 deletions docs/source/datamodules_vision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,47 +5,49 @@ The following are pre-built datamodules for computer-vision.
-------------

Supervised learning
--------------------
-------------------
These are standard vision datasets with the train, test, val splits pre-generated in DataLoaders with
the standard transforms (and Normalization) values

BinaryEMNIST
^^^^^^^^^^^^

.. autoclass:: pl_bolts.datamodules.binary_emnist_datamodule.BinaryEMNISTDataModule

BinaryMNIST
^^^^^^^^^^^

.. autoclass:: pl_bolts.datamodules.binary_mnist_datamodule.BinaryMNISTDataModule
:noindex:

CityScapes
^^^^^^^^^^

.. autoclass:: pl_bolts.datamodules.cityscapes_datamodule.CityscapesDataModule
:noindex:

CIFAR-10
^^^^^^^^

.. autoclass:: pl_bolts.datamodules.cifar10_datamodule.CIFAR10DataModule
:noindex:

EMNIST
^^^^^^

.. autoclass:: pl_bolts.datamodules.emnist_datamodule.EMNISTDataModule

FashionMNIST
^^^^^^^^^^^^

.. autoclass:: pl_bolts.datamodules.fashion_mnist_datamodule.FashionMNISTDataModule
:noindex:


Imagenet
^^^^^^^^

.. autoclass:: pl_bolts.datamodules.imagenet_datamodule.ImagenetDataModule
:noindex:

MNIST
^^^^^

.. autoclass:: pl_bolts.datamodules.mnist_datamodule.MNISTDataModule
:noindex:

Semi-supervised learning
------------------------
Expand All @@ -56,10 +58,8 @@ Imagenet (ssl)
^^^^^^^^^^^^^^

.. autoclass:: pl_bolts.datamodules.ssl_imagenet_datamodule.SSLImagenetDataModule
:noindex:

STL-10
^^^^^^

.. autoclass:: pl_bolts.datamodules.stl10_datamodule.STL10DataModule
:noindex:
4 changes: 4 additions & 0 deletions pl_bolts/datamodules/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from pl_bolts.datamodules.async_dataloader import AsynchronousLoader
from pl_bolts.datamodules.binary_emnist_datamodule import BinaryEMNISTDataModule
from pl_bolts.datamodules.binary_mnist_datamodule import BinaryMNISTDataModule
from pl_bolts.datamodules.cifar10_datamodule import CIFAR10DataModule, TinyCIFAR10DataModule
from pl_bolts.datamodules.cityscapes_datamodule import CityscapesDataModule
from pl_bolts.datamodules.emnist_datamodule import EMNISTDataModule
from pl_bolts.datamodules.experience_source import DiscountedExperienceSource, ExperienceSource, ExperienceSourceDataset
from pl_bolts.datamodules.fashion_mnist_datamodule import FashionMNISTDataModule
from pl_bolts.datamodules.imagenet_datamodule import ImagenetDataModule
Expand Down Expand Up @@ -33,4 +35,6 @@
'STL10DataModule',
'VOCDetectionDataModule',
'KittiDataset',
'EMNISTDataModule',
'BinaryEMNISTDataModule',
]
81 changes: 81 additions & 0 deletions pl_bolts/datamodules/binary_emnist_datamodule.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
from typing import Any, Optional, Union

from pl_bolts.datamodules.emnist_datamodule import EMNISTDataModule
from pl_bolts.datasets import BinaryEMNIST
from pl_bolts.utils import _TORCHVISION_AVAILABLE


class BinaryEMNISTDataModule(EMNISTDataModule):
"""
.. figure:: https://user-images.githubusercontent.com/4632336/123210742-4d6b3380-d477-11eb-80da-3e9a74a18a07.png
:width: 400
:alt: EMNIST

Please see :class:`~pl_bolts.datamodules.emnist_datamodule.EMNISTDataModule` for more details.

Example::

from pl_bolts.datamodules import BinaryEMNISTDataModule
dm = BinaryEMNISTDataModule('.')
model = LitModel()
Trainer().fit(model, datamodule=dm)
"""
name = "binary_emnist"
dataset_cls = BinaryEMNIST
dims = (1, 28, 28)

def __init__(
self,
data_dir: Optional[str] = None,
split: str = 'mnist',
val_split: Union[int, float] = 0.2,
num_workers: int = 16,
normalize: bool = False,
batch_size: int = 32,
seed: int = 42,
shuffle: bool = False,
pin_memory: bool = False,
drop_last: bool = False,
strict_val_split: bool = False,
*args: Any,
**kwargs: Any,
) -> None:
"""
Args:
data_dir: Where to save/load the data.
split: The dataset has 6 different splits: ``byclass``, ``bymerge``,
``balanced``, ``letters``, ``digits`` and ``mnist``.
This argument is passed to :class:`torchvision.datasets.EMNIST`.
val_split: Percent (float) or number (int) of samples
to use for the validation split.
num_workers: How many workers to use for loading data
normalize: If ``True``, applies image normalize.
batch_size: How many samples per batch to load.
seed: Random seed to be used for train/val/test splits.
shuffle: If ``True``, shuffles the train data every epoch.
pin_memory: If ``True``, the data loader will copy Tensors into
CUDA pinned memory before returning them.
drop_last: If ``True``, drops the last incomplete batch.
strict_val_split: If ``True``, uses the validation split defined in the paper and ignores ``val_split``.
Note that it only works with ``"balanced"``, ``"digits"``, ``"letters"``, ``"mnist"`` splits.
"""
if not _TORCHVISION_AVAILABLE: # pragma: no cover
raise ModuleNotFoundError(
'You want to use EMNIST dataset loaded from `torchvision` which is not installed yet.'
)

super(BinaryEMNISTDataModule, self).__init__( # type: ignore[misc]
data_dir=data_dir,
split=split,
val_split=val_split,
num_workers=num_workers,
normalize=normalize,
batch_size=batch_size,
seed=seed,
shuffle=shuffle,
pin_memory=pin_memory,
drop_last=drop_last,
strict_val_split=strict_val_split,
*args,
**kwargs,
)
Loading