Skip to content

Commit

Permalink
Merge branch 'master' into optimizer_step/training_step
Browse files Browse the repository at this point in the history
  • Loading branch information
ananyahjha93 committed Nov 2, 2020
2 parents 56e4688 + ac3f739 commit 68ffdb0
Show file tree
Hide file tree
Showing 47 changed files with 1,553 additions and 221 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,11 @@ jobs:
pytorch_version: 1.6
- python_version: 3.6
pytorch_version: 1.4
#- python_version: 3.7
# pytorch_version: 1.8 # todo
- python_version: 3.7
pytorch_version: 1.7
# TODO
# - python_version: 3.7
# pytorch_version: 1.8
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
fail-fast: false
matrix:
python_version: [3.6, 3.7, 3.8]
pytorch_version: [1.3, 1.4, 1.5, 1.6]
pytorch_version: [1.3, 1.4, 1.5, 1.6, 1.7]
exclude:
# excludes PT 1.3 as it is missing on pypi
- python_version: 3.8
Expand Down
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Added multiclass AUROC metric ([#4236](https://github.com/PyTorchLightning/pytorch-lightning/pull/4236))

- Added timeout for `tpu_device_exists` to ensure process does not hang indefinitely ([#4340](https://github.com/PyTorchLightning/pytorch-lightning/pull/4340))

- Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience ([#3807](https://github.com/PyTorchLightning/pytorch-lightning/pull/3807))

### Changed

- W&B log in sync with Trainer step ([#4405](https://github.com/PyTorchLightning/pytorch-lightning/pull/4405))

- Hook `on_after_backward` is called only when `optimizer_step` is being called ([#4439](https://github.com/PyTorchLightning/pytorch-lightning/pull/4439))

- Moved `track_and_norm_grad` into `training loop` and called only when `optimizer_step` is being called ([#4439](https://github.com/PyTorchLightning/pytorch-lightning/pull/4439))

### Deprecated

- Deprecated passing `ModelCheckpoint` instance to `checkpoint_callback` Trainer argument ([#4336](https://github.com/PyTorchLightning/pytorch-lightning/pull/4336))
Expand All @@ -31,6 +39,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Fixed error using `auto_select_gpus=True` with `gpus=-1` ([#4209](https://github.com/PyTorchLightning/pytorch-lightning/pull/4209))

- Fixed that metrics do not store computational graph for all seen data ([#4313](https://github.com/PyTorchLightning/pytorch-lightning/pull/4313))

- Fixed AMP unscale for `on_after_backward` ([#4439](https://github.com/PyTorchLightning/pytorch-lightning/pull/4439))

## [1.0.4] - 2020-10-27

Expand Down Expand Up @@ -74,7 +85,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

- Fixed WandbLogger not uploading checkpoint artifacts at the end of training ([#4341](https://github.com/PyTorchLightning/pytorch-lightning/pull/4341))


## [1.0.3] - 2020-10-20

### Added
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))
```

#### And without changing a single line of code, you could run on GPU/TPUss
#### And without changing a single line of code, you could run on GPUs/TPUs
```python
# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)
Expand Down
2 changes: 1 addition & 1 deletion dockers/base-conda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ ENV CONDA_ENV=lightning
COPY environment.yml environment.yml

# conda init
RUN conda create -y --name $CONDA_ENV && \
RUN conda create -y --name $CONDA_ENV cudatoolkit=${CUDA_VERSION} && \
conda init bash && \
# NOTE: this requires that the channel is presented in the yaml before packages
# replace channel to nigtly if needed, fix PT version and remove Horovod as it will be installe later
Expand Down
14 changes: 13 additions & 1 deletion docs/source/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,19 @@ Example implementation:
def compute(self):
return self.correct.float() / self.total
Metrics support backpropagation, if all computations involved in the metric calculation
are differentiable. However, note that the cached state is detached from the computational
graph and cannot be backpropagated. Not doing this would mean storing the computational
graph for each update call, which can lead to out-of-memory errors.
In practise this means that:

.. code-block:: python
metric = MyMetric()
val = metric(pred, target) # this value can be backpropagated
val = metric.compute() # this value cannot be backpropagated
**********
Metric API
**********
Expand Down Expand Up @@ -453,4 +466,3 @@ embedding_similarity [func]

.. autofunction:: pytorch_lightning.metrics.functional.self_supervised.embedding_similarity
:noindex:

6 changes: 5 additions & 1 deletion docs/source/optimizers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ to manually manage the optimization process. To do so, do the following:
opt_d.step()
opt_d.zero_grad()
# log losses
self.log('loss_a', loss_a)
self.log('loss_b', loss_b)
.. note:: This is only recommended for experts who need ultimate flexibility

Manual optimization does not yet support accumulated gradients but will be live in 1.1.0
Expand Down Expand Up @@ -108,7 +112,7 @@ Every optimizer you use can be paired with any `LearningRateScheduler <https://p
def configure_optimizers(self):
return {
'optimizer': Adam(...),
'scheduler': ReduceLROnPlateau(optimizer, ...),
'lr_scheduler': ReduceLROnPlateau(optimizer, ...),
'monitor': 'metric_to_track'
}
Expand Down
20 changes: 17 additions & 3 deletions docs/source/tpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,13 +128,27 @@ That's it! Your model will train on all 8 TPU cores.

----------------

Single TPU core training
TPU core training

------------------------
Lightning supports training on a single TPU core. Just pass the TPU core ID [1-8] in a list.

Lightning supports training on a single TPU core or 8 TPU cores.

The Trainer parameters ``tpu_cores`` defines how many TPU cores to train on (1 or 8) / Single TPU to train on [1].

For Single TPU training, Just pass the TPU core ID [1-8] in a list.

Single TPU core training. Model will train on TPU core ID 5.

.. code-block:: python
trainer = pl.Trainer(tpu_cores=[1])
trainer = pl.Trainer(tpu_cores=[5])
8 TPU cores training. Model will train on 8 TPU cores.

.. code-block:: python
trainer = pl.Trainer(tpu_cores=8)
----------------

Expand Down
8 changes: 4 additions & 4 deletions docs/source/weights_loading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ You can customize the checkpointing behavior to monitor any quantity of your tra
# 3. Init ModelCheckpoint callback, monitoring 'val_loss'
checkpoint_callback = ModelCheckpoint(monitor='val_loss')
# 4. Pass your callback to checkpoint_callback trainer flag
trainer = Trainer(checkpoint_callback=checkpoint_callback)
# 4. Add your callback to the callbacks list
trainer = Trainer(callbacks=[checkpoint_callback])
You can also control more advanced options, like `save_top_k`, to save the best k models and the mode of the monitored quantity (min/max/auto, where the mode is automatically inferred from the name of the monitored quantity), `save_weights_only` or `period` to set the interval of epochs between checkpoints, to avoid slowdowns.

Expand All @@ -89,14 +89,14 @@ You can also control more advanced options, like `save_top_k`, to save the best
save_top_k=3,
mode='min')
trainer = Trainer(checkpoint_callback=checkpoint_callback)
trainer = Trainer(callbacks=[checkpoint_callback])
You can retrieve the checkpoint after training by calling

.. code-block:: python
checkpoint_callback = ModelCheckpoint(dirpath='my/path/')
trainer = Trainer(checkpoint_callback=checkpoint_callback)
trainer = Trainer(callbacks=[checkpoint_callback])
trainer.fit(model)
checkpoint_callback.best_model_path
Expand Down
4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ dependencies:
- python>=3.6
- pip>20.1
- numpy>=1.16.4
- pytorch>=1.3
- pytorch>=1.3,<1.8
- future>=0.17.1
- PyYAML>=5.1
- tqdm>=4.41.0
Expand All @@ -41,7 +41,7 @@ dependencies:
- torchtext>=0.3.1

# Examples
- torchvision>=0.4.1
- torchvision>=0.4.1,<0.9.0

- pip:
- test-tube>=0.7.5
Expand Down
6 changes: 3 additions & 3 deletions notebooks/05-trainer-flags-overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2223,7 +2223,7 @@
"source": [
"from pytorch_lightning.callbacks import ModelCheckpoint\n",
"\n",
"trainer = pl.Trainer(checkpoint_callback=ModelCheckpoint(monitor='val_loss'))\n",
"trainer = pl.Trainer(callbacks=[ModelCheckpoint(monitor='val_loss')])\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
],
Expand Down Expand Up @@ -2265,7 +2265,7 @@
" prefix='',\n",
")\n",
"\n",
"trainer = Trainer(checkpoint_callback=checkpoint_callback)\n",
"trainer = Trainer(callbacks=[checkpoint_callback])\n",
"\n",
"trainer.fit(model, train_loader, val_loader)"
],
Expand Down Expand Up @@ -2471,7 +2471,7 @@
"# **NOTE: this saves weights to some/path NOT my/path\n",
"checkpoint = ModelCheckpoint(filepath='some/path')\n",
"trainer = pl.Trainer(\n",
" checkpoint_callback=checkpoint,\n",
" callbacks=[checkpoint],\n",
" weights_save_path='my/path'\n",
")\n",
"trainer.fit(model, train_loader, val_loader)"
Expand Down
5 changes: 0 additions & 5 deletions pytorch_lightning/accelerators/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,6 @@ def optimizer_zero_grad(self, batch_idx, optimizer, opt_idx):
model_ref.optimizer_zero_grad(self.trainer.current_epoch, batch_idx, optimizer, opt_idx)

def clip_gradients(self, optimizer, clip_val=None):

if self.trainer.amp_backend == AMPType.NATIVE:
self.trainer.scaler.unscale_(optimizer)

# apply clip gradients
# TODO: separate TPU case from here
self._clip_gradients(optimizer, clip_val)

Expand Down
Loading

0 comments on commit 68ffdb0

Please sign in to comment.