diff --git a/CHANGELOG.md b/CHANGELOG.md index 528ae2bcd2a8c..5898f11c1a3fd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,17 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). +## [unreleased] - YYYY-MM-DD + +### Added + +### Changed + +### Deprecated + +### Removed + +### Fixed ## [0.8.0] - 2020-06-18 @@ -25,11 +36,11 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Added a model hook `transfer_batch_to_device` that enables moving custom data structures to the target device ([1756](https://github.com/PyTorchLightning/pytorch-lightning/pull/1756)) - Added [black](https://black.readthedocs.io/en/stable/) formatter for the code with code-checker on pull ([1610](https://github.com/PyTorchLightning/pytorch-lightning/pull/1610)) - Added back the slow spawn ddp implementation as `ddp_spawn` ([#2115](https://github.com/PyTorchLightning/pytorch-lightning/pull/2115)) -- Added loading checkpoints from URLs ([#1667](https://github.com/PyTorchLightning/pytorch-lightning/issues/1667)) +- Added loading checkpoints from URLs ([#1667](https://github.com/PyTorchLightning/pytorch-lightning/pull/1667)) - Added a callback method `on_keyboard_interrupt` for handling KeyboardInterrupt events during training ([#2134](https://github.com/PyTorchLightning/pytorch-lightning/pull/2134)) - Added a decorator `auto_move_data` that moves data to the correct device when using the LightningModule for inference ([#1905](https://github.com/PyTorchLightning/pytorch-lightning/pull/1905)) -- Added `ckpt_path` option to `LightningModule.test(...)` to load particular checkpoint ([#2190](https://github.com/PyTorchLightning/pytorch-lightning/issues/2190)) -- Added `setup` and `teardown` hooks for model ([#2229](https://github.com/PyTorchLightning/pytorch-lightning/issues/2229)) +- Added `ckpt_path` option to `LightningModule.test(...)` to load particular checkpoint ([#2190](https://github.com/PyTorchLightning/pytorch-lightning/pull/2190)) +- Added `setup` and `teardown` hooks for model ([#2229](https://github.com/PyTorchLightning/pytorch-lightning/pull/2229)) ### Changed @@ -67,7 +78,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Run graceful training teardown on interpreter exit ([#1631](https://github.com/PyTorchLightning/pytorch-lightning/pull/1631)) - Fixed user warning when apex was used together with learning rate schedulers ([#1873](https://github.com/PyTorchLightning/pytorch-lightning/pull/1873)) -- Fixed multiple calls of `EarlyStopping` callback ([#1751](https://github.com/PyTorchLightning/pytorch-lightning/issues/1751)) +- Fixed multiple calls of `EarlyStopping` callback ([#1863](https://github.com/PyTorchLightning/pytorch-lightning/pull/1863)) - Fixed an issue with `Trainer.from_argparse_args` when passing in unknown Trainer args ([#1932](https://github.com/PyTorchLightning/pytorch-lightning/pull/1932)) - Fixed bug related to logger not being reset correctly for model after tuner algorithms ([#1933](https://github.com/PyTorchLightning/pytorch-lightning/pull/1933)) - Fixed root node resolution for SLURM cluster with dash in host name ([#1954](https://github.com/PyTorchLightning/pytorch-lightning/pull/1954)) diff --git a/README.md b/README.md index 71d6c3cbdaa86..5ba74f18c7144 100644 --- a/README.md +++ b/README.md @@ -21,12 +21,8 @@ --> ---- -## Trending contributors - -[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/0)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/0)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/1)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/1)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/2)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/2)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/3)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/3)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/4)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/4)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/5)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/5)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/6)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/6)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/7)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/7) - --- + ## Continuous Integration
@@ -381,6 +377,7 @@ If you have any questions, feel free to: 4. [Join our slack](https://join.slack.com/t/pytorch-lightning/shared_invite/zt-f6bl2l0l-JYMK3tbAgAmGRrlNr00f1A). --- + ## FAQ **How do I use Lightning for rapid research?** [Here's a walk-through](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) @@ -447,6 +444,14 @@ pip install https://github.com/PytorchLightning/pytorch-lightning/archive/0.X.Y. - Adrian Wälchli [(awaelchli)](https://github.com/awaelchli) - Nicki Skafte [(skaftenicki)](https://github.com/SkafteNicki) +--- + +### Trending contributors + +[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/0)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/0)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/1)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/1)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/2)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/2)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/3)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/3)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/4)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/4)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/5)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/5)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/6)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/6)[![](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/images/7)](https://sourcerer.io/fame/williamFalcon/pytorchlightning/pytorch-lightning/links/7) + +--- + #### Funding Building open-source software with only a few part-time people is hard! We've secured funding to make sure we can hire a full-time staff, attend conferences, and move faster through implementing features you request. @@ -463,7 +468,7 @@ If you want to cite the framework feel free to use this (but only if you loved i @article{falcon2019pytorch, title={PyTorch Lightning}, author={Falcon, WA}, - journal={GitHub. Note: https://github. com/williamFalcon/pytorch-lightning Cited by}, + journal={GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning Cited by}, volume={3}, year={2019} } diff --git a/docs/source/apex.rst b/docs/source/apex.rst index f4e8602531785..927b5ca672d3c 100644 --- a/docs/source/apex.rst +++ b/docs/source/apex.rst @@ -8,7 +8,7 @@ Lightning offers 16-bit training for CPUs, GPUs and TPUs. GPU 16-bit ------------ +---------- 16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well. diff --git a/docs/source/callbacks.rst b/docs/source/callbacks.rst index c8202f0ceff59..39612cfce6da5 100644 --- a/docs/source/callbacks.rst +++ b/docs/source/callbacks.rst @@ -46,7 +46,7 @@ Example: We successfully extended functionality without polluting our super clean :class:`~pytorch_lightning.core.LightningModule` research code. ---- +---------------- .. automodule:: pytorch_lightning.callbacks.base :noindex: @@ -56,7 +56,7 @@ We successfully extended functionality without polluting our super clean _abc_impl, check_monitor_top_k, ---- +---------------- .. automodule:: pytorch_lightning.callbacks.early_stopping :noindex: @@ -66,7 +66,7 @@ We successfully extended functionality without polluting our super clean _abc_impl, check_monitor_top_k, ---- +---------------- .. automodule:: pytorch_lightning.callbacks.gradient_accumulation_scheduler :noindex: @@ -76,7 +76,7 @@ We successfully extended functionality without polluting our super clean _abc_impl, check_monitor_top_k, ---- +---------------- .. automodule:: pytorch_lightning.callbacks.lr_logger :noindex: @@ -84,7 +84,7 @@ We successfully extended functionality without polluting our super clean _extract_lr, _find_names ---- +---------------- .. automodule:: pytorch_lightning.callbacks.model_checkpoint :noindex: @@ -94,7 +94,7 @@ We successfully extended functionality without polluting our super clean _abc_impl, check_monitor_top_k, ---- +---------------- .. automodule:: pytorch_lightning.callbacks.progress :noindex: diff --git a/docs/source/debugging.rst b/docs/source/debugging.rst index bad72541f74e9..06f9cd4344b43 100644 --- a/docs/source/debugging.rst +++ b/docs/source/debugging.rst @@ -6,7 +6,7 @@ Debugging ========= The following are flags that make debugging much easier. ---- +---------------- fast_dev_run ------------ @@ -21,7 +21,7 @@ argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`) trainer = Trainer(fast_dev_run=True) ---- +---------------- Inspect gradient norms ---------------------- @@ -35,7 +35,7 @@ argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`) # the 2-norm trainer = Trainer(track_grad_norm=2) ---- +---------------- Log GPU usage ------------- @@ -48,7 +48,7 @@ argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`) trainer = Trainer(log_gpu_memory=True) ---- +---------------- Make model overfit on subset of data ------------------------------------ @@ -70,7 +70,7 @@ argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`) With this flag, the train, val, and test sets will all be the same train set. We will also replace the sampler in the training set to turn off shuffle for you. ---- +---------------- Print a summary of your LightningModule --------------------------------------- @@ -99,7 +99,7 @@ See Also: - :paramref:`~pytorch_lightning.trainer.trainer.Trainer.weights_summary` Trainer argument - :class:`~pytorch_lightning.core.memory.ModelSummary` ---- +---------------- Shorten epochs -------------- @@ -116,7 +116,7 @@ On larger datasets like Imagenet, this can help you debug or test a few things f # use 10 batches of train and 5 batches of val trainer = Trainer(limit_train_batches=10, limit_val_batches=5) ---- +---------------- Set the number of validation sanity steps ----------------------------------------- diff --git a/docs/source/experiment_logging.rst b/docs/source/experiment_logging.rst index 6d6d96a4157f3..96c72ee80abc8 100644 --- a/docs/source/experiment_logging.rst +++ b/docs/source/experiment_logging.rst @@ -7,8 +7,6 @@ Experiment Logging ================== ---- - Comet.ml ^^^^^^^^ @@ -49,7 +47,7 @@ The :class:`~pytorch_lightning.loggers.CometLogger` is available anywhere except .. seealso:: :class:`~pytorch_lightning.loggers.CometLogger` docs. ---- +---------------- MLflow ^^^^^^ @@ -76,7 +74,7 @@ Then configure the logger and pass it to the :class:`~pytorch_lightning.trainer. .. seealso:: :class:`~pytorch_lightning.loggers.MLFlowLogger` docs. ---- +---------------- Neptune.ai ^^^^^^^^^^ @@ -116,7 +114,7 @@ The :class:`~pytorch_lightning.loggers.NeptuneLogger` is available anywhere exce .. seealso:: :class:`~pytorch_lightning.loggers.NeptuneLogger` docs. ---- +---------------- allegro.ai TRAINS ^^^^^^^^^^^^^^^^^ @@ -160,7 +158,7 @@ The :class:`~pytorch_lightning.loggers.TrainsLogger` is available anywhere in yo .. seealso:: :class:`~pytorch_lightning.loggers.TrainsLogger` docs. ---- +---------------- Tensorboard ^^^^^^^^^^^ @@ -186,7 +184,7 @@ The :class:`~pytorch_lightning.loggers.TensorBoardLogger` is available anywhere .. seealso:: :class:`~pytorch_lightning.loggers.TensorBoardLogger` docs. ---- +---------------- Test Tube ^^^^^^^^^ @@ -221,7 +219,7 @@ The :class:`~pytorch_lightning.loggers.TestTubeLogger` is available anywhere exc .. seealso:: :class:`~pytorch_lightning.loggers.TestTubeLogger` docs. ---- +---------------- Weights and Biases ^^^^^^^^^^^^^^^^^^ @@ -257,7 +255,7 @@ The :class:`~pytorch_lightning.loggers.WandbLogger` is available anywhere except .. seealso:: :class:`~pytorch_lightning.loggers.WandbLogger` docs. ---- +---------------- Multiple Loggers ^^^^^^^^^^^^^^^^ diff --git a/docs/source/fast_training.rst b/docs/source/fast_training.rst index 895e8d9662281..a196063b12198 100644 --- a/docs/source/fast_training.rst +++ b/docs/source/fast_training.rst @@ -8,7 +8,7 @@ Fast Training There are multiple options to speed up different parts of the training by choosing to train on a subset of data. This could be done for speed or debugging purposes. ---- +---------------- Check validation every n epochs ------------------------------- @@ -19,7 +19,7 @@ If you have a small dataset you might want to check validation every n epochs # DEFAULT trainer = Trainer(check_val_every_n_epoch=1) ---- +---------------- Force training for min or max epochs ------------------------------------ @@ -33,7 +33,7 @@ It can be useful to force training for a minimum number of epochs or limit to a # DEFAULT trainer = Trainer(min_epochs=1, max_epochs=1000) ---- +---------------- Set validation check frequency within 1 training epoch ------------------------------------------------------ @@ -52,7 +52,7 @@ Must use an int if using an IterableDataset. # check every 100 train batches (ie: for IterableDatasets or fixed frequency) trainer = Trainer(val_check_interval=100) ---- +---------------- Use data subset for training, validation and test ------------------------------------------------- diff --git a/docs/source/hooks.rst b/docs/source/hooks.rst index 86a659e148a27..f3f2d2c95cc90 100644 --- a/docs/source/hooks.rst +++ b/docs/source/hooks.rst @@ -12,7 +12,7 @@ To enable a hook, simply override the method in your LightningModule and the tra 3. Add it in the correct place in :mod:`pytorch_lightning.trainer` where it should be called. ---- +---------------- Hooks lifecycle --------------- @@ -72,7 +72,7 @@ Test loop - ``torch.set_grad_enabled(True)`` - :meth:`~pytorch_lightning.core.hooks.ModelHooks.on_post_performance_check` ---- +---------------- General hooks ------------- diff --git a/docs/source/introduction_guide.rst b/docs/source/introduction_guide.rst index fd46d4e4caafb..07c5577671771 100644 --- a/docs/source/introduction_guide.rst +++ b/docs/source/introduction_guide.rst @@ -17,7 +17,7 @@ To illustrate, here's the typical PyTorch project structure organized in a Light As your project grows in complexity with things like 16-bit precision, distributed training, etc... the part in blue quickly becomes onerous and starts distracting from the core research code. ---- +---------------- Goal of this guide ------------------ @@ -32,7 +32,7 @@ to use inheritance to very quickly create an AutoEncoder. .. note:: Any DL/ML PyTorch project fits into the Lightning structure. Here we just focus on 3 types of research to illustrate. ---- +---------------- Installing Lightning -------------------- @@ -55,7 +55,7 @@ Or with conda conda install pytorch-lightning -c conda-forge ---- +---------------- Lightning Philosophy -------------------- @@ -117,7 +117,7 @@ In Lightning this code is abstracted out by `Callbacks`. generated = decoder(z) self.experiment.log('images', generated) ---- +---------------- Elements of a research project ------------------------------ @@ -383,7 +383,7 @@ in the LightningModule Again, this is the same PyTorch code except that it has been organized by the LightningModule. This code is not restricted which means it can be as complicated as a full seq-2-seq, RL loop, GAN, etc... ---- +---------------- Training -------- @@ -594,11 +594,11 @@ Notice the epoch is MUCH faster! .. figure:: /_images/mnist_imgs/tpu_fast.png :alt: TPU speed ---- +---------------- .. include:: hyperparameters.rst ---- +---------------- Validating ---------- @@ -677,7 +677,7 @@ in the validation loop, you won't need to potentially wait a full epoch to find .. note:: Lightning disables gradients, puts model in eval mode and does everything needed for validation. ---- +---------------- Testing ------- @@ -748,7 +748,7 @@ You can also run the test from a saved lightning model .. warning:: .test() is not stable yet on TPUs. We're working on getting around the multiprocessing challenges. ---- +---------------- Predicting ---------- @@ -849,7 +849,7 @@ Or maybe we have a model that we use to do generation How you split up what goes in `forward` vs `training_step` depends on how you want to use this model for prediction. ---- +---------------- Extensibility ------------- @@ -910,7 +910,7 @@ you could do your own: Every single part of training is configurable this way. For a full list look at `LightningModule `_. ---- +---------------- Callbacks --------- @@ -947,10 +947,10 @@ And pass the callbacks into the trainer .. note:: See full list of 12+ hooks in the :ref:`callbacks`. ---- +---------------- .. include:: child_modules.rst ---- +---------------- .. include:: transfer_learning.rst diff --git a/docs/source/metrics.rst b/docs/source/metrics.rst index 88d4ef4c27010..3d6c8472fc9e0 100644 --- a/docs/source/metrics.rst +++ b/docs/source/metrics.rst @@ -31,7 +31,7 @@ Example:: to a few metrics. Please feel free to create an issue/PR if you have a proposed metric or have found a bug. ---- +---------------- Implement a metric ------------------ @@ -48,6 +48,8 @@ handles automated DDP syncing and converts all inputs and outputs to tensors. Numpy metrics might slow down your training substantially, since every metric computation requires a GPU sync to convert tensors to numpy. +---------------- + TensorMetric ^^^^^^^^^^^^ Here's an example showing how to implement a TensorMetric @@ -61,6 +63,8 @@ Here's an example showing how to implement a TensorMetric .. autoclass:: pytorch_lightning.metrics.metric.TensorMetric :noindex: +---------------- + NumpyMetric ^^^^^^^^^^^ Here's an example showing how to implement a NumpyMetric @@ -75,7 +79,7 @@ Here's an example showing how to implement a NumpyMetric .. autoclass:: pytorch_lightning.metrics.metric.NumpyMetric :noindex: ---- +---------------- Class Metrics ------------- @@ -225,7 +229,7 @@ RMSLE .. autoclass:: pytorch_lightning.metrics.regression.RMSE :noindex: ---- +---------------- Functional Metrics ------------------ @@ -364,13 +368,11 @@ stat_scores_multiple_classes (F) .. autofunction:: pytorch_lightning.metrics.functional.stat_scores_multiple_classes :noindex: ---- +---------------- Metric pre-processing --------------------- -Metric - to_categorical (F) ^^^^^^^^^^^^^^^^^^ @@ -383,7 +385,7 @@ to_onehot (F) .. autofunction:: pytorch_lightning.metrics.functional.to_onehot :noindex: ---- +---------------- Sklearn interface ----------------- diff --git a/docs/source/tpu.rst b/docs/source/tpu.rst index ddc633b2754b2..c289a688ec028 100644 --- a/docs/source/tpu.rst +++ b/docs/source/tpu.rst @@ -5,13 +5,13 @@ Lightning supports running on TPUs. At this moment, TPUs are available on Google Cloud (GCP), Google Colab and Kaggle Environments. For more information on TPUs `watch this video `_. ---- +---------------- Live demo ---------- Check out this `Google Colab `_ to see how to train MNIST on TPUs. ---- +---------------- TPU Terminology --------------- @@ -23,7 +23,7 @@ A TPU pod hosts many TPUs on it. Currently, TPU pod v2 has 2048 cores! You can request a full pod from Google cloud or a "slice" which gives you some subset of those 2048 cores. ---- +---------------- How to access TPUs ------------------ @@ -33,7 +33,7 @@ To access TPUs there are two main ways. 2. Using Google Cloud (GCP). 3. Using Kaggle. ---- +---------------- Colab TPUs ----------- @@ -65,7 +65,7 @@ To get a TPU on colab, follow these steps: 6. Then set up your LightningModule as normal. ---- +---------------- DistributedSamplers ------------------- @@ -122,7 +122,7 @@ To use a full TPU pod skip to the TPU pod section. That's it! Your model will train on all 8 TPU cores. ---- +---------------- Single TPU core training ------------------------ @@ -132,14 +132,14 @@ Lightning supports training on a single TPU core. Just pass the TPU core ID [1-8 trainer = pl.Trainer(tpu_cores=[1]) ---- +---------------- Distributed Backend with TPU ---------------------------- The ```distributed_backend``` option used for GPUs does not apply to TPUs. TPUs work in DDP mode by default (distributing over each core) ---- +---------------- TPU Pod ------- @@ -153,7 +153,7 @@ All you need to do is submit the following command: --conda-env=torch-xla-nightly -- python /usr/share/torch-xla-0.5/pytorch/xla/test/test_train_imagenet.py --fake_data ---- +---------------- 16 bit precision ----------------- @@ -171,7 +171,7 @@ set the 16-bit flag. Under the hood the xla library will use the `bfloat16 type `_. ---- +---------------- About XLA ---------- diff --git a/pytorch_lightning/__init__.py b/pytorch_lightning/__init__.py index 24e190c13a91b..094cb1aaa2b68 100644 --- a/pytorch_lightning/__init__.py +++ b/pytorch_lightning/__init__.py @@ -1,6 +1,6 @@ """Root package info.""" -__version__ = '0.8.0' +__version__ = '0.8.1-dev' __author__ = 'William Falcon et al.' __author_email__ = 'waf2107@columbia.edu' __license__ = 'Apache-2.0'