Skip to content

Stability and additional improvements

Compare
Choose a tag to compare
@Borda Borda released this 17 Jan 17:26
· 1955 commits to master since this release
fc195b9

App

Added

  • Added a possibility to set up basic authentication for Lightning apps (#16105)

Changed

  • The LoadBalancer now uses internal ip + port instead of URL exposed (#16119)
  • Added support for logging in different trainer stages with DeviceStatsMonitor
    (#16002)
  • Changed lightning_app.components.serve.gradio to lightning_app.components.serve.gradio_server (#16201)
  • Made cluster creation/deletion async by default (#16185)

Fixed

  • Fixed not being able to run multiple lightning apps locally due to port collision (#15819)
  • Avoid relpath bug on Windows (#16164)
  • Avoid using the deprecated LooseVersion (#16162)
  • Porting fixes to autoscaler component (#16249)
  • Fixed a bug where lightning login with env variables would not correctly save the credentials (#16339)

Fabric

Added

  • Added Fabric.launch() to programmatically launch processes (e.g. in Jupyter notebook) (#14992)
  • Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the run method (#14992)
  • Added Fabric.setup_module() and Fabric.setup_optimizers() to support strategies that need to set up the model before an optimizer can be created (#15185)
  • Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
  • Added lightning_fabric.accelerators.find_usable_cuda_devices utility function (#16147)
  • Added basic support for LightningModules (#16048)
  • Added support for managing callbacks via Fabric(callbacks=...) and emitting events through Fabric.call() (#16074)
  • Added Logger support (#16121)
    • Added Fabric(loggers=...) to support different Logger frameworks in Fabric
    • Added Fabric.log for logging scalars using multiple loggers
    • Added Fabric.log_dict for logging a dictionary of multiple metrics at once
    • Added Fabric.loggers and Fabric.logger attributes to access the individual logger instances
    • Added support for calling self.log and self.log_dict in a LightningModule when using Fabric
    • Added access to self.logger and self.loggers in a LightningModule when using Fabric
  • Added lightning_fabric.loggers.TensorBoardLogger (#16121)
  • Added lightning_fabric.loggers.CSVLogger (#16346)
  • Added support for a consistent .zero_grad(set_to_none=...) on the wrapped optimizer regardless of which strategy is used (#16275)

Changed

  • Renamed the class LightningLite to Fabric (#15932, #15938)
  • The Fabric.run() method is no longer abstract (#14992)
  • The XLAStrategy now inherits from ParallelStrategy instead of DDPSpawnStrategy (#15838)
  • Merged the implementation of DDPSpawnStrategy into DDPStrategy and removed DDPSpawnStrategy (#14952)
  • The dataloader wrapper returned from .setup_dataloaders() now calls .set_epoch() on the distributed sampler if one is used (#16101)
  • Renamed Strategy.reduce to Strategy.all_reduce in all strategies (#16370)
  • When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (#16388)

Removed

  • Removed support for FairScale's sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

Fixed

  • Restored sampling parity between PyTorch and Fabric dataloaders when using the DistributedSampler (#16101)
  • Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (#16334)

PyTorch

Added

  • Added support for native logging of MetricCollection with enabled compute groups (#15580)
  • Added support for custom artifact names in pl.loggers.WandbLogger (#16173)
  • Added support for DDP with LRFinder (#15304)
  • Added utilities to migrate checkpoints from one Lightning version to another (#15237)
  • Added support to upgrade all checkpoints in a folder using the pl.utilities.upgrade_checkpoint script (#15333)
  • Add an axes argument ax to the .lr_find().plot() to enable writing to a user-defined axes in a matplotlib figure (#15652)
  • Added log_model parameter to MLFlowLogger (#9187)
  • Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
  • Added a warning when self.log(..., logger=True) is called without a configured logger (#15814)
  • Added support for colossalai 0.1.11 (#15888)
  • Added LightningCLI support for optimizer and learning schedulers via callable type dependency injection (#15869)
  • Added support for activation checkpointing for the DDPFullyShardedNativeStrategy strategy (#15826)
  • Added the option to set DDPFullyShardedNativeStrategy(cpu_offload=True|False) via bool instead of needing to pass a configuration object (#15832)
  • Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (#16037)
  • Added support for returning optimizer-like classes in LightningModule.configure_optimizers (#16189)

Changed

  • Switch from tensorboard to tensorboardx in TensorBoardLogger (#15728)
  • From now on, Lightning Trainer and LightningModule.load_from_checkpoint automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (#15237)
  • Trainer.{validate,test,predict}(ckpt_path=...) no longer restores the Trainer.global_step and trainer.current_epoch value from the checkpoints - From now on, only Trainer.fit will restore this value (#15532)
  • The ModelCheckpoint.save_on_train_epoch_end attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (#15300)
  • The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (#15634)
  • MLFlowLogger now logs hyperparameters and metrics in batched API calls (#15915)
  • Overriding the on_train_batch_{start,end} hooks in conjunction with taking a dataloader_iter in the training_step no longer errors out and instead shows a warning (#16062)
  • Move tensorboardX to extra dependencies. Use the CSVLogger by default (#16349)
  • Drop PyTorch 1.9 support (#15347)

Deprecated

  • Deprecated description, env_prefix and env_parse parameters in LightningCLI.__init__ in favour of giving them through parser_kwargs (#15651)
  • Deprecated pytorch_lightning.profiler in favor of pytorch_lightning.profilers (#16059)
  • Deprecated Trainer(auto_select_gpus=...) in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
  • Deprecated pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus} in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
  • nvidia/apex deprecation (#16039)
    • Deprecated pytorch_lightning.plugins.NativeMixedPrecisionPlugin in favor of pytorch_lightning.plugins.MixedPrecisionPlugin
    • Deprecated the LightningModule.optimizer_step(using_native_amp=...) argument
    • Deprecated the Trainer(amp_backend=...) argument
    • Deprecated the Trainer.amp_backend property
    • Deprecated the Trainer(amp_level=...) argument
    • Deprecated the pytorch_lightning.plugins.ApexMixedPrecisionPlugin class
    • Deprecates the pytorch_lightning.utilities.enums.AMPType enum
    • Deprecates the DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...) arguments
  • horovod deprecation (#16141)
    • Deprecated Trainer(strategy="horovod")
    • Deprecated the HorovodStrategy class
  • Deprecated pytorch_lightning.lite.LightningLite in favor of lightning.fabric.Fabric (#16314)
  • FairScale deprecation (in favor of PyTorch's FSDP implementation) (#16353)
    • Deprecated the pytorch_lightning.overrides.fairscale.LightningShardedDataParallel class
    • Deprecated the pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin class
    • Deprecated the pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin class
    • Deprecated the pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy class
    • Deprecated the pytorch_lightning.strategies.sharded.DDPShardedStrategy class
    • Deprecated the pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy class

Removed

  • Removed deprecated pytorch_lightning.utilities.memory.get_gpu_memory_map in favor of pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats (#15617)
  • Temporarily removed support for Hydra multi-run (#15737)
  • Removed deprecated pytorch_lightning.profiler.base.AbstractProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
  • Removed deprecated pytorch_lightning.profiler.base.BaseProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
  • Removed deprecated code in pytorch_lightning.utilities.meta (#16038)
  • Removed the deprecated LightningDeepSpeedModule (#16041)
  • Removed the deprecated pytorch_lightning.accelerators.GPUAccelerator in favor of pytorch_lightning.accelerators.CUDAAccelerator (#16050)
  • Removed the deprecated pytorch_lightning.profiler.* classes in favor of pytorch_lightning.profilers (#16059)
  • Removed the deprecated pytorch_lightning.utilities.cli module in favor of pytorch_lightning.cli (#16116)
  • Removed the deprecated pytorch_lightning.loggers.base module in favor of pytorch_lightning.loggers.logger (#16120)
  • Removed the deprecated pytorch_lightning.loops.base module in favor of pytorch_lightning.loops.loop (#16142)
  • Removed the deprecated pytorch_lightning.core.lightning module in favor of pytorch_lightning.core.module (#16318)
  • Removed the deprecated pytorch_lightning.callbacks.base module in favor of pytorch_lightning.callbacks.callback (#16319)
  • Removed the deprecated Trainer.reset_train_val_dataloaders() in favor of Trainer.reset_{train,val}_dataloader (#16131)
  • Removed support for LightningCLI(seed_everything_default=None) (#16131)
  • Removed support in LightningLite for FairScale's sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

Fixed

  • Enhanced reduce_boolean_decision to accommodate any-analogous semantics expected by the EarlyStopping callback (#15253)
  • Fixed the incorrect optimizer step synchronization when running across multiple TPU devices (#16020)
  • Fixed a type error when dividing the chunk size in the ColossalAI strategy (#16212)
  • Fixed bug where the interval key of the scheduler would be ignored during manual optimization, making the LearningRateMonitor callback fail to log the learning rate (#16308)
  • Fixed an issue with MLFlowLogger not finalizing correctly when status code 'finished' was passed (#16340)

Contributors

@1SAA, @akihironitta, @AlessioQuercia, @awaelchli, @bipinKrishnan, @Borda, @carmocca, @dmitsf, @erhoo82, @ethanwharris, @Forbu, @hhsecond, @justusschock, @lantiga, @lightningforever, @Liyang90, @manangoel99, @mauvilsa, @nicolai86, @nohalon, @rohitgr7, @schmidt-jake, @speediedan, @yMayanand

If we forgot someone due to not matching commit email with GitHub account, let us know :]