Skip to content

Releases: ranamihir/pytorch_common

v1.5.6

24 Mar 18:57
764363b
Compare
Choose a tag to compare
  • Added ability to store checkpoints in retraining mode w/o early stopping.
  • Updated README.

v1.5.5

06 Mar 06:18
Compare
Choose a tag to compare
  • Added support for reduction={"mean"|"sum"} for losses when using sample weighting.
  • Switched to using bumpversion for version control.
  • More unit tests.
  • Docstring cleanup and improvements.
  • Other minor bug fixes and improvements.
  • In sync with cd46a29f.

v1.5.4

08 Feb 17:53
b154ff6
Compare
Choose a tag to compare
  • [Breaking] train_utils.perform_one_epoch() now returns a dictionary instead of list.
  • Model evaluation / prediction methods now accept return_keys as an argument to pre-specify what items are to be returned.
    • This results in huge memory advantages by not having to store unnecessary objects.
  • Added option to only perform training without any evaluation, by simply not providing any validation dataloader / logger arguments.
    • [Breaking] As part of this change, for the sake of simplicity, the ReduceLROnPlateau scheduler is now no longer supported (which requires validation loss in order to take each step).
  • Added feature to add sample weighting during training and evaluation.
  • Added several unit tests in accordance with the aforementioned features.
  • Changed default early stopping criterion to accuracy (instead of f1).
  • Several other time, memory, and logging improvements.
  • In sync withc3809cf7

v1.5.3

05 Jan 05:28
a72ae42
Compare
Choose a tag to compare
  • [Breaking] train_model() in train_utils.py now supports checkpoint_file argument instead of start_epoch (which is now inferred) for resuming training.
    • The trained model (located at checkpoint_file) is now loaded inside the function, rather than having to load it separately first.
  • Major improvement in computation of top-k accuracy scores.
    • Instead of computing it separately for each k, the computation is shared under the hood across all ks as much as possible, which is a huge reduction in time, especially for problems with a high number of classes.
  • Added create_dataloader() to datasets.py for creating a DataLoader from a Dataset.
  • Using time.perf_counter() instead of time.time() for measuring function execution time.
  • Other minor improvements and bug fixes.
  • In sync with 1a95403b

v1.5.2

10 Nov 17:30
284aa97
Compare
Choose a tag to compare
  • Updated to pytorch=1.8.0 and cudatoolkit=10.1
  • Overhauled metric computation.
    • Much cleaner code
    • Drastic reduction in metric computation time, since preprocessing is now shared for many metrics, e.g. getting the max-probability class for accuracy / precision / recall / f1, etc.

v1.5.1

02 Nov 22:00
d3b1cd3
Compare
Choose a tag to compare
  • Memory improvements
  • Minor bug fix in test and code improvements
  • More type annotations
  • Updated transformers version
  • Changed config attribute from model to model_name
  • Removed transiendir, misc_data_dir in favor of artifact_dir
  • Better logging

v1.5

30 Aug 17:15
70d38bf
Compare
Choose a tag to compare

Added better type annotations + minor improvements and bug fixes.

v1.4

24 Jul 02:51
ec2d9ba
Compare
Choose a tag to compare

This version primarily adds type annotations and makes aesthetic changes to conform to black and isort code quality guidelines but breaks backward compatibility in a few important places.

Breaking changes:

  • config.py: Deprecated support for batch_size. Only per-GPU batch size is supported now. It may be specified as follows:
    • batch_size_per_gpu (as before), which will use the same batch size for all modes
    • {mode}_batch_size_per_gpu (mode = train/eval/test) for specifying different batch sizes for each mode
  • datasets_dl.py:
    • Renamed print() -> print_dataset() in BasePyTorchDataset
    • oversample_class() now takes oversampling_factor as an argument instead of earlier setting it as a class attribute (similarly for undersample_class())
    • It additionally takes column as an argument to specify the column on which to perform sampling (which defaults to self.target_col to imitate existing behavior)
    • Added sample_class() as a generic function for both over-/under-sampling
  • models_dl.py: Renamed print() -> print_model() in BasePyTorchModel
  • train_utils.py:
    • save_model():
      • Arguments optimizer, train_logger, and val_logger are all optional now to allow saving just the model and config
      • Takes arguments in a different order
  • utils.py: get_string_from_dict() now sorts and config_info_dict first before generating a unique string to ensure same string is obtained regardless of order of the keys

Other changes:

  • Added type annotations everywhere
  • Switched to double quotes everywhere to conform to PEP 8/257 guidelines
  • Sticking to black and isort code quality standards
  • Switched to max. line length of 119
  • Added more tests. Updated existing ones to work with the aforementioned changes.
  • utils.py: Moved get_trainable_params() here (which is directly called in BasePyTorchModel) to allow support for non-BasePyTorchModel-type models
  • types.py: Additional file to define common (predefined and custom) data types
  • pre-push.sh now assumes the appropriate environment is already enabled (instead of forcibly enabling one named pytorch_common, which may not be available)
  • Minor performance improvements + code cleanup + bug fixes
  • Upgraded transformers version to 3.0.2 and pytorch_common to 1.4

To run the code formatters, run the following commands from the main project directory:

black --line-length 119 --target-version py37 .
isort -rc -l 119 -tc -m 3 .

v1.3

24 Jul 01:47
Compare
Choose a tag to compare
  • Added unit tests for all files in the package
    • The tests mostly revolve around ensuring correct setup of the config and making sure training/saving/loading/evaluating all models and (compatible) datasets with all (compatible) metrics for regression/classification works
  • Added/fixed code for simple regression datasets and models (regression wasn't used / tested too much before)
  • Added several util functions (mostly used only in unit testing for now)
  • Stricter and better asserts in config.py
  • Renamed datasets.py to datasets_dl.py and created datasets.py only for loading different datasets (for consistency with models.py and models_dl.py
  • Added a pre-push hook for automatically running tests before each hook (pre-commit would've been too frequent and slowed down development)
  • Minor code cleanup + docstring improvements + bug fixes + Readme updates

v1.2.1

24 Jul 01:46
Compare
Choose a tag to compare
  • Upgraded to transformers==2.9.0 which has many performance improvements + bug fixes
  • Using common loop for training/evaluation/testing to remove duplicate code
  • Added support for specifying decoupling function in train_model() (and get_all_predictions()) to define how to extract the inputs (and targets) from a batch
    • This may be useful in case this process deviates from the typical behavior, but the training paradigm is otherwise the same, and hence train_model() can still be used
  • Removed dependency on vrdscommon
    • The timing decorator was being imported from vrdscommon; now one is defined in the package itself
    • As a result of this, added support for defining decorators
  • Added/improved util functions:
    • get_total_grad_norm()
    • compare_tensors_or_arrays()
    • is_batch_on_gpu()
    • is_model_on_gpu()
    • is_model_parallelized()
    • remove_dir()
    • save_object() now also supports saving YAML files
  • Minor cleanup and linting