[KED-1408, KED-1442] Code cleanup (#485)

kedro-org · Mar 13, 2020 · ecd7277 · ecd7277
1 parent c585b55
commit ecd7277
Show file tree

Hide file tree

Showing 151 changed files with 166 additions and 18,279 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -52,13 +52,6 @@ repos:
             files: ^features/
             entry: pylint --disable=missing-docstring,no-name-in-module
             stages: [commit]
-          - id: pylint-quick-extras
-            name: "Quick PyLint on extras/*"
-            language: system
-            types: [file, python]
-            files: ^extras/
-            entry: pylint
-            stages: [commit]
           - id: pylint-quick-tests
             name: "Quick PyLint on tests/*"
             language: system
@@ -80,12 +73,6 @@ repos:
             pass_filenames: false
             stages: [manual]
             entry: pylint --disable=missing-docstring,no-name-in-module features
-          - id: pylint-extras
-            name: "PyLint on extras/*"
-            language: system
-            pass_filenames: false
-            stages: [manual]
-            entry: pylint extras
           - id: pylint-tests
             name: "PyLint on tests/*"
             language: system
@@ -97,7 +84,7 @@ repos:
             name: "Black"
             language: system
             pass_filenames: false
-            entry: python -m tools.min_version 3.6 "black kedro extras features tests"
+            entry: python -m tools.min_version 3.6 "black kedro features tests"
           - id: legal
             name: "Licence check"
             language: system

diff --git a/RELEASE.md b/RELEASE.md
@@ -16,6 +16,23 @@
 * `get_last_load_version` and `get_last_save_version` have been renamed to `resolve_load_version` and `resolve_save_version` on ``AbstractVersionedDataSet``, the results of which are cached.
 * The `release()` method on datasets extending ``AbstractVersionedDataSet`` clears the cached load and save version. All custom datasets must call `super()._release()` inside `_release()`.
 * Removed `KEDRO_ENV_VAR` from `kedro.context` to speed up the CLI run time. To make `kedro` work with project templates generated with earlier versions of Kedro, remove all instances of `KEDRO_ENV_VAR` from `kedro_cli.py`.
+* Deleted obsoleted datasets from `kedro.io`.
+* Deleted `kedro.contrib` and `extras` folders.
+
+### Migration guide from Kedro 0.15.* to Upcoming Release
+#### Migration for datasets
+
+Since all the datasets (from `kedro.io` and `kedro.contrib.io`) were moved to `kedro/extras/datasets` you must update the type of all datasets in `<project>/conf/base/catalog.yml` file.
+Here how it should be changed: `type: <SomeDataSet>` -> `type: <subfolder of kedro/extras/datasets>.<SomeDataSet>` (e.g. `type: CSVDataSet` -> `type: pandas.CSVDataSet`).
+
+In addition, all the specific datasets like `CSVLocalDataSet`, `CSVS3DataSet` etc. were deprecated. In addition, all the specific datasets like `CSVLocalDataSet`, `CSVS3DataSet` etc. were deprecated. Instead, you must use generalized datasets like `CSVDataSet`.
+E.g. `type: CSVS3DataSet` -> `type: pandas.CSVDataSet`.
+
+Note: No changes required if you are using your custom dataset.
+
+#### Migration for decorators, color logger, transformers etc.
+Since some modules were moved to other locations you need to update import paths appropriately.
+The list of moved files you can find in `0.15.6` release notes under `Files with a new location` section.
 
 ## Thanks for supporting contributions
 [@foolsgold](https://github.com/foolsgold), [Mani Sarkar](https://github.com/neomatrix369), [Priyanka Shanbhag](https://github.com/priyanka1414), [Luis Blanche](https://github.com/LuisBlanche)

diff --git a/docs/conf.py b/docs/conf.py
@@ -215,9 +215,10 @@
     "kedro.pipeline",
     "kedro.runner",
     "kedro.config",
-    "kedro.contrib.io",
-    "kedro.contrib.colors.logging",
-    "kedro.contrib.decorators",
+    "kedro.extras.datasets",
+    "kedro.extras.logging",
+    "kedro.extras.decorators",
+    "kedro.extras.transformers",
 ]
 
 

diff --git a/docs/source/04_user_guide/04_data_catalog.md b/docs/source/04_user_guide/04_data_catalog.md
@@ -592,4 +592,4 @@ io.save("ranked", ranked)
 > *Note:* Saving `None` to a dataset is not allowed!
 
 ### Creating your own dataset
-More specialised datasets can be found in `contrib/io`. [Creating new datasets](../03_tutorial/03_set_up_data.md#creating-custom-datasets) is the easiest way to contribute to the Kedro project.
+All datasets can be found in `kedro/extras/datasets`. [Creating new datasets](../03_tutorial/03_set_up_data.md#creating-custom-datasets) is the easiest way to contribute to the Kedro project.
diff --git a/docs/source/04_user_guide/06_pipelines.md b/docs/source/04_user_guide/06_pipelines.md
@@ -625,7 +625,7 @@ Hello f(h(g(Python)))!
 Out[9]: {}
 ```
 
-Decorators can be useful for monitoring your pipeline. Kedro currently has 1 built-in decorator: `log_time`, which will log the time taken for executing your node. You can find it in `kedro.pipeline.decorators`. Other decorators can be found in `kedro.contrib.decorators`, for which you will need to install the required dependencies.
+Decorators can be useful for monitoring your pipeline. Kedro currently has 1 built-in decorator: `log_time`, which will log the time taken for executing your node. You can find it in `kedro.pipeline.decorators`. Other decorators can be found in `kedro.extras.decorators`, for which you will need to install the required dependencies.
 
 ## Running pipelines with IO
 

diff --git a/docs/source/04_user_guide/08_advanced_io.md b/docs/source/04_user_guide/08_advanced_io.md
@@ -227,7 +227,7 @@ Currently the following datasets support versioning:
 
 ## Partitioned dataset
 
-These days distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you may encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro.contrib.io.pyspark.SparkDataSet) cater for such use cases, but the use of Spark is not always feasible.
+These days distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you may encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro.extras.datasets.spark.SparkDataSet) cater for such use cases, but the use of Spark is not always feasible.
 
 This is the reason why Kedro provides a built-in [PartitionedDataSet](/kedro.io.PartitionedDataSet), which has the following features:
 1. `PartitionedDataSet` can recursively load all or specific files from a given location.

diff --git a/docs/source/04_user_guide/09_pyspark.md b/docs/source/04_user_guide/09_pyspark.md
@@ -80,7 +80,7 @@ Since `SparkSession` is a [singleton](https://python-3-patterns-idioms-test.read
 
 Having created a `SparkSession`, you can load your data using `PySpark`'s [DataFrameReader](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader).
 
-To do so, please use the provided [SparkDataSet](/kedro.contrib.io.pyspark.SparkDataSet):
+To do so, please use the provided [SparkDataSet](/kedro.extras.datasets.spark.SparkDataSet):
 
 ### Code API
 

diff --git a/docs/source/05_api_docs/kedro.contrib.io.rst b/docs/source/05_api_docs/kedro.contrib.io.rst
diff --git a/docs/source/05_api_docs/kedro.contrib.rst b/docs/source/05_api_docs/kedro.contrib.rst
diff --git a/docs/source/05_api_docs/kedro.io.rst b/docs/source/05_api_docs/kedro.io.rst
@@ -21,29 +21,13 @@ Data Sets
     :toctree:
     :template: autosummary/class.rst
 
-    kedro.io.CSVLocalDataSet
-    kedro.io.CSVHTTPDataSet
-    kedro.io.CSVS3DataSet
-    kedro.io.HDFLocalDataSet
-    kedro.io.HDFS3DataSet
-    kedro.io.JSONLocalDataSet
-    kedro.io.JSONDataSet
     kedro.io.LambdaDataSet
     kedro.io.MemoryDataSet
-    kedro.io.ParquetLocalDataSet
     kedro.io.PartitionedDataSet
     kedro.io.IncrementalDataSet
-    kedro.io.PickleLocalDataSet
-    kedro.io.PickleS3DataSet
-    kedro.io.SQLTableDataSet
-    kedro.io.SQLQueryDataSet
-    kedro.io.TextLocalDataSet
-    kedro.io.ExcelLocalDataSet
     kedro.io.CachedDataSet
     kedro.io.DataCatalogWithDefault
 
-Additional ``AbstractDataSet`` implementations can be found in ``kedro.contrib.io``.
-
 Errors
 ------
 

diff --git a/docs/source/05_api_docs/kedro.rst b/docs/source/05_api_docs/kedro.rst
@@ -16,7 +16,6 @@ kedro
   kedro.pipeline
   kedro.runner
   kedro.context
-  kedro.contrib
   kedro.cli
   kedro.versioning
   kedro.extras.datasets

diff --git a/extras/README.md b/extras/README.md
diff --git a/extras/__init__.py b/extras/__init__.py
diff --git a/extras/ipython_loader.py b/extras/ipython_loader.py