Skip to content

Commit

Permalink
[KED-1408, KED-1442] Code cleanup (#485)
Browse files Browse the repository at this point in the history
  • Loading branch information
andrii-ivaniuk authored Mar 13, 2020
1 parent c585b55 commit ecd7277
Show file tree
Hide file tree
Showing 151 changed files with 166 additions and 18,279 deletions.
15 changes: 1 addition & 14 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,6 @@ repos:
files: ^features/
entry: pylint --disable=missing-docstring,no-name-in-module
stages: [commit]
- id: pylint-quick-extras
name: "Quick PyLint on extras/*"
language: system
types: [file, python]
files: ^extras/
entry: pylint
stages: [commit]
- id: pylint-quick-tests
name: "Quick PyLint on tests/*"
language: system
Expand All @@ -80,12 +73,6 @@ repos:
pass_filenames: false
stages: [manual]
entry: pylint --disable=missing-docstring,no-name-in-module features
- id: pylint-extras
name: "PyLint on extras/*"
language: system
pass_filenames: false
stages: [manual]
entry: pylint extras
- id: pylint-tests
name: "PyLint on tests/*"
language: system
Expand All @@ -97,7 +84,7 @@ repos:
name: "Black"
language: system
pass_filenames: false
entry: python -m tools.min_version 3.6 "black kedro extras features tests"
entry: python -m tools.min_version 3.6 "black kedro features tests"
- id: legal
name: "Licence check"
language: system
Expand Down
17 changes: 17 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,23 @@
* `get_last_load_version` and `get_last_save_version` have been renamed to `resolve_load_version` and `resolve_save_version` on ``AbstractVersionedDataSet``, the results of which are cached.
* The `release()` method on datasets extending ``AbstractVersionedDataSet`` clears the cached load and save version. All custom datasets must call `super()._release()` inside `_release()`.
* Removed `KEDRO_ENV_VAR` from `kedro.context` to speed up the CLI run time. To make `kedro` work with project templates generated with earlier versions of Kedro, remove all instances of `KEDRO_ENV_VAR` from `kedro_cli.py`.
* Deleted obsoleted datasets from `kedro.io`.
* Deleted `kedro.contrib` and `extras` folders.

### Migration guide from Kedro 0.15.* to Upcoming Release
#### Migration for datasets

Since all the datasets (from `kedro.io` and `kedro.contrib.io`) were moved to `kedro/extras/datasets` you must update the type of all datasets in `<project>/conf/base/catalog.yml` file.
Here how it should be changed: `type: <SomeDataSet>` -> `type: <subfolder of kedro/extras/datasets>.<SomeDataSet>` (e.g. `type: CSVDataSet` -> `type: pandas.CSVDataSet`).

In addition, all the specific datasets like `CSVLocalDataSet`, `CSVS3DataSet` etc. were deprecated. In addition, all the specific datasets like `CSVLocalDataSet`, `CSVS3DataSet` etc. were deprecated. Instead, you must use generalized datasets like `CSVDataSet`.
E.g. `type: CSVS3DataSet` -> `type: pandas.CSVDataSet`.

Note: No changes required if you are using your custom dataset.

#### Migration for decorators, color logger, transformers etc.
Since some modules were moved to other locations you need to update import paths appropriately.
The list of moved files you can find in `0.15.6` release notes under `Files with a new location` section.

## Thanks for supporting contributions
[@foolsgold](https://github.com/foolsgold), [Mani Sarkar](https://github.com/neomatrix369), [Priyanka Shanbhag](https://github.com/priyanka1414), [Luis Blanche](https://github.com/LuisBlanche)
Expand Down
7 changes: 4 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,9 +215,10 @@
"kedro.pipeline",
"kedro.runner",
"kedro.config",
"kedro.contrib.io",
"kedro.contrib.colors.logging",
"kedro.contrib.decorators",
"kedro.extras.datasets",
"kedro.extras.logging",
"kedro.extras.decorators",
"kedro.extras.transformers",
]


Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/04_data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -592,4 +592,4 @@ io.save("ranked", ranked)
> *Note:* Saving `None` to a dataset is not allowed!

### Creating your own dataset
More specialised datasets can be found in `contrib/io`. [Creating new datasets](../03_tutorial/03_set_up_data.md#creating-custom-datasets) is the easiest way to contribute to the Kedro project.
All datasets can be found in `kedro/extras/datasets`. [Creating new datasets](../03_tutorial/03_set_up_data.md#creating-custom-datasets) is the easiest way to contribute to the Kedro project.
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/06_pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -625,7 +625,7 @@ Hello f(h(g(Python)))!
Out[9]: {}
```

Decorators can be useful for monitoring your pipeline. Kedro currently has 1 built-in decorator: `log_time`, which will log the time taken for executing your node. You can find it in `kedro.pipeline.decorators`. Other decorators can be found in `kedro.contrib.decorators`, for which you will need to install the required dependencies.
Decorators can be useful for monitoring your pipeline. Kedro currently has 1 built-in decorator: `log_time`, which will log the time taken for executing your node. You can find it in `kedro.pipeline.decorators`. Other decorators can be found in `kedro.extras.decorators`, for which you will need to install the required dependencies.

## Running pipelines with IO

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/08_advanced_io.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ Currently the following datasets support versioning:

## Partitioned dataset

These days distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you may encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro.contrib.io.pyspark.SparkDataSet) cater for such use cases, but the use of Spark is not always feasible.
These days distributed systems play an increasingly important role in ETL data pipelines. They significantly increase the processing throughput, enabling us to work with much larger volumes of input data. However, these benefits sometimes come at a cost. When dealing with the input data generated by such distributed systems, you may encounter a situation where your Kedro node needs to read the data from a directory full of uniform files of the same type (e.g. JSON, CSV, Parquet, etc.) rather than from a single file. Tools like `PySpark` and the corresponding [SparkDataSet](/kedro.extras.datasets.spark.SparkDataSet) cater for such use cases, but the use of Spark is not always feasible.

This is the reason why Kedro provides a built-in [PartitionedDataSet](/kedro.io.PartitionedDataSet), which has the following features:
1. `PartitionedDataSet` can recursively load all or specific files from a given location.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/09_pyspark.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Since `SparkSession` is a [singleton](https://python-3-patterns-idioms-test.read

Having created a `SparkSession`, you can load your data using `PySpark`'s [DataFrameReader](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader).

To do so, please use the provided [SparkDataSet](/kedro.contrib.io.pyspark.SparkDataSet):
To do so, please use the provided [SparkDataSet](/kedro.extras.datasets.spark.SparkDataSet):

### Code API

Expand Down
46 changes: 0 additions & 46 deletions docs/source/05_api_docs/kedro.contrib.io.rst

This file was deleted.

41 changes: 0 additions & 41 deletions docs/source/05_api_docs/kedro.contrib.rst

This file was deleted.

16 changes: 0 additions & 16 deletions docs/source/05_api_docs/kedro.io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,13 @@ Data Sets
:toctree:
:template: autosummary/class.rst

kedro.io.CSVLocalDataSet
kedro.io.CSVHTTPDataSet
kedro.io.CSVS3DataSet
kedro.io.HDFLocalDataSet
kedro.io.HDFS3DataSet
kedro.io.JSONLocalDataSet
kedro.io.JSONDataSet
kedro.io.LambdaDataSet
kedro.io.MemoryDataSet
kedro.io.ParquetLocalDataSet
kedro.io.PartitionedDataSet
kedro.io.IncrementalDataSet
kedro.io.PickleLocalDataSet
kedro.io.PickleS3DataSet
kedro.io.SQLTableDataSet
kedro.io.SQLQueryDataSet
kedro.io.TextLocalDataSet
kedro.io.ExcelLocalDataSet
kedro.io.CachedDataSet
kedro.io.DataCatalogWithDefault

Additional ``AbstractDataSet`` implementations can be found in ``kedro.contrib.io``.

Errors
------

Expand Down
1 change: 0 additions & 1 deletion docs/source/05_api_docs/kedro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ kedro
kedro.pipeline
kedro.runner
kedro.context
kedro.contrib
kedro.cli
kedro.versioning
kedro.extras.datasets
Expand Down
9 changes: 0 additions & 9 deletions extras/README.md

This file was deleted.

Empty file removed extras/__init__.py
Empty file.
141 changes: 0 additions & 141 deletions extras/ipython_loader.py

This file was deleted.

Loading

0 comments on commit ecd7277

Please sign in to comment.