ARCHIVED: This was originally built before Kedro hooks were established, and I believe the functionality can (more easily) be added by just installing pandas-profiling and utilizing it with hooks against the necessary datasets, thus making an additional package a bit redundant.
This is a Kedro plugin that uses Pandas-Profiling to profile datasets.
It can be installed via PyPI.
pip install kedro-pandas-profiling
You simply proceed with a Kedro project as normal.
Once the data catalog is set up, you can run:
kedro profile #this returns the list of things in the catalog
kedro profile -n #with the name of the dataset
Kedro profile with no arguments returns the results of your catalog, and from that you can append a name of a dataset to profile. This current iteration only supports .csv and .xlsx files.
Sample output based on the company dataset from the Kedro tutorial.
Kedro-Pandas-Profiling is licensed under the Apache 2.0 License.