Provides a basic directory structure and template files for setting up a DataLoader using the ETL methodology.
pip install git+https://gitlab.com/jayemar/etl.git
from etl.dataloader import DataLoader
dl = DataLoader()
train_gen = dl.retrieve_data(<ml_cfg>)
test_gen = dl.get_test_data()
valid_gen = dl.get_validation_data()
The config file can be in either JSON or YAML format. Fields are optional unless otherwise stated.
- data_dir: directory where data is located; path can be absolute or relative to directory of task.py
- batch_size: number of records per batch
- epochs: number of epochs to run through during training
- train_size: decimal ratio of training data
- test_size: decimal ratio of test data
- valid_size: decimal ratio of validation data