-
template: includes all the python module files to adapt to your data to build the ML trainer.
-
examples: includes two examples, classification and regression, both on synthetic data. The examples show how the template is adapted given a dataset. In addition, each example includes a python script to perform prediction (inference) via invoking a deployed model's API.
-
scripts: includes scripts to 1) train the model locally, 2) train the model on Cloud ML Engine, and 3) deploy the model on GCP as well as to make prediction (inference) using the deployed model.
File Name | Purpose | Do You Need to Change? |
---|---|---|
metadata.py | Defines: 1) Task type, 2) input data header, 2) numeric and categorical feature names, 4) target feature name, and 5) unused feature names | Yes, as you will need to specify the metadata of your dataset |
featurizer.py | 1) Creates tensorflow feature_column definitions based on the metadata of the features. 2) Creates deep and wide feature column lists. | Maybe, if you want to change how deep and wide columns are defined (see next section). |
input.py | Generates a (scalable) data input function for training or evaluation from sharded files, using file name queue, so that entire data is not loaded in memory. | Probably No, unless you want to implement a data input from a different source. |
parsers.py | Includes functions to parse data from text files into tensors with the proper data types (based on the default values in the metadata). | Probably No, unless you want to parse data files in different formats (e.g. xml, json, etc.). |
preprocess.py | Use to 1) define additional feature columns, such as bucketized_column and crossed_column, and 2) to implement custom feature engineering logic, e.g. polynomial expansion. | Probably Yes, in order to implement you own feature engineering logic, unless your input data includes all the features, along with the engineered ones. |
model.py | Includes functions to create DNNLinearCombinedRegressor and DNNLinearCombinedClassifier, based on the hyper-parameters in the parameters.py module. | Probably No, unless you want to change something in the estimator, e.g., activation functions, optimizers, etc. |
experiment.py | Defines evaluation metric and creates experiment function. | Probably No, unless you want to change the evaluation metric. |
serving.py | Includes serving functions that accepts CSV, JSON, and TF Example instances. | No |
parameters.py | Includes the function to parse and initialize the arguments, as well as maintaining the hyper-parameters (hparam object). | Probably No, unless you want to change/add parameters (e.g. for feature engineering). |
task.py | Entry point to the trainer, as it includes the main function that runs the experiment. | No |
-
numeric_olumns → dense_columns (int and float features)
-
categorical_columns_with_vocabolary_list & bucketized_columns → categorical_columns (low-cardinality categorical features)
-
categorical_columns_with_hash_buckets & crossed_columns → sparse_columns (high-cardinality categorical features)
-
categorical_columns → indicator_columns (one-hot encoding)
-
sparse_columns → embedding_columns (dimensionality reduction w.r.t. embedding_size)
-
deep_columns = dense_columns + indicator_columns + embedding_columns
-
wide_columns = categorical_columns + sparse_columns