EPIC: Refactor DeepFeatureSynthesis._build_features
#2106
Labels
enhancement
Improvement to an existing feature
refactor
Work being done to refactor code.
tech debt
additional rework caused by choosing an easy solution now
DeepFeatureSynthesis._build_features
is in need of a refactor to improve speed, maintainability, and scalability.There are many optimizations that can be made underneath this function to improve performance while maintaining API signature. As a rough benchmark, the
get_valid_primitives
function takes 2 hours to run on the retail entityset to produce a little over 5 million feature defintions. This can be optimized to be much take a much shorter time.Functions should be more granular and testable:
For example one of the most granular functions should take a datastructure which is a hashmap of features, keyed by their ColumnSchema as a single argument, and another argument which is an inputset (eg. Numeric, Boolean), and return a list of lists of all feature combinations that match this inputtype signature. This function should be pure, which would improve maintainability by being very readable and testable.
Optimizations:
Caching
Using the example above, this function could be wrapped with an LRU Cache decorator that would allow primitives that have input signatures matching other primitives to return immediately. Memory issues should be of little concern since these calculations can be perfomed using very datastructures containing logical types only and no data, but this should be measured and tested.
Data Structures
Features and primitives should be hashed by their associated logical types for faster lookup.
The text was updated successfully, but these errors were encountered: