[Offline Pipeline] Static sources / one-time execution #14

sarahwooders · 2021-12-01T21:49:40Z

The offline pipeline is built for offline development of features for experimentation and model training. The offline pipeline works with a static data source and writes the resulting features to either a CSV or an external DB connection.

The offline pipeline should be easily transferable to the online setting by changing the source type to be dynamic (e.g. a kafka stream). The offline pipeline will complete execution once all the data is read and processed.

Example

# define connectors 
source_conn = TableConnector(hive_connection, historical=True)
embedding_conn = TableConnector(redis_connection, historical=False) 

# offline pipeline requires that all sources are static
pipeline = Ralf(offline=True) 
source = ralf.csv_source(filename="data.csv", static=True, connector=source_conn) 
user_clicks = source.groupby(key="user") 
embedding = user_clicks.map(MyOperator, args=(...), connector=embedding_conn)
pipeline.run()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Offline Pipeline] Static sources / one-time execution #14

[Offline Pipeline] Static sources / one-time execution #14

sarahwooders commented Dec 1, 2021 •

edited

Loading

[Offline Pipeline] Static sources / one-time execution #14

[Offline Pipeline] Static sources / one-time execution #14

Comments

sarahwooders commented Dec 1, 2021 • edited Loading

Example

sarahwooders commented Dec 1, 2021 •

edited

Loading