Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of pre-processing pipeline for CI/CD #1713

Closed
jhagege opened this issue Apr 20, 2020 · 4 comments
Closed

Serialization of pre-processing pipeline for CI/CD #1713

jhagege opened this issue Apr 20, 2020 · 4 comments
Assignees

Comments

@jhagege
Copy link

jhagege commented Apr 20, 2020

Hi, thanks for the great library.
I noticed in your examples you serialize the preprocessing pipeline.
Does it assume that the pip dependencies of the preprocessing classes need to be the exact same version ?
I'm trying to think how to package the inference workflow inside a single Dockerfile as part of a CI/CD pipeline.
How can I guarantee that I have a self-contained Docker image with the correct exact dependencies and the serialized pre-processing pipeline.

Thanks for any insights.

@ukclivecox
Copy link
Contributor

Hi @jhagege
Can you give some more details of the examples you are referring to and what you mean by "pre-processing pipeline"?

@jhagege
Copy link
Author

jhagege commented Apr 20, 2020

@cliveseldon , thanks for your quick answer.

I was referring to the following:
image from the outlier_combiner example.

I find the pattern elegant and I'm wondering how to take it one step further.
Each model created is defined by three "artifacts":

  • Source code (commit id)
  • Preprocessor pipeline (transforms)
  • Environment (python packages for example in a requirements.txt with their dependencies version).

I'd like to configure a CI pipeline to package all of those into some kind of an "uber-artifact", per model that is trained, so that it can provide an integrated environment for inference.

Thanks for any insights.

@RafalSkolasinski
Copy link
Contributor

We are not really concentrating on training.

It seems that the best approach is to have a solid reproducible preparation of artefacts / trained models (kubeflow, dvc, pachyderm, ...) and then package these into a Docker image that you can deploy with Seldon.

Check our latest addition of model metadata: https://docs.seldon.io/projects/seldon-core/en/latest/reference/apis/metadata.html that allows one to make connection to the training source of the model.

@jhagege
Copy link
Author

jhagege commented Jul 16, 2020

Thanks much, I'll review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants