Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example using Seldon for text classification with SpaCy tokenizer #578

Merged
merged 1 commit into from
May 21, 2019

Conversation

axsaucedo
Copy link
Contributor

Overview

This PR contains an example for a Python logistic regression model that is used to automate the moderation of reddit comments. It's intended to show how Seldon can be used for
text processing use-cases, in this case specifically focused on text classification. It also shows how it's possible to take advantage of the SpaCy tokenizer, which is currently a very popular and useful tool used in production in many NLP projects.

Notebook

The notebook can be previewed here: https://github.com/axsauze/seldon-core/blob/sklearn_spacy_text_example/examples/models/sklearn_spacy_text/sklearn_spacy_text_classifier_example.ipynb

Contents

  • Download reddit comment dataset
  • Text classification model with:
    • Text cleaner pre-processor
    • SpaCy tokenizer
    • TFIDF vectorizer
    • Logistic Regression model
  • Docker example
  • Kubernetes example

@jklaise jklaise self-requested a review May 21, 2019 09:12
Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@axsaucedo axsaucedo merged commit c62dbfa into SeldonIO:master May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants