-
What question(s) am I trying to answer? Do I think the data collected can answer that question?
-
What is the best way to phrase my question(s) as a machine learning problem?
-
Have I collected enough data to represent the problem I want to solve?
-
What features of the data did I extract, and will these enable the right predictions?
-
How will I measure success in my application?
-
How will the machine learning solution interact with other parts of my research or business product?
- Question is Sharp ?
- Data measures what you care about ?
- Data is accurate ?
- Data is connected ?
- Is there a lot of data ?
- Is this A or B ? => Classification algorithms ( two-class or multiclass clasification)
- Is this weird or not normal ? => Anomaly detection algorithms
- How much or how many ? => Regression algorithms (continuous data)
- How this is organized ? => Clustering algorithms (Which viewers like the same types of movies?)
- What should I do now ? => Reinforcement learning algorithms (Decision process and rewards system , e.g: autonomous driving)
- Define the interaction with the data science models: REST? RPC calls RabbitMQ? Kafka Message Bus?
- Plan to use notebooks (Jupyther, Azure Notebook, Google colab)
- Model retraining (Cron for the pipeline or Airflow)