Skip to content

dejanu/DataSets

Repository files navigation

Question as a starting point:

  1. What question(s) am I trying to answer? Do I think the data collected can answer that question?

  2. What is the best way to phrase my question(s) as a machine learning problem?

  3. Have I collected enough data to represent the problem I want to solve?

  4. What features of the data did I extract, and will these enable the right predictions?

  5. How will I measure success in my application?

  6. How will the machine learning solution interact with other parts of my research or business product?


Questions for Data

  1. Question is Sharp ?
  2. Data measures what you care about ?
  3. Data is accurate ?
  4. Data is connected ?
  5. Is there a lot of data ?

Question for Algorithms

https://docs.microsoft.com/en-us/azure/machine-learning/studio/data-science-for-beginners-the-5-questions-data-science-answers

  1. Is this A or B ? => Classification algorithms ( two-class or multiclass clasification)
  2. Is this weird or not normal ? => Anomaly detection algorithms
  3. How much or how many ? => Regression algorithms (continuous data)
  4. How this is organized ? => Clustering algorithms (Which viewers like the same types of movies?)
  5. What should I do now ? => Reinforcement learning algorithms (Decision process and rewards system , e.g: autonomous driving)

Model interaction

  1. Define the interaction with the data science models: REST? RPC calls RabbitMQ? Kafka Message Bus?
  2. Plan to use notebooks (Jupyther, Azure Notebook, Google colab)
  3. Model retraining (Cron for the pipeline or Airflow)

About

Data Aggregation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages