Skip to content

My detailed experience of taking Amazon's Machine Learning Specialty exam

Notifications You must be signed in to change notification settings

simonguenther/AWS_Machine_Learning_Specialty_MLS-C01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

AWS Machine Learning Speciality Certification (MLS-C01)

Passed the MLS-C01 certification in July 2020 with 941/1000. There is not much detailed info on the exam compared to other popular AWS certifications, so I want to give as detailed information as possible. Everybody who is looking into this certification will have a better idea what he can expect from the exam.

There is no gold standard for preparing for MLS-C01. Compared to other popular certifications there are only very few resources to be used aside from Amazon's official papers and videos (which are very good!)

I made use of free trials on LinuxAcademy and ACloudGuru to go over their MLS-C01 preparation course. I'd estimate both courses are overlapping for at least 80% of the topics. Most important takeaway from those two courses: Neither of them will get you the certification. Both give a good overview of topics, but they just scratch the surface and you will need to go much deeper into almost all topics in order to have a shot at passing the certification. Don't be fooled by the practice exams at the end! The questions are not as detailed as the real questions will be.

What I already knew:

To put everything what I am going to say about the exam into perspective, here is the stuff that I already knew before going after this certification:

  • I have been dabbling in Data Engineering/Data Science for a couple years as an autodidact. At the beginning of the year I decided to participate in a 3-month fulltime Data Science-bootcamp at neuefische.de which consolidated and broadend my knowledge.
  • Already have Amazon's Cloud Solutions Architect Associate (SAA-C02) under my belt. This helps a lot when dismissing answers in AWS Services related questions.
  • 15+ years experience with various programming languages, database systems, development patterns and IT best practices.

What I used as preparation for the exam:

  • LinuxAcademy MLS course (good overview)
  • ACloudGuru MLS course (good overview)
  • AWS SageMaker Documentation (that's where I spent most of my time)
  • Sagemaker DeepDive YouTube Playlist (very good! shows a lot of dry stuff from the docs in action)
  • Bunch of SageMaker presentations from re:invent (mainly 3xx/4xx)
  • AWS Machine Learning Blog (imo most of the scenario based questions in the exam are coming from case studies from the blog)
  • Book: Hands-On Machine Learning by Geron (very good ML book overall, mainly used to refresh my memory in some areas)

What I used to test my knowledge:

  • Udemy Practice Exam by Abhishek Singh (very, very good questions which come pretty close to the style of the real exam questions)
  • Udemy Practice Exam by Frank Kane (imo a lot weaker (not as close to the exam questions) exam than the one from Singh, still worth the money to find blank spots)
  • AWS MLS Practice exam (40$ for 20 exam questions - would spent the money on it (had a voucher from passing SAA-C02))

What I was tested on during the exam:

  • no questions about hyperparameter, input types, parallelization of built-in algorithms
  • LOTS of questions regarding preprocessing of datasets
    • dropping/imputation, oversampling
    • dealing with skewed datasets (log-transform, binning, etc)
    • what to do with correlating/depending features in linear regression
    • how to scale and split a dataset correctly (split then scale training and fit test/validation vs scale all and split afterwards, etc)
    • mitigation of high/low correlation in datasets with lots of raw features
    • what to look for in features (high correlation vs low correlation, etc)
  • lots of questions about dealing with over- and underfitting in general and specifically in neural nets
    • dropout, early stopping, decrease number of hidden layers,... in all variations and scenarios
    • regularization (L1 vs L2)
  • evaluation metrics
    • trick question with switching positive/negative observations so you have to adjust to that
    • business implications of misclassification (FN more/less impact on cost of business, etc)
    • calculate accuracy and precision
    • interpret 3x3 confusion matrix
  • visualization
    • best visualization types for various situations
    • visualization for correlation of features (scatter plots)
  • custom algorithms
    • docker container (which services are used ECR? ECS? both? S3?)
    • process of deploying an algorithm in a custom docker container
    • docker related questions about entrypoints, paths (/opt/ml,...)
    • transfer learning
  • hyperparamter optimization
    • xgBoost init statement - which hyperparameter to optimize when overfitting
    • neural net - learning rate/batch size tuning
  • scaling/load balancing
    • Endpoint Configuration calculate InvokePerInstance based on given numbers
    • TensorFlow scaling with horovod
    • multiple questions with IoT devices (and managing endpoints) vs using Neo
  • algorithm choices
    • business scenarios, which algo to use
      • regression scenario
      • recommendation scenario
      • binary classification
    • anomaly detection scenario - which algorithm to use
  • chaining of AWS Services (most of them regarding ETL)
    • scenarios where you should chain services/algorithms as solutions (transcribe, translate,..)
    • classic ETL questions: Glue vs Data Pipeline vs Kinesis (in combination with Lambda, Elasticsearch,...)
    • EMR related questions (PySpark integrated solutions, "EMR legacy solution" inclusion, ...)
  • SageMaker Security
    • company has certain standards regarding tags, instance-types - how can this be accomplished? (aws service catalog vs python script vs cloudformation script vs ...) generic question
    • optimized filetypes for Athena
    • Normal vs Poisson-Distribution
    • Baysian Network/Naive Bayes/Pearson co-effcient
    • Classification Scenario: Which algorithm to use ? (classic SVM RBF Kernel plot - probably all you need to know about SVM)
    • Question regarding activiation function of NN in certain scenario (Softmax vs ReLu vs ...)

What I did not expect to see in the exam (but it was mentioned/given as an option):

  • AWS Service Catalog
  • AWS Connect
  • AWS Alexa Business

What makes this exam much different to SAA-C02:

The range of level of detail across the questions is a lot wider in MLS-C01. There can be an ETL question were answers rely on knowing supported input/output filetypes of various AWS Services. Other questions have very broad answers like "use kinesis and store it in s3".

What you should do to prepare for the certification:

If you are going for this certification I encourage you to spend a lot of time in SageMaker itself. Do not just click "Run all cells" in a example notebook. Built models yourself, launch endpoints, find other ways to deploy the same stuff (by using docker containers f.e.). Have as much hands-on experience as possible!

About

My detailed experience of taking Amazon's Machine Learning Specialty exam

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published