Microsoft Certified: Azure Data Scientist Associate (DP-100) Study Guide
The Microsoft Certified: Azure Data Scientist Associate certification enables you to assess and enhance your knowledge and experience in data science and Machine Learning on Azure. This is an intermediate level certification, so I would recommend you to set aside some time to study for the exam and practice using some of the services related to this exam - ‘Practice makes perfect’
In this study guide, I will share with you some of the useful resources you can use to guide you during your learning journey to get this certification.
Certification Path
Exam Name | Link |
---|---|
Exam DP-100: Designing and Implementing a Data Science Solution on Azure | Exam Details |
Exam DP-100: Designing and Implementing a Data Science Solution on Azure
First place to go is the Microsoft Learn platform where a dedicated learning path is available, for free. Also, you should have a look to the Resources section in this study guide where you have useful resources to help you consolidate the knowledge that will help you get the exam and certification. If you prefer to watch videos, instead of read, explaining these core concepts and showing how to get prepared to the exam, then I invite you to have a look at the Microsoft Exam DP-100 : Designing and Implementing a Data Science Solution on Azure, available on Pluralsight.
Skills measured
Manage Azure resources for machine learning (25-30%)
Create an Azure Machine Learning workspace
- create an Azure Machine Learning workspace
- configure workspace settings
- manage a workspace by using Azure Machine Learning studio
Manage data in an Azure Machine Learning workspace
Manage compute for experiments in Azure Machine Learning
- determine the appropriate compute specifications for a training workload
- create compute targets for experiments and training
- configure Attached Compute resources including Azure Databricks
- monitor compute utilization
Implement security and access control in Azure Machine Learning
- determine access requirements and map requirements to built-in roles
- create custom roles
- manage role membership
- manage credentials by using Azure Key Vault
Set up an Azure Machine Learning development environment
- create compute instances
- share compute instances
- access Azure Machine Learning workspaces from other development environments
Set up an Azure Databricks workspace
- create an Azure Databricks workspace
- create an Azure Databricks cluster
- create and run notebooks in Azure Databricks
- link and Azure Databricks workspace to an Azure Machine Learning workspace
Run experiments and train models (20-25%)
Create models by using the Azure Machine Learning designer
- create a training pipeline by using Azure Machine Learning designer
- ingest data in a designer pipeline
- use designer modules to define a pipeline data flow
- use custom code modules in designer
Run model training scripts
- create and run an experiment by using the Azure Machine Learning SDK
- configure run settings for a script
- consume data from a dataset in an experiment by using the Azure Machine Learning SDK
- run a training script on Azure Databricks compute
- run code to train a model in an Azure Databricks notebook
Generate metrics from an experiment run
- log metrics from an experiment run
- retrieve and view experiment outputs
- use logs to troubleshoot experiment run errors
- use MLflow to track experiments
- track experiments running in Azure Databricks
Use Automated Machine Learning to create optimal models
- use the Automated ML interface in Azure Machine Learning studio
- use Automated ML from the Azure Machine Learning SDK
- select pre-processing options
- select the algorithms to be searched
- define a primary metric
- get data for an Automated ML run
- retrieve the best model
Tune hyperparameters with Azure Machine Learning
- select a sampling method
- define the search space
- define the primary metric
- define early termination options
- find the model that has optimal hyperparameter values
Deploy and operationalize machine learning solutions (35-40%)
Select compute for model deployment
Deploy a model as a service
- configure deployment settings
- deploy a registered model
- deploy a model trained in Azure Databricks to an Azure Machine Learning endpoint
- consume a deployed service
- troubleshoot deployment container issues
Manage models in Azure Machine Learning
Create an Azure Machine Learning pipeline for batch inferencing
- configure a ParallelRunStep
- configure compute for a batch inferencing pipeline
- publish a batch inferencing pipeline
- run a batch inferencing pipeline and obtain outputs
- obtain outputs from a ParallelRunStep
Publish an Azure Machine Learning designer pipeline as a web service
Implement pipelines by using the Azure Machine Learning SDK
Apply ML Ops practices
- trigger an Azure Machine Learning pipeline from Azure DevOps
- automate model retraining based on new data additions or data changes
- refactor notebooks into scripts
- implement source control for scripts
Implement responsible machine learning (5-10%)
Use model explainers to interpret models
- select a model interpreter
- generate feature importance data
Describe fairness considerations for models
Describe privacy considerations for data
- describe principles of differential privacy
- specify acceptable levels of noise in data and the effects on privacy
Additional resources
Below follows the list of additional resources that you should consider and a quick note to the Microsoft Learn collection shared there. I tried to extend the learning paths you have available on the exam’s page with some extra modules that I consider relevant to the exam.
Best of Luck and share your results with the community once you get certified! 😊💪