sort |
---|
0 |
Your org has been maturing its data platform implemented on Azure using a combination of services like Data Factory, Datalake storage, Databricks, Synapse and Power BI delivering a modern analytics and BI experience to your business. Now you've decided to embark on your journey to grow the ML-Ops maturity in your org to embed AI to transform your business. You've made the decision to implement a cloud scale architecture backed by Azure ML.
This is where it becomes critical to acknowledge that ML Ops is not an isolated technical implementation - it is a business transformation enabled by technology. That is to say, its simply not sufficient to implement an 'MLOps pipeline', but rather approach the transformation through the lens of People, Process and Technology to deliver an implementation that comprehensively addresses "Who (People) does What (Process), Where (Technology)?".
This means the ML Ops framework you implement has a considered, practical response to operational considerations such as:
- People:
- Should we implement ML Ops to be centralised or a federated across business units and roles?
- What skills do we need to operate ML Ops? Do we need to create and recruit for additional roles?
- ...
- Process:
- How should a data scientist create a new project that includes all the necessary inclusions to onboard to MLOps
- What branching strategy should a data scientist use?
- What role does the data science team play with respect to the Cloud ops team to productionise a model?
- ...
- Technology:
- How many Azure ML Environments do we need?
- How do we select the right compute types for our workloads?
- ....
Question is - where do we get started, how do we go about implementing ML Ops using tried and tested techniques to accelerate progress without having to discover details to the above from scratch?
This repo hence aims to present a documented approach that enables you to go from zero to a reference baseline implementation drawing on our delivery experience with actual customers. This is achieved by bringing together a range of documentation, architecture design guides, IaaC templates and code acceleration artefacts from Microsoft and open source references, together as a packaged deliverable aligned to logical project stages as below. The motivation for reuse of existing artfacts (v. building from scratch), is to leverage the best of IP that exists across the ecosystem. Moreover, as the product evolves, SDK's are updated etc, this repo attempts to use Git submodules as a method to stay updated and point to the latest available references that can be applied to implementation projects.
The guided accelerator consolidates the best practice patterns, IaaC and AML code artefacts to provide reference IP to support a baseline MLOps implementation on Azure leveraging Azure ML that can be delivered in approximately 12 weeks of project scope.
This repo is designed to be consumed 'documentation led', with the relevant IaaC or implementation code artefacts linked at the appropriate sections.
Use the Table of Contents below to help you navigate to the section of repo that you are interested in based on your role or the stage of your project.
Stage | Tasks | Roles |
---|---|---|
1. MLOps Foundation | Understanding MLOps • What's DevOps? • What's MLOps? |
Everyone |
Understand Maturity Model • Determine Organization Capability Level • Culture and Key Principles |
Group Manager Team Lead Project Lead | |
Team Formation • Skills, Roles, and Responsibilities) • Deciding on (agile) Delivery Model |
Everyone | |
Deliverables • Review the Checklists |
||
2. Design | Review AML Architecture and Design Concepts | Team Lead Solution Architect |
Understanding MLOps with Azure AML | Team Lead Solution Architect | |
Make Technology Choices based on your use case and organisation's need | Team Lead Solution Architect | |
Security Control for Service Infrastructure • Use vNET Integrate & Private Link for AML |
Solution Architect Azure Infrastructure Engineer Team Lead | |
Configuring Access Control • Secure Access to AML with RBAC |
Solution Architect Azure Infrastructure Engineer Team Lead | |
Map Team Roles to RBAC • Use Custom Roles when required |
Team Lead Solution Architect | |
Infrastructure Costs Management | Solution Architect Azure Infrastructure Engineer Team Lead Administrator | |
Deliverables • Approved Solution Design • Review the Checklists |
||
3. Deploy | Accelerate Code Deployment for AML Services • Automate the Deployment of Resources • Update the Deployment Scripts to Match the approved Solution Design |
Azure Infrastructure Engineer DevOps Engineer Team Lead |
Setting up Local Environment for Development • Install Tools • Connect to AML |
Data Scientist MLOps Engineer Data Engineer | |
Organise AML Environments | MLOps Engineer DevOps Engineer | |
Creating Separate Environments (Dev, Test, Prod) | MLOps Engineer DevOps Engineer | |
Deliverables • Full Deployed Services on Azure using Automated Pipelines • Review the Checklists |
||
4. Migrate | Understanding AML Ops concepts | MLOps Engineer |
Review AML Best Practices | MLOps Engineer | |
Deliverables • Review the Checklists |
ML Ops by its very nature has many different alternatives to implementation across all aspects, particularly around the definition and implementation an operating model that takes into account the nuances of your own organisational structures, roles and processes and is fit for purpose. Hence MLOps is very much a growth journey, rather than a precise destination. Therefore this accelerator aims to offer guidance and reusable references that:
- Aims to mature from Stage 0 to partial automation required to get to Stage 2 or 3 of the MLOps maturity curve
- Can be adapted with minimal refactoring to address a wide range of common scenarios, rather than be highly prescriptive and limit its reach.
- Provides about 80% of the material that can be reused to accelerate an implementation project that for this scope above is expected to take between 10-12 weeks.
- Prioritises support for Python based ML where relevant. Azure ML continues to mature its support for R, and most code artefacts included here can be adapted to support R based models, however this is not considered in focus for the development of this accelerator.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.