-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Hack] 072-DataScienceInFabric #883
Open
juanlldc
wants to merge
253
commits into
microsoft:master
Choose a base branch
from
pradeepsingla87:xxx-DataScience_In_MicrosoftFabric
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
253 commits
Select commit
Hold shift + click to select a range
77f280b
Update Solution-00.md
juanlldc bac5e5c
Update README.md
juanlldc b6c3523
Update README.md
juanlldc 50dcb9d
Merge pull request #4 from juanlldc/patch-4
pradeepsingla87 7c6ffa4
Merge pull request #3 from juanlldc/patch-5
pradeepsingla87 afad360
Merge pull request #2 from juanlldc/patch-6
pradeepsingla87 f653bfc
Merge branch 'microsoft:master' into xxx-DataScience_In_MicrosoftFabric
lesantana da8908d
Creating directory xxx-DataScience_In_MicrosoftFabric/Coach/Solutions…
pradeepsingla87 29badf4
Committing 7 items from workspace 9aafe34c-0112-488c-ac8a-787e353a55e1
pradeepsingla87 e836f40
Committing 6 items from workspace 9aafe34c-0112-488c-ac8a-787e353a55e1
pradeepsingla87 116eb0e
Committing 4 items from workspace 9aafe34c-0112-488c-ac8a-787e353a55e1
pradeepsingla87 0effa71
Merge branch 'microsoft:master' into xxx-DataScience_In_MicrosoftFabric
pradeepsingla87 ed66544
Add files via upload
pradeepsingla87 2d12f06
Rename 03-Train-Register-HeartFailurePredictionModel (1).ipynb to 03…
pradeepsingla87 6262729
Rename 02-data-analysis-preprocess (4).ipynb to 02-data-analysis-prep…
pradeepsingla87 536c723
Update Challenge-04.md
pradeepsingla87 c79783a
Update Challenge-04.md
pradeepsingla87 09ad5c0
Update Challenge-04.md
pradeepsingla87 5e4bae3
Update Challenge-04.md
pradeepsingla87 8b72184
Update Challenge-04.md
pradeepsingla87 c9a69d4
Update Challenge-04.md
pradeepsingla87 7f371f9
Update Challenge-04.md
pradeepsingla87 4aac647
Update Challenge-04.md
pradeepsingla87 03db381
Update Challenge-04.md
pradeepsingla87 cc2fb00
Update Challenge-04.md
pradeepsingla87 96dac3f
Update Challenge-00.md
pradeepsingla87 513c465
Update Challenge-04.md
pradeepsingla87 cea02ae
Update Challenge-04.md
pradeepsingla87 4cdc187
Update Challenge-04.md
pradeepsingla87 00fb96b
Update Solution-04.md
pradeepsingla87 0216124
Update Solution-04.md
pradeepsingla87 fe9f6df
Add files via upload
pradeepsingla87 5fc2420
Update Solution-04.md
pradeepsingla87 f6117f9
Update Solution-04.md
pradeepsingla87 13a1091
Update Solution-04.md
pradeepsingla87 9da8e40
Add files via upload
pradeepsingla87 c2b88f2
Update Solution-04.md
pradeepsingla87 5f4af6e
Update Solution-04.md
pradeepsingla87 9bfa0de
Update Solution-04.md
pradeepsingla87 5054975
Add files via upload
pradeepsingla87 c76930f
Update Solution-04.md
pradeepsingla87 5532e35
Update Solution-04.md
pradeepsingla87 392ae8f
Update Solution-04.md
pradeepsingla87 3561ce7
Add files via upload
pradeepsingla87 f451759
Update Solution-04.md
pradeepsingla87 437f8f3
Add files via upload
pradeepsingla87 ef74d75
Update Solution-04.md
pradeepsingla87 777e7a4
Update Solution-04.md
pradeepsingla87 685bcdc
Add files via upload
pradeepsingla87 adec07a
Delete xxx-DataScience_In_MicrosoftFabric/Screenshot_26-6-2024_03016_…
pradeepsingla87 68961c0
Add files via upload
pradeepsingla87 cc258e7
Update Solution-04.md
pradeepsingla87 577feef
Update README.md
juanlldc 5fb84b4
Add files via upload
juanlldc 58193cd
Rename 01-Ingest-Heart-Failure-Dataset-to-Lakehouse (1).ipynb to 01-I…
juanlldc 41e1066
Rename 03-Train-Register-HeartFailurePredictionModel (1).ipynb to 03-…
juanlldc 9372227
Add files via upload
juanlldc a08f76a
Create a
juanlldc 93c5ee1
Rename xxx-DataScience_In_MicrosoftFabric/Coach/Screenshot_26-6-2024_…
juanlldc eaf1864
Rename xxx-DataScience_In_MicrosoftFabric/Coach/Screenshot_26-6-2024_…
juanlldc 2833080
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-10.png to xxx-D…
juanlldc e9275fa
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-11.png to xxx-D…
juanlldc d4d1a64
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-12.png to xxx-D…
juanlldc 79d8e8c
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-13.png to xxx-D…
juanlldc 52bacc9
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-14.png to xxx-D…
juanlldc 6d5a380
Rename xxx-DataScience_In_MicrosoftFabric/Coach/image-15.png to xxx-D…
juanlldc 0ff5d32
Update Solution-04.md
juanlldc c00676f
Update Solution-01.md
juanlldc 69dece5
Update Solution-01.md
juanlldc 407466a
Update Solution-00.md
juanlldc 16250a7
Update Solution-00.md
juanlldc a2f7bf5
Update Solution-00.md
juanlldc 0d8e902
Update Challenge-00.md
juanlldc e24bf96
Update Challenge-00.md
juanlldc f47eb33
Update Solution-00.md
juanlldc 3f76345
Update Solution-01.md
juanlldc e3f05ac
Update Solution-01.md
juanlldc ec32268
Update Challenge-01.md
juanlldc 980a458
Update Challenge-00.md
juanlldc a29adc6
Update Challenge-00.md
juanlldc 57e8a52
Update Challenge-05.md
juanlldc 6dbb9c4
Update Challenge-06.md
juanlldc 2181d27
Update Challenge-06.md
juanlldc c330e20
Update Challenge-06.md
juanlldc feac683
Update Solution-06.md
juanlldc b7cbab7
Update Solution-06.md
juanlldc 435d5b7
Update Solution-00.md
juanlldc 811ec3b
Update Challenge-00.md
juanlldc 150a49c
Update Solution-00.md
juanlldc f119ec9
Update Solution-00.md
juanlldc d2a33f1
Update Challenge-00.md
juanlldc e9ccded
Update Solution-00.md
juanlldc 9a983fe
Update Solution-01.md
juanlldc e00bebe
Update Challenge-01.md
juanlldc 098eaa7
Update Challenge-01.md
juanlldc dc9dc81
Update Challenge-01.md
juanlldc 6919854
Update Challenge-01.md
juanlldc 3efb9f8
Update Challenge-01.md
juanlldc 1c6037e
Update Challenge-01.md
juanlldc bb0fec8
Update Challenge-01.md
juanlldc 5dc1619
Update Challenge-01.md
juanlldc f43bd7e
Update Solution-03.md
juanlldc 24cc825
Update Solution-03.md
juanlldc f253ce1
Update Solution-03.md
juanlldc 67f5061
Update Solution-03.md
juanlldc 1e1b2a8
Update Solution-03.md
juanlldc a774ecc
Update Solution-01.md
juanlldc a2ac122
Update Solution-01.md
juanlldc 5824e15
Update Solution-03.md
juanlldc 59d1961
Update Solution-01.md
juanlldc 57e30bf
Update Challenge-03.md
juanlldc 8972c8d
Update Solution-04.md
juanlldc af168a6
Update Challenge-04.md
juanlldc fccca04
Update Challenge-05.md
pradeepsingla87 a17d310
Update Challenge-05.md
pradeepsingla87 606f5a0
Update Challenge-05.md
pradeepsingla87 2c188f2
Update Challenge-05.md
pradeepsingla87 28674a4
Update Challenge-05.md
pradeepsingla87 8bae485
Update Challenge-05.md
pradeepsingla87 cdad534
Update Challenge-05.md
pradeepsingla87 503ceb5
Update Challenge-05.md
pradeepsingla87 dfb0615
Update Challenge-05.md
pradeepsingla87 ba06ede
Update Solution-05.md
pradeepsingla87 04bc87f
Update Solution-05.md
pradeepsingla87 a5a9cd6
Update Solution-05.md
pradeepsingla87 d4d5e1a
Update Solution-05.md
pradeepsingla87 bfb9f16
Update Solution-05.md
pradeepsingla87 eaa5a01
Update Solution-05.md
pradeepsingla87 95cf645
Update Solution-05.md
pradeepsingla87 372bc41
Update Challenge-02.md
lesantana 961a1e9
Update Solution-01.md
juanlldc a62b733
Update Solution-01.md
juanlldc 1319473
Update Challenge-02.md
lesantana 5003f42
Update Solution-01.md
juanlldc 887aaa6
Update Challenge-02.md
lesantana d8bbf59
Update Challenge-02.md
lesantana af0497a
Update Challenge-02.md
lesantana a65ab21
Update Solution-03.md
juanlldc 487bd9c
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solutions/Notebooks/0…
juanlldc 2d1a545
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solutions/Notebooks/0…
juanlldc bcbf8b6
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solutions/Notebooks/0…
juanlldc 940a20f
Add files via upload
juanlldc bcfe236
Update Solution-02.md
lesantana 485645b
Update Solution-02.md
lesantana f3e3fc1
Update Solution-02.md
lesantana d162d42
Update Solution-05.md
juanlldc ad2677c
Update Challenge-05.md
juanlldc f613d1a
Update Challenge-05.md
juanlldc a31e339
Update Solution-06.md
juanlldc a950f42
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Photos/a
juanlldc 03cf467
Add files via upload
juanlldc 41d6c76
Rename Screenshot 2024-07-17 153448.png to postman-body.png
juanlldc b82209e
Rename Screenshot 2024-07-17 153526.png to postman-token.png
juanlldc e1ac6aa
Rename Screenshot 2024-07-17 153501.png to postman-header.png
juanlldc 95ba384
Update Solution-06.md
juanlldc 5abc2da
Update Challenge-06.md
juanlldc dcdd087
Update Challenge-06.md
juanlldc 27d1138
Update Challenge-06.md
juanlldc 9bba28a
Update Challenge-06.md
juanlldc 2ade4bc
Add files via upload
juanlldc 13433f7
Create a
juanlldc 3089bcb
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Notebooks/a
juanlldc cbdddd7
Create a
juanlldc 6488513
Add files via upload
juanlldc 01eddb3
Add files via upload
juanlldc bded490
Delete xxx-DataScience_In_MicrosoftFabric/Student/Resources/01-Ingest…
juanlldc fc8155e
Delete xxx-DataScience_In_MicrosoftFabric/Student/Resources/03-Train-…
juanlldc fba50cb
Delete xxx-DataScience_In_MicrosoftFabric/Student/Resources/04-Perfor…
juanlldc cb3d487
Delete xxx-DataScience_In_MicrosoftFabric/Coach/CoachResources.zip
juanlldc e1df1e7
Add files via upload
juanlldc 796ca9e
Add files via upload
juanlldc a77529a
Create a
juanlldc 4c242c3
Add files via upload
juanlldc 173bb75
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Notebooks/a
juanlldc 1758bd4
Delete xxx-DataScience_In_MicrosoftFabric/Student/Notebooks/a
juanlldc e785e1b
Update README.md
juanlldc fda64a5
Update README.md
juanlldc da1c24f
Update README.md
juanlldc 4df385e
Update README.md
juanlldc 02d8c5e
Add files via upload
juanlldc 4666429
Delete xxx-DataScience_In_MicrosoftFabric/Student/Resources/heart_fai…
juanlldc 8538bc1
Delete xxx-DataScience_In_MicrosoftFabric/Student/StudentResources.zip
juanlldc b93a780
Add files via upload
juanlldc 5855603
Delete xxx-DataScience_In_MicrosoftFabric/Coach/CoachResources.zip
juanlldc 16f0fe7
Add files via upload
juanlldc 5efecf7
Delete xxx-DataScience_In_MicrosoftFabric/Student/StudentResources.zip
juanlldc 19a4732
Add files via upload
juanlldc c78a781
Update README.md
juanlldc 4cc27ce
Update README.md
juanlldc 976c607
Delete xxx-DataScience_In_MicrosoftFabric/Student/Challenge-07.md
juanlldc 57c270e
Delete xxx-DataScience_In_MicrosoftFabric/Student/Challenge-08.md
juanlldc 484c1be
Update Challenge-00.md
juanlldc ef0306b
Update Challenge-01.md
juanlldc dbe431b
Update Challenge-02.md
juanlldc f5cb3f1
Update Challenge-02.md
juanlldc 6f36d1f
Update Challenge-02.md
juanlldc e597e3c
Update Challenge-02.md
juanlldc f666f9d
Update Challenge-02.md
juanlldc f6d3ea2
Update Challenge-02.md
juanlldc d4f010e
Update Challenge-05.md
juanlldc b152fbf
Update Challenge-05.md
juanlldc 1682ec0
Update Challenge-05.md
juanlldc fac3170
Update Challenge-05.md
juanlldc b8c130e
Update Challenge-05.md
juanlldc 7def066
Update Challenge-05.md
juanlldc 6e4850e
Update Challenge-05.md
juanlldc 00d75bd
Update Challenge-05.md
juanlldc 8627b18
Update Challenge-05.md
juanlldc 0ae5e23
Update Challenge-05.md
juanlldc 30f4636
Update Challenge-05.md
juanlldc 3226843
Update Challenge-06.md
juanlldc 5549de7
Update Challenge-06.md
juanlldc 0f8ad39
Update Challenge-06.md
juanlldc 93ba24f
Update Challenge-06.md
juanlldc c3d9db4
Update Challenge-06.md
juanlldc b700dac
Update Challenge-06.md
juanlldc 880d72a
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Lectures.pptx
juanlldc 6375c91
Update README.md
juanlldc 76164cf
Update README.md
juanlldc 5b87f74
Update README.md
juanlldc d718187
Update README.md
juanlldc b62907e
Update Solution-05.md
juanlldc 4121415
Update Solution-05.md
juanlldc 43c0db2
Update Solution-05.md
juanlldc ac5a036
Update Solution-06.md
juanlldc a4d841f
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solution-07.md
juanlldc 52db5cd
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solution-08.md
juanlldc 375d649
Delete xxx-DataScience_In_MicrosoftFabric/Coach/Solutions directory
juanlldc 166358c
Renaming top level folder from xxx to 072
juanlldc 2a5f543
Merge branch 'microsoft:master' into xxx-DataScience_In_MicrosoftFabric
juanlldc 4a551ef
Moving student resources
juanlldc db24fe1
Merge branch 'xxx-DataScience_In_MicrosoftFabric' of https://github.c…
juanlldc 6abe351
Moved coach solutions
juanlldc b287136
Update student setup instructions
juanlldc 9b44fb8
Updating coach setup instructions
juanlldc a72e529
Moved Images folder to root, updated references
juanlldc 41593f0
Update licensing requirements across both guides
juanlldc 2bcfd0b
Add wordlist and fix spellcheck errors
juanlldc 0d3ce76
Second spellchecker fix
juanlldc b0f78a0
Adding changes from Cameron's test run
juanlldc 050e413
Update README.md
Whowong b38e8c1
Update Solution-00.md
Whowong 1439a66
Update Solution-02.md
Whowong cc01f5e
Update Solution-03.md
Whowong 7509a10
Update Solution-04.md
Whowong a7d1bd4
Update Challenge-00.md
Whowong 97652eb
Update Challenge-04.md
Whowong e0fa2cb
Resolved issues uncovered in review by WTH team
juanlldc 20fca26
Merge branch 'xxx-DataScience_In_MicrosoftFabric' of https://github.c…
juanlldc 1b9fc6b
Fix typo in Solution-02.md
Whowong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
dataframe | ||
DataFrame | ||
DataFrame's | ||
MLFlow | ||
Leandro | ||
interpretability | ||
auc | ||
repurpose |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# What The Hack - Data Science In Microsoft Fabric | ||
|
||
## Introduction | ||
|
||
Welcome to the coach's guide for the Data Science In Microsoft Fabric What The Hack. Here you will find links to specific guidance for coaches for each of the challenges. | ||
|
||
**NOTE:** If you are a Hackathon participant, this is the answer guide. Don't cheat yourself by looking at these during the hack! Go learn something. :) | ||
|
||
## Coach's Guides | ||
|
||
- Challenge 00: **[Prerequisites - Ready, Set, GO!](./Solution-00.md)** | ||
- Configure your Fabric workspace and gather your data | ||
- Challenge 01: **[Bring your data to the OneLake](./Solution-01.md)** | ||
- Creating a shortcut to the available data | ||
- Challenge 02: **[Data preparation with Data Wrangler](./Solution-02.md)** | ||
- Clean and transform the data into a useful format while leveraging Data Wrangler | ||
- Challenge 03: **[Train and register the model](./Solution-03.md)** | ||
- Train a machine learning model with ML Flow with the help of Copilot | ||
- Challenge 04: **[Generate batch predictions](./Solution-04.md)** | ||
- Score a static dataset with the model | ||
- Challenge 05: **[Visualize predictions with a PowerBI report](./Solution-05.md)** | ||
- Visualize generated predictions by using a PowerBI report | ||
- Challenge 06: **[(Optional) Deploy the model to an AzureML real-time endpoint](./Solution-06.md)** | ||
- Deploy the trained model to an AzureML endpoint for inference | ||
|
||
## Coach Prerequisites | ||
|
||
This hack has pre-reqs that a coach is responsible for understanding and/or setting up BEFORE hosting an event. Please review the [What The Hack Hosting Guide](https://aka.ms/wthhost) for information on how to host a hack event. | ||
|
||
The guide covers the common preparation steps a coach needs to do before any What The Hack event, including how to properly configure Microsoft Teams. | ||
|
||
### Student Resources | ||
|
||
Always refer students to the [What The Hack website](https://aka.ms/wth) for the student guide: [https://aka.ms/wth](https://aka.ms/wth) | ||
|
||
**NOTE:** Students should **not** be given a link to the What The Hack repo before or during a hack. The student guide does **NOT** have any links to the Coach's guide or the What The Hack repo on GitHub. | ||
|
||
|
||
## Azure and Fabric Requirements | ||
|
||
This hack requires students to have access to Azure and Fabric. These requirements should be shared with a stakeholder in the organization that will be providing the licenses that will be used by the students. | ||
|
||
### Fabric and PowerBI licensing requirements: | ||
|
||
Each student will need access to Microsoft Fabric and be licensed to create PowerBI reports for this hack. The following are the options to complete these licensing requirements: | ||
|
||
1. **Recommended if available**: Individual [Fabric free trials](https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial#start-the-fabric-capacity-trial). This will grant users access to creating the required Fabric items as well as the PowerBI report. **If previously used, the Fabric free trial may be unavailable** | ||
2. Fabric Capacity and PowerBI Pro/Premium per user license. Each user would need their own PowerBI license but capacities could be shared and scaled up according to their needs. If running the hack on an individual basis, an F4 capacity would be adequate, and an F8 capacity would have generous compute power margin. **Alternatively, users can activate a [PowerBI Free Trial](https://learn.microsoft.com/en-us/power-bi/fundamentals/service-self-service-signup-for-power-bi) if available.** The PowerBI trial could be available even if the Fabric one is not. | ||
|
||
|
||
### Azure licensing requirements | ||
|
||
There are 2 challenges that require access to Azure: | ||
|
||
- Challenge 1: Students are required to navigate an Azure ADLS Gen 2 account through the Azure Portal to learn how to set up a Fabric shortcut to an existing file. This challenge requires each student to have contributor permissions to the resource, but 1 single storage account/directory/file could be shared among all students, given that they will not modify it but rather just access and connect to it. | ||
|
||
- Challenge 6: Students are required to have Azure AI Developer access to an Azure Machine Learning resource. Each student will need to register their own model and create their own real-time endpoint, which is why it is **recommended to individually deploy an Azure ML workspace per student**. | ||
|
||
Given these requirements, each student could have their own Azure subscription, or they could share access to a single subscription. | ||
|
||
These Azure resources can be deployed on an individual per-student basis using the `deployhack.sh` script included in the student resources folder. | ||
|
||
## Suggested Hack Agenda | ||
|
||
You may schedule this hack in any format, as long as the challenges are completed sequentially. | ||
|
||
Time estimates for each challenge: | ||
- Challenge 00: 15 minutes | ||
- Challenge 01: 30 minutes | ||
- Challenge 02: 30 minutes | ||
- Challenge 03: 45 minutes | ||
- Challenge 04: 30 minutes | ||
- Challenge 05: 30 minutes | ||
- Challenge 06: 45 minutes | ||
|
||
## Repository Contents | ||
|
||
- `./Coach` | ||
- Coach's Guide and related files | ||
- `./Coach/Solutions` | ||
- Solution files with completed example answers to challenges | ||
- `./Student` | ||
- Student's Challenge Guide | ||
- `./Student/Resources` | ||
- Student resource files, also available as a download link on Student Challenge 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Challenge 00 - Prerequisites - Ready, Set, GO! - Coach's Guide | ||
|
||
**[Home](./README.md)** - [Next Solution >](./Solution-01.md) | ||
|
||
## Introduction | ||
|
||
Thank you for participating in the Data Science in Microsoft Fabric What The Hack. Before you can hack, you will need to set up some prerequisites. | ||
|
||
## Common Prerequisites | ||
|
||
We have compiled a list of common tools and software that will come in handy to complete most What The Hack Azure-based hacks! | ||
|
||
You might not need all of them for the hack you are participating in. However, if you work with Azure on a regular basis, these are all things you should consider having in your toolbox. | ||
|
||
<!-- If you are editing this template manually, be aware that these links are only designed to work if this Markdown file is in the /xxx-HackName/Student/ folder of your hack. --> | ||
|
||
- [Azure Subscription](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-subscription) | ||
- [Postman](https://www.postman.com/downloads/) | ||
- [Managing Cloud Resources](../../000-HowToHack/WTH-Common-Prerequisites.md#managing-cloud-resources) | ||
- [Azure Portal](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-portal) | ||
- [Azure Cloud Shell](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-cloud-shell) | ||
- [Azure CLI (optional)](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-cli) | ||
- [Note for Windows Users](../../000-HowToHack/WTH-Common-Prerequisites.md#note-for-windows-users) | ||
- [Azure PowerShell CmdLets](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-powershell-cmdlets) | ||
|
||
- [Azure Storage Explorer (optional)](../../000-HowToHack/WTH-Common-Prerequisites.md#azure-storage-explorer) | ||
|
||
Additionally please refer to the [Coach Hack introduction](./README.md) for more information about licensing requirements and options | ||
## Description | ||
|
||
Now that you have the common pre-requisites installed on your workstation, there are prerequisites specific to this hack. | ||
|
||
In Challenge 0 on the student guide, students are instructed to download the Resources folder [here](https://aka.ms/FabricdsWTHResources). This folder contains the notebooks students will be working with, as well as a shell script that they will use to deploy some needed Azure resources. | ||
|
||
The [coach solution notebooks](./Solutions/) are the completed versions of the student notebooks. The solutions can be used as a guide or uploaded to Fabric to complete each Challenge. | ||
|
||
|
||
**NOTE:** The resources.zip folder also includes the heart.csv file. You can upload this data directly to the Fabric Lakehouse if you decide you want to go through this hack without needing an Azure subscription. However, this will skip half of Challenge 1 and the important concept of using shortcuts in Fabric. If you are going to be setting up the Azure resources and using the shortcut, ignore the heart.csv file. | ||
|
||
To begin setting up your Azure subscription for this hack, you will run a bash script that will deploy and configure a list of resources. You can find this script as the `HackSetup.sh` file in the resources folder. | ||
- Download the setup file to your computer | ||
- Go to the Azure portal and click on the cloud shell button on the top navigation bar, to the right of the Copilot button. | ||
- **NOTE**: This script has been designed for the Azure CLI. It might fail to deploy if you attempt to run it from a local terminal. | ||
- Once the cloud shell connects, make sure you are using a Bash shell. If you are not, click on the button on the top-right corner of the cloud shell to switch to bash. | ||
- Click on the Manage Files button on the shell's navigation bar and select upload. Select the setup file from your computer. | ||
- Run the `sh HackSetup.sh` command in your cloud shell. | ||
- Follow the prompts in the shell. | ||
|
||
After setting up your Azure resources, head to [Microsoft Fabric](https://fabric.microsoft.com/). | ||
- Create a new workspace by clicking on 'Workspaces' in the vertical menu on the left side of the screen. Use the 'New Workspace' button at the bottom of the list. | ||
- Once you are inside your new workspace, select the Data Science experience using the button on the bottom left corner of the screen. | ||
- At the top of the Data Science experience menu, check that you are still in the new workspace and select 'Import Notebook' from the top row of options. | ||
- Follow the prompts to upload the 4 notebook (`.ipynb`) files contained within the resources folder. | ||
|
||
|
||
## Success Criteria | ||
|
||
To complete this challenge successfully, you should be able to: | ||
|
||
- Verify that you have a storage account with the heart.csv data in a container | ||
- Verify that you have a Fabric workspace where your 4 notebooks are available | ||
- (Optional) Verify that your Azure ML workspace has correctly deployed (if completing Challenge) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Challenge 01 - Bring your data to the OneLake - Coach's Guide | ||
|
||
[< Previous Solution](./Solution-00.md) - **[Home](./README.md)** - [Next Solution >](./Solution-02.md) | ||
|
||
## Notes & Guidance | ||
|
||
In this challenge, hack participants must create a shortcut to the folder deployed in their Azure subscription on Challenge 0. This will allow them to use the data in Fabric without the need for replication. Once the shortcut is completed, participants will open Notebook 1 to load the csv file into a delta table for further modification on Notebook 2. | ||
|
||
### Sections | ||
|
||
1. Create a Lakehouse (non-notebook) | ||
2. Create a Shortcut (non-notebook) | ||
3. Read the .csv file into a dataframe in the notebook (Notebook 1) | ||
4. Write the dataframe to the lakehouse as a delta table (Notebook 1) | ||
|
||
### Student step-by-step instructions (creating a shortcut) | ||
- Creating a Lakehouse: | ||
- Participants must create a lakehouse on the Fabric workspace they previously set up. In Fabric, navigate to the workspace. | ||
- On the top left of the screen, select new and more options. | ||
- On the data engineering section, select Lakehouse. Give the lakehouse a unique name and click on create. | ||
|
||
- Creating a Shortcut: | ||
- On the Lakehouse navigator, use the left hand-side menu and click on the 3 dots (...) next to files. Click on "New shortcut" | ||
- On the shortcut wizard, click on "Azure Data Lake Storage Gen2" | ||
- Go to your Azure portal. The URL can be found on the **Settings>Endpoint** side menu of the Storage Account. In this menu, you will see a variety of endpoint Resource IDs and URLs. Find and copy the **data lake storage URL** from the list. Enter it into the wizard in Fabric. | ||
- Create a new connection, give it a name and select "Account Key" as the authentication kind. | ||
- Go back to your Azure portal. The Account Key can be found on in the **Security + Networking>Access keys** side menu of the Storage Account. Show and copy one of the keys. Enter it into the wizard in Fabric. | ||
- Click on next to access the file explorer. Wait for the screen to load. | ||
- On the side menu, expand the file-system folder. Select the check mark next to the "files" folder. | ||
- Click next to move to the next screen, then click on create to create the shortcut. | ||
- Verify that your shortcut is showing under the **Files** folder of the lakehouse navigator. You might need to click on the 3 dots and on refresh if your shortcut is not present initially. | ||
|
||
### Overview of student directions (running Notebook 1) | ||
- This section of the challenge is notebook based. All the instructions and links required for participants to successfully complete this section can be found on Notebook 1 in the `student/resources.zip/notebooks` folder. | ||
- To run the notebook, go to your Fabric workspace and select Notebook 1. Ensure that it is correctly attached to the lakehouse. You might need to connect to the lakehouse you previously created on the left-hand side file explorer menu. | ||
- The students must follow the instructions, leverage the documentation and complete the code cells sequentially. | ||
|
||
### Coaches' guidance | ||
- This challenge has 2 main sections, creating a shortcut and loading the files into delta tables. The first section must be completed before working on Notebook 1. | ||
- The full version of Notebook 1, with all code cells filled in, can be found for reference in the `coach/solutions` folder of this GitHub. | ||
- The aim of this challenge, as noted in the student guide, is to understand lakehouses, shortcuts and the delta format. | ||
- To assist students, coaches can clear up doubts regarding the Python syntax or how to get started with notebooks, but students should focus on learning how to set up shortcuts, navigate the Fabric UI and read/write to the delta lake. | ||
|
||
## Success criteria | ||
- The heart.csv data is now saved as a delta table on the lakehouse |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Challenge 02 - Data preparation with Data Wrangler - Coach's Guide | ||
|
||
[< Previous Solution](./Solution-01.md) - **[Home](./README.md)** - [Next Solution >](./Solution-03.md) | ||
|
||
## Notes & Guidance | ||
|
||
In this challenge, hack participants must use Data Wrangler to prepare the heart dataset for model training. The purpose is to focus on transforming and preparing the data for the next challenges. They will have the flexibility to either write code in a notebook or leverage Data Wrangler’s intuitive interface to streamline the pre-processing tasks. | ||
|
||
### Sections | ||
|
||
1. Read the .csv file into a pandas dataframe in the notebook. (Notebook 2) | ||
2. Launch the Data Wrangler and interact with the data cleaning operations (Notebook 2) | ||
3. Apply the operations using python codes (Notebook 2) | ||
4. Develop feature engineering using spark. (Notebook 2) | ||
5. Write the dataframe to the lakehouse as a delta table. (Notebook 2) | ||
|
||
### Student step-by-step instructions | ||
- Launching Data Wrangler: | ||
- Participants must create a pandas dataframe the fabric notebook. It’s necessary to complete the first cell in notebook 2. | ||
- Once executed, under the notebook ribbon Home tab, select Launch Data Wrangler. You'll see a list of activated pandas DataFrames available for editing. | ||
- Select the DataFrame you just created in last cell and open in Data Wrangler. From the Pandas dataframe list, select `df`. | ||
|
||
|
||
- Data Cleaning Operations – (Data Wrangler) | ||
- *Removing Unnecessary Columns* | ||
- On the *Operations* panel, expand *Schema* and select *Drop columns*. | ||
- Select `RowNumber`. This column will appear in red in the preview, to show they're changed by the code (in this case, dropped.) | ||
- Select **Apply**, a new step is created in the **Cleaning steps panel** on the bottom left. | ||
|
||
- *Dropping Missing Values* | ||
- On the **Operations** panel, select **Find and replace**, and then select **Drop missing values**. | ||
- Select the `RestingBP`, `Cholesterol` and `FastingBS` columns. Those are the columns that are pointed as having missing values on the right-hand side menu of the screen. | ||
- Select **Apply**, a new step is created in the **Cleaning steps panel** on the bottom left. | ||
|
||
- *Dropping Duplicate Rows* | ||
- On the **Operations** panel, select **Find and replace**, and then select **Drop duplicate rows**. | ||
- Select **Apply**, a new step is created in the **Cleaning steps panel** on the bottom left. | ||
|
||
- Feature Engineering - (Notebook) | ||
- This part is notebook based. Participants will work in cells 09, 10, and 11 to transform categorical values into numerical labels. | ||
- You can also explore how to one-hot encode the categorical columns with Data Wrangler. However, this will not create labels in your existing columns, but rather a new column for each category with True and False values. Using this alternative format might need some modification to the code in the model training process. Please discuss this possibility with hack attendees to raise awareness of this Data Wrangler feature. | ||
|
||
### Overview of student directions (running Notebook 2) | ||
- This section of the challenge is notebook based. All the instructions and links required for participants to successfully complete this section can be found on Notebook 2 in the `student/resources.zip/notebooks` folder. | ||
- To run the notebook, go to your Fabric workspace and select Notebook 2. Ensure that it is correctly attached to the lakehouse. You might need to connect to the lakehouse you previously created on the left-hand side file explorer menu. | ||
- The students must follow the instructions, leverage the documentation and complete the code cells sequentially. | ||
|
||
### Coaches' guidance | ||
|
||
- This challenge has 3 main sections, Data Wrangler operations, feature engineering and saving processed data to a delta table. | ||
- The full version of Notebook 2, with all code cells filled in, can be found for reference in the `coach/solutions` folder of this GitHub. | ||
- The aim of this challenge, as noted in the student guide, is to understand data preparation using data wrangler and fabric notebooks. | ||
- To assist students, coaches can clear up doubts regarding the Python syntax or how to get started with notebooks, but students should focus on learning how to operate data wrangler, navigate the Fabric UI, code in notebooks and read/write to the delta lake. | ||
|
||
|
||
## Success criteria | ||
- The heart dataset totally shaped, cleaned and prepared for the model training. | ||
- No data duplicated or exceeded columns. | ||
- No missing values. | ||
- No categorical values. | ||
|
||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource folder may work for internal employees but will not work when someone external wants to download the content. In the meantime, you can leverage the standard boilerplate text that the coaches need to provide the zip, or we can discuss other options you may have.