-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft cosmos data extractor #116
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
*.user | ||
*.userosscache | ||
*.sln.docstates | ||
*.DS_Store | ||
|
||
# ignore appsettings | ||
**/appsettings.development.json | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Extracting additional training data from CosmosDB | ||
|
||
An example of an item the metadata store appears as follows in our CosmosDB database: | ||
|
||
``` | ||
{ | ||
"id": "1ed0f937-3f63-4f6a-a680-1b4b4ef24fb9", | ||
"modelId": "AudioSet", | ||
"audioUri": "https://livemlaudiospecstorage.blob.core.windows.net/audiowavs/rpi_orcasound_lab_2020_09_27_21_15_03_PDT.wav", | ||
"imageUri": "https://livemlaudiospecstorage.blob.core.windows.net/spectrogramspng/rpi_orcasound_lab_2020_09_27_21_15_03_PDT.png", | ||
"reviewed": true, | ||
"timestamp": "2020-09-28T04:15:03.495901Z", | ||
"whaleFoundConfidence": 80.18461538461537, | ||
"location": { | ||
"id": "rpi_orcasound_lab", | ||
"name": "Haro Strait", | ||
"longitude": -123.2166658, | ||
"latitude": 48.5499978 | ||
}, | ||
"source_guid": "rpi_orcasound_lab", | ||
"predictions": [ | ||
{ | ||
"id": 0, | ||
"startTime": 2.5, | ||
"duration": 2.5, | ||
"confidence": 0.914 | ||
}, | ||
{ | ||
"id": 1, | ||
"startTime": 7.5, | ||
"duration": 2.5, | ||
"confidence": 0.624 | ||
}, | ||
{ | ||
"id": 2, | ||
"startTime": 12.5, | ||
"duration": 2.5, | ||
"confidence": 0.869 | ||
}, | ||
{ | ||
"id": 3, | ||
"startTime": 15, | ||
"duration": 2.5, | ||
"confidence": 0.918 | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Attached is a .NET application that allows you to create a cross-product of each observation to the predictions property, resulting in a JSON array of all the possible permutations between the relevant observation metadata and each unique prediction. | ||
|
||
The steps required to leverage this .NET application is detailed in this [link](https://learn.microsoft.com/en-us/training/paths/connect-to-azure-cosmos-db-sql-api-sdk/). It requires downloading the `Microsoft.Azure.Cosmos` package from `nuget.org`, connecting to our online account and executing the SQL query as specified in the `script.cs`. To build this application, please run `dotnet run`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
<PropertyGroup> | ||
<OutputType>Exe</OutputType> | ||
<TargetFramework>net6.0</TargetFramework> | ||
</PropertyGroup> | ||
<ItemGroup> | ||
<PackageReference Include="Microsoft.Azure.Cosmos" Version="3.22.1" /> | ||
</ItemGroup> | ||
</Project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
public class Prediction | ||
{ | ||
public string id { get; set; } | ||
public string modelId { get; set; } | ||
public string audioUri { get; set; } | ||
public string imageUri { get; set; } | ||
public bool reviewed { get; set; } | ||
public string timestamp { get; set; } | ||
public double whaleFoundConfidence { get; set; } | ||
public string location_id { get; set; } | ||
public string location_lat { get; set; } | ||
public string location_name { get; set; } | ||
public string location_long { get; set; } | ||
public string source_guid { get; set; } | ||
public string prediction_id { get; set; } | ||
public string prediction_startTime { get; set; } | ||
public string prediction_duration { get; set; } | ||
public string prediction_confidence { get; set; } | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
using System; | ||
using Microsoft.Azure.Cosmos; | ||
|
||
string endpoint = "https://aifororcasmetadatastore.documents.azure.com:443/"; | ||
string key = "[INSERT PRIMARY KEY HERE]"; | ||
|
||
CosmosClient client = new CosmosClient(endpoint, key); | ||
AccountProperties account = await client.ReadAccountAsync(); | ||
|
||
// Sanity check | ||
Console.WriteLine($"Account Name:\t{account.Id}"); | ||
|
||
// Get the database | ||
Database database = await client.CreateDatabaseIfNotExistsAsync("predictions"); | ||
Container container = await database.CreateContainerIfNotExistsAsync("metadata", "/source_guid"); | ||
|
||
string sql = "SELECT m.id, m.modelId, m.audioUri, m.imageUri, m.reviewed, m.timestamp, m.whaleFoundConfidence, m.location.id AS location_id, m.location.name AS location_name, m.location.longitude AS location_long, m.location.latitude AS location_lat, m.source_guid, p.id AS prediction_id, p.startTime AS prediction_startTime, p.duration AS prediction_duration, p.confidence AS prediction_confidence FROM metadata m JOIN p IN m.predictions"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add line breaks? :) See http://net-informations.com/q/faq/multilines.html for example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also have a dumb question: what is "p"? |
||
QueryDefinition query = new (sql); | ||
|
||
QueryRequestOptions options = new (); | ||
options.MaxItemCount = 50; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to have this set from an external flag? This would be useful if we distribute the tool as a binary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the developer is using Visual Studio, they can put the url and key in their User Secrets so that it does not go into the checked in code. Otherwise they will need to put it into appsettings.Development.json file and make sure that file does not get checked in as part of the pull request. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought it was a console application, hence my suggestion. User secrets/appsettings.Development.json are generally used for web apps. |
||
|
||
FeedIterator<Prediction> iterator = container.GetItemQueryIterator<Prediction>(query, requestOptions: options); | ||
|
||
while (iterator.HasMoreResults) | ||
{ | ||
FeedResponse<Prediction> predictions = await iterator.ReadNextAsync(); | ||
foreach (Prediction pred in predictions) | ||
{ | ||
Console.WriteLine($"[{pred.prediction_id}]\t[{pred.prediction_startTime,40}]\t[{pred.prediction_duration,10}]\t[{pred.prediction_duration,40}]\t[{pred.prediction_confidence,40}]\t"); | ||
} | ||
Console.WriteLine("Press any key for next page of results"); | ||
Console.ReadKey(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use connection string instead of endpoint and key? If so, we don't need to hardcode endpoint.
Would also be nice to expose as flag (see below comment).