-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft cosmos data extractor #116
base: main
Are you sure you want to change the base?
Conversation
string endpoint = "https://aifororcasmetadatastore.documents.azure.com:443/"; | ||
string key = "[INSERT PRIMARY KEY HERE]"; | ||
|
||
CosmosClient client = new CosmosClient(endpoint, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use connection string instead of endpoint and key? If so, we don't need to hardcode endpoint.
Would also be nice to expose as flag (see below comment).
Database database = await client.CreateDatabaseIfNotExistsAsync("predictions"); | ||
Container container = await database.CreateContainerIfNotExistsAsync("metadata", "/source_guid"); | ||
|
||
string sql = "SELECT m.id, m.modelId, m.audioUri, m.imageUri, m.reviewed, m.timestamp, m.whaleFoundConfidence, m.location.id AS location_id, m.location.name AS location_name, m.location.longitude AS location_long, m.location.latitude AS location_lat, m.source_guid, p.id AS prediction_id, p.startTime AS prediction_startTime, p.duration AS prediction_duration, p.confidence AS prediction_confidence FROM metadata m JOIN p IN m.predictions"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add line breaks? :) See http://net-informations.com/q/faq/multilines.html for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also have a dumb question: what is "p"?
QueryDefinition query = new (sql); | ||
|
||
QueryRequestOptions options = new (); | ||
options.MaxItemCount = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have this set from an external flag? This would be useful if we distribute the tool as a binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the developer is using Visual Studio, they can put the url and key in their User Secrets so that it does not go into the checked in code. Otherwise they will need to put it into appsettings.Development.json file and make sure that file does not get checked in as part of the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was a console application, hence my suggestion. User secrets/appsettings.Development.json are generally used for web apps.
@pastorep what should we do with this PR now? |
@micya what should we do with this PR now? |
@dthaler. Don't think this will affect any of the existing UI/APIs. Might be worth holding onto for future reference since is proports to export the data in a CSV format. |
@Herman-Wu @scottveirs I finally had a cycle this morning to put together an extractor to pull the metadata from Cosmos. This uses the Microsoft.Azure.Cosmos library from NuGet to connect to the Cosmos DB SQL API account. Currently, it allows users to paginate through the results, but when I have another spare cycle, I can save the results to CSV (or another output type). Please let me know if these fields capture what is needed for training (or whether I should parse it down further).
Thank you!