-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Add PostgresVectorStore Memory connector. #9324
base: main
Are you sure you want to change the base?
.Net: Add PostgresVectorStore Memory connector. #9324
Conversation
Work in progress, some methods are not implemented yet.
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresConstants.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
…ctor-store-dotnet
e.g. Change type of property 'CollectionInts' from 'System.Collections.Generic.ICollection<int>?' to 'System.Collections.Generic.HashSet<int>?' for improved performance
@westey-m thanks for all the detailed feedback. I think I've addressed all the comments, and this is good for another review. |
@@ -64,4 +74,15 @@ internal static class PostgresConstants | |||
/// <summary>The name of the column that returns distance value in the database.</summary> | |||
/// <remarks>It is used in the similarity search query. Must not conflict with model property.</remarks> | |||
public const string DistanceColumnName = "sk_pg_distance"; | |||
|
|||
/// <summary>The default index kind.</summary> | |||
public const string DefaultIndexKind = IndexKind.Hnsw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also OK to use Flat here as the default. When I was suggesting using Hnsw earlier, I didn't realise that postgres can also work without, and that's a totally acceptable default.
/// <param name="options">Optional options to further configure the <see cref="IVectorStore"/>.</param> | ||
/// <param name="serviceId">An optional service id to use as the service key.</param> | ||
/// <returns>The service collection.</returns> | ||
public static IServiceCollection AddPostgresVectorStore(this IServiceCollection services, string connectionString, PostgresVectorStoreOptions? options = default, string? serviceId = default) | ||
public static IServiceCollection AddPostgresVectorStore(this IServiceCollection services, NpgsqlDataSource dataSource, PostgresVectorStoreOptions? options = default, string? serviceId = default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this overload is that useful. Where someone builds their DI container, they will not typically be able to create a NpgsqlDataSource
in a way that will get it destroyed in the right timeframe. E.g. if they use a using clause it'll just get destroyed immediately before the app really starts. If they register it with DI to allow it to handle it's lifecycle, they will use a different overload.
FYI: We typically follow a philosophy of avoiding adding extra overloads or helpers unless we have a good justification, simply because it means more maintenance and more chances of breaking changes in future. So less is more 😄
/// <param name="options">Optional options to further configure the <see cref="IVectorStoreRecordCollection{TKey, TRecord}"/>.</param> | ||
/// <param name="serviceId">An optional service id to use as the service key.</param> | ||
/// <returns>Service collection.</returns> | ||
public static IServiceCollection AddPostgresVectorStoreRecordCollection<TKey, TRecord>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, not sure we need this one.
{ | ||
yield return collection; | ||
} | ||
return this._postgresClient.GetTablesAsync(cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to also wrap any errors here in VectorStoreOperationException
public void MapFromDataToStorageModelWithNumericKeyReturnsValidStorageModel() | ||
{ | ||
// Arrange | ||
var definition = GetRecordDefinition<string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is also using a string key instead of numeric as the test name suggests.
var result = mapper.MapFromStorageToDataModel(storageModel, new() { IncludeVectors = includeVectors }); | ||
|
||
// Assert | ||
Assert.Equal("key", result.Key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, looks like this should still be changed to be numeric?
this._serviceCollection.AddSingleton<NpgsqlDataSource>(dataSource); | ||
|
||
// Act | ||
this._serviceCollection.AddPostgresVectorStoreRecordCollection<string, TestRecord>("testcollection"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numeric Key?
new VectorStoreRecordDataProperty("code", typeof(int)), | ||
new VectorStoreRecordDataProperty("rating", typeof(float?)), | ||
new VectorStoreRecordDataProperty("description", typeof(string)), | ||
new VectorStoreRecordDataProperty("parking_is_included", typeof(bool)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth having a record definition where some of the properties have storage names that are different to the main property name, to make sure these are used correctly in the command. Same with the other tests here that have to build names into the command.
{ | ||
return new VectorStoreRecordDefinition | ||
{ | ||
Properties = new List<VectorStoreRecordProperty> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add one or two enumerables to the property list as well to verify that they word as expected.
{ | ||
Properties = new List<VectorStoreRecordProperty> | ||
{ | ||
new VectorStoreRecordKeyProperty("HotelId", typeof(ulong)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this anywhere? From the constants list I don't believe we support ulong, so I would expect using this to fail.
[InlineData(typeof(long), 7L)] | ||
[InlineData(typeof(string), "key1")] | ||
[InlineData(typeof(Guid), null)] | ||
public async Task ItCanGetAndDeleteRecordAsync(Type idType, object? key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also possible to make test methods generic, e.g. public async Task ItCanGetAndDeleteRecordAsync<TKey>(Type idType, TKey? key)
, which means you can use TKey in the method and you don't have to use dynamic.
namespace SemanticKernel.IntegrationTests.Connectors.Memory.Postgres; | ||
|
||
[Collection("PostgresVectorStoreCollection")] | ||
public sealed class PostgresVectorStoreRecordCollectionTests(PostgresVectorStoreFixture fixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a test that reads some data that wasn't upserted by the Collection. E.g. if you create a table and upsert some records using SQL, and then read them using the Collection. The reason being that we want to ensure that the data in the DB is actually what we want it to be. E.g. if we do two complimentary things wrong, on upsert and read, so it looks fine when using the Collection, but the data isn't stored in the way we need it to in the DB, a test where we didn't write the data via the Collection should catch this.
This PR adds a PostgresVectorStore and related classes to Microsoft.SemanticKernel.Connectors.Postgres.
Motivation and Context
As part of the move to having memory connectors implement the new Microsoft.Extensions.VectorData.IVectorStore architecture (see https://github.com/microsoft/semantic-kernel/blob/main/docs/decisions/0050-updated-vector-store-design.md), each memory connector needs to be updated with the new architecture. This PR tackles updating the existing Microsoft.SemanticKernel.Connectors.Postgres package to include this implementation. This will supercede the PostgresMemoryStore implementation.
Some high level comments about design:
Contribution Checklist