On 29 October 2024, Microsoft unveiled the Microsoft.Extensions.VectorData.Abstractions library, enhancing .NET integration with the AI Semantic Kernel SDK and simplifying operations for developers.

On 29 October 2024, Microsoft made strides in the realm of artificial intelligence and developer technology with the preview release of the Microsoft.Extensions.VectorData.Abstractions library for .NET. This new library is designed to simplify the integration of .NET solutions with the AI Semantic Kernel SDK by providing abstractions over various AI implementations and models.

Microsoft’s Semantic Kernel is an SDK aimed at enterprises that enables developers to integrate different Large Language Models (LLMs) and languages. It boasts automatic orchestrations for these plugins, enhancing flexibility and functionality for developers working within various coding environments. This release is part of a series of tools developed through a collaborative effort between the Semantic Kernel and .NET teams at Microsoft. Previously, Microsoft released the Microsoft.Extensions.AI library, which abstracts common AI services such as chat clients.

The latest addition, Microsoft.Extensions.VectorData.Abstractions, is focused on the simplification of vector stores used for LLM embeddings. In artificial intelligence, an embedding is a method of representing data records within a high-dimensional vector space, which enables the conversion of discrete data into a format suitable for processing by LLM neural networks. Through this approach, records that are semantically similar are positioned closer together in the vector space, allowing for more nuanced semantic search capabilities rather than relying solely on text matching.

The Microsoft.Extensions.VectorData.Abstractions library offers support for Create, Read, Update, Delete (CRUD), and search operations. Developers can work with .NET Plain Old CLR Objects (POCO) annotated with vector attributes such as VectorStoreRecordKey, VectorStoreRecordData, and VectorStoreRecordVector. An illustrative example is a Movie class, where each movie instance includes a key, title, description, and a computed vector property that encapsulates the embedding of the record in the LLM with 384 dimensions, using a cosine-similarity distance function.

In practice, the abstraction library utilises IEmbeddingGenerator and IVectorStore interfaces for storage and embedding. Semantic Kernel provides the in-memory vector store, while an embedding generator like Ollama pre-made LLM package running locally on the developer’s machine uses models such as all-minilm small for embedding. This setup is depicted in the code as:

csharp
IEmbeddingGenerator<string, Embedding<float>> generator = new OllamaEmbeddingGenerator(new Uri("http://localhost:11434/"), "all-minilm");

The embedding process involves generating a vector from a movie’s description using IEmbeddingGenerator.GenerateEmbeddingVectorAsync, after which the record is inserted into the vector store:

csharp
movie.Vector = await generator.GenerateEmbeddingVectorAsync(movie.Description); await movies.UpsertAsync(movie);

Additionally, querying the embedded records is performed by embedding query text into vectors using the same interface. For instance, a query for “A family friendly movie” is constructed as:

csharp
var queryEmbedding = await generator.GenerateEmbeddingVectorAsync(query);

The VectorizedSearchAsync method of the vector data store interface then finds the records most similar to the given query prompt:

csharp
var searchOptions = new VectorSearchOptions() { Top = 1, VectorPropertyName = "Vector" }; var results = await movies.VectorizedSearchAsync(queryEmbedding, searchOptions);

Microsoft provides comprehensive code examples on their blog and the Semantic Kernel learning site to guide developers through these processes. A significant application of the vector store abstraction library is extending LLMs with custom data stores using retrieval-augmented generation (RAG). This technique allows LLMs to query specific knowledge bases without retraining models, and a full example of vector store RAG is available.

Currently a preview, the library is expected to remain in this phase until the release of .NET 9. Developers can engage with Microsoft’s development team by submitting feedback through the GitHub repository issue list.

Looking ahead, Microsoft aims to:

  • Enhance collaboration with the Semantic Kernel to introduce more streamlined experiences in RAG scenarios.
  • Partner with vector store collaborators to integrate Microsoft.Extensions.VectorData into a broader .NET ecosystem.

Source: Noah Wire Services

More on this & sources

Share.
Leave A Reply

Exit mobile version