Semantic Kernel for .NET: a quick look
With Semantic Kernel, we are able to orchestrate or rig up an object that handles calls to popular API services for interacting with Large language models and Small Language Models from OpenAI, Microsoft, Mistral* and Google* (*marked for future releases according to Microsoft at Build 2024).
Semantic Kernel with GPT4o and .NET
In the previous blog, I wrote a small page to fetch and render images on a web page, but they did not have any caption description.
Lets take a look at how we can write descriptive captions with the help of GPT4o vision capabilities and of course text generation capabilities within the context of Semantic Kernel. With Semantic Kernel, we are provided with the scaffolding for inserting user messages as builder pattern methods. The system message and the chat history are also easily available to us to make it simple to add user messages and chatbot messages into the history to maintain context where we need to continuously chat.
Add Semantic Kernel to your project via Nuget:
dotnet add package Microsoft.SemanticKernel --version 1.15.0
Then scaffold your Kernel as the following (in this instance, the Kernel will have an OpenAI-specific Chat Completion service attached to it for general chat; there are Azure-specific ones too):
GPT4o Vision abilities
We can then arrange our data as the following for Vision capabilities and text generation from our AI model, sending our prompt and image Url on each request via Semantic Kernel, to get a text response:
public async static Task Main(string[] args)
{
List<string> list = new List<string>()
{
//these are mostly art images of red coloured forests "https://i.pinimg.com/originals/5e/86/0e/5e860e89c4460f0be1a572fc7461fbd6.jpg",
"https://img.freepik.com/premium-photo/red-forest-wallpapers-android-iphone-red-forest-wallpaper_759095-18370.jpg",
"https://img.freepik.com/premium-photo/red-forest-with-river-trees-background_915071-1886.jpg",
"https://images.fineartamerica.com/images-medium-large-5/red-forest-tree-landscape-autumn-ben-robson-hull-photography.jpg"
};
var kernel = GetSemanticKernel();
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
foreach (string fileName in list)
{
var imagePrompt = new ChatMessageContentItemCollection() {
new TextContent("please describe in 20 words or less what the image you see is, as if you were a wise and creative mystic"),
new ImageContent( new Uri(fileName ) )
};
history.AddUserMessage(imagePrompt);
var gptDescription =
await chatCompletionService.GetChatMessageContentAsync(history, null, kernel);
Console.WriteLine(gptDescription);
history.AddAssistantMessage(gptDescription.ToString());
}
}
And you can some nice results coming up:
Model capabilities to the Semantic Kernel can be added in the kernel builder, for example:
Now, adding to the previous code with Text to Speech capabilities, audio playback through the OpenAI Text to Audio Service would appear like the following. This is where the GPT4o generated text is spoken through the TTS model. The updated code then looks like this:
And as a quick result when running, the GPT4o model will generate descriptive text of the image it sees, and the TTS model will speak: