cheat sheet

Semantic Kernel

Build LLM-powered applications with Microsoft Semantic Kernel. Covers the kernel, plugins, prompt templates, planners, function calling, Kernel Memory, Python and .NET SDKs.

Semantic Kernel — Microsoft's AI Orchestration SDK

What it is

Semantic Kernel (SK) is Microsoft's open-source SDK for building LLM-powered applications across Python, .NET, and Java. The mental model is straightforward: a Kernel is a container; you register services (chat completion, embeddings, image generation) and plugins (collections of functions the model can call); plugins expose functions that are either Python/C# code or prompt-template files; and planners turn high-level user goals into a sequence of plugin invocations.

Where LangChain leans into Python data-science workflows and LlamaIndex centres on retrieval, Semantic Kernel is engineered for the .NET enterprise stack — strong typing, dependency injection, Microsoft.Extensions.AI alignment, OpenTelemetry by default — while keeping a feature-parity Python SDK. The current direction is to deprecate the dedicated planners in favour of automatic function calling by the underlying LLM, which makes SK feel more like an SDK around tool-use than a separate agent framework.

Install

bash
pip install semantic-kernel

Output:

text
Successfully installed semantic-kernel-1.x.x ...

For .NET:

bash
dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.SemanticKernel.Connectors.OpenAI

Output:

text
info : Package 'Microsoft.SemanticKernel' is compatible with all the specified frameworks in project.

Python 1.x and .NET 1.x are functionally similar but not identical — feature releases land in .NET first. Check the SK release notes before relying on bleeding-edge features in Python.

Quick example — Python

python
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.contents.chat_history import ChatHistory

async def main():
    kernel = Kernel()
    kernel.add_service(OpenAIChatCompletion(
        service_id="chat",
        ai_model_id="gpt-4o-mini",
        api_key=os.environ["OPENAI_API_KEY"],
    ))

    history = ChatHistory()
    history.add_user_message("Define a kernel in one sentence.")

    chat = kernel.get_service("chat")
    settings = chat.get_prompt_execution_settings_class()(service_id="chat")
    response = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
    print(response.content)

asyncio.run(main())

Output:

text
A kernel is the central orchestrator that holds AI services, plugins, and configuration
for invoking functions in a Semantic Kernel application.

Quick example — C#

csharp
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
    modelId: "gpt-4o-mini",
    apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
Kernel kernel = builder.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddUserMessage("Define a kernel in one sentence.");

var reply = await chat.GetChatMessageContentAsync(history, kernel: kernel);
Console.WriteLine(reply.Content);

Output:

text
A kernel is the orchestration root that wires AI services, plugins, and memory into a single
invocable surface for an application.

When / why to use it

  • Building LLM features inside an existing .NET (ASP.NET, Blazor, MAUI) application — SK is the canonical Microsoft path.
  • Mixed Python + .NET teams that want shared plugin conventions and the same prompt files across runtimes.
  • Apps that benefit from automatic function calling — the model picks tools without you writing a router.
  • Long-running services that need OpenTelemetry tracing out of the box.
  • Document-grounded chat using Kernel Memory, a separate but companion service.

Common pitfalls

kernel argument is mandatory for function calling — when calling get_chat_message_content(...), you must pass kernel=kernel so the chat service can discover registered plugins. Forget it and the model returns plain text with no tool calls.

Auto-invoke loop limits — by default the kernel auto-invokes returned tool calls up to a maximum (5 in 1.x). For multi-step agents bump it via FunctionChoiceBehavior.Auto(auto_invoke=True, maximum_auto_invoke_attempts=20).

Prompt template syntax — SK uses {{$variable}} and {{function.name $arg}}. Single-brace placeholders are silently passed through.

Planners are deprecatedSequentialPlanner, StepwisePlanner, and ActionPlanner are deprecated in favour of native function calling. New code should use FunctionChoiceBehavior.Auto() instead.

Pass enable_kernel_functions=True and function_choice_behavior=FunctionChoiceBehavior.Auto() to make all registered functions automatically callable by the model.

SK 1.x emits OpenTelemetry spans for every model and function call. Set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable to ship traces to Honeycomb, Tempo, or Azure Monitor.

The Kernel — Python

The Kernel is the entry point. You register services (chat completion, text embedding, text-to-image) and plugins on it; functions can then be invoked individually or via the chat completion auto-invoke loop.

python
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAITextEmbedding,
)
import os

kernel = Kernel()

kernel.add_service(OpenAIChatCompletion(
    service_id="chat",
    ai_model_id="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
))

kernel.add_service(OpenAITextEmbedding(
    service_id="embed",
    ai_model_id="text-embedding-3-small",
    api_key=os.environ["OPENAI_API_KEY"],
))

print(kernel.services.keys())

Output:

text
dict_keys(['chat', 'embed'])

Azure OpenAI, Hugging Face, Ollama, Anthropic (via semantic-kernel-anthropic or OpenAI-compatible endpoints), and Google AI services are all registerable through their respective Connectors.AI.* modules.

Plugins and functions

A plugin is a class whose methods are decorated with @kernel_function. The decorator turns the method into a function the model can call. Type hints and docstrings drive the function description and parameter schema.

python
from typing import Annotated
from semantic_kernel.functions import kernel_function

class TimePlugin:
    @kernel_function(description="Return the current date in ISO format.")
    def today(self) -> Annotated[str, "Current date YYYY-MM-DD"]:
        from datetime import date
        return date.today().isoformat()

    @kernel_function(description="Return the day of the week for a given ISO date.")
    def day_of_week(self, iso_date: Annotated[str, "An ISO date YYYY-MM-DD"]) -> str:
        from datetime import date
        return date.fromisoformat(iso_date).strftime("%A")

kernel.add_plugin(TimePlugin(), plugin_name="time")

result = await kernel.invoke(kernel.get_function("time", "day_of_week"), iso_date="2026-01-01")
print(result.value)

Output:

text
Thursday

Annotated[type, "description"] provides the parameter description the LLM sees when deciding to call the function.

C# plugin

csharp
using System.ComponentModel;
using Microsoft.SemanticKernel;

public class TimePlugin
{
    [KernelFunction("today")]
    [Description("Return the current date in ISO format.")]
    public string Today() => DateTime.UtcNow.ToString("yyyy-MM-dd");

    [KernelFunction("day_of_week")]
    [Description("Return the day of the week for a given ISO date.")]
    public string DayOfWeek([Description("An ISO date YYYY-MM-DD")] string isoDate)
        => DateTime.Parse(isoDate).DayOfWeek.ToString();
}

kernel.Plugins.AddFromObject(new TimePlugin(), "time");

Prompt-template functions

Functions can also be .prompty or .skprompt.txt files. The template uses {{$var}} placeholders for inputs and {{function.name $arg}} to invoke other functions inline.

Directory layout:

text
plugins/
  WriterPlugin/
    ShortPoem/
      skprompt.txt
      config.json

skprompt.txt:

text
Write a short four-line poem about {{$topic}} in the style of {{$style}}.

config.json:

json
{
  "schema": 1,
  "description": "Generate a short poem.",
  "execution_settings": {
    "default": {
      "max_tokens": 200,
      "temperature": 0.8
    }
  },
  "input_variables": [
    {"name": "topic", "description": "Subject of the poem", "default": ""},
    {"name": "style", "description": "Author style", "default": "Robert Frost"}
  ]
}

Load and invoke:

python
kernel.add_plugin(parent_directory="./plugins", plugin_name="WriterPlugin")
poem = await kernel.invoke(kernel.get_function("WriterPlugin", "ShortPoem"), topic="snow", style="haiku")
print(str(poem))

Output:

text
White silence falls slow,
Each flake a quiet promise,
Winter writes the world.

.prompty is the newer, YAML-front-matter prompt format that is also consumed by Microsoft.Extensions.AI:

text
---
name: ShortPoem
description: Generate a short poem.
model:
  api: chat
  configuration:
    type: openai
    name: gpt-4o-mini
inputs:
  topic:
    type: string
  style:
    type: string
    default: Robert Frost
---
system:
You are a poet.
user:
Write a short four-line poem about {{topic}} in the style of {{style}}.

Automatic function calling

The recommended way to combine an LLM with plugins. Set FunctionChoiceBehavior.Auto() and SK feeds plugin schemas to the model, executes any tool calls, and loops until the model returns a plain message.

python
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion, OpenAIChatPromptExecutionSettings
from semantic_kernel.contents.chat_history import ChatHistory

kernel = Kernel()
kernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))
kernel.add_plugin(TimePlugin(), plugin_name="time")

settings = OpenAIChatPromptExecutionSettings(service_id="chat")
settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

async def ask(question: str) -> str:
    history = ChatHistory()
    history.add_user_message(question)
    chat = kernel.get_service("chat")
    reply = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
    return reply.content

print(asyncio.run(ask("What day of the week is 2026-07-04?")))

Output:

text
2026-07-04 is a Saturday.

FunctionChoiceBehavior.Required() forces the model to call exactly one function; FunctionChoiceBehavior.None_() disables tool calling for this turn.

Streaming

python
from semantic_kernel.contents.chat_history import ChatHistory

history = ChatHistory()
history.add_user_message("Explain async/await in three sentences.")

chat = kernel.get_service("chat")
async for chunk in chat.get_streaming_chat_message_content(chat_history=history, settings=settings, kernel=kernel):
    if chunk and chunk.content:
        print(chunk.content, end="", flush=True)
print()

In C# the equivalent is GetStreamingChatMessageContentsAsync(...). Both APIs yield delta chunks that include any tool-call fragments.

Filters — middleware for functions

Filters wrap function invocations with pre/post hooks. Use them for logging, auth checks, redaction, or retry-on-error.

python
from semantic_kernel.filters import FunctionInvocationContext
from semantic_kernel.filters.filter_types import FilterTypes

@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
async def log_calls(context: FunctionInvocationContext, next):
    print(f"-> {context.function.plugin_name}.{context.function.name}")
    await next(context)
    print(f"<- {context.function.plugin_name}.{context.function.name} = {context.result}")

Equivalent .NET hooks: IFunctionInvocationFilter, IPromptRenderFilter, IAutoFunctionInvocationFilter.

Memory — short-term and long-term

SK ships a lightweight in-process memory abstraction; the heavier RAG story lives in the companion Kernel Memory service (see below).

python
from semantic_kernel.memory import VolatileMemoryStore, SemanticTextMemory

store = VolatileMemoryStore()
embed = kernel.get_service("embed")

memory = SemanticTextMemory(storage=store, embeddings_generator=embed)
await memory.save_information(collection="docs", id="1", text="The kernel routes all requests.")
await memory.save_information(collection="docs", id="2", text="Plugins expose callable functions.")

results = await memory.search(collection="docs", query="How are tools exposed?", limit=2)
for r in results:
    print(r.text, "→", r.relevance)

Output:

text
Plugins expose callable functions. → 0.82
The kernel routes all requests. → 0.41

Pluggable backends include AzureCognitiveSearchMemoryStore, PostgresMemoryStore, RedisMemoryStore, QdrantMemoryStore, WeaviateMemoryStore, PineconeMemoryStore, and ChromaMemoryStore.

Kernel Memory — the heavier RAG service

Kernel Memory (KM) is a separate Microsoft project for production RAG. It runs as a service (Docker, ASP.NET, Azure Functions) and exposes a REST/gRPC API for indexing documents and asking questions; it handles chunking, embedding, citation, and multi-tenancy. SK speaks to KM through KernelMemoryServiceClient.

bash
docker run -it --rm -p 9001:9001 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  ghcr.io/microsoft/kernel-memory:latest

Output:

text
KernelMemory service is listening on http://+:9001

Client usage from .NET:

csharp
var memory = new MemoryWebClient("http://localhost:9001");
await memory.ImportDocumentAsync("readme.md", documentId: "readme");
var answer = await memory.AskAsync("What does this document cover?");
Console.WriteLine(answer.Result);

KM is the recommended path for production RAG in Microsoft stacks; the in-process SemanticTextMemory is fine for tests and small CLIs.

Planners (legacy — use auto function calling)

Planners predate native tool calling. They use an LLM to write a plan that calls plugin functions, then the kernel executes it. Modern SK code uses FunctionChoiceBehavior.Auto() instead, but planners still ship for backward compatibility.

python
from semantic_kernel.planners import FunctionCallingStepwisePlanner

planner = FunctionCallingStepwisePlanner(service_id="chat")
result = await planner.invoke(
    kernel,
    "Tell me the day of the week for July 4th, 2026, and write a one-line poem about that day.",
)
print(result.final_answer)

SequentialPlanner and ActionPlanner are removed in 1.x. Only FunctionCallingStepwisePlanner remains; even it is in maintenance mode.

Agents (preview)

semantic_kernel.agents adds a higher-level agent abstraction with conversation threads, tool subsets, and group chats. The .NET surface is more mature; Python tracks behind.

python
from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.contents import ChatHistoryAgentThread

agent = ChatCompletionAgent(
    kernel=kernel,
    name="researcher",
    instructions="You answer questions concisely using available tools.",
)

thread = ChatHistoryAgentThread()
async for response in agent.invoke(messages="What day of the week is 2026-07-04?", thread=thread):
    print(response.content)

Multi-agent group chat is available via AgentGroupChat with termination strategies (round-robin, by content, by another LLM judge).

Real-world recipes

Recipe — pluggable LLM with fallback

python
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatCompletion

kernel.add_service(AzureChatCompletion(service_id="primary",  deployment_name="gpt-4o", endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_KEY"]))
kernel.add_service(OpenAIChatCompletion(service_id="fallback", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))

try:
    settings = OpenAIChatPromptExecutionSettings(service_id="primary")
    reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
except Exception:
    settings = OpenAIChatPromptExecutionSettings(service_id="fallback")
    reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
print(reply)

Recipe — typed return values

python
from typing import Annotated
from pydantic import BaseModel
from semantic_kernel.functions import kernel_function

class Review(BaseModel):
    score: int
    summary: str

class ReviewPlugin:
    @kernel_function(description="Score and summarise a product review.")
    def review(self, text: Annotated[str, "Raw review text"]) -> Review:
        return Review(score=8, summary="Generally positive.")

Pydantic models are converted to JSON schema for the model and back to typed instances on return.

Recipe — request-scoped plugin registration in ASP.NET

csharp
builder.Services.AddScoped<Kernel>(sp =>
{
    var kb = Kernel.CreateBuilder();
    kb.AddOpenAIChatCompletion("gpt-4o-mini", builder.Configuration["OpenAI:ApiKey"]!);
    var k = kb.Build();
    k.Plugins.AddFromObject(sp.GetRequiredService<TimePlugin>(), "time");
    k.Plugins.AddFromObject(sp.GetRequiredService<UserContextPlugin>(), "user");
    return k;
});

A new Kernel per request gives clean isolation and lets per-user plugins (auth, tenant context) be injected.

Recipe — tracing to Azure Monitor

python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="https://my-otel-collector/v1/traces")))
trace.set_tracer_provider(provider)

Every kernel.invoke(...), chat completion, and function call now emits a span — search by gen_ai.operation.name in Azure Monitor.

Recipe — caching repeated prompts

python
import functools

@functools.lru_cache(maxsize=256)
async def cached_invoke(prompt: str) -> str:
    return str(await kernel.invoke_prompt(prompt=prompt))

For semantic caching (cache hits on similar prompts), use SemanticTextMemory.search() over a prompt_cache collection and return on a high-relevance hit before calling the LLM.

Recipe — exposing SK as an OpenAI-compatible REST API

semantic-kernel-openai-compat (.NET) or a thin ASP.NET wrapper turns the kernel into a POST /v1/chat/completions endpoint. Any OpenAI SDK can then point at it.

csharp
app.MapPost("/v1/chat/completions", async (ChatRequest req, Kernel k) =>
{
    var chat = k.GetRequiredService<IChatCompletionService>();
    var history = new ChatHistory(req.Messages);
    var reply = await chat.GetChatMessageContentAsync(history, kernel: k);
    return Results.Json(new { choices = new[] { new { message = new { role = "assistant", content = reply.Content } } } });
});

Python vs .NET — feature parity snapshot

CapabilityPython.NET
Chat completionyesyes
Text embeddingyesyes
Function callingyesyes
Prompt templates ({{$var}})yesyes
.prompty filesyesyes
Auto-invoke loopyesyes
Filtersyesyes
Agents (preview)yesyes (more mature)
Process framework (workflows)partialyes
Kernel Memory clientyesyes
OpenTelemetryyesyes
MCP integrationyes (sk-mcp)yes

Quick reference

TaskCode
Install (Python)pip install semantic-kernel
Install (.NET)dotnet add package Microsoft.SemanticKernel
Create kernel (Py)kernel = Kernel()
Create kernel (.NET)Kernel.CreateBuilder()...Build()
Add OpenAI servicekernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="...", api_key=...))
Define function@kernel_function(description="...") on a method
Register pluginkernel.add_plugin(MyPlugin(), plugin_name="my")
Load file pluginkernel.add_plugin(parent_directory="./plugins", plugin_name="P")
Invoke functionawait kernel.invoke(kernel.get_function("p", "fn"), arg=value)
Invoke promptawait kernel.invoke_prompt(prompt="...", settings=...)
Auto tool callingsettings.function_choice_behavior = FunctionChoiceBehavior.Auto()
Required tool callFunctionChoiceBehavior.Required()
Streamasync for chunk in chat.get_streaming_chat_message_content(...)
Filter@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
In-process memorySemanticTextMemory(storage=VolatileMemoryStore(), embeddings_generator=embed)
Save to memoryawait memory.save_information(collection, id, text)
Search memoryawait memory.search(collection, query, limit=k)
AgentChatCompletionAgent(kernel=k, name=..., instructions=...)
Stepwise plannerFunctionCallingStepwisePlanner(service_id="chat")