cheat sheet

Semantic Kernel

Build LLM-powered applications with Microsoft Semantic Kernel. Covers the kernel, plugins, prompt templates, planners, function calling, Kernel Memory, Python and .NET SDKs.

updated 05-25-2026

Semantic Kernel — Microsoft's AI Orchestration SDK

What it is

Semantic Kernel (SK) is Microsoft's open-source SDK for building LLM-powered applications across Python, .NET, and Java. The mental model is straightforward: a Kernel is a container; you register services (chat completion, embeddings, image generation) and plugins (collections of functions the model can call); plugins expose functions that are either Python/C# code or prompt-template files; and planners turn high-level user goals into a sequence of plugin invocations.

Where LangChain leans into Python data-science workflows and LlamaIndex centres on retrieval, Semantic Kernel is engineered for the .NET enterprise stack — strong typing, dependency injection, Microsoft.Extensions.AI alignment, OpenTelemetry by default — while keeping a feature-parity Python SDK. The current direction is to deprecate the dedicated planners in favour of automatic function calling by the underlying LLM, which makes SK feel more like an SDK around tool-use than a separate agent framework.

Install

bash

pip install semantic-kernel

Output:

text

Successfully installed semantic-kernel-1.x.x ...

For .NET:

bash

dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.SemanticKernel.Connectors.OpenAI

Output:

text

info : Package 'Microsoft.SemanticKernel' is compatible with all the specified frameworks in project.

Python 1.x and .NET 1.x are functionally similar but not identical — feature releases land in .NET first. Check the SK release notes before relying on bleeding-edge features in Python.

Quick example — Python

python

import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.contents.chat_history import ChatHistory

async def main():
    kernel = Kernel()
    kernel.add_service(OpenAIChatCompletion(
        service_id="chat",
        ai_model_id="gpt-4o-mini",
        api_key=os.environ["OPENAI_API_KEY"],
    ))

    history = ChatHistory()
    history.add_user_message("Define a kernel in one sentence.")

    chat = kernel.get_service("chat")
    settings = chat.get_prompt_execution_settings_class()(service_id="chat")
    response = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
    print(response.content)

asyncio.run(main())

Output:

text

A kernel is the central orchestrator that holds AI services, plugins, and configuration
for invoking functions in a Semantic Kernel application.

Quick example — C#

csharp

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
    modelId: "gpt-4o-mini",
    apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
Kernel kernel = builder.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddUserMessage("Define a kernel in one sentence.");

var reply = await chat.GetChatMessageContentAsync(history, kernel: kernel);
Console.WriteLine(reply.Content);

Output:

text

A kernel is the orchestration root that wires AI services, plugins, and memory into a single
invocable surface for an application.

When / why to use it

Building LLM features inside an existing .NET (ASP.NET, Blazor, MAUI) application — SK is the canonical Microsoft path.
Mixed Python + .NET teams that want shared plugin conventions and the same prompt files across runtimes.
Apps that benefit from automatic function calling — the model picks tools without you writing a router.
Long-running services that need OpenTelemetry tracing out of the box.
Document-grounded chat using Kernel Memory, a separate but companion service.

Common pitfalls

kernel argument is mandatory for function calling — when calling get_chat_message_content(...), you must pass kernel=kernel so the chat service can discover registered plugins. Forget it and the model returns plain text with no tool calls.

Auto-invoke loop limits — by default the kernel auto-invokes returned tool calls up to a maximum (5 in 1.x). For multi-step agents bump it via FunctionChoiceBehavior.Auto(auto_invoke=True, maximum_auto_invoke_attempts=20).

Prompt template syntax — SK uses {{$variable}} and {{function.name $arg}}. Single-brace placeholders are silently passed through.

Planners are deprecated — SequentialPlanner, StepwisePlanner, and ActionPlanner are deprecated in favour of native function calling. New code should use FunctionChoiceBehavior.Auto() instead.

Pass enable_kernel_functions=True and function_choice_behavior=FunctionChoiceBehavior.Auto() to make all registered functions automatically callable by the model.

SK 1.x emits OpenTelemetry spans for every model and function call. Set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable to ship traces to Honeycomb, Tempo, or Azure Monitor.

The Kernel — Python

The Kernel is the entry point. You register services (chat completion, text embedding, text-to-image) and plugins on it; functions can then be invoked individually or via the chat completion auto-invoke loop.

python

from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAITextEmbedding,
)
import os

kernel = Kernel()

kernel.add_service(OpenAIChatCompletion(
    service_id="chat",
    ai_model_id="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
))

kernel.add_service(OpenAITextEmbedding(
    service_id="embed",
    ai_model_id="text-embedding-3-small",
    api_key=os.environ["OPENAI_API_KEY"],
))

print(kernel.services.keys())

Output:

text

dict_keys(['chat', 'embed'])

Azure OpenAI, Hugging Face, Ollama, Anthropic (via semantic-kernel-anthropic or OpenAI-compatible endpoints), and Google AI services are all registerable through their respective Connectors.AI.* modules.

Plugins and functions

A plugin is a class whose methods are decorated with @kernel_function. The decorator turns the method into a function the model can call. Type hints and docstrings drive the function description and parameter schema.

python

from typing import Annotated
from semantic_kernel.functions import kernel_function

class TimePlugin:
    @kernel_function(description="Return the current date in ISO format.")
    def today(self) -> Annotated[str, "Current date YYYY-MM-DD"]:
        from datetime import date
        return date.today().isoformat()

    @kernel_function(description="Return the day of the week for a given ISO date.")
    def day_of_week(self, iso_date: Annotated[str, "An ISO date YYYY-MM-DD"]) -> str:
        from datetime import date
        return date.fromisoformat(iso_date).strftime("%A")

kernel.add_plugin(TimePlugin(), plugin_name="time")

result = await kernel.invoke(kernel.get_function("time", "day_of_week"), iso_date="2026-01-01")
print(result.value)

Output:

text

Thursday

Annotated[type, "description"] provides the parameter description the LLM sees when deciding to call the function.

C# plugin

csharp

using System.ComponentModel;
using Microsoft.SemanticKernel;

public class TimePlugin
{
    [KernelFunction("today")]
    [Description("Return the current date in ISO format.")]
    public string Today() => DateTime.UtcNow.ToString("yyyy-MM-dd");

    [KernelFunction("day_of_week")]
    [Description("Return the day of the week for a given ISO date.")]
    public string DayOfWeek([Description("An ISO date YYYY-MM-DD")] string isoDate)
        => DateTime.Parse(isoDate).DayOfWeek.ToString();
}

kernel.Plugins.AddFromObject(new TimePlugin(), "time");

Prompt-template functions

Functions can also be .prompty or .skprompt.txt files. The template uses {{$var}} placeholders for inputs and {{function.name $arg}} to invoke other functions inline.

Directory layout:

text

plugins/
  WriterPlugin/
    ShortPoem/
      skprompt.txt
      config.json

skprompt.txt:

text

Write a short four-line poem about {{$topic}} in the style of {{$style}}.

config.json:

json

{
  "schema": 1,
  "description": "Generate a short poem.",
  "execution_settings": {
    "default": {
      "max_tokens": 200,
      "temperature": 0.8
    }
  },
  "input_variables": [
    {"name": "topic", "description": "Subject of the poem", "default": ""},
    {"name": "style", "description": "Author style", "default": "Robert Frost"}
  ]
}

Load and invoke:

python

kernel.add_plugin(parent_directory="./plugins", plugin_name="WriterPlugin")
poem = await kernel.invoke(kernel.get_function("WriterPlugin", "ShortPoem"), topic="snow", style="haiku")
print(str(poem))

Output:

text

White silence falls slow,
Each flake a quiet promise,
Winter writes the world.

.prompty is the newer, YAML-front-matter prompt format that is also consumed by Microsoft.Extensions.AI:

text

---
name: ShortPoem
description: Generate a short poem.
model:
  api: chat
  configuration:
    type: openai
    name: gpt-4o-mini
inputs:
  topic:
    type: string
  style:
    type: string
    default: Robert Frost
---
system:
You are a poet.
user:
Write a short four-line poem about {{topic}} in the style of {{style}}.

Automatic function calling

The recommended way to combine an LLM with plugins. Set FunctionChoiceBehavior.Auto() and SK feeds plugin schemas to the model, executes any tool calls, and loops until the model returns a plain message.

python

import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion, OpenAIChatPromptExecutionSettings
from semantic_kernel.contents.chat_history import ChatHistory

kernel = Kernel()
kernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))
kernel.add_plugin(TimePlugin(), plugin_name="time")

settings = OpenAIChatPromptExecutionSettings(service_id="chat")
settings.function_choice_behavior = FunctionChoiceBehavior.Auto()

async def ask(question: str) -> str:
    history = ChatHistory()
    history.add_user_message(question)
    chat = kernel.get_service("chat")
    reply = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
    return reply.content

print(asyncio.run(ask("What day of the week is 2026-07-04?")))

Output:

text

2026-07-04 is a Saturday.

FunctionChoiceBehavior.Required() forces the model to call exactly one function; FunctionChoiceBehavior.None_() disables tool calling for this turn.

Streaming

python

from semantic_kernel.contents.chat_history import ChatHistory

history = ChatHistory()
history.add_user_message("Explain async/await in three sentences.")

chat = kernel.get_service("chat")
async for chunk in chat.get_streaming_chat_message_content(chat_history=history, settings=settings, kernel=kernel):
    if chunk and chunk.content:
        print(chunk.content, end="", flush=True)
print()

In C# the equivalent is GetStreamingChatMessageContentsAsync(...). Both APIs yield delta chunks that include any tool-call fragments.

Filters — middleware for functions

Filters wrap function invocations with pre/post hooks. Use them for logging, auth checks, redaction, or retry-on-error.

python

from semantic_kernel.filters import FunctionInvocationContext
from semantic_kernel.filters.filter_types import FilterTypes

@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
async def log_calls(context: FunctionInvocationContext, next):
    print(f"-> {context.function.plugin_name}.{context.function.name}")
    await next(context)
    print(f"<- {context.function.plugin_name}.{context.function.name} = {context.result}")

Equivalent .NET hooks: IFunctionInvocationFilter, IPromptRenderFilter, IAutoFunctionInvocationFilter.

Memory — short-term and long-term

SK ships a lightweight in-process memory abstraction; the heavier RAG story lives in the companion Kernel Memory service (see below).

python

from semantic_kernel.memory import VolatileMemoryStore, SemanticTextMemory

store = VolatileMemoryStore()
embed = kernel.get_service("embed")

memory = SemanticTextMemory(storage=store, embeddings_generator=embed)
await memory.save_information(collection="docs", id="1", text="The kernel routes all requests.")
await memory.save_information(collection="docs", id="2", text="Plugins expose callable functions.")

results = await memory.search(collection="docs", query="How are tools exposed?", limit=2)
for r in results:
    print(r.text, "→", r.relevance)

Output:

text

Plugins expose callable functions. → 0.82
The kernel routes all requests. → 0.41

Pluggable backends include AzureCognitiveSearchMemoryStore, PostgresMemoryStore, RedisMemoryStore, QdrantMemoryStore, WeaviateMemoryStore, PineconeMemoryStore, and ChromaMemoryStore.

Kernel Memory — the heavier RAG service

Kernel Memory (KM) is a separate Microsoft project for production RAG. It runs as a service (Docker, ASP.NET, Azure Functions) and exposes a REST/gRPC API for indexing documents and asking questions; it handles chunking, embedding, citation, and multi-tenancy. SK speaks to KM through KernelMemoryServiceClient.

bash

docker run -it --rm -p 9001:9001 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  ghcr.io/microsoft/kernel-memory:latest

Output:

text

KernelMemory service is listening on http://+:9001

Client usage from .NET:

csharp

var memory = new MemoryWebClient("http://localhost:9001");
await memory.ImportDocumentAsync("readme.md", documentId: "readme");
var answer = await memory.AskAsync("What does this document cover?");
Console.WriteLine(answer.Result);

KM is the recommended path for production RAG in Microsoft stacks; the in-process SemanticTextMemory is fine for tests and small CLIs.

Planners (legacy — use auto function calling)

Planners predate native tool calling. They use an LLM to write a plan that calls plugin functions, then the kernel executes it. Modern SK code uses FunctionChoiceBehavior.Auto() instead, but planners still ship for backward compatibility.

python

from semantic_kernel.planners import FunctionCallingStepwisePlanner

planner = FunctionCallingStepwisePlanner(service_id="chat")
result = await planner.invoke(
    kernel,
    "Tell me the day of the week for July 4th, 2026, and write a one-line poem about that day.",
)
print(result.final_answer)

SequentialPlanner and ActionPlanner are removed in 1.x. Only FunctionCallingStepwisePlanner remains; even it is in maintenance mode.

Agents (preview)

semantic_kernel.agents adds a higher-level agent abstraction with conversation threads, tool subsets, and group chats. The .NET surface is more mature; Python tracks behind.

python

from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.contents import ChatHistoryAgentThread

agent = ChatCompletionAgent(
    kernel=kernel,
    name="researcher",
    instructions="You answer questions concisely using available tools.",
)

thread = ChatHistoryAgentThread()
async for response in agent.invoke(messages="What day of the week is 2026-07-04?", thread=thread):
    print(response.content)

Multi-agent group chat is available via AgentGroupChat with termination strategies (round-robin, by content, by another LLM judge).

Real-world recipes

Recipe — pluggable LLM with fallback

python

from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatCompletion

kernel.add_service(AzureChatCompletion(service_id="primary",  deployment_name="gpt-4o", endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_KEY"]))
kernel.add_service(OpenAIChatCompletion(service_id="fallback", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))

try:
    settings = OpenAIChatPromptExecutionSettings(service_id="primary")
    reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
except Exception:
    settings = OpenAIChatPromptExecutionSettings(service_id="fallback")
    reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
print(reply)

Recipe — typed return values

python

from typing import Annotated
from pydantic import BaseModel
from semantic_kernel.functions import kernel_function

class Review(BaseModel):
    score: int
    summary: str

class ReviewPlugin:
    @kernel_function(description="Score and summarise a product review.")
    def review(self, text: Annotated[str, "Raw review text"]) -> Review:
        return Review(score=8, summary="Generally positive.")

Pydantic models are converted to JSON schema for the model and back to typed instances on return.

Recipe — request-scoped plugin registration in ASP.NET

csharp

builder.Services.AddScoped<Kernel>(sp =>
{
    var kb = Kernel.CreateBuilder();
    kb.AddOpenAIChatCompletion("gpt-4o-mini", builder.Configuration["OpenAI:ApiKey"]!);
    var k = kb.Build();
    k.Plugins.AddFromObject(sp.GetRequiredService<TimePlugin>(), "time");
    k.Plugins.AddFromObject(sp.GetRequiredService<UserContextPlugin>(), "user");
    return k;
});

A new Kernel per request gives clean isolation and lets per-user plugins (auth, tenant context) be injected.

Recipe — tracing to Azure Monitor

python

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="https://my-otel-collector/v1/traces")))
trace.set_tracer_provider(provider)

Every kernel.invoke(...), chat completion, and function call now emits a span — search by gen_ai.operation.name in Azure Monitor.

Recipe — caching repeated prompts

python

import functools

@functools.lru_cache(maxsize=256)
async def cached_invoke(prompt: str) -> str:
    return str(await kernel.invoke_prompt(prompt=prompt))

For semantic caching (cache hits on similar prompts), use SemanticTextMemory.search() over a prompt_cache collection and return on a high-relevance hit before calling the LLM.

Recipe — exposing SK as an OpenAI-compatible REST API

semantic-kernel-openai-compat (.NET) or a thin ASP.NET wrapper turns the kernel into a POST /v1/chat/completions endpoint. Any OpenAI SDK can then point at it.

csharp

app.MapPost("/v1/chat/completions", async (ChatRequest req, Kernel k) =>
{
    var chat = k.GetRequiredService<IChatCompletionService>();
    var history = new ChatHistory(req.Messages);
    var reply = await chat.GetChatMessageContentAsync(history, kernel: k);
    return Results.Json(new { choices = new[] { new { message = new { role = "assistant", content = reply.Content } } } });
});

Python vs .NET — feature parity snapshot

Capability	Python	.NET
Chat completion	yes	yes
Text embedding	yes	yes
Function calling	yes	yes
Prompt templates (`{{$var}}`)	yes	yes
`.prompty` files	yes	yes
Auto-invoke loop	yes	yes
Filters	yes	yes
Agents (preview)	yes	yes (more mature)
Process framework (workflows)	partial	yes
Kernel Memory client	yes	yes
OpenTelemetry	yes	yes
MCP integration	yes (sk-mcp)	yes

Quick reference

Task	Code
Install (Python)	`pip install semantic-kernel`
Install (.NET)	`dotnet add package Microsoft.SemanticKernel`
Create kernel (Py)	`kernel = Kernel()`
Create kernel (.NET)	`Kernel.CreateBuilder()...Build()`
Add OpenAI service	`kernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="...", api_key=...))`
Define function	`@kernel_function(description="...")` on a method
Register plugin	`kernel.add_plugin(MyPlugin(), plugin_name="my")`
Load file plugin	`kernel.add_plugin(parent_directory="./plugins", plugin_name="P")`
Invoke function	`await kernel.invoke(kernel.get_function("p", "fn"), arg=value)`
Invoke prompt	`await kernel.invoke_prompt(prompt="...", settings=...)`
Auto tool calling	`settings.function_choice_behavior = FunctionChoiceBehavior.Auto()`
Required tool call	`FunctionChoiceBehavior.Required()`
Stream	`async for chunk in chat.get_streaming_chat_message_content(...)`
Filter	`@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)`
In-process memory	`SemanticTextMemory(storage=VolatileMemoryStore(), embeddings_generator=embed)`
Save to memory	`await memory.save_information(collection, id, text)`
Search memory	`await memory.search(collection, query, limit=k)`
Agent	`ChatCompletionAgent(kernel=k, name=..., instructions=...)`
Stepwise planner	`FunctionCallingStepwisePlanner(service_id="chat")`