cheat sheet
Semantic Kernel
Build LLM-powered applications with Microsoft Semantic Kernel. Covers the kernel, plugins, prompt templates, planners, function calling, Kernel Memory, Python and .NET SDKs.
Semantic Kernel — Microsoft's AI Orchestration SDK
What it is
Semantic Kernel (SK) is Microsoft's open-source SDK for building LLM-powered applications across Python, .NET, and Java. The mental model is straightforward: a Kernel is a container; you register services (chat completion, embeddings, image generation) and plugins (collections of functions the model can call); plugins expose functions that are either Python/C# code or prompt-template files; and planners turn high-level user goals into a sequence of plugin invocations.
Where LangChain leans into Python data-science workflows and LlamaIndex centres on retrieval, Semantic Kernel is engineered for the .NET enterprise stack — strong typing, dependency injection, Microsoft.Extensions.AI alignment, OpenTelemetry by default — while keeping a feature-parity Python SDK. The current direction is to deprecate the dedicated planners in favour of automatic function calling by the underlying LLM, which makes SK feel more like an SDK around tool-use than a separate agent framework.
Install
pip install semantic-kernel
Output:
Successfully installed semantic-kernel-1.x.x ...
For .NET:
dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.SemanticKernel.Connectors.OpenAI
Output:
info : Package 'Microsoft.SemanticKernel' is compatible with all the specified frameworks in project.
Python 1.x and .NET 1.x are functionally similar but not identical — feature releases land in .NET first. Check the SK release notes before relying on bleeding-edge features in Python.
Quick example — Python
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.contents.chat_history import ChatHistory
async def main():
kernel = Kernel()
kernel.add_service(OpenAIChatCompletion(
service_id="chat",
ai_model_id="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
))
history = ChatHistory()
history.add_user_message("Define a kernel in one sentence.")
chat = kernel.get_service("chat")
settings = chat.get_prompt_execution_settings_class()(service_id="chat")
response = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
print(response.content)
asyncio.run(main())
Output:
A kernel is the central orchestrator that holds AI services, plugins, and configuration
for invoking functions in a Semantic Kernel application.
Quick example — C#
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
modelId: "gpt-4o-mini",
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
Kernel kernel = builder.Build();
var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddUserMessage("Define a kernel in one sentence.");
var reply = await chat.GetChatMessageContentAsync(history, kernel: kernel);
Console.WriteLine(reply.Content);
Output:
A kernel is the orchestration root that wires AI services, plugins, and memory into a single
invocable surface for an application.
When / why to use it
- Building LLM features inside an existing .NET (ASP.NET, Blazor, MAUI) application — SK is the canonical Microsoft path.
- Mixed Python + .NET teams that want shared plugin conventions and the same prompt files across runtimes.
- Apps that benefit from automatic function calling — the model picks tools without you writing a router.
- Long-running services that need OpenTelemetry tracing out of the box.
- Document-grounded chat using Kernel Memory, a separate but companion service.
Common pitfalls
kernelargument is mandatory for function calling — when callingget_chat_message_content(...), you must passkernel=kernelso the chat service can discover registered plugins. Forget it and the model returns plain text with no tool calls.
Auto-invoke loop limits — by default the kernel auto-invokes returned tool calls up to a maximum (5 in 1.x). For multi-step agents bump it via
FunctionChoiceBehavior.Auto(auto_invoke=True, maximum_auto_invoke_attempts=20).
Prompt template syntax — SK uses
{{$variable}}and{{function.name $arg}}. Single-brace placeholders are silently passed through.
Planners are deprecated —
SequentialPlanner,StepwisePlanner, andActionPlannerare deprecated in favour of native function calling. New code should useFunctionChoiceBehavior.Auto()instead.
Pass
enable_kernel_functions=Trueandfunction_choice_behavior=FunctionChoiceBehavior.Auto()to make all registered functions automatically callable by the model.
SK 1.x emits OpenTelemetry spans for every model and function call. Set the
OTEL_EXPORTER_OTLP_ENDPOINTenvironment variable to ship traces to Honeycomb, Tempo, or Azure Monitor.
The Kernel — Python
The Kernel is the entry point. You register services (chat completion, text embedding, text-to-image) and plugins on it; functions can then be invoked individually or via the chat completion auto-invoke loop.
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
OpenAIChatCompletion,
OpenAITextEmbedding,
)
import os
kernel = Kernel()
kernel.add_service(OpenAIChatCompletion(
service_id="chat",
ai_model_id="gpt-4o-mini",
api_key=os.environ["OPENAI_API_KEY"],
))
kernel.add_service(OpenAITextEmbedding(
service_id="embed",
ai_model_id="text-embedding-3-small",
api_key=os.environ["OPENAI_API_KEY"],
))
print(kernel.services.keys())
Output:
dict_keys(['chat', 'embed'])
Azure OpenAI, Hugging Face, Ollama, Anthropic (via semantic-kernel-anthropic or OpenAI-compatible endpoints), and Google AI services are all registerable through their respective Connectors.AI.* modules.
Plugins and functions
A plugin is a class whose methods are decorated with @kernel_function. The decorator turns the method into a function the model can call. Type hints and docstrings drive the function description and parameter schema.
from typing import Annotated
from semantic_kernel.functions import kernel_function
class TimePlugin:
@kernel_function(description="Return the current date in ISO format.")
def today(self) -> Annotated[str, "Current date YYYY-MM-DD"]:
from datetime import date
return date.today().isoformat()
@kernel_function(description="Return the day of the week for a given ISO date.")
def day_of_week(self, iso_date: Annotated[str, "An ISO date YYYY-MM-DD"]) -> str:
from datetime import date
return date.fromisoformat(iso_date).strftime("%A")
kernel.add_plugin(TimePlugin(), plugin_name="time")
result = await kernel.invoke(kernel.get_function("time", "day_of_week"), iso_date="2026-01-01")
print(result.value)
Output:
Thursday
Annotated[type, "description"] provides the parameter description the LLM sees when deciding to call the function.
C# plugin
using System.ComponentModel;
using Microsoft.SemanticKernel;
public class TimePlugin
{
[KernelFunction("today")]
[Description("Return the current date in ISO format.")]
public string Today() => DateTime.UtcNow.ToString("yyyy-MM-dd");
[KernelFunction("day_of_week")]
[Description("Return the day of the week for a given ISO date.")]
public string DayOfWeek([Description("An ISO date YYYY-MM-DD")] string isoDate)
=> DateTime.Parse(isoDate).DayOfWeek.ToString();
}
kernel.Plugins.AddFromObject(new TimePlugin(), "time");
Prompt-template functions
Functions can also be .prompty or .skprompt.txt files. The template uses {{$var}} placeholders for inputs and {{function.name $arg}} to invoke other functions inline.
Directory layout:
plugins/
WriterPlugin/
ShortPoem/
skprompt.txt
config.json
skprompt.txt:
Write a short four-line poem about {{$topic}} in the style of {{$style}}.
config.json:
{
"schema": 1,
"description": "Generate a short poem.",
"execution_settings": {
"default": {
"max_tokens": 200,
"temperature": 0.8
}
},
"input_variables": [
{"name": "topic", "description": "Subject of the poem", "default": ""},
{"name": "style", "description": "Author style", "default": "Robert Frost"}
]
}
Load and invoke:
kernel.add_plugin(parent_directory="./plugins", plugin_name="WriterPlugin")
poem = await kernel.invoke(kernel.get_function("WriterPlugin", "ShortPoem"), topic="snow", style="haiku")
print(str(poem))
Output:
White silence falls slow,
Each flake a quiet promise,
Winter writes the world.
.prompty is the newer, YAML-front-matter prompt format that is also consumed by Microsoft.Extensions.AI:
---
name: ShortPoem
description: Generate a short poem.
model:
api: chat
configuration:
type: openai
name: gpt-4o-mini
inputs:
topic:
type: string
style:
type: string
default: Robert Frost
---
system:
You are a poet.
user:
Write a short four-line poem about {{topic}} in the style of {{style}}.
Automatic function calling
The recommended way to combine an LLM with plugins. Set FunctionChoiceBehavior.Auto() and SK feeds plugin schemas to the model, executes any tool calls, and loops until the model returns a plain message.
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion, OpenAIChatPromptExecutionSettings
from semantic_kernel.contents.chat_history import ChatHistory
kernel = Kernel()
kernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))
kernel.add_plugin(TimePlugin(), plugin_name="time")
settings = OpenAIChatPromptExecutionSettings(service_id="chat")
settings.function_choice_behavior = FunctionChoiceBehavior.Auto()
async def ask(question: str) -> str:
history = ChatHistory()
history.add_user_message(question)
chat = kernel.get_service("chat")
reply = await chat.get_chat_message_content(chat_history=history, settings=settings, kernel=kernel)
return reply.content
print(asyncio.run(ask("What day of the week is 2026-07-04?")))
Output:
2026-07-04 is a Saturday.
FunctionChoiceBehavior.Required() forces the model to call exactly one function; FunctionChoiceBehavior.None_() disables tool calling for this turn.
Streaming
from semantic_kernel.contents.chat_history import ChatHistory
history = ChatHistory()
history.add_user_message("Explain async/await in three sentences.")
chat = kernel.get_service("chat")
async for chunk in chat.get_streaming_chat_message_content(chat_history=history, settings=settings, kernel=kernel):
if chunk and chunk.content:
print(chunk.content, end="", flush=True)
print()
In C# the equivalent is GetStreamingChatMessageContentsAsync(...). Both APIs yield delta chunks that include any tool-call fragments.
Filters — middleware for functions
Filters wrap function invocations with pre/post hooks. Use them for logging, auth checks, redaction, or retry-on-error.
from semantic_kernel.filters import FunctionInvocationContext
from semantic_kernel.filters.filter_types import FilterTypes
@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
async def log_calls(context: FunctionInvocationContext, next):
print(f"-> {context.function.plugin_name}.{context.function.name}")
await next(context)
print(f"<- {context.function.plugin_name}.{context.function.name} = {context.result}")
Equivalent .NET hooks: IFunctionInvocationFilter, IPromptRenderFilter, IAutoFunctionInvocationFilter.
Memory — short-term and long-term
SK ships a lightweight in-process memory abstraction; the heavier RAG story lives in the companion Kernel Memory service (see below).
from semantic_kernel.memory import VolatileMemoryStore, SemanticTextMemory
store = VolatileMemoryStore()
embed = kernel.get_service("embed")
memory = SemanticTextMemory(storage=store, embeddings_generator=embed)
await memory.save_information(collection="docs", id="1", text="The kernel routes all requests.")
await memory.save_information(collection="docs", id="2", text="Plugins expose callable functions.")
results = await memory.search(collection="docs", query="How are tools exposed?", limit=2)
for r in results:
print(r.text, "→", r.relevance)
Output:
Plugins expose callable functions. → 0.82
The kernel routes all requests. → 0.41
Pluggable backends include AzureCognitiveSearchMemoryStore, PostgresMemoryStore, RedisMemoryStore, QdrantMemoryStore, WeaviateMemoryStore, PineconeMemoryStore, and ChromaMemoryStore.
Kernel Memory — the heavier RAG service
Kernel Memory (KM) is a separate Microsoft project for production RAG. It runs as a service (Docker, ASP.NET, Azure Functions) and exposes a REST/gRPC API for indexing documents and asking questions; it handles chunking, embedding, citation, and multi-tenancy. SK speaks to KM through KernelMemoryServiceClient.
docker run -it --rm -p 9001:9001 \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
ghcr.io/microsoft/kernel-memory:latest
Output:
KernelMemory service is listening on http://+:9001
Client usage from .NET:
var memory = new MemoryWebClient("http://localhost:9001");
await memory.ImportDocumentAsync("readme.md", documentId: "readme");
var answer = await memory.AskAsync("What does this document cover?");
Console.WriteLine(answer.Result);
KM is the recommended path for production RAG in Microsoft stacks; the in-process SemanticTextMemory is fine for tests and small CLIs.
Planners (legacy — use auto function calling)
Planners predate native tool calling. They use an LLM to write a plan that calls plugin functions, then the kernel executes it. Modern SK code uses FunctionChoiceBehavior.Auto() instead, but planners still ship for backward compatibility.
from semantic_kernel.planners import FunctionCallingStepwisePlanner
planner = FunctionCallingStepwisePlanner(service_id="chat")
result = await planner.invoke(
kernel,
"Tell me the day of the week for July 4th, 2026, and write a one-line poem about that day.",
)
print(result.final_answer)
SequentialPlannerandActionPlannerare removed in 1.x. OnlyFunctionCallingStepwisePlannerremains; even it is in maintenance mode.
Agents (preview)
semantic_kernel.agents adds a higher-level agent abstraction with conversation threads, tool subsets, and group chats. The .NET surface is more mature; Python tracks behind.
from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.contents import ChatHistoryAgentThread
agent = ChatCompletionAgent(
kernel=kernel,
name="researcher",
instructions="You answer questions concisely using available tools.",
)
thread = ChatHistoryAgentThread()
async for response in agent.invoke(messages="What day of the week is 2026-07-04?", thread=thread):
print(response.content)
Multi-agent group chat is available via AgentGroupChat with termination strategies (round-robin, by content, by another LLM judge).
Real-world recipes
Recipe — pluggable LLM with fallback
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatCompletion
kernel.add_service(AzureChatCompletion(service_id="primary", deployment_name="gpt-4o", endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_KEY"]))
kernel.add_service(OpenAIChatCompletion(service_id="fallback", ai_model_id="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"]))
try:
settings = OpenAIChatPromptExecutionSettings(service_id="primary")
reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
except Exception:
settings = OpenAIChatPromptExecutionSettings(service_id="fallback")
reply = await kernel.invoke_prompt(prompt="Define vector embeddings.", settings=settings)
print(reply)
Recipe — typed return values
from typing import Annotated
from pydantic import BaseModel
from semantic_kernel.functions import kernel_function
class Review(BaseModel):
score: int
summary: str
class ReviewPlugin:
@kernel_function(description="Score and summarise a product review.")
def review(self, text: Annotated[str, "Raw review text"]) -> Review:
return Review(score=8, summary="Generally positive.")
Pydantic models are converted to JSON schema for the model and back to typed instances on return.
Recipe — request-scoped plugin registration in ASP.NET
builder.Services.AddScoped<Kernel>(sp =>
{
var kb = Kernel.CreateBuilder();
kb.AddOpenAIChatCompletion("gpt-4o-mini", builder.Configuration["OpenAI:ApiKey"]!);
var k = kb.Build();
k.Plugins.AddFromObject(sp.GetRequiredService<TimePlugin>(), "time");
k.Plugins.AddFromObject(sp.GetRequiredService<UserContextPlugin>(), "user");
return k;
});
A new Kernel per request gives clean isolation and lets per-user plugins (auth, tenant context) be injected.
Recipe — tracing to Azure Monitor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="https://my-otel-collector/v1/traces")))
trace.set_tracer_provider(provider)
Every kernel.invoke(...), chat completion, and function call now emits a span — search by gen_ai.operation.name in Azure Monitor.
Recipe — caching repeated prompts
import functools
@functools.lru_cache(maxsize=256)
async def cached_invoke(prompt: str) -> str:
return str(await kernel.invoke_prompt(prompt=prompt))
For semantic caching (cache hits on similar prompts), use SemanticTextMemory.search() over a prompt_cache collection and return on a high-relevance hit before calling the LLM.
Recipe — exposing SK as an OpenAI-compatible REST API
semantic-kernel-openai-compat (.NET) or a thin ASP.NET wrapper turns the kernel into a POST /v1/chat/completions endpoint. Any OpenAI SDK can then point at it.
app.MapPost("/v1/chat/completions", async (ChatRequest req, Kernel k) =>
{
var chat = k.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory(req.Messages);
var reply = await chat.GetChatMessageContentAsync(history, kernel: k);
return Results.Json(new { choices = new[] { new { message = new { role = "assistant", content = reply.Content } } } });
});
Python vs .NET — feature parity snapshot
| Capability | Python | .NET |
|---|---|---|
| Chat completion | yes | yes |
| Text embedding | yes | yes |
| Function calling | yes | yes |
Prompt templates ({{$var}}) | yes | yes |
.prompty files | yes | yes |
| Auto-invoke loop | yes | yes |
| Filters | yes | yes |
| Agents (preview) | yes | yes (more mature) |
| Process framework (workflows) | partial | yes |
| Kernel Memory client | yes | yes |
| OpenTelemetry | yes | yes |
| MCP integration | yes (sk-mcp) | yes |
Quick reference
| Task | Code |
|---|---|
| Install (Python) | pip install semantic-kernel |
| Install (.NET) | dotnet add package Microsoft.SemanticKernel |
| Create kernel (Py) | kernel = Kernel() |
| Create kernel (.NET) | Kernel.CreateBuilder()...Build() |
| Add OpenAI service | kernel.add_service(OpenAIChatCompletion(service_id="chat", ai_model_id="...", api_key=...)) |
| Define function | @kernel_function(description="...") on a method |
| Register plugin | kernel.add_plugin(MyPlugin(), plugin_name="my") |
| Load file plugin | kernel.add_plugin(parent_directory="./plugins", plugin_name="P") |
| Invoke function | await kernel.invoke(kernel.get_function("p", "fn"), arg=value) |
| Invoke prompt | await kernel.invoke_prompt(prompt="...", settings=...) |
| Auto tool calling | settings.function_choice_behavior = FunctionChoiceBehavior.Auto() |
| Required tool call | FunctionChoiceBehavior.Required() |
| Stream | async for chunk in chat.get_streaming_chat_message_content(...) |
| Filter | @kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION) |
| In-process memory | SemanticTextMemory(storage=VolatileMemoryStore(), embeddings_generator=embed) |
| Save to memory | await memory.save_information(collection, id, text) |
| Search memory | await memory.search(collection, query, limit=k) |
| Agent | ChatCompletionAgent(kernel=k, name=..., instructions=...) |
| Stepwise planner | FunctionCallingStepwisePlanner(service_id="chat") |