cheat sheet
semantic-kernel
Package-level reference for semantic-kernel on PyPI — install variants, the Python vs .NET split, provider extras, and alternative frameworks.
semantic-kernel
What it is
semantic-kernel is the Python distribution of Microsoft Semantic Kernel, an LLM orchestration SDK with a model-agnostic Kernel, pluggable AI services, plugins (collections of callable "functions" including native Python, prompt templates, and OpenAPI-described APIs), and planners that turn a goal into a chain of plugin calls. The SDK is fully async — most surface area is await-only.
Semantic Kernel exists in two first-party flavours: C# / .NET (the original) and Python (this package). A Java SDK also exists. The Python and .NET versions share concepts but the Python version often lags the C# version in features and stability.
Reach for semantic-kernel when you are building inside the Microsoft / Azure ecosystem, want first-class Azure OpenAI integration, and prefer a plugin-and-planner model. Reach for langchain, llama-index, haystack-ai, or autogen for richer Python ecosystems; crewai for narrative multi-agent flows.
Install
pip install semantic-kernel
Output: (none — exits 0 on success)
uv add semantic-kernel
Output: dependency resolved + added to pyproject.toml
poetry add semantic-kernel
Output: updated lockfile + virtualenv install
pip install "semantic-kernel[hugging_face]"
pip install "semantic-kernel[mistralai]"
pip install "semantic-kernel[google]"
pip install "semantic-kernel[ollama]"
Output: SK plus the chosen AI-provider integration
Versioning & Python support
- The Python package is pre-
1.0; minor releases regularly reshape connectors, planners, and agent abstractions. Pin a tight version range and treat upgrades as small migrations. - Recent versions support Python 3.10+. The codebase is async-first (
asyncio); using SK from sync code requiresasyncio.runornest_asyncio. - The C# SDK reached
1.0first and tends to lead Python on new features (especially the Agents Framework and Process Framework). Python eventually catches up but not always one-to-one. - Planners have been deprecated and renamed across versions — Action / Sequential / Stepwise planners in early releases gave way to function-calling planners and the newer Agent abstractions. Code from year-old tutorials often imports symbols that no longer exist.
Package metadata
- Maintainer: Microsoft (Semantic Kernel team) and community contributors
- Project home: github.com/microsoft/semantic-kernel
- Python docs: learn.microsoft.com/semantic-kernel
- PyPI: pypi.org/project/semantic-kernel
- License: MIT
- Governance: Microsoft-led with open contributions; the .NET SDK is the senior sibling
- First released: 2023
- Downloads: hundreds of thousands per month on PyPI
Optional dependencies & extras
The Python SDK uses extras to opt in to specific AI providers and connectors. Names and exact set vary across releases — recent versions include roughly:
semantic-kernel[hugging_face]— local HuggingFace pipelines as an AI service.semantic-kernel[mistralai]— Mistral AI provider.semantic-kernel[google]— Google Gemini / Vertex AI providers.semantic-kernel[ollama]— local Ollama provider.semantic-kernel[anthropic]— Anthropic Claude provider.semantic-kernel[aws]/semantic-kernel[bedrock]— Amazon Bedrock provider.semantic-kernel[azure]— Azure-specific integrations (Cognitive Search, Cosmos DB, etc.).semantic-kernel[chroma],semantic-kernel[qdrant],semantic-kernel[weaviate],semantic-kernel[redis],semantic-kernel[postgres],semantic-kernel[milvus],semantic-kernel[pinecone]— memory/vector connectors.semantic-kernel[notebooks],semantic-kernel[realtime]— development utilities and the realtime audio agent stack.
OpenAI and Azure OpenAI providers are in the base install (no extra needed).
Common companions:
openai— pulled in by default; the canonical chat completion path.azure-identity,azure-search-documents,azure-cosmos— Azure integration glue.mcp— Model Context Protocol client/server SDK; SK has first-class MCP plugin support.pydantic— used heavily for function-calling schemas.
Alternatives
| Package | Trade-off |
|---|---|
langchain | Bigger ecosystem and more connectors. Use when you want the widest integration coverage. |
llama-index | Stronger indexing/retrieval primitives. Use when RAG is the centre of gravity. |
haystack-ai | Explicit DAG-style pipelines. Use when you want strict typed wiring. |
autogen-agentchat | Multi-agent conversations. Use for agent-to-agent design. |
crewai | Role-based agent crews. Use for narrative multi-agent flows. |
dspy-ai | Programmatic prompt optimisation. Use when you want to compile prompts. |
.NET Microsoft.SemanticKernel | The senior SDK in C#. Use when your stack is .NET. |
Common gotchas
- Python lags .NET. Features and stability land in C# first. If a Microsoft-authored blog post shows an SK feature, double-check the Python package version actually exposes it before adopting it in a Python codebase.
- Planner deprecations across versions. Action, Sequential, and Stepwise planners from early SK gave way to function-calling planners and (more recently) the Agents Framework. Import paths change between minors — copy from the current
samples/directory, not old tutorials. - Async-only API. Almost everything is
async def; running SK from a sync script requiresasyncio.run(...)ornest_asyncioinside notebooks. Mixing with sync codebases needs explicit bridges. openaiis a hard dependency even if you only use Azure OpenAI or a non-OpenAI provider — the OpenAI client classes are referenced by base abstractions. Don't strip it from your image to "save space".- Plugins vs functions vs skills. Older SK terminology called collections of functions "skills"; current terminology is "plugins" containing "functions". Both names appear in stale docs.
- Memory connectors are evolving. The vector-store interfaces were reworked in 2024-2025 around the new
VectorStoreabstraction; oldMemoryStore-style code is being phased out. - MCP integration is first-class. Plugins can be exposed via Model Context Protocol and SK can consume MCP servers — but you need the
mcpPyPI package alongside, and the API surface is still maturing.
Real-world recipes
The recipes below focus on install / connector / async-topology choices — the sections/frameworks/semantic-kernel companion covers the kernel/plugin/planner concepts in depth.
Minimal kernel with Azure OpenAI — the canonical first run. No extras needed; azure-openai and openai come in with the base install.
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import KernelArguments
async def main():
kernel = Kernel()
kernel.add_service(AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
))
prompt = "Summarise: {{$input}}"
fn = kernel.add_function(plugin_name="util", function_name="summarise", prompt=prompt)
result = await kernel.invoke(fn, KernelArguments(input="HNSW is a graph-based ANN..."))
print(result)
asyncio.run(main())
Output: the prompt is templated with the input variable and sent to Azure OpenAI; everything is async — running this from sync code requires asyncio.run
Native Python plugin (a callable as a kernel function) — plugins are Python objects whose methods are decorated as kernel_function. The kernel can call them, the planner can choose them, and the function-calling LLM can invoke them.
from semantic_kernel.functions import kernel_function
class Weather:
@kernel_function(description="Get current weather for a city", name="get_weather")
def get_weather(self, city: str) -> str:
return f"Sunny in {city}."
kernel.add_plugin(Weather(), plugin_name="weather")
result = await kernel.invoke(kernel.get_function("weather", "get_weather"), KernelArguments(city="Berlin"))
Output: the kernel invokes the Python method; the same plugin is automatically usable by function-calling LLMs because of the description= and parameter types
Function-calling planner — modern SK leans on the LLM's own function-calling rather than the older SequentialPlanner / StepwisePlanner. Configure execution settings to enable auto-invocation.
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
settings = OpenAIChatPromptExecutionSettings()
settings.function_choice_behavior = FunctionChoiceBehavior.Auto(filters={"included_plugins": ["weather"]})
response = await kernel.invoke_prompt(
"What's the weather in Tokyo?",
arguments=KernelArguments(settings=settings),
)
Output: the LLM decides to call weather.get_weather, the kernel invokes the function, and the LLM composes a final answer using the returned value
MCP plugin — SK can consume any MCP server as a plugin. Requires the mcp PyPI package.
from semantic_kernel.connectors.mcp import MCPStdioPlugin
async with MCPStdioPlugin(
name="my_mcp",
command="uvx",
args=["some-mcp-server"],
) as plugin:
kernel.add_plugin(plugin, plugin_name="my_mcp")
# tools exposed by the MCP server are now kernel functions
Output: the MCP server's tools become kernel functions; the function-calling LLM can invoke them like any native Python plugin
Memory connector with Qdrant — the new VectorStore abstraction unifies memory connectors across vector DBs.
from semantic_kernel.connectors.memory.qdrant import QdrantVectorStore
from semantic_kernel.data import VectorStoreRecordDataField, VectorStoreRecordKeyField, VectorStoreRecordVectorField, vectorstoremodel
@vectorstoremodel
class Doc:
id: VectorStoreRecordKeyField
text: VectorStoreRecordDataField
embedding: VectorStoreRecordVectorField
store = QdrantVectorStore(url="http://qdrant.internal:6333")
collection = store.get_collection(collection_name="kb", data_model_type=Doc)
await collection.upsert(Doc(id="1", text="HNSW", embedding=[0.1] * 384))
Output: a typed vector-store collection with typed records; the same model works against any vector connector that implements VectorStore
Agent (preview Agents Framework) — the SK Agents Framework wraps a kernel + instructions + tools into an Agent abstraction. APIs are still evolving; check the samples for your installed version.
from semantic_kernel.agents import ChatCompletionAgent
agent = ChatCompletionAgent(
service_id="default",
kernel=kernel,
name="ResearchAssistant",
instructions="You research topics and cite sources.",
)
async for message in agent.invoke("Find papers on HNSW"):
print(message.content)
Output: an agent loop that streams responses; tool calls dispatch through the kernel's plugins automatically
Production deployment
Semantic Kernel is async-first library code; production deployment usually means wrapping it in a FastAPI / aiohttp app or hosting it inside Azure Functions / Container Apps with the kernel constructed per process (not per request).
Topology checklist:
| Concern | Approach |
|---|---|
| Kernel lifetime | per-process; reuse across requests |
| Async runtime | asyncio (FastAPI / aiohttp); nest_asyncio only for notebooks |
| Provider | Azure OpenAI (managed identity), OpenAI, or local via Ollama |
| Plugins | Python classes, OpenAPI specs, or MCP servers |
| Memory | external vector DB (Qdrant, Azure AI Search, Cosmos DB, pgvector) |
| Tracing | OpenTelemetry (built-in spans) → Application Insights / Jaeger |
| Secrets | env vars or Azure Key Vault via azure-identity |
Azure-native deployment. SK's natural home is the Azure ecosystem:
- Azure OpenAI as the chat completion service —
AzureChatCompletionwithazure-identityfor managed identity auth (no keys). - Azure AI Search as the memory connector (
AzureAISearchVectorStore). - Cosmos DB for chat history or vector storage.
- Application Insights as the OTel destination.
from azure.identity import DefaultAzureCredential
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
kernel.add_service(AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
ad_token_provider=DefaultAzureCredential().get_token_provider("https://cognitiveservices.azure.com/.default"),
))
Output: authenticates via managed identity in production, falls back to dev credentials locally — no API key in env vars
Non-Azure deployments. SK runs anywhere Python runs. Use OpenAI directly (OpenAIChatCompletion), Anthropic via the anthropic extra, or local models via Ollama. The kernel-and-plugin model is provider-agnostic.
Async only. Almost the entire surface area is async def. From a sync codebase, you bridge with asyncio.run(...) per call or run the kernel in a background event loop. Inside FastAPI, native async endpoints are the natural fit.
OpenTelemetry tracing. SK emits OTel spans for kernel invocations, function calls, and LLM requests. Configure an exporter and the spans flow to your APM unchanged.
Version migration guide
The Python SDK is pre-1.0; minor releases regularly reshape connectors, planners, and agent abstractions. The Python version lags C# on feature parity.
Planner deprecations. Early SK had three planners — ActionPlanner, SequentialPlanner, StepwisePlanner. All have been deprecated in favour of:
- Function-calling planner — let the LLM decide which functions to call (
FunctionChoiceBehavior.Auto). The default in current SK. - Agents Framework — wraps a kernel + tools + instructions in an Agent abstraction. Currently preview/evolving.
- Process Framework — explicit graph of steps; for deterministic multi-step orchestration. Preview.
If you're on year-old tutorial code, expect from semantic_kernel.planners import StepwisePlanner to fail or be marked deprecated.
Plugins vs skills vs functions. Older SK terminology called collections of functions "skills"; current terminology is "plugins" containing "functions". The kernel.add_skill(...) API was renamed kernel.add_plugin(...).
Memory connector reshape. The MemoryStore and MemoryRecord abstractions were replaced by VectorStore, VectorStoreRecord*, and the @vectorstoremodel decorator. The new model is more typed and matches the C# SDK; the old API will be removed.
Connector packages. The set of extras has expanded and renamed across versions. semantic-kernel[chroma], [qdrant], [weaviate], [redis], [postgres], [milvus], [pinecone], plus AI providers [hugging_face], [mistralai], [google], [ollama], [anthropic], [aws]/[bedrock]. Check the package's setup.py for the current list.
Function decorator signature. @kernel_function(name=..., description=...) is current. Older versions used @sk_function and a separate @sk_function_context_parameter decorator. Migrate to the new combined form.
Python vs .NET parity. The .NET SDK reached 1.0 first and continues to lead. If a Microsoft-authored blog post shows an SK feature, verify the Python package version actually exposes it. The Agents Framework, in particular, lags .NET.
Pinning strategy. A reproducible setup pins a tight minor range:
semantic-kernel>=1.18,<1.19
Plus the connector/provider extras you use. Read the changelog before bumping minors; renames and deprecations are frequent.
Performance tuning
| Lever | Mechanism | When it helps |
|---|---|---|
| Reuse kernel per process | avoid recreating connectors | every web request |
asyncio concurrency | parallel kernel invokes | independent LLM calls |
| Function-calling over planners | one LLM call instead of N | latency-sensitive flows |
| Streaming responses | kernel.invoke_prompt_stream | UX with progressive output |
| Local provider (Ollama) | no API latency | dev iteration |
| Prompt caching (Anthropic) | reuse system prompts | repeated calls |
Streaming. Most kernel methods have a streaming counterpart that returns an async iterator of partial chunks:
async for chunk in kernel.invoke_prompt_stream("Tell me about HNSW"):
print(chunk.content, end="", flush=True)
Output: characters print as the LLM generates them; latency-to-first-token drops dramatically vs the non-streaming call
Function-calling vs planner. A planner LLM call typically costs 2–4× a function-calling call (it has to enumerate plans rather than just choose tools). For latency-sensitive flows, function-calling is the cheaper path.
Troubleshooting common errors
RuntimeError: asyncio.run() cannot be called from a running event loop— you're callingasyncio.runinside an existing loop (e.g. Jupyter). Useawaitdirectly, ornest_asyncio.apply()for notebooks.ImportError: cannot import name 'StepwisePlanner'— old planner removed. Use function-calling viaFunctionChoiceBehavior.Auto.AttributeError: 'Kernel' object has no attribute 'add_skill'— renamed toadd_plugin.- Plugin function not invoked by the LLM — missing
description=on@kernel_function. The LLM uses the description to decide which function to call. openaiSDK error even though I use Anthropic —openaiis a hard dependency of the base install; many internal abstractions reference it. Don't strip it.- Connector extras conflict — installing many
semantic-kernel[*]extras at once can create incompatible transitive pins. Install only the ones you use, or use a constraints file. - MCP plugin hangs — the underlying MCP server didn't start. Test the MCP server independently first (
uvx some-mcp-serverdirectly) before wrapping. - Function-calling loops forever — LLM repeatedly calls a function that errors. Set a step limit via
FunctionChoiceBehavior.Auto(maximum_auto_invoke_attempts=N).
Ecosystem integrations
- Azure ecosystem — first-class. Azure OpenAI, Azure AI Search, Cosmos DB, Application Insights, Azure Identity all have direct connectors.
- OpenAI / Anthropic / Google / Mistral / AWS Bedrock / Ollama / HuggingFace — each has a chat completion connector via extras.
- MCP (Model Context Protocol) — first-class via
MCPStdioPlugin/MCPSsePlugin. SK can consume any MCP server as a plugin and expose its own functions over MCP. - OpenAPI plugins — point SK at an OpenAPI spec and every operation becomes a kernel function. Useful for wrapping internal REST APIs.
- LangChain interop — limited; the two frameworks overlap conceptually. Most projects pick one rather than mix.
- Vector DBs — Chroma, Qdrant, Weaviate, Pinecone, Milvus, Redis, Postgres (pgvector), Azure AI Search via the
VectorStoreabstraction. - .NET / Java siblings — same concepts, different SDKs. Cross-language workflows usually federate through an MCP server.
Security considerations
SK plugins are arbitrary Python functions invoked by an LLM. This is powerful and dangerous in equal measure — the LLM can call anything you expose.
- Plugin scope. Limit
FunctionChoiceBehavior.Auto(filters={"included_plugins": [...]})to the minimum set of plugins each request needs. Don't expose admin functions to general-purpose chat agents. - Function input validation. Plugin functions receive arguments the LLM generated. Validate types and ranges; never
evalor pass to a shell without sanitisation. - Prompt injection. Retrieved content (memory connectors, RAG) can contain prompt-injection payloads that hijack the function-calling LLM. Use system prompts that explicitly forbid following instructions in retrieved data.
- MCP plugin trust. When SK consumes an MCP server, that server's tools become callable. Audit every MCP server you connect — the trust model is "MCP server's author has root in your agent".
- Azure managed identity. Production Azure deployments should use
DefaultAzureCredentialrather than API keys. Keys leak; managed identities don't. - OpenAPI plugin auth. OpenAPI-based plugins authenticate against the wrapped REST API. Use mTLS or short-lived tokens, not long-lived API keys baked into specs.
- Memory connector secrets. Vector store credentials live in connector config; load from env or Key Vault, never hardcode.
- Logging of function arguments. OpenTelemetry traces include plugin call arguments — apply PII scrubbing for regulated content.
- Cost as DoS. A misbehaving function-calling loop can call expensive LLMs forever. Cap with
maximum_auto_invoke_attemptsand per-user request budgets.
When NOT to use this
Semantic Kernel is the right tool when you're inside the Microsoft/Azure ecosystem and want a plugin-and-planner model. It's the wrong tool when:
- Pure Python ecosystem. LangChain has broader Python integration coverage and more community examples.
- RAG is the whole product. LlamaIndex has stronger retrieval primitives.
- You want explicit DAG-style pipelines. Haystack 2.x has typed-socket wiring that SK doesn't.
- You're on .NET. Use the .NET SDK directly — it's the senior sibling and leads on features.
- Multi-agent narrative flows. CrewAI or AutoGen are more agent-shaped; SK's Agents Framework is still maturing.
- Sync codebase. SK is async-first; mixing into sync code adds friction. A sync library may fit better.
- You don't want pre-1.0 API churn. SK Python keeps reshaping connectors and planners. If stability is paramount, a 1.0+ framework is calmer.
See also
- Frameworks: Semantic Kernel — kernels, plugins, planners, agents
- Concept: agents — agent orchestration patterns
- Concept: API — REST design fundamentals