cheat sheet

semantic-kernel

Package-level reference for semantic-kernel on PyPI — install variants, the Python vs .NET split, provider extras, and alternative frameworks.

semantic-kernel

What it is

semantic-kernel is the Python distribution of Microsoft Semantic Kernel, an LLM orchestration SDK with a model-agnostic Kernel, pluggable AI services, plugins (collections of callable "functions" including native Python, prompt templates, and OpenAPI-described APIs), and planners that turn a goal into a chain of plugin calls. The SDK is fully async — most surface area is await-only.

Semantic Kernel exists in two first-party flavours: C# / .NET (the original) and Python (this package). A Java SDK also exists. The Python and .NET versions share concepts but the Python version often lags the C# version in features and stability.

Reach for semantic-kernel when you are building inside the Microsoft / Azure ecosystem, want first-class Azure OpenAI integration, and prefer a plugin-and-planner model. Reach for langchain, llama-index, haystack-ai, or autogen for richer Python ecosystems; crewai for narrative multi-agent flows.

Install

bash
pip install semantic-kernel

Output: (none — exits 0 on success)

bash
uv add semantic-kernel

Output: dependency resolved + added to pyproject.toml

bash
poetry add semantic-kernel

Output: updated lockfile + virtualenv install

bash
pip install "semantic-kernel[hugging_face]"
pip install "semantic-kernel[mistralai]"
pip install "semantic-kernel[google]"
pip install "semantic-kernel[ollama]"

Output: SK plus the chosen AI-provider integration

Versioning & Python support

  • The Python package is pre-1.0; minor releases regularly reshape connectors, planners, and agent abstractions. Pin a tight version range and treat upgrades as small migrations.
  • Recent versions support Python 3.10+. The codebase is async-first (asyncio); using SK from sync code requires asyncio.run or nest_asyncio.
  • The C# SDK reached 1.0 first and tends to lead Python on new features (especially the Agents Framework and Process Framework). Python eventually catches up but not always one-to-one.
  • Planners have been deprecated and renamed across versions — Action / Sequential / Stepwise planners in early releases gave way to function-calling planners and the newer Agent abstractions. Code from year-old tutorials often imports symbols that no longer exist.

Package metadata

  • Maintainer: Microsoft (Semantic Kernel team) and community contributors
  • Project home: github.com/microsoft/semantic-kernel
  • Python docs: learn.microsoft.com/semantic-kernel
  • PyPI: pypi.org/project/semantic-kernel
  • License: MIT
  • Governance: Microsoft-led with open contributions; the .NET SDK is the senior sibling
  • First released: 2023
  • Downloads: hundreds of thousands per month on PyPI

Optional dependencies & extras

The Python SDK uses extras to opt in to specific AI providers and connectors. Names and exact set vary across releases — recent versions include roughly:

  • semantic-kernel[hugging_face] — local HuggingFace pipelines as an AI service.
  • semantic-kernel[mistralai] — Mistral AI provider.
  • semantic-kernel[google] — Google Gemini / Vertex AI providers.
  • semantic-kernel[ollama] — local Ollama provider.
  • semantic-kernel[anthropic] — Anthropic Claude provider.
  • semantic-kernel[aws] / semantic-kernel[bedrock] — Amazon Bedrock provider.
  • semantic-kernel[azure] — Azure-specific integrations (Cognitive Search, Cosmos DB, etc.).
  • semantic-kernel[chroma], semantic-kernel[qdrant], semantic-kernel[weaviate], semantic-kernel[redis], semantic-kernel[postgres], semantic-kernel[milvus], semantic-kernel[pinecone] — memory/vector connectors.
  • semantic-kernel[notebooks], semantic-kernel[realtime] — development utilities and the realtime audio agent stack.

OpenAI and Azure OpenAI providers are in the base install (no extra needed).

Common companions:

  • openai — pulled in by default; the canonical chat completion path.
  • azure-identity, azure-search-documents, azure-cosmos — Azure integration glue.
  • mcp — Model Context Protocol client/server SDK; SK has first-class MCP plugin support.
  • pydantic — used heavily for function-calling schemas.

Alternatives

PackageTrade-off
langchainBigger ecosystem and more connectors. Use when you want the widest integration coverage.
llama-indexStronger indexing/retrieval primitives. Use when RAG is the centre of gravity.
haystack-aiExplicit DAG-style pipelines. Use when you want strict typed wiring.
autogen-agentchatMulti-agent conversations. Use for agent-to-agent design.
crewaiRole-based agent crews. Use for narrative multi-agent flows.
dspy-aiProgrammatic prompt optimisation. Use when you want to compile prompts.
.NET Microsoft.SemanticKernelThe senior SDK in C#. Use when your stack is .NET.

Common gotchas

  1. Python lags .NET. Features and stability land in C# first. If a Microsoft-authored blog post shows an SK feature, double-check the Python package version actually exposes it before adopting it in a Python codebase.
  2. Planner deprecations across versions. Action, Sequential, and Stepwise planners from early SK gave way to function-calling planners and (more recently) the Agents Framework. Import paths change between minors — copy from the current samples/ directory, not old tutorials.
  3. Async-only API. Almost everything is async def; running SK from a sync script requires asyncio.run(...) or nest_asyncio inside notebooks. Mixing with sync codebases needs explicit bridges.
  4. openai is a hard dependency even if you only use Azure OpenAI or a non-OpenAI provider — the OpenAI client classes are referenced by base abstractions. Don't strip it from your image to "save space".
  5. Plugins vs functions vs skills. Older SK terminology called collections of functions "skills"; current terminology is "plugins" containing "functions". Both names appear in stale docs.
  6. Memory connectors are evolving. The vector-store interfaces were reworked in 2024-2025 around the new VectorStore abstraction; old MemoryStore-style code is being phased out.
  7. MCP integration is first-class. Plugins can be exposed via Model Context Protocol and SK can consume MCP servers — but you need the mcp PyPI package alongside, and the API surface is still maturing.

Real-world recipes

The recipes below focus on install / connector / async-topology choices — the sections/frameworks/semantic-kernel companion covers the kernel/plugin/planner concepts in depth.

Minimal kernel with Azure OpenAI — the canonical first run. No extras needed; azure-openai and openai come in with the base install.

python
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.functions import KernelArguments

async def main():
    kernel = Kernel()
    kernel.add_service(AzureChatCompletion(
        deployment_name="gpt-4o",
        endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
    ))
    prompt = "Summarise: {{$input}}"
    fn = kernel.add_function(plugin_name="util", function_name="summarise", prompt=prompt)
    result = await kernel.invoke(fn, KernelArguments(input="HNSW is a graph-based ANN..."))
    print(result)

asyncio.run(main())

Output: the prompt is templated with the input variable and sent to Azure OpenAI; everything is async — running this from sync code requires asyncio.run

Native Python plugin (a callable as a kernel function) — plugins are Python objects whose methods are decorated as kernel_function. The kernel can call them, the planner can choose them, and the function-calling LLM can invoke them.

python
from semantic_kernel.functions import kernel_function

class Weather:
    @kernel_function(description="Get current weather for a city", name="get_weather")
    def get_weather(self, city: str) -> str:
        return f"Sunny in {city}."

kernel.add_plugin(Weather(), plugin_name="weather")
result = await kernel.invoke(kernel.get_function("weather", "get_weather"), KernelArguments(city="Berlin"))

Output: the kernel invokes the Python method; the same plugin is automatically usable by function-calling LLMs because of the description= and parameter types

Function-calling planner — modern SK leans on the LLM's own function-calling rather than the older SequentialPlanner / StepwisePlanner. Configure execution settings to enable auto-invocation.

python
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior

settings = OpenAIChatPromptExecutionSettings()
settings.function_choice_behavior = FunctionChoiceBehavior.Auto(filters={"included_plugins": ["weather"]})

response = await kernel.invoke_prompt(
    "What's the weather in Tokyo?",
    arguments=KernelArguments(settings=settings),
)

Output: the LLM decides to call weather.get_weather, the kernel invokes the function, and the LLM composes a final answer using the returned value

MCP plugin — SK can consume any MCP server as a plugin. Requires the mcp PyPI package.

python
from semantic_kernel.connectors.mcp import MCPStdioPlugin

async with MCPStdioPlugin(
    name="my_mcp",
    command="uvx",
    args=["some-mcp-server"],
) as plugin:
    kernel.add_plugin(plugin, plugin_name="my_mcp")
    # tools exposed by the MCP server are now kernel functions

Output: the MCP server's tools become kernel functions; the function-calling LLM can invoke them like any native Python plugin

Memory connector with Qdrant — the new VectorStore abstraction unifies memory connectors across vector DBs.

python
from semantic_kernel.connectors.memory.qdrant import QdrantVectorStore
from semantic_kernel.data import VectorStoreRecordDataField, VectorStoreRecordKeyField, VectorStoreRecordVectorField, vectorstoremodel

@vectorstoremodel
class Doc:
    id: VectorStoreRecordKeyField
    text: VectorStoreRecordDataField
    embedding: VectorStoreRecordVectorField

store = QdrantVectorStore(url="http://qdrant.internal:6333")
collection = store.get_collection(collection_name="kb", data_model_type=Doc)
await collection.upsert(Doc(id="1", text="HNSW", embedding=[0.1] * 384))

Output: a typed vector-store collection with typed records; the same model works against any vector connector that implements VectorStore

Agent (preview Agents Framework) — the SK Agents Framework wraps a kernel + instructions + tools into an Agent abstraction. APIs are still evolving; check the samples for your installed version.

python
from semantic_kernel.agents import ChatCompletionAgent

agent = ChatCompletionAgent(
    service_id="default",
    kernel=kernel,
    name="ResearchAssistant",
    instructions="You research topics and cite sources.",
)
async for message in agent.invoke("Find papers on HNSW"):
    print(message.content)

Output: an agent loop that streams responses; tool calls dispatch through the kernel's plugins automatically

Production deployment

Semantic Kernel is async-first library code; production deployment usually means wrapping it in a FastAPI / aiohttp app or hosting it inside Azure Functions / Container Apps with the kernel constructed per process (not per request).

Topology checklist:

ConcernApproach
Kernel lifetimeper-process; reuse across requests
Async runtimeasyncio (FastAPI / aiohttp); nest_asyncio only for notebooks
ProviderAzure OpenAI (managed identity), OpenAI, or local via Ollama
PluginsPython classes, OpenAPI specs, or MCP servers
Memoryexternal vector DB (Qdrant, Azure AI Search, Cosmos DB, pgvector)
TracingOpenTelemetry (built-in spans) → Application Insights / Jaeger
Secretsenv vars or Azure Key Vault via azure-identity

Azure-native deployment. SK's natural home is the Azure ecosystem:

  • Azure OpenAI as the chat completion service — AzureChatCompletion with azure-identity for managed identity auth (no keys).
  • Azure AI Search as the memory connector (AzureAISearchVectorStore).
  • Cosmos DB for chat history or vector storage.
  • Application Insights as the OTel destination.
python
from azure.identity import DefaultAzureCredential
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

kernel.add_service(AzureChatCompletion(
    deployment_name="gpt-4o",
    endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    ad_token_provider=DefaultAzureCredential().get_token_provider("https://cognitiveservices.azure.com/.default"),
))

Output: authenticates via managed identity in production, falls back to dev credentials locally — no API key in env vars

Non-Azure deployments. SK runs anywhere Python runs. Use OpenAI directly (OpenAIChatCompletion), Anthropic via the anthropic extra, or local models via Ollama. The kernel-and-plugin model is provider-agnostic.

Async only. Almost the entire surface area is async def. From a sync codebase, you bridge with asyncio.run(...) per call or run the kernel in a background event loop. Inside FastAPI, native async endpoints are the natural fit.

OpenTelemetry tracing. SK emits OTel spans for kernel invocations, function calls, and LLM requests. Configure an exporter and the spans flow to your APM unchanged.

Version migration guide

The Python SDK is pre-1.0; minor releases regularly reshape connectors, planners, and agent abstractions. The Python version lags C# on feature parity.

Planner deprecations. Early SK had three planners — ActionPlanner, SequentialPlanner, StepwisePlanner. All have been deprecated in favour of:

  1. Function-calling planner — let the LLM decide which functions to call (FunctionChoiceBehavior.Auto). The default in current SK.
  2. Agents Framework — wraps a kernel + tools + instructions in an Agent abstraction. Currently preview/evolving.
  3. Process Framework — explicit graph of steps; for deterministic multi-step orchestration. Preview.

If you're on year-old tutorial code, expect from semantic_kernel.planners import StepwisePlanner to fail or be marked deprecated.

Plugins vs skills vs functions. Older SK terminology called collections of functions "skills"; current terminology is "plugins" containing "functions". The kernel.add_skill(...) API was renamed kernel.add_plugin(...).

Memory connector reshape. The MemoryStore and MemoryRecord abstractions were replaced by VectorStore, VectorStoreRecord*, and the @vectorstoremodel decorator. The new model is more typed and matches the C# SDK; the old API will be removed.

Connector packages. The set of extras has expanded and renamed across versions. semantic-kernel[chroma], [qdrant], [weaviate], [redis], [postgres], [milvus], [pinecone], plus AI providers [hugging_face], [mistralai], [google], [ollama], [anthropic], [aws]/[bedrock]. Check the package's setup.py for the current list.

Function decorator signature. @kernel_function(name=..., description=...) is current. Older versions used @sk_function and a separate @sk_function_context_parameter decorator. Migrate to the new combined form.

Python vs .NET parity. The .NET SDK reached 1.0 first and continues to lead. If a Microsoft-authored blog post shows an SK feature, verify the Python package version actually exposes it. The Agents Framework, in particular, lags .NET.

Pinning strategy. A reproducible setup pins a tight minor range:

text
semantic-kernel>=1.18,<1.19

Plus the connector/provider extras you use. Read the changelog before bumping minors; renames and deprecations are frequent.

Performance tuning

LeverMechanismWhen it helps
Reuse kernel per processavoid recreating connectorsevery web request
asyncio concurrencyparallel kernel invokesindependent LLM calls
Function-calling over plannersone LLM call instead of Nlatency-sensitive flows
Streaming responseskernel.invoke_prompt_streamUX with progressive output
Local provider (Ollama)no API latencydev iteration
Prompt caching (Anthropic)reuse system promptsrepeated calls

Streaming. Most kernel methods have a streaming counterpart that returns an async iterator of partial chunks:

python
async for chunk in kernel.invoke_prompt_stream("Tell me about HNSW"):
    print(chunk.content, end="", flush=True)

Output: characters print as the LLM generates them; latency-to-first-token drops dramatically vs the non-streaming call

Function-calling vs planner. A planner LLM call typically costs 2–4× a function-calling call (it has to enumerate plans rather than just choose tools). For latency-sensitive flows, function-calling is the cheaper path.

Troubleshooting common errors

  • RuntimeError: asyncio.run() cannot be called from a running event loop — you're calling asyncio.run inside an existing loop (e.g. Jupyter). Use await directly, or nest_asyncio.apply() for notebooks.
  • ImportError: cannot import name 'StepwisePlanner' — old planner removed. Use function-calling via FunctionChoiceBehavior.Auto.
  • AttributeError: 'Kernel' object has no attribute 'add_skill' — renamed to add_plugin.
  • Plugin function not invoked by the LLM — missing description= on @kernel_function. The LLM uses the description to decide which function to call.
  • openai SDK error even though I use Anthropicopenai is a hard dependency of the base install; many internal abstractions reference it. Don't strip it.
  • Connector extras conflict — installing many semantic-kernel[*] extras at once can create incompatible transitive pins. Install only the ones you use, or use a constraints file.
  • MCP plugin hangs — the underlying MCP server didn't start. Test the MCP server independently first (uvx some-mcp-server directly) before wrapping.
  • Function-calling loops forever — LLM repeatedly calls a function that errors. Set a step limit via FunctionChoiceBehavior.Auto(maximum_auto_invoke_attempts=N).

Ecosystem integrations

  • Azure ecosystem — first-class. Azure OpenAI, Azure AI Search, Cosmos DB, Application Insights, Azure Identity all have direct connectors.
  • OpenAI / Anthropic / Google / Mistral / AWS Bedrock / Ollama / HuggingFace — each has a chat completion connector via extras.
  • MCP (Model Context Protocol) — first-class via MCPStdioPlugin / MCPSsePlugin. SK can consume any MCP server as a plugin and expose its own functions over MCP.
  • OpenAPI plugins — point SK at an OpenAPI spec and every operation becomes a kernel function. Useful for wrapping internal REST APIs.
  • LangChain interop — limited; the two frameworks overlap conceptually. Most projects pick one rather than mix.
  • Vector DBs — Chroma, Qdrant, Weaviate, Pinecone, Milvus, Redis, Postgres (pgvector), Azure AI Search via the VectorStore abstraction.
  • .NET / Java siblings — same concepts, different SDKs. Cross-language workflows usually federate through an MCP server.

Security considerations

SK plugins are arbitrary Python functions invoked by an LLM. This is powerful and dangerous in equal measure — the LLM can call anything you expose.

  • Plugin scope. Limit FunctionChoiceBehavior.Auto(filters={"included_plugins": [...]}) to the minimum set of plugins each request needs. Don't expose admin functions to general-purpose chat agents.
  • Function input validation. Plugin functions receive arguments the LLM generated. Validate types and ranges; never eval or pass to a shell without sanitisation.
  • Prompt injection. Retrieved content (memory connectors, RAG) can contain prompt-injection payloads that hijack the function-calling LLM. Use system prompts that explicitly forbid following instructions in retrieved data.
  • MCP plugin trust. When SK consumes an MCP server, that server's tools become callable. Audit every MCP server you connect — the trust model is "MCP server's author has root in your agent".
  • Azure managed identity. Production Azure deployments should use DefaultAzureCredential rather than API keys. Keys leak; managed identities don't.
  • OpenAPI plugin auth. OpenAPI-based plugins authenticate against the wrapped REST API. Use mTLS or short-lived tokens, not long-lived API keys baked into specs.
  • Memory connector secrets. Vector store credentials live in connector config; load from env or Key Vault, never hardcode.
  • Logging of function arguments. OpenTelemetry traces include plugin call arguments — apply PII scrubbing for regulated content.
  • Cost as DoS. A misbehaving function-calling loop can call expensive LLMs forever. Cap with maximum_auto_invoke_attempts and per-user request budgets.

When NOT to use this

Semantic Kernel is the right tool when you're inside the Microsoft/Azure ecosystem and want a plugin-and-planner model. It's the wrong tool when:

  • Pure Python ecosystem. LangChain has broader Python integration coverage and more community examples.
  • RAG is the whole product. LlamaIndex has stronger retrieval primitives.
  • You want explicit DAG-style pipelines. Haystack 2.x has typed-socket wiring that SK doesn't.
  • You're on .NET. Use the .NET SDK directly — it's the senior sibling and leads on features.
  • Multi-agent narrative flows. CrewAI or AutoGen are more agent-shaped; SK's Agents Framework is still maturing.
  • Sync codebase. SK is async-first; mixing into sync code adds friction. A sync library may fit better.
  • You don't want pre-1.0 API churn. SK Python keeps reshaping connectors and planners. If stability is paramount, a 1.0+ framework is calmer.

See also