#evaluation

8 pages tagged evaluation.

8/8

trulens-eval

Package-level reference for trulens-eval on PyPI — install variants, the trulens umbrella rename, framework extras, and alternative evaluators.

05-31-2026#pip#package#ai

ragas

Package-level reference for ragas on PyPI — install variants, LLM-as-judge dependencies, metric churn, and alternative evaluators.

05-31-2026#pip#package#ai

langsmith

Package-level reference for the langsmith SDK on PyPI — install, versioning, env-var setup, and observability alternatives.

05-31-2026#pip#package#llm

Retrieval-Augmented Generation (RAG)

Grounding LLM responses in chunks retrieved from an external corpus so the model reasons over real, citable sources instead of parametric memory alone.

05-25-2026#llm#vector-search#ai

LLM Evaluations

Build production evaluation pipelines for LLM applications — golden datasets, LLM-as-judge, rubrics, statistical significance, regression detection, and evals vs tests.

05-25-2026#evals#evaluation#llm-as-judge

TruLens

Evaluate and monitor LLM applications with TruLens. Covers the RAG triad, feedback functions, TruChain, TruLlama, custom evaluators, the dashboard, and CI integration.

04-27-2026#python#trulens#rag

ragas

Measure and improve RAG pipeline quality with ragas. Covers faithfulness, answer relevancy, context precision, context recall, dataset format, LLM judges, and CI integration.

04-27-2026#python#ragas#rag

LangSmith

Trace, debug, evaluate, and monitor LLM applications with LangSmith. Covers tracing setup, datasets, evaluators, prompt hub, comparing runs, and CI integration.

04-27-2026#python#langsmith#llm