#evaluation
8 pages tagged evaluation.
trulens-eval
Package-level reference for trulens-eval on PyPI — install variants, the trulens umbrella rename, framework extras, and alternative evaluators.
ragas
Package-level reference for ragas on PyPI — install variants, LLM-as-judge dependencies, metric churn, and alternative evaluators.
langsmith
Package-level reference for the langsmith SDK on PyPI — install, versioning, env-var setup, and observability alternatives.
Retrieval-Augmented Generation (RAG)
Grounding LLM responses in chunks retrieved from an external corpus so the model reasons over real, citable sources instead of parametric memory alone.
LLM Evaluations
Build production evaluation pipelines for LLM applications — golden datasets, LLM-as-judge, rubrics, statistical significance, regression detection, and evals vs tests.
TruLens
Evaluate and monitor LLM applications with TruLens. Covers the RAG triad, feedback functions, TruChain, TruLlama, custom evaluators, the dashboard, and CI integration.
ragas
Measure and improve RAG pipeline quality with ragas. Covers faithfulness, answer relevancy, context precision, context recall, dataset format, LLM judges, and CI integration.
LangSmith
Trace, debug, evaluate, and monitor LLM applications with LangSmith. Covers tracing setup, datasets, evaluators, prompt hub, comparing runs, and CI integration.