Cookbook
This cookbook, inspired by OpenAI's cookbook, is a collection of recipes for common use cases of Braintrust. Each recipe is an open source self-contained example, hosted on GitHub. We welcome community contributions and aspire for the cookbook to be a collaborative, living, breathing collection of best practices for building high quality AI products.
python
Evaluating the precision and recall of an emotion classifier

Adrian Barbir
Jan 17, 2025recallprecisionevalsclassifierpython
typescript
Evaluating audio with the OpenAI Realtime API

Ornella Altunyan
Dec 14, 2024evalstoolsaudio
python
Evaluating SimpleQA


Ankur Goyal, Ornella Altunyan
Dec 6, 2024datasetsevals
typescript
Using Python functions to extract text from images

Ornella Altunyan
Nov 22, 2024pythontoolsocrfunctions
typescript
Using OpenTelemetry for LLM observability

Ornella Altunyan
Oct 31, 2024evalstools
typescript
Using functions to build a RAG agent


Ornella Altunyan, Ankur Goyal
Oct 8, 2024functionsragtools
python
Evaluating multimodal receipt extraction

Ankur Goyal
Sep 30, 2024evalsmultimodalreceipts
typescript
Unreleased AI: A full stack Next.js app for generating changelogs

Ornella Altunyan
Aug 28, 2024evalsloggingnext.js
python
An agent that runs OpenAPI commands

Ankur Goyal
Aug 12, 2024agentragevals
typescript
Benchmarking inference providers

Ankur Goyal
Jul 29, 2024evalsllama-3.1providers
typescript
Tool calls in LLaMa 3.1

Ankur Goyal
Jul 26, 2024evalsllama-3.1tools
typescript
Evaluating a chat assistant

Tara Nagar
Jul 16, 2024evalschat
python
LLM Eval For Text2SQL

Ankur Goyal
May 29, 2024evalsdatasetstext2sql
python
Optimizing Ragas to evaluate a RAG pipeline


Ankur Goyal, Nelson Auner
May 27, 2024evalsrag
typescript
Comparing evals across multiple AI models

John Huang
May 22, 2024evalscharts
python
Detecting Prompt Injections

Nelson Auner
May 20, 2024evalsclassification
python
AI Search Bar

Austin Moehle
Mar 4, 2024evalssql
typescript
How Zapier uses assertions to evaluate tool usage in chatbots

Vítor Balocco
Feb 13, 2024evalsassertionstools
typescript
Generating release notes and hill-climbing to improve them

Ankur Goyal
Feb 2, 2024evalshill-climbing
typescript
Generating beautiful HTML components

Ankur Goyal
Jan 29, 2024loggingdatasetsevals
python
Coda's Help Desk with and without RAG


Austin Moehle, Kenny Wong
Dec 21, 2023evalsrag
typescript
Improving Github issue titles using their contents

Ankur Goyal
Oct 29, 2023evalssummarization
python
Classifying news articles

David Song
Sep 1, 2023evalsclassification
python
Text-to-SQL

Ankur Goyal
Aug 12, 2023evalssql