Skip to content

Braintrust

Getting Started with Evals - a speedrun through Braintrust

For software engineers struggling with LLM application performance, simple evaluations are your secret weapon. Forget the complexity — we'll show you how to start testing your LLM in just 5 minutes using Braintrust. By the end of this article, you'll have a working example of a test harness that you can easily customise for your own use cases.

We'll be using a cleaned version of the GSM8k dataset that you can find here.

Here's what we'll cover:

  1. Setting up Braintrust
  2. Writing our first task to evaluate an LLM's response to the GSM8k with Instructor
  3. Simple recipes that you'll need