Getting Started with Evals - a speedrun through Braintrust
For software engineers struggling with LLM application performance, simple evaluations are your secret weapon. Forget the complexity — we'll show you how to start testing your LLM in just 5 minutes using Braintrust. By the end of this article, you'll have a working example of a test harness that you can easily customise for your own use cases.
We'll be using a cleaned version of the GSM8k dataset that you can find here.
Here's what we'll cover:
- Setting up Braintrust
- Writing our first task to evaluate an LLM's response to the GSM8k with Instructor
- Simple recipes that you'll need