Skip to content

Index

Getting Started with Evals - a speedrun through Braintrust

For software engineers struggling with LLM application performance, simple evaluations are your secret weapon. Forget the complexity — we'll show you how to start testing your LLM in just 5 minutes using Braintrust. By the end of this article, you'll have a working example of a test harness that you can easily customise for your own use cases.

We'll be using a cleaned version of the GSM8k dataset that you can find here.

Here's what we'll cover:

  1. Setting up Braintrust
  2. Writing our first task to evaluate an LLM's response to the GSM8k with Instructor
  3. Simple recipes that you'll need

How to create synthetic data that works

Synthetic data can accelerate AI development, but generating high-quality datasets remains challenging. In this article, I'll walk through a few experiments I've done with synthetic data generation and the takeaways I've learnt so that you can do the same.

We'll do by covering

  1. Limitations of simple generation methods : Why simple generation methods produce homogeneous data
  2. Entropy and why it matters : Techniques to increase diversity in synthetic datasets
  3. Practical Implementations : Some simple examples of how to increase entropy and diversity to get better synthetic data

AI Engineering World Fair

What's new?

Last year, we saw a lot of interest in the use of LLMs for new use cases. This year, with more funding and interest in the space, we've finally started thinking about productionizing these models at scale and making sure that they're reliable, consistent and secure.

Let's start with a few definitions

  • Agent : This is a LLM which is provided with a few tools it can call. The agentic part of this system comes from the ability to make decisions based on some input. This is similar to Harrison Chase's article here

  • Evaluations : A set of metrics that we can look at to understand where our current system falls short. An example could be measuring precision and recall.

  • Synthethic Data Generation: Data generated by a LLM which is meant to mimic real data

Grokking LLMs

I've spent the last year working with LLMs and writing a good amount of technical content on how to use them effectively, mostly with the help of structured parsing using a framework like Instructor. Most of what I know now is self-taught and this is the guide that I wish I had when starting out.

It should take about 10-15 minutes at most to read and I've added some resources along the way that are relevant to you. If you're looking for a higher level, i suggest skimming over the first two sections and then focusing more on the application/data side of things!

I hope that after reading this essay, you walk away with an enthusiasm that these models are going to change so much things that we know today. We have models with reasoning abilities and knowledge capacities that dwarf many humans today in tasks such as Mathetical Reasoning, QnA and more.

Introduction

It's really fun to create your own tools. With some extra time on my hands this weekend, I decided to work on building a small tool that would solve a problem i'd been facing for some time - converting wikilinks to relative links.

For those who are unaware, when you work in tools like Obsidian, the default tends to be wikilinks that look like this [[wiki-link]]. This is great if you're only using obsidian but limits the portability of your markdown script itself. For platforms such as Github, the lack of absolute links means that you can't easily click and navigate between markdown files on their web platform.

Writing scripts that scale

Writing good scripts for machine learning is an art. I struggled with writing them for a long time because of how different it was to my experience working with full-stack frameworks such as React or FastAPI.

There were four main issues that I struggled with

  1. My job has a high probability of failing without any reason
  2. My data might not fit into memory for no reason
  3. Running a single job takes days or more
  4. Optimizing hyper-parameters is genuinely difficult

Everything I've learnt about writing good Python code

In the past 6 months, I've 10xed the amount of python code I've written. In this article, I'll show you a few easy actionable tips to write better and more maintainable code. I've been lucky enough to have Jason (@jxnlco on twitter) review a good chunk of my code and I've found that these few things have made a massive difference in my code quality.

  1. using the @classmethod decorator
  2. learn the stdlib
  3. write simpler functions
  4. being a bit lazier - earn the abstraction
  5. decouple your implementation

Learning with Adult Responsibilities

Introduction

Over the past 6 months, I've been trying to learn more about AI and LLMs. ChatGPT had me hooked when I tried it for the first time. Over the course of this period, I've been chatting to more people, shitposting on twitter and working to learn as much as I can in my spare time.

That amounts to roughly 10-20 hours a week since I don't have much of a social life which has been about 4-500 hours in total since the time I started exploring this space so take my experience with a grain of salt. I'm relatively new and you're probably 2-3 months behind me at most, much less if you do it full time.

I've had some people reach out to me for advice on what to do and I figured I'd write a longer blog post so that I could refer to it myself and consolidate some of my ramblings.

GPT-React

Introduction

The full code for this is avaliable here for reference.

A while ago, I saw a demo video of Vercel's V0 and was blown away by what it could produce. It could take in user prompts, feedback and iteratively generate new and improved UI code using the popular @shadcn/ui library.

This was soon followed by the open-v0 project by raidendotai. Since I didn't have access to v0 via vercel, i figured I would clone the project and try to figure out how it worked.

One eventful friday evening later, I ended up putting together a small prototype which uses context-aware RAG and pydantic to generate valid NextJS Code based on a user prompt which you can see below.

The Gif renders pretty slowly for some reason so if you want to see the original clip, you can check it out here

A guide to RWKV V3

Introduction

RWKV is an alternative to the transformer architecture. It's open source and has it's own paper over here. I found out about it sometime back in a paper club and thought i'd write a short article about it with what I had learnt.

Here are some other resources which you might find useful about RWKVs

  • RKWV by Picocreator This is a markdown file that was used by one of the contributors - Picocreator to give a short presentation on the RWKV architecture.

  • RKWV in 100 lines Which covers the implementation of RWKV in 100 lines of code. Much of this article is based off the content here - I try to extend and provide my own intuition for some proofs. I've also attached a colab notebook for you if you want to play with the code.