Reinventing Gandalf
Introduction
The code for the challenge can be found here
A while ago, a company called Lakera released a challenge called Gandalf on Hacker News which took the LLM community by storm. The premise was simple - get a LLM that they had built to reveal a password. This wasn't an easy task and many people spent days trying to crack it.
Some time after their challenge had been relased, they were then kind enough to release both the solution AND a rough overview of how the challenge was developed. You can check it out here. Inspired by this, I figured I'd try to reproduce it to some degree on my own in a challenge I called The Chinese Wall with Peter Mekhaeil for our annual company's coding competition. We will be releasing the code shortly.
Participants were asked to try and extract a password from a LLM that we provided. We also provided a discord bot that was trained on the challenge documentation which participants could use to ask questions to.
Here's a quick snapshot of it in action
The model uses Open AI's GPT 3.5 under the hood with the instructor library for function calls.