How does Instructor work?

For Python developers working with large language models (LLMs), instructor has become a popular tool for structured data extraction. While its capabilities may seem complex, the underlying mechanism is surprisingly straightforward. In this article, we'll walk through a high level overview of how the library works and how we support the OpenAI Client.

We'll start by looking at

Why should you care about Structured Extraction?
What is the high level flow
How does a request go from Pydantic Model to Validated Function Call?

By the end of this article, you'll have a good understand of how instructor helps you get validated outputs from your LLM calls and a better understanding of how you might be able to contribute to the library yourself.

Why should you care?

For developers integrating AI into production systems, structured outputs are crucial. Here's why:

Validation Reliability : As AI-driven data extraction becomes more complex, manually crafting validation logic grows increasingly error-prone. Structured outputs provide a robust framework for ensuring data integrity, especially for nested or intricate data structures.
System Integration : Incorporating AI-generated content into existing infrastructure demands predictable, well-defined output formats. Structured outputs act as a bridge, allowing seamless integration of AI capabilities with established systems and workflows.

By leveraging tools that enforce structured outputs, developers can harness the power of AI while maintaining control over data quality and system reliability. This approach not only streamlines development but also enhances the robustness of AI-driven applications.

In short, structured outputs transform unvalidated LLM calls into validated functions with type signatures that behave exactly as a normal python function, albeit with some level of probablistic behaviour.

High Level Flow

Let's look at the Getting Started example from the docs and see how it works. In this article, we'll only be looking at the synchronous implementation of the chat.completions.create function.

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30

A few things are happening here

We define our desired output structure using Pydantic
We wrap our client in a from_openai function that returns a client with the same interface but patched with our new functionality.
We then make a call like we normally do to OpenAI's API, but with the exception of the a new response_model parameter.
Magically we get our output?

That to me was an incredible experience compared to something like Langchain which abstracted a significant amount of inner workings away that made customisation difficult. Now that we've seen how it works on an API level, let's look at what the library does under the hood.

Parsing Responses and Handling Errors

Let's try to answer a few questions here:

What does the from_openai function do?
How does the Pydantic response_model keyword argument get used?
What happens to the response from the LLM and how is it validated when we use response_model?

The `from_openai` function

We can see the code for the from_openai function here where it takes in two main arguments - client and Mode. These Mode enums are how we switch between different modes of interaction with the OpenAI client itself.

class Mode(enum.Enum):
    """The mode to use for patching the client"""

    FUNCTIONS = "function_call"
    PARALLEL_TOOLS = "parallel_tool_call"
    TOOLS = "tool_call"
    MISTRAL_TOOLS = "mistral_tools"
    JSON = "json_mode"
    MD_JSON = "markdown_json_mode"
    JSON_SCHEMA = "json_schema_mode"
    ANTHROPIC_TOOLS = "anthropic_tools"
    ANTHROPIC_JSON = "anthropic_json"
    COHERE_TOOLS = "cohere_tools"
    VERTEXAI_TOOLS = "vertexai_tools"
    VERTEXAI_JSON = "vertexai_json"
    GEMINI_JSON = "gemini_json"
    GEMINI_TOOLS = "gemini_tools"
    COHERE_JSON_SCHEMA = "json_object"
    TOOLS_STRICT = "tools_strict"

For OpenAI, we have the following tools

FUNCTIONS - This was the previous method of calling OpenAI functions that's been deprecated
TOOLS_STRICT - This is the current Tool Calling mode that uses Structured Outputs
TOOLS - This is how we call OpenAI tools and is the default mode for the OpenAI client
JSON - This is when we manually prompt the LLM to return JSON and then parse it using a JSON loader.

def from_openai(
    client: openai.OpenAI | openai.AsyncOpenAI,
    mode: instructor.Mode = instructor.Mode.TOOLS,
    **kwargs: Any,
) -> Instructor | AsyncInstructor:
    # Other Validation Log Here

    if isinstance(client, openai.OpenAI):
        return Instructor(
            client=client,
            create=instructor.patch(create=client.chat.completions.create, mode=mode),
            mode=mode,
            provider=provider,
            **kwargs,
        )

    if isinstance(client, openai.AsyncOpenAI):
        return AsyncInstructor(
            client=client,
            create=instructor.patch(create=client.chat.completions.create, mode=mode),
            mode=mode,
            provider=provider,
            **kwargs,
        )

We can see here that when we use the from_openai function, we get a new Instructor that has been patched with our desired mode. What's this .patch function doing? In short, it's really helping us to create a new function that wraps the original client.chat.completions.create function that exists on the Instructor class that we've now obtained from the from_openai function.

def patch(
    client: Union[OpenAI, AsyncOpenAI] = None,
    create: Callable[T_ParamSpec, T_Retval] = None,
    mode: Mode = Mode.TOOLS,
) -> Union[OpenAI, AsyncOpenAI]:
    # ... Validation Logic

    @wraps(func)
    def new_create_sync(
        response_model: type[T_Model] = None,
        validation_context: dict = None,
        max_retries: int = 1,
        strict: bool = True,
        *args: T_ParamSpec.args,
        **kwargs: T_ParamSpec.kwargs,
    ) -> T_Model:
        response_model, new_kwargs = handle_response_model(
            response_model=response_model, mode=mode, **kwargs
        )
        response = retry_sync(
            func=func,
            response_model=response_model,
            validation_context=validation_context,
            max_retries=max_retries,
            args=args,
            strict=strict,
            kwargs=new_kwargs,
            mode=mode,
        )
        return response

    new_create = new_create_async if func_is_async else new_create_sync

    if client is not None:
        client.chat.completions.create = new_create
        return client
    else:
        return new_create

The key insight here is that the magic happens with

handle_response_model - This is where we do a lot of the heavy lifting. We use the response_model to convert your Pydantic class into a OpenAI Schema compatible format.
retry_sync - This is where we handle the retry logic. We use the max_retries to retry the function call if it fails.

How does the Pydantic `response_model` keyword argument get used?

Let's first look at the code for the handle_response_model function here

def handle_response_model(
    response_model: type[T] | None, mode: Mode = Mode.TOOLS, **kwargs: Any
) -> tuple[type[T], dict[str, Any]]:
    """Prepare the response model type hint, and returns the response_model
    along with the new modified kwargs needed to be able to use the response_model
    parameter with the patch function.


    Args:
        response_model (T): The response model to use for parsing the response
        mode (Mode, optional): The openai completion mode. Defaults to Mode.TOOLS.

    Raises:
        NotImplementedError: When using stream=True with a non-iterable response_model
        ValueError: When using an invalid patch mode

    Returns:
        Union[Type[OpenAISchema], dict]: The response model to use for parsing the response
    """
    new_kwargs = kwargs.copy()

    # Other Provider Logic
    if not issubclass(response_model, OpenAISchema):
        response_model = openai_schema(response_model)  # type: ignore

    # Other Logic
    elif mode in {Mode.TOOLS, Mode.MISTRAL_TOOLS}:
            new_kwargs["tools"] = [
                {
                    "type": "function",
                    "function": response_model.openai_schema,
                }
            ]
            if mode == Mode.MISTRAL_TOOLS:
                new_kwargs["tool_choice"] = "any"
            else:
                new_kwargs["tool_choice"] = {
                    "type": "function",
                    "function": {"name": response_model.openai_schema["name"]},
                }


    # Other Logic

    return response_model, new_kwargs

We can see here that we've converted the response_model into a format that's compatible with the OpenAI API. This is where the openai_schema function is called. This function is responsible for converting your Pydantic class into a format that's compatible with the OpenAI API, code can be found here

class OpenAISchema(BaseModel):
    # Ignore classproperty, since Pydantic doesn't understand it like it would a normal property.
    model_config = ConfigDict(ignored_types=(classproperty,))

    @classproperty
    def openai_schema(cls) -> dict[str, Any]:
        """
        Return the schema in the format of OpenAI's schema as jsonschema

        Note:
            Its important to add a docstring to describe how to best use this class, it will be included in the description attribute and be part of the prompt.

        Returns:
            model_json_schema (dict): A dictionary in the format of OpenAI's schema as jsonschema
        """
        schema = cls.model_json_schema()
        docstring = parse(cls.__doc__ or "")
        parameters = {
            k: v for k, v in schema.items() if k not in ("title", "description")
        }
        for param in docstring.params:
            if (name := param.arg_name) in parameters["properties"] and (
                description := param.description
            ):
                if "description" not in parameters["properties"][name]:
                    parameters["properties"][name]["description"] = description

        parameters["required"] = sorted(
            k for k, v in parameters["properties"].items() if "default" not in v
        )

        if "description" not in schema:
            if docstring.short_description:
                schema["description"] = docstring.short_description
            else:
                schema["description"] = (
                    f"Correctly extracted `{cls.__name__}` with all "
                    f"the required parameters with correct types"
                )

        return {
            "name": schema["title"],
            "description": schema["description"],
            "parameters": parameters,
        }

def openai_schema(cls: type[BaseModel]) -> OpenAISchema:
    if not issubclass(cls, BaseModel):
        raise TypeError("Class must be a subclass of pydantic.BaseModel")

    schema = wraps(cls, updated=())(
        create_model(
            cls.__name__ if hasattr(cls, "__name__") else str(cls),
            __base__=(cls, OpenAISchema),
        )
    )
    return cast(OpenAISchema, schema)

With this function, we're able to take our original Pydantic class and convert it to a function call that looks something like this.

{
    "name": "UserInfo",
    "description": "A user info object",
    "parameters": {
        "name": {
            "type": "string",
            "description": "The name of the user"
        },
        "age": {
            "type": "int",
            "description": "The age of the user"
        }
    }
}

We then customise the specific kwargs that we'll then be passing into the OpenAI API to call a function that matches the exact Pydantic class we've defined.

new_kwargs["tools"] = [
    {
        "type": "function",
        "function": response_model.openai_schema,
    }
]
if mode == Mode.MISTRAL_TOOLS:
    new_kwargs["tool_choice"] = "any"
else:
    new_kwargs["tool_choice"] = {
        "type": "function",
        "function": {"name": response_model.openai_schema["name"]},
    }

How does the response from the LLM get validated?

Now that we've seen how the response_model is used, let's look at how the response from the LLM is validated in the retry_sync function here

It really is a

for i in max_retries:
    try:
        call_openai_with_new_arguments(**kwargs)
    except validation fails as e:
        update the kwargs with the new errors ( keep appending to the messages the generated content + validation errors )

You'll see this from the code snippet below

def retry_sync(
    func: Callable[T_ParamSpec, T_Retval],
    response_model: type[T_Model],
    validation_context: dict,
    args,
    kwargs,
    max_retries: int | Retrying = 1,
    strict: bool | None = None,
    mode: Mode = Mode.TOOLS,
) -> T_Model:

    # Compute some stuff

    try:
        response = None
        for attempt in max_retries:
            with attempt:
                try:
                    response = func(*args, **kwargs)
                    stream = kwargs.get("stream", False)

                    return process_response(
                        response,
                        response_model=response_model,
                        stream=stream,
                        validation_context=validation_context,
                        strict=strict,
                        mode=mode,
                    )
                except (ValidationError, JSONDecodeError) as e:
                    if <condition unrelated to TOOL calling with OpenAI>:
                        raise e
                    else:
                        kwargs["messages"].extend(reask_messages(response, mode, e))

Resk messages themselves aren't anything special, for tool calling, we're literally just appending the response from the LLM and the validation_context to the messages and calling the LLM again as you can see here

def reask_messages(response: ChatCompletion, mode: Mode, exception: Exception):
    # other Logic
    if mode in {Mode.TOOLS, Mode.TOOLS_STRICT}:
        for tool_call in response.choices[0].message.tool_calls:
            yield {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": tool_call.function.name,
                "content": f"Validation Error found:\n{exception}\nRecall the function correctly, fix the errors",
            }

This updates the messages with the validated errors that we've been passing into the OpenAI API and then we call the LLM again. Eventually either we get the validated response that we care about or we hit the max retry limit and raise an error.

Why you probably shouldn't roll your own

I hope this article has shed some light on the inner workings of Instructor and how it's able to provide a seamless experience for structured outputs. If anything, I hope it helps you understand and think about how you might be able to contribute to the library yourself in the future.

While it might be tempting to implement your own solution, there are several challenges to consider:

Constantly tracking updates to different LLM providers can be time-consuming and difficult to maintain.
Implementing your own streaming support, partial responses, and iterable handling is technically challenging and prone to errors.

Instead of reinventing the wheel, using a validated library like Instructor allows you to focus on what truly matters - building your LLM application. By leveraging a robust, well-maintained solution, you can save time, reduce errors, and stay up-to-date with the latest developments in the rapidly evolving field of LLMs.