Tutorial 2: Classification, LCEL and Output parsers using LangChain and Python

Youtube video

Github link

This blog post explores how to leverage Langchain, a powerful framework for building LLM applications, in conjunction with Python to tackle classification problems and manage structured output using LangChain Expression Language (LCEL) and Output Parsers .

Setting up the Environment

Before diving into the concepts, it’s essential to set up the necessary environment. The provided notebook excerpts indicate the installation of key Langchain libraries and OpenAI integration:

%pip install --upgrade --quiet langchain-core
%pip install -qU langchain-openai
%pip install --upgrade "httpx<0.28"

Setting the OpenAI API key is also a crucial step to interact with OpenAI models:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

We will be using the ChatOpenAI model throughout our examples.

from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
Output Parsers: Structuring LLM Responses

Large Language Models (LLMs) are capable of generating free-form text, which can be challenging to process programmatically. Output Parsers in Langchain help structure the output of an LLM into a more usable format, such as Python dictionaries or objects .

Basic Prompting

Let’s start with a simple example of prompting without any explicit output parsing:

response = model.invoke("Give me a list of 4 tourist locations in Chicago")
print(response.content)

This will give us a text-based list. To get more structured data, we can utilize Pydantic models and output parsers.

Structured Output with Pydantic

Pydantic is a Python library for data validation and parsing. We can define Pydantic models to represent the desired structure of our LLM output. For instance, to get a list of tourist locations with a name, description, and suggested time to spend, we can define the following Pydantic models:

from pydantic import BaseModel, Field

class Location(BaseModel):
    name: str = Field(description="Name of the location to visit")
    description: str = Field(description="Brief description of the location")
    time_to_be_spent: str = Field(description="Time I should spent")

class LocationsList(BaseModel):
    locations: list[Location] = Field(description="List of locations")
Langchain provides the JsonOutputParser to parse the LLM output into a JSON format that conforms to a specified Pydantic model.
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=LocationsList)

We can then use this parser with a PromptTemplate to instruct the LLM to generate output in the desired JSON format:

from langchain_core.prompts import PromptTemplate

prompt_template = PromptTemplate(
    template="Give me a list of {number} tourist locations in {city}\n{format_instructions}",
    input_variables=["number", "city"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
prompt = prompt_template.invoke({"city": "Chicago", "number": 7})
response = model.invoke(prompt)
print(parser.parse(response.content))

The parser.get_format_instructions() provides instructions to the LLM on how the output should be formatted according to the Pydantic model.

LangChain Expression Language (LCEL)

LCEL provides a declarative way to chain together different components of a Langchain application. It allows you to build complex workflows by connecting prompts, models, and output parsers using the pipe operator ”|”. The previous example can be simplified using LCEL:

chain = prompt_template | model | parser
response = chain.invoke({"number": 7, "city": "Chicago"})
print(response)

This concise syntax makes it easier to define and manage Langchain pipelines.

Classification Problems with Langchain

Langchain can be effectively used to solve classification problems, such as sentiment analysis. We can define a Pydantic model to represent the classification categories and use an output parser to get structured classification results.

Sentiment Analysis Example

Let’s consider a sentiment analysis task where we want to classify a piece of text based on its sentiment, aggressiveness, and language:

from langchain_core.prompts import ChatPromptTemplate

tagging_prompt = ChatPromptTemplate.from_template(
    """Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.
Passage: {input}
{format_instructions} """
)

class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text", enum=["happy", "negative"])
    aggressiveness: int = Field(
        description="How aggressive the text is? higher the number, more aggressive it is.",
        enum=[1-3, 6, 9]
    )
    language: str = Field(description="The language the text is written in", enum=["Hindi", "English", "Unknown"])

parser = JsonOutputParser(pydantic_object=Classification)

Now, we can create an LCEL chain to perform the sentiment analysis:

chain = tagging_prompt | model | parser
response = chain.invoke({"input": "你是個⽩痴。我不喜歡你。", "format_instructions": parser.get_format_instructions()})
print(response)

This will output a structured JSON response containing the sentiment, aggressiveness level, and language of the input text:

{'sentiment': 'negative', 'aggressiveness': 4, 'language': 'Unknown'}

Conclusion

Langchain, with its powerful features like LCEL for building modular pipelines and Output Parsers for structuring LLM responses using Pydantic, provides an excellent framework for tackling various language-based tasks, including classification. By defining clear output structures and leveraging the expressiveness of LCEL, developers can build robust and manageable LLM applications in Python .