-->
This blog post explores how to leverage Langchain, a powerful framework for building LLM applications, in conjunction with Python to tackle classification problems and manage structured output using LangChain Expression Language (LCEL) and Output Parsers .
Before diving into the concepts, it’s essential to set up the necessary environment. The provided notebook excerpts indicate the installation of key Langchain libraries and OpenAI integration:
%pip install --upgrade --quiet langchain-core
%pip install -qU langchain-openai
%pip install --upgrade "httpx<0.28"
Setting the OpenAI API key is also a crucial step to interact with OpenAI models:
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
We will be using the ChatOpenAI model throughout our examples.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
Output Parsers: Structuring LLM Responses
Large Language Models (LLMs) are capable of generating free-form text, which can be challenging to process programmatically. Output Parsers in Langchain help structure the output of an LLM into a more usable format, such as Python dictionaries or objects .
Let’s start with a simple example of prompting without any explicit output parsing:
response = model.invoke("Give me a list of 4 tourist locations in Chicago")
print(response.content)
This will give us a text-based list. To get more structured data, we can utilize Pydantic models and output parsers.
Pydantic is a Python library for data validation and parsing. We can define Pydantic models to represent the desired structure of our LLM output. For instance, to get a list of tourist locations with a name, description, and suggested time to spend, we can define the following Pydantic models:
from pydantic import BaseModel, Field
class Location(BaseModel):
name: str = Field(description="Name of the location to visit")
description: str = Field(description="Brief description of the location")
time_to_be_spent: str = Field(description="Time I should spent")
class LocationsList(BaseModel):
locations: list[Location] = Field(description="List of locations")
Langchain provides the JsonOutputParser to parse the LLM output into a JSON format that conforms to a specified Pydantic model.
from langchain_core.output_parsers import JsonOutputParser
parser = JsonOutputParser(pydantic_object=LocationsList)
We can then use this parser with a PromptTemplate to instruct the LLM to generate output in the desired JSON format:
from langchain_core.prompts import PromptTemplate
prompt_template = PromptTemplate(
template="Give me a list of {number} tourist locations in {city}\n{format_instructions}",
input_variables=["number", "city"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
prompt = prompt_template.invoke({"city": "Chicago", "number": 7})
response = model.invoke(prompt)
print(parser.parse(response.content))
The parser.get_format_instructions()
provides instructions to the LLM on how the output should be formatted according to the Pydantic model.
LCEL provides a declarative way to chain together different components of a Langchain application. It allows you to build complex workflows by connecting prompts, models, and output parsers using the pipe operator ”|”. The previous example can be simplified using LCEL:
chain = prompt_template | model | parser
response = chain.invoke({"number": 7, "city": "Chicago"})
print(response)
This concise syntax makes it easier to define and manage Langchain pipelines.
Langchain can be effectively used to solve classification problems, such as sentiment analysis. We can define a Pydantic model to represent the classification categories and use an output parser to get structured classification results.
Let’s consider a sentiment analysis task where we want to classify a piece of text based on its sentiment, aggressiveness, and language:
from langchain_core.prompts import ChatPromptTemplate
tagging_prompt = ChatPromptTemplate.from_template(
"""Extract the desired information from the following passage.
Only extract the properties mentioned in the 'Classification' function.
Passage: {input}
{format_instructions} """
)
class Classification(BaseModel):
sentiment: str = Field(description="The sentiment of the text", enum=["happy", "negative"])
aggressiveness: int = Field(
description="How aggressive the text is? higher the number, more aggressive it is.",
enum=[1-3, 6, 9]
)
language: str = Field(description="The language the text is written in", enum=["Hindi", "English", "Unknown"])
parser = JsonOutputParser(pydantic_object=Classification)
Now, we can create an LCEL chain to perform the sentiment analysis:
chain = tagging_prompt | model | parser
response = chain.invoke({"input": "你是個⽩痴。我不喜歡你。", "format_instructions": parser.get_format_instructions()})
print(response)
This will output a structured JSON response containing the sentiment, aggressiveness level, and language of the input text:
{'sentiment': 'negative', 'aggressiveness': 4, 'language': 'Unknown'}
Langchain, with its powerful features like LCEL for building modular pipelines and Output Parsers for structuring LLM responses using Pydantic, provides an excellent framework for tackling various language-based tasks, including classification. By defining clear output structures and leveraging the expressiveness of LCEL, developers can build robust and manageable LLM applications in Python .