Generation
The process of creating or producing a response.
Generation
The process of creating or producing a response.
Augmented
To enhance, increase, or add to something.
Retrieval
The act of searching for and finding relevant information from a large, external, and non-static data source (your knowledge base).
grounded
The integration step ensures the answer is grounded, meaning it is based on the verifiable facts provided in the retrieved context, not solely on the LLM's general, potentially outdated, or hallucinated pre-trained knowledge.
Finite context — they can’t ingest entire corpora at once.
the context window is the finite amount of tokens (words or word fragments) an LLM can actively see and process in a single pass to generate a response.
Static knowledge — their training data is frozen at a point in time.
This leads to hallucinations (fabricating plausible but false information) and outdated answers when asked about recent topics or proprietary, domain-specific data.
Embeddings: Wrapper around a text embedding model, used for converting text to embeddings.
This component acts as the translator that converts human-readable text (like a document chunk or a user's question) into a numerical vector (an embedding).
We can embed and store all of our document splits in a single command
VectorStore.from_documents()
Define complex inputs with Pydantic models or JSON schemas:
IMPORTANT
Chat models accept a sequence of message objects as input and return an AIMessage as output. Interactions are often stateless, so that a simple conversational loop involves invoking a model with a growing list of messages.
The raw Large Language Model (LLM) is stateless—it resets completely after every API call, possessing no inherent memory of past interactions. To create the "illusion of state" and maintain a conversation, the application developer must manually manage the entire dialogue history and all necessary context, packaging it into a new, comprehensive prompt that fits within the LLM's finite context window for every single turn.
Message prompts
The best way
Messages are the fundamental unit of context for models in LangChain. They represent the input and output of models, carrying both the content and metadata needed to represent the state of a conversation when interacting with an LLM.
Messages are the building blocks that allow LangChain to maintain the state of a conversation, which is necessary for multi-turn chat applications.
A Message doesn't just carry the raw text (content); it also carries crucial metadata (like the message type—HumanMessage, AIMessage, etc.) that tells the LLM how to interpret it.
When using a model separately from an agent, it is up to you to execute the requested tool and return the result back to the model for use in subsequent reasoning.
Model Suggestion: The LLM's initial call returns an AIMessage containing the suggestion to use a specific tool (the tool_calls object).
Developer Action (Execution): The developer's code must intercept this message, parse the tool name and arguments, and manually execute the corresponding Python function.
Result Feedback: The developer must then package the output of the tool execution into a ToolMessage and send it back to the Model, along with the previous conversation history, for the Model to complete its final reasoning and generate the answer.
The easiest way to get started with a standalone model in LangChain is to use init_chat_model to initialize one from a chat model provider of your choice
For most new projects focused on universality and best practice, init_chat_model() is the recommended and modern approach because it promotes provider agnosticism and reduces boilerplate code.
Agents follow the ReAct (“Reasoning + Acting”) pattern
An Agent often requires multiple calls to the LLM (Thought, Action, Observation, Thought, etc.) to complete a task. Each call incurs cost and latency.1
Tools give agents the ability to take actions. Agents go beyond simple model-only tool binding by facilitating: Multiple tool calls in sequence (triggered by a single prompt) Parallel tool calls when appropriate Dynamic tool selection based on previous results Tool retry logic and error handling State persistence across tool calls
When you bind tools directly to a Model, the model makes a single, stateless decision. It suggests the best tool for the immediate prompt and then stops.
The Agent, however, uses its loop (often ReAct: Reason, Act, Observe) to execute complex strategies
An LLM Agent runs tools in a loop to achieve a goal. An agent runs until a stop condition is met - i.e., when the model emits a final output or an iteration limit is reached.
The difference lies in autonomy and execution flow: A Model with Tools (via direct binding/function calling) is a single, stateless step where the LLM merely suggests the best tool and its arguments, requiring the developer to manually execute the tool and initiate any subsequent calls. In contrast, an Agent with Tools leverages an Agent Executor to manage a dynamic, multi-step loop (e.g., ReAct), where the LLM acts as the planner, deciding which tool to call next, and the Executor automatically runs the tool, feeds the observation back to the model, and repeats the cycle until the complex, multi-step goal is autonomously achieved.