The concept of RAG is relatively straightforward. It involves two main components: a document retriever and a large language model (LLM). The document retriever is responsible for finding relevant information from a large corpus of documents based on the input question using semantic search. This information is then passed to the LLM, which generates a response. The unique aspect of RAG is the way it combines these two components. Instead of retrieving documents and then generating a response in two separate steps, RAG uses a joint process where the document retrieval and response generation steps are connected. This allows the model to consider multiple documents simultaneously when generating a response, leading to more accurate and contextually relevant outputs.
Simple definition of RAG