The RAG model consists of two main modules: the Retriever and the Generator. These two modules work in tandem to ensure that the generated text not only contains relevant external knowledge but also features natural and fluent language expression.
2.1 The Retriever
The main task of the retriever is to obtain the content most relevant to the input query from an external knowledge base or document collection. In RAG, commonly used techniques include:
• Vector Retrieval: Such as BERT vectors, etc. It represents documents and queries in a vector space and uses similarity calculations for matching. The advantage of vector retrieval is that it can better capture semantic similarity, rather than relying solely on lexical matching.
• Traditional Retrieval Algorithms: Such as BM25, which is mainly a weighted search model based on term frequency and inverse document frequency (TF-IDF) to rank and retrieve documents. BM25 is suitable for handling relatively simple matching tasks, especially when there are direct matches of keywords between the query and the documents.
The role of the retriever in RAG is to provide a context background for the generator, enabling the generator to produce more relevant answers based on these retrieved document fragments.
2.2 The Generator
The generator is responsible for generating the final natural language output. In the RAG system, commonly used generators include:
• BART: BART is a sequence-to-sequence generative model that focuses on text generation tasks and can improve the generation quality through different levels of noise processing.
• GPT Series: GPT is a typical pre-trained language model, proficient in generating smooth and natural text. Through large-scale data training, it can generate relatively accurate answers, especially excelling in task-generation tasks.
After receiving the document fragments from the retriever, the generator will use these fragments as context and, combined with the input query, generate relevant and natural text answers. This ensures that the model's generated results are not only based on existing knowledge but also incorporate the latest external information.
2.3 The Workflow of RAG
The workflow of the RAG model can be summarized in the following steps:
- Input Query: The user enters a question, and the system converts it into a vector representation.
- Document Retrieval: The retriever extracts the document fragments most relevant to the query from the knowledge base, usually using vector retrieval techniques or traditional techniques such as BM25.
- Answer Generation: The generator receives the fragments provided by the retriever and generates natural language answers based on these fragments. The generator not only relies on the original user query but also uses the retrieved fragments to provide richer and contextually relevant answers.
- Output Result: The generated answer is fed back to the user, and this process ensures that the user can obtain accurate answers based on the latest and relevant information.