Resumen de Assessing a Literary RAG System with a Human-Evaluated Synthetic QA Dataset Generated by an LLM: Experiments with Knowledge Graphs

Yanco Amor Torterolo Orta, Sofía Micaela Roseti, Antonio Moreno Sandoval

español
El presente trabajo explora el uso de un dataset de QA generado por un LLM para evaluar un sistema de RAG, y muestra algunos experimentos con Knowledge Graphs para mejorar la recuperación de información en obras literarias dentro de las Humanidades Digitales. El sistema de RAG emplea una base de datos Neo4j personalizada que contiene el texto de la obra Trafalgar, de Benito Pérez Galdós. Esto ha supuesto la necesidad de encontrar un método adecuado para evaluar el sistema: un dataset sintético a partir del libro. Se emplearon varios modelos para producir versiones del dataset, y posteriormente cuatro lingüistas lo evaluaron, lo que permite la comparación entre modelos. Las métricas de RAG de DeepEval se utilizaron para comparar el sistema junto a la versión del dataset con mejor puntuación. Adicionalmente, este trabajo describe otras técnicas de recuperación de información como generación de Cypher a partir de texto, o few-shot prompting.
English
This paper explores the use of an LLM-generated QA dataset to evaluate a RAG system, and presents experiments involving Knowledge Graphs to improve retrieval over literary pieces in the context of Digital Humanities. The RAG system leverages a custom Neo4j database containing the text of the Spanish literary work Trafalgar, by Benito P´erez Gald´os. This posed the challenge of finding a suitable evaluation method for the system, which led to the generation of a synthetic dataset from the same book. Several models were used to create different versions of the dataset, which were then evaluated by four linguists (human evaluation), enabling comparisons between models. DeepEval RAG metrics were used to evaluate the system with the dataset version that obtained the highest score. Additionally, this work describes some retrieval techniques, such as text-to-Cypher generation and few-shot prompting.

Mi Hispadoc

Selección

Acceso de usuarios registrados

Resumen de Assessing a Literary RAG System with a Human-Evaluated Synthetic QA Dataset Generated by an LLM: Experiments with Knowledge Graphs

Mi Hispadoc