POL: Un nuevo sistema para la detección y clasificación de nombres propios

Rogelio Nazar; Patricio Arriagada

POL: Un nuevo sistema para la detección y clasificación de nombres propios

Autores: Rogelio Nazar, Patricio Arriagada
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 58, 2017, págs. 13-20
Idioma: español
Títulos paralelos:
- POL: A new system for named-entity detection and categorisation
Enlaces
- Texto completo

Dialnet Métricas: 1 Cita

Resumen
- español
  El objetivo de este trabajo es desarrollar una metodología para la detección y clasificación de nombres propios (NP) en las categorías de antropónimo, topónimo y nombre de organización. La hipótesis sobre la que se basa la investigación es que el contexto de aparición de los NP - definido como las n palabras previas – así como los elementos que componen el NP mismo, pueden aportar pistas para predecir el tipo de entidad. Para tal fin, se diseñó un algoritmo de clasificación supervisado que se entrena con un corpus ya anotado por otro sistema, que en el caso de nuestros experimentos fue la suite de analizadores de idiomas FreeLing anotando el corpus de la Wikipedia en castellano. En el entrenamiento, nuestro sistema aprende a relacionar tipos de entidades con palabras del contexto así como las que componen los NP anotados. Se evalúan los resultados en el corpus CONLL-2002 y también con un corpus de geopolítica perteneciente a la revista Le Monde Diplomatique en su edición en castellano. Se compara además el desempeño en ese corpus de distintos sistemas de extracción y clasificación de NP en castellano.
- English
  The purpose of this research is to develop a methodology for the detection and categorisation of named entities or proper names (PPNN), in the categories of geographical place, person and organisation. The hypothesis is that the context of occurrence of the entity - a context window of n words before the target - as well as the components of the PN itself may provide good estimators of the type of PN. To that end, we developed a supervised categorisation algorithm, with a training phase in which the system receives a corpus already annotated by another NERC system. In the case of these experiments, such system was the open-source suite of language analysers FreeLing, annotating the corpus of the Spanish Wikipedia. During this training phase, the system learns to associate the category of entity with words of the context as well as those from the PN itself. We evaluate results with the CONLL-2002 and also with a corpus of geopolitics from the journal Le Monde Diplomatique in its Spanish edition, and compare the results with some well-known NERC systems for Spanish.
Referencias bibliográficas
- Agerri, R., J. Bermudez, y G. Rigau. 2014. Ixa pipeline: Efficient and ready to use multilingual nlp tools. En Proceedings of the Ninth International...
- Arriagada, P. 2016. Análisis y clasificación de nombres propios en art´ıculos de geopol´ıtica de la revista Le Monde Diplomatique: una aproximación...
- Carreras, X., I. Chao, L. Padró, y M. Padró. 2004. Freeling: An open-source suite of language analyzers. En Proceedings of the 4th International...
- Carreras, X., L. M`arquez, y L. Padró. 2002. Named entity extraction using adaboost. En Proceedings of the 6th Conference on Natural Language...
- Coseriu, E. 1982. El plural en los nombres propios. En Teor´ıa del Lenguaje y Ling¨u´ıstica General. Gredos, Madrid, páginas 261–281.
- De Miguel, E. 1999. El aspecto l´exico. En I. Bosque y V. Demonte, editores, Gramática descriptiva de la lengua española. Espasa Calpe, Madrid,...
- Fernández Leborans, M. J. 1999. El nombre propio. En I. Bosque y V. Demonte, editores, Gramática descriptiva de la lengua española. Espasa...
- Finkel, J. R., T. Grenager, y C. Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling....
- Florian, R. 2002. Named entity recognition as a house of cards: Classifier stacking. En Proceedings of the 6th Conference on Natural Language...
- Gamallo, P., J. C. Pichel, M. Garcia, J. M. Abu´ın, y T. Fernández-Pena. 2014. Análisis morfosintáctico y clasificación de entidades nombradas...
- Grishman, R. y B. Sundheim. 1996. Message understanding conference-6: a brief history. En 16th International Conference on Computational Linguistics,...
- Lin, Y., J.-B. Michel, E. L. Aiden, J. Orwant, W. Brockman, y S. Petrov. 2012. Syntactic annotations for the google books ngram corpus. En...
- Manning, C. D., P. Raghavan, y H. Sch¨utze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
- Nadeau, D. y S. Sekine. 2007. A survey of named entity recognition and classification. Journal of Linguisticae Investigationes, 30(1):1–20.
- Padró, M. y L. Padró. 2005. A named entity recognition system based on a finite automata acquisition algorithm. Procesamiento del Lenguaje...
- RAE. 2009. Nueva gramática de la lengua española. Espasa Libros, Madrid. Ramshaw, L. A. y M. P. Marcus. 1995. Text chunking using transformation-based...
- Russell, B. 1905. On denoting. Mind, (14):479– 493.
- Solorio, T. 2004. Improvement of named entity tagging by machine learning. Informe t´ecnico, Coordinación de Ciencias Computacionales INAOE...
- Tjong Kim Sang, E. F. 2002. Introduction to the conll-2002 shared task: Language-independent named entity recognition. En Proceedings of the...
- Tjong Kim Sang, E. F. y J. Veenstra. 1999. Representing text chunks. En Proceedings of the Ninth Conference on European Chapter of the Association...
- Tkachenko, M. y A. Simanovsky. 2012. Named entity recognition: Exploring features. En J. Jancsary, editor, Proceedings of KONVENS 2012, páginas...
- van Dijk, T. A. 1992. La ciencia del texto. Paidós, Barcelona.
- van Hooland, S., M. D. Wilde, R. Verborgh, T. Steiner, y R. V. de Walle. 2015. Exploring entity recognition and disambiguation for cultural...
- Wilks, Y. 1998. Sense and texts. Computational Linguistics and Chinese Language Processing, 3(2):1–16

Mi Hispadoc

Selección

Opciones de artículo

Seleccionado

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Acceso de usuarios registrados

POL: Un nuevo sistema para la detección y clasificación de nombres propios

Mi Hispadoc

Opciones de artículo

Opciones de compartir

Opciones de entorno