Categorización de texto en bases documentales a partir de modelos computacionales livianos

Marcelo Mendoza Prado; Ivette Ortiz Montenegro

Categorización de texto en bases documentales a partir de modelos computacionales livianos

Marcelo Mendoza ^[1] ; Ivette Ortiz
1. [1] Universidad Técnica Federico Santa María
  
  Universidad Técnica Federico Santa María
  
  Valparaíso, Chile
Localización: Revista signos: estudios de lingüística, ISSN-e 0718-0934, ISSN 0035-0451, Nº. 77, 2011, págs. 251-274
Idioma: español
DOI: 10.4067/s0718-09342011000300004
Títulos paralelos:
- Text categorization in documentary databases using light computational models
Enlaces
- Texto completo (pdf)
Resumen
- español
  En este trabajo se presenta un nuevo categorizador de texto para bases de datos documentales. El categorizador propuesto corresponde a una extensión del categorizador Naive Bayes que permite obtener buenos resultados en bases documentales con desbalance en datos de entrenamiento. Resultados experimentales permiten afirmar que el categorizador supera a Naive Bayes y se compara favorablemente con otras técnicas más sofisticadas como máquinas de soporte vectorial y regresión logística sin incurrir en costos computacionales significativos en la fase de entrenamiento
- English
  We introduce a new text categorization method for documentary databases. The proposed method is an extension of the Naive Bayes text categorization model which allows obtaining good performance results in documentary databases with unbalanced training data. Experimental results allow us to conclude that the categorization method overcomes Naive Bayes and compares favorably with more sophisticated categorization methods such as support vector machines and logistic regression without increasing the use of computational resources in the training phase
Referencias bibliográficas
- Ault, T. & Yang, Y. (2002). Information filtering in TREC-9 and TDT-3: A comparative analysis. Journal of Information Retrieval, 5(2-3),...
- Bennett, P. (2000). Assessing the calibration of naive Bayes posterior estimates. Technical Report CMU-CS-00-155. School of Computer Science:...
- Datar, M. & Indyk, P. (2004). Locality-sensitive hashing scheme basedon p-stable distributions. En Annual symposium on computational geometry....
- Hastie, T., Tibshirani, R. & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York:...
- Indyk, P. (2004). Nearest neighbors in high-dimensional spaces. En J. Goodman & J. O’Rourke (Eds.), Handbook of discrete and computational...
- Joachims, T. (2006). Training linear SVMs in linear time. En ACM SIGKDD International conference on knowledge discovery and data mining. Philadelphia,...
- Kolcz, A. & Yih, W. (2007). Raising the baseline for high-precision text classifiers. En ACM SIGKDD International conference on knowledge...
- Lewis, D. & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. En Annual symposium on document analysis...
- Lewis, D., Yang, Y., Rose, T. & Li. F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning...
- Maron, M. & Kuhns, J. (1960). On relevance, probabilistic indexing, and information retrieval. Journal of the Association for Computing...
- McCallum, A. & Nigam, K. (1998). A comparison of event models for naive Bayes text classification. En International conference on machine...
- Mendoza, M. & Becerra, C. (2010). On the design of learning objects classifiers. En IEEE International conference on advanced learning...
- Perkins, S., Lacker, K. & Theiler, J. (2003). Grafting: Fast, incremental feature selection by gradient descent in function space. Journal...
- Qiang, G. (2010). An effective algorithm for improving the performance on naive Bayes for text classification. En International conference...
- Rennie, J., Shih, L., Teevan, J. & Karger, D. (2003). Tackling the poor assumptions of naive Bayes text classifiers. En International...
- Robertson, S., Walker, S., Hancock, M., Gull, A. & Lau, M. (1992). Okapi at TREC. En Text retrieval conference. Gaithersburg, Maryland,...
- Rocchio, J. (1971). Relevance feedback in information retrieval. En G. Salton (Ed.), The SMART Retrieval System–Experiments in automatic document...
- Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic retrieval. Information Processing y Management, 24(5), 513-523.
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.
- Schneider, K. (2005) Techniques for improving the performance of naive Bayes for text classification. En International conference on computational...
- Vapnik, V. (1998). Statistical learning theory. New York: Wiley-Interscience.
- Venegas, R. (2007). Clasificación de textos académicos en función de su contenido léxicosemántico. Revista Signos. Estudios de Lingüística,...
- Voorhees, E. & Harman, D. (2005). TREC: Experiments and evaluation in information retrieval. New York: MIT Press.
- Wilbur, W. & Kim, W. (2009). The ineffectiveness of within-document term frequency in text classification. Information Retrieval, 12(5),...
- Zhang, T. & Oles, F. (2001). Text categorization based on regularized linear classification methods. Journal of Information Retrieval,...

Mi Hispadoc

Selección

Opciones de artículo

Seleccionado

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Acceso de usuarios registrados

Categorización de texto en bases documentales a partir de modelos computacionales livianos

Universidad Técnica Federico Santa María

Mi Hispadoc

Opciones de artículo

Opciones de compartir

Opciones de entorno