CLASSIFYING CATEGORY NAMES IN VIETNAMESE WIKIPEDIA

Authors

  • Tạ Hoàng Thắng The Faculty of Information Technology, Dalat University

DOI:

https://doi.org/10.37569/DalatUniversity.7.2.240(2017)

Keywords:

Naming convention, Name taxonomy, Wikipedia category.

Abstract

Wikipedia is famous to be the biggest encyclopedia currently,the purpose of which is to spread knowledge for everyone in the world. By using robots in the process of article generation, Vietnamese Wikipedia is one of 13 language projects which has more than 1 million articles. However, this raises a lot of challenges for Vietnamese Wikipedia in article quality improvement, category classification, anti-vandalism and other tasks. In this paper, we classify categories in Vietnamese Wikipedia, particularly in category taxonomy and naming conventions. The crucial method is to adopt standards and category taxonomy in the English project, the biggest Wikipedia project in term of the amount of contributed information. Then we apply these to Vietnamese Wikipedia. To do this, we have to combine many social methods as well as techniques to gain expected results. The evaluation of category names and data results from Wikidata which we obtained is a first step to build a tool to translate English categories into Vietnamese categories.

Metrics

Metrics Loading ...

References

Barak, L., Dagan, I., & Shnarch, E. (2009). Text categorization from category name via lexical reference. Paper presented at The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, USA.

Dao, T. N., & Simpson, T. (2005). Measuring similarity between sentences. Retrieved from http://trac.research.cc.gatech.edu/ccl/export/184/SecondMindProject/SM/SM.WordNet/Paper/WordNetDotNet_Semantic_Similarity.pdf

Nastase, V., & Strube, M. (2008). Decoding Wikipedia categories for knowledge acquisition. Paper presented at The Twenty-third AAAI Conference on Artificial Intelligence, USA.

Nguyễn, Q. C., Lê, T. N., Tôn, L. P., & Nguyễn, V. T. (2012). Một hướng tiếp cận xây dựng Ontology tiếng Việt. Tạp chí Đại học Công nghiệp, 14(6), 23-31.

Ponzetto, S. P., & Strube, M. (2007). Deriving a large-scale taxonomy from Wikipedia. Paper presented at The AAAI Conference on Artificial Intelligence, USA.

Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Philadelphia, USA: University of Pennsylvania.

Tuc, H. D. (2003). Vietnamese-English bilingualism: Patterns of code-switching.London, UK: Routledge Curzon Press.

Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledge base. Communications of the ACM, 57(10), 78-85.

Wikimedia (2015). Project:Semi-automatically generated categories for Vietnamese Wikipedia. Retrieved from https://meta.wikimedia.org/wiki/Grants:IEG/Semi-automatically_generate_Categories_for_Vietnamese_Wikipedia

Xu, L., Takeda, H., Hamasaki, M., & Wu, H. (2010). Typing software articles with Wikipedia category structure. Retrieved from http://www.nii.ac.jp/TechReports/public_html/10-002E.pdf

Zesch, T., & Gurevych, I. (2007). Analysis of the Wikipedia category graph for NLP applications. Paper presented at The TextGraphs-2 Workshop, USA.

Published

28-06-2017

Volume and Issues

Section

Natural Sciences and Technology

How to Cite

Thắng, T. H. (2017). CLASSIFYING CATEGORY NAMES IN VIETNAMESE WIKIPEDIA. Dalat University Journal of Science, 7(2), 217-230. https://doi.org/10.37569/DalatUniversity.7.2.240(2017)