CLASSIFYING CATEGORY NAMES IN VIETNAMESE WIKIPEDIA
Keywords:Naming convention, Name taxonomy, Wikipedia category.
AbstractWikipedia is famous to be the biggest encyclopedia currently,the purpose of which is to spread knowledge for everyone in the world. By using robots in the process of article generation, Vietnamese Wikipedia is one of 13 language projects which has more than 1 million articles. However, this raises a lot of challenges for Vietnamese Wikipedia in article quality improvement, category classification, anti-vandalism and other tasks. In this paper, we classify categories in Vietnamese Wikipedia, particularly in category taxonomy and naming conventions. The crucial method is to adopt standards and category taxonomy in the English project, the biggest Wikipedia project in term of the amount of contributed information. Then we apply these to Vietnamese Wikipedia. To do this, we have to combine many social methods as well as techniques to gain expected results. The evaluation of category names and data results from Wikidata which we obtained is a first step to build a tool to translate English categories into Vietnamese categories.
Barak, L., Dagan, I., & Shnarch, E. (2009). Text categorization from category name via lexical reference. Paper presented at The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, USA.
Dao, T. N., & Simpson, T. (2005). Measuring similarity between sentences. Retrieved from http://trac.research.cc.gatech.edu/ccl/export/184/SecondMindProject/SM/SM.WordNet/Paper/WordNetDotNet_Semantic_Similarity.pdf
Nastase, V., & Strube, M. (2008). Decoding Wikipedia categories for knowledge acquisition. Paper presented at The Twenty-third AAAI Conference on Artificial Intelligence, USA.
Nguyễn, Q. C., Lê, T. N., Tôn, L. P., & Nguyễn, V. T. (2012). Một hướng tiếp cận xây dựng Ontology tiếng Việt. Tạp chí Đại học Công nghiệp, 14(6), 23-31.
Ponzetto, S. P., & Strube, M. (2007). Deriving a large-scale taxonomy from Wikipedia. Paper presented at The AAAI Conference on Artificial Intelligence, USA.
Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Philadelphia, USA: University of Pennsylvania.
Tuc, H. D. (2003). Vietnamese-English bilingualism: Patterns of code-switching.London, UK: Routledge Curzon Press.
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: A free collaborative knowledge base. Communications of the ACM, 57(10), 78-85.
Wikimedia (2015). Project:Semi-automatically generated categories for Vietnamese Wikipedia. Retrieved from https://meta.wikimedia.org/wiki/Grants:IEG/Semi-automatically_generate_Categories_for_Vietnamese_Wikipedia
Xu, L., Takeda, H., Hamasaki, M., & Wu, H. (2010). Typing software articles with Wikipedia category structure. Retrieved from http://www.nii.ac.jp/TechReports/public_html/10-002E.pdf
Zesch, T., & Gurevych, I. (2007). Analysis of the Wikipedia category graph for NLP applications. Paper presented at The TextGraphs-2 Workshop, USA.
Volume and Issues
Copyright & License
Copyright (c) 2017 Tạ Hoàng Thắng
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.