TEXT CLASSIFICATION BASED ON SUPPORT VECTOR MACHINE
DOI:
https://doi.org/10.37569/DalatUniversity.9.2.536(2019)Keywords:
Feature vector, Kernal, Naïve Bayes, Support Vector Machine, Text classification.Abstract
The development of the Internet has increased the need for daily online information storage. Finding the correct information that we are interested in takes a lot of time, so the use of techniques for organizing and processing text data are needed. These techniques are called text classification or text categorization. There are many methods of text classification, but for this paper we study and apply the Support Vector Machine (SVM) method and compare its effect with the Naïve Bayes probability method. In addition, before implementing text classification, we performed preprocessing steps on the training set by extracting keywords with dimensional reduction techniques to reduce the time needed in the classification process.
Downloads
References
An, J., & Chen, Y. P. P. (2005). Keyword extraction for text categorization. Paper presented at The International Conference on Active Media Technology, Japan.
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Leaming, 20(3), 273-297
Ehrentraut, C., Ekholm, M., & Tan, H. (2018). Detecting hospital-acquired infections: A document classification approach using Support Vector Machines and gradient tree boosting. Health Informatics Journal, 24, 24-42.
Github. (2017). Vietnamese stopwords. Retrieved from https://github.com/stopwords/vietnamese-stopwords/blob/master/vietnamese-stopwords.txt
Kim, S., Han, K., Rim, H., & Myaeng, S. (2006). Some effective techniques for Naive Bayes text classification. Transactions on Knowledge and Data Engineering, 18(11), 1457-1466.
Leopold, E., & Kinermann, J. (2002). Text categorization with Support Vector Machines. How to represent texts in input space? Machine Learning, 46(1-3), 423-444.
Lin, D., Peng, H., & Liu, B. (2006). Support Vector Machines for text categorization in Chinese question classification. Paper presented at The International Conference on Web Intelligence, China.
Liu, Z., & Xu, H. (2013). Kernel parameter selection for Support Vector Machines classification. Journal of Algorithms & Computational Technology, 8(2), 163-177.
Madge, S., & Bhatt, S. (2015). Predicting stock price direction using Support Vector Machines. Retrieved from https://www.cs.princeton.edu/sites/default/files/uploads/saahil_madge.pdf
Nguyen, G. L., & Luong, M. T. (2006). Phân loại văn bản tiếng Việt với bộ phân loại vectơ hỗ trợ SVM. Retrieved from http://ictvietnam.vn/files/_layouts/biznews/uploads/file/Uploaded/admin/CS15012_bai_anh_Linh_Giang.pdf
Nguyen, S. D., Ngo, H. Q., & Jiamthapthaksin, R. (2016). State-of-the-art Vietnamese word segmentation. Paper presented at The International Conference on Science in Information Technology, Indonesia.
Ninh, D. K., & Nguyen, Q. V. (2017). Biểu diễn ngữ cảnh trong khai triển chữ viết tắt dùng tiếp cận học máy. Tạp chí Khoa học và Công nghệ Đại học Đà Nẵng, 5(114), 31-35.
Pham, T. V., & Ta, T. M. (2017). Vietnamese news classification based on BoW with keywords extraction and neural network. Paper presented at The Asia Pacific Symposium on Intelligent and Evolutionary Systems, Vietnam.
Phan, T. H., & Nguyen, Q. C. (2015). Automatic classification for Vietnamese news. Advances in Computer Science: An International Journal, 4(4), 126-132.
R. Courant, & D. Hilbert. (1953). Methods of mathematical physics. New Jersey, USA: John Wiley & Sons.
Umair, S., & Sharif, M. (2018). Predicting students grades using artificial neural networks and Support Vector Machines. In M. K. Pour (Eds.), Encyclopedia of Information Science and Technology (4th ed. p. 14). Pennsylvania, USA: IGI Global USA.
Vladimir, V. (1999). The nature of statistical learning theory (2nd ed.). Berlin, Germany: Springer Publishing.
Vu, T. H. (2018). Bài 32: Naive Bayes classifier. Retrieved from https://machinelearningcoban.com/2017/08/08/nbc/
Xue, D., & Fengxin. (2015). Research of text categorization model based on random forests. Paper presented at The IEEE International Conference on Computational Intelligence & Communication Technology, India.
Downloads
Published
Volume and Issues
Section
Copyright & License
Copyright (c) 2019 Lê Thị Minh Nguyện.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.