Relevant SMS Spam Feature Selection Using Wrapper Approach and XGBoost Algorithm

https://doi.org/10.24017/science.2019.2.11

Abstract views: 3014 / PDF downloads: 984

Authors

  • Diyari Jalal Mussa Information technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Iraq
  • Noor Ghazi M. Jameel Computer Networks Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Iraq

Abstract

In recent years with the widely usage of mobile devices, the problem of SMS Spam increased dramatically. Receiving those undesired messages continuously can cause frustration to users. And sometimes it can be harmful, by sending SMS messages containing fake web pages in order to steal users’ confidential information. Besides spasm number of hazardous actions, there is a limited number of spam filtering software. According to this paper, XGBoost algorithm used for handling SMS spam detection problem. Number of structural features was collected from previous studies. 15 structural features were extracted from Tiago’s dataset, which is the most frequently used dataset by researchers. For selecting the optimal relevant features, two different types of wrapper feature selection algorithms were used in order to reduce and select best relevant features. The accuracy and performance obtained by the selected features via sequential backward selection method was better comparing to sequential forward selection method. The extracted nine optimal features can be a good representation of a spam SMS message. Additionally, the classification accuracy obtained by the proposed method using nine optimal features with XGBoost algorithm is 98.64 using 10-fold cross validation.

Keywords:

SMS spam, wrapper methods, sequential feature selection, sequential forward selection, sequential backward selection, boosting classifier, extreme gradient boosting, XGBoost.

References

[1] T. A. Almeida, J. M. G. Hidalgo and T. P. Silva, "Towards SMS Spam Filtering: Results under a New Dataset," International Journal of Information Security Science, vol. 2, no. 1, pp. 1-18, 2013.
[2] S. J. Delany, M. Buckley and D. Greene, "SMS spam filtering: Methods and data," Expert Systems with Applications, vol. 39, no. 10, pp. 9899-9908, 2012.
https://doi.org/10.1016/j.eswa.2012.02.053
[3] X. Hu and F. Yan, "Sampling of Mass SMS Filtering Algorithm Based on Frequent Time-domain Area," in Third International Conference on Knowledge Discovery and Data Mining, Phuket, Thailand, 2010.
[4] D. Puniškis , R. Laurutis and R. Dirmeikis , "An Artificial Neural Nets for Spam e-mail Recognition," Elektronika ir Elektrotechnika, vol. 69, no. 5, p. 73 -76, 2006.
[5] W. L. Huang, Y. Liu, Z. Q. Zhong and Z. M. Shen, "Complex Network Based SMS Filtering Algorithm," Acta Automatica Sinica, vol. 7, no. 35, p. 990-996, 2009.
https://doi.org/10.3724/SP.J.1004.2009.00990
[6] X. Zheng , C. Liu and Z. Yu, "Chinese short messages service spam filtering based on logistic regression," Journal of Heilongjiang Institute of Technology, vol. 4, no. 24, p. 36-39, 2010.
[7] K. Yadav, P. Kumaraguru, A. Goyal, A. Gupta and V. Naik, "SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering," in Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, Phoenix, Arizona, 2011.
https://doi.org/10.1145/2184489.2184491
[8] A. K. Uysal, S. Gunal, S. Ergin and E. S. Gunal, "The Impact of Feature Extraction and Selection on SMS Spam Filtering," Elektronika ir Elektrotechnika, vol. 19, no. 5, p. 2013, 2013.
https://doi.org/10.5755/j01.eee.19.5.1829
[9] I. Ahmed, D. Guan and T. C. Chung , "SMS Classification Based on Naïve Bayes Classifier and Apriori Algorithm Frequent Itemset," International Journal of Machine Learning and Computing, vol. 4, no. 2, 2014.
https://doi.org/10.7763/IJMLC.2014.V4.409
[10] F. Akbari and H. Sajedi, "SMS Spam Detection using Selected Text Features and Boosting Classifiers," in 7th Conference on Information and Knowledge Technology (IKT), Urmia, Iran, 2015.
https://doi.org/10.1109/IKT.2015.7288782
[11] X. Zhang , G. Xiong, Y. Hu, F. Zhu, X. Dong and T. R. Nyberg, "A Method of SMS Spam Filtering Based on AdaBoost Algorithm," in 12th World Congress on Intelligent Control and Automation, Guilin, China, 2016.
https://doi.org/10.1109/WCICA.2016.7578522
[12] N. Choudhary and A. k. Jain, "Towards Filtering of SMS Spam Messages Using Machine Learning Based Technique," Advanced Informatics for Computing Research, 2017.
https://doi.org/10.1007/978-981-10-5780-9_2
[13] J. Ma, Y. Zhang, Z. Wang and B. Chen, "A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model," KSII Transactions on Internet and Information Systems, vol. 12, no. 2, 2018.
https://doi.org/10.3837/tiis.2018.02.004
[14] M. Nivaashini, R.S.Soundariya, A.Kodieswari, and P.Thangaraj,"SMS Spam Detection using Deep Neural Network", International Journal of Pure and Applied Mathematics, Volume 119 No. 18, pp. 2425-2436 , 2018,
[15] I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, G. Paliouras and C. D. Spyropoulos , "An Evaluation of Naive Bayesian Anti-Spam Filtering," in Proc. of the Workshop on Machine Learning in the New Information Age, pp. 9-17, 2000.
[16] SMS Spam Collection v.1 dataset, https://archive.ics.uci.edu/ml/datasets/sms+spam+collection.
[17] "Technical realization of the Short Message Service (SMS). Point-to-Point," ETSI, GSM 03.40, 1992.
[18] S. Günal, S. Ergin, M. B. Gülmezo?lu and Ö. N. Gerek, "On Feature Extraction for Spam E-Mail Detection," in Lecture Notes in Computer Science, , pp. 635-642, 2006.
https://doi.org/10.1007/11848035_84
[19] S. Gunal, "Hybrid feature selection for text classification," Turkish Journal of Electrical Engineering and Computer Sciences, vol. 20, no. 2, pp. 1296-1311, 2012.
https://doi.org/10.3906/elk-1101-1064
[20] P. Su, Y. Liu and X. Song, "Research on Intrusion Detection Method Based on Improved Smote and XGBoost," in Proceedings of the 8th International Conference on Communication and Network Security, Qingdao, China, 2018.
https://doi.org/10.1145/3290480.3290505
[21] B. Kumari and T. Swarnkar, "Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array: A Review," International Journal of Computer Science and Information Technologies, Vol. 2 (3),pp. 1048-1053, 2011.
[22] T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, 2016.
https://doi.org/10.1145/2939672.2939785

Downloads

Article Metrics

Published

21-11-2019

Issue

Section

Pure and Applied Science

How to Cite

[1]
D. Jalal Mussa and N. G. M. Jameel, “Relevant SMS Spam Feature Selection Using Wrapper Approach and XGBoost Algorithm”, KJAR, vol. 4, no. 2, pp. 110–120, Nov. 2019, doi: 10.24017/science.2019.2.11.