Persian SMS Spam Detection using Machine Learning and Deep Learning Techniques

Document Type : Original Article

Authors

1 Department of Information Technology Science and Research Branch, Islamic Azad University Tehran, Iran

2 Department of Computer Engineering, Science and Research Branch, Islamic Azad University Tehran, Iran

3 Iran telecom IT Research faculty, ICT research institute, Tehran, Iran research center

Abstract

Spams are well-known examples of unsolicited text or messages which are sent by unknown individuals and cause issues for smartphone users. The inconvenience imposed on users, the loss of network traffic, the rise in the calculated cost, occupying more physical space on the mobile phone, and abusing and defrauding recipients are but a few of their downsides. Consequently, the automated identification of  suspicious and spam messages is undoubtedly vitally important. Additionally, text messages which are smartly composed might be difficult to recognize. However, the present methodologies in this subject are hindered by the absence of adequate Persian datasets. A huge body of research and experiments has revealed that techniques based on deep and combined learning are superior at identifying unpleasant text messages. This work sought to develop an effective strategy for identifying SMS spam through utilizing combining machine learning classification algorithms together with deep learning models. After applying  preprocessing on our gathered dataset, the suggested technique applies two convolutional neural network layers, the first of which being an LSTM layer, and the second one which is a fully connected layer to extract the data characteristics, thereby implementing the suggested deep learning approach. As part of the Machine Learning methodologies, the vector support machine makes use of the data and features at hand to determine the ultimate classification. Results indicate that the suggested model is implemented more effectively than the existing techniques, and an accuracy of 97.7% was achieved as a result.

Keywords

Main Subjects


  • Navaney, G. Dubey, and A. Rana, "SMS spam filtering using supervised machine learning algorithms," in 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2018, pp.43-48: IEEE.
  • N. A. Sjarif, N. F. M. Azmi, S. Chuprat, H. M. Sarkan, Y. Yahya, and S. M. Sam, "SMS spam message detection using term frequency-inverse document frequency and random forest algorithm," Procedia Computer Science, vol. 161, pp.509-515, 2019.
  • K. Roy, J. P. Singh, and S. Banerjee, "Deep learning to filter SMS Spam," Future Generation Computer Systems, vol. 102, pp. 524-533, 2020.
  • Gadde, A. Lakshmanarao, and S. Satyanarayana, "SMS Spam Detection using Machine Learning and Deep Learning Techniques," in 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, vol. 1, pp.358-362: IEEE.
  • Tekerek, "Support vector machine based spam SMS detection," Politeknik Dergisi, vol. 22, no. 3, pp. 779-784, 2019.
  • Ballı and O. Karasoy, "Development of content-based SMS classification application by using Word2Vec-based feature extraction," IET Software, vol. 13, no. 4, pp. 295-304, 2019.
  • Santos, N. Nedjah, and L. de Macedo Mourelle, "Sentiment analysis using convolutional neural network with fastText embeddings," in 2017 IEEE Latin American conference on computational intelligence (LA-CCI), 2017, pp. 1-5: IEEE.
  • Mojumder, M. Hasan, M. F. Hossain, and K. A. Hasan, "A study of fasttext word embedding effects in document classification in bangla language," in International Conference on Cyber Security and Computer Science, 2020, pp. 441-453: Springer.
  • Li, Z. Hao, and H. Lei, "Survey of convolutional neural network," Journal of Computer Applications, vol. 36, no. 9, pp. 2508-2515, 2016.
  • Li, Z. Hao, and H. Lei, "Survey of convolutional neural network," Journal of Computer Applications, vol. 36, no. 9, pp. 2508-2515, 2016.
  • Sony, K. Dunphy, A. Sadhu, and M. Capretz, "A systematic review of convolutional neural network-based structural condition assessment techniques," Engineering Structures, vol. 226, p. 111347, 2021.
  • Aloysius and M. Geetha, "A review on deep convolutional neural networks," in 2017 International Conference on Communication and Signal Processing (ICCSP), 2017, pp. 0588-0592: IEEE.
  • S. Vyas, H. B. Prajapati, and V. K. Dabhi, "Survey on face expression recognition using CNN," in 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), 2019, pp. 102-106: IEEE.
  • Mu and X. Zeng, "A review of deep learning research," KSII Transactions on Internet and Information Systems (TIIS), vol. 13, no. 4, pp. 1738-1764, 2019.
  • Scholkopf, Bernhard, and Alexander J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive Computation and Machine Learning Series, 2018.
  • Roya khorashadizadeh was born in 1991 in Mashhad. she received her B.Sc. in Information Technology Engineering from mashhad islamic azad University in 2015. Currently, she is an M.Sc. student in Information Technology at Science and Research Branch, islamic azad university. her research interests include Text processing and machine learning.
  • Sommayeh Jafarali jassbi was born in Tehran,Iran, in 1982. She received the M.Sc degree in computer architecture engineering in 2007, and the Ph.D. degree in computer architecture engineering in 2010 from the Islamic Azad Univeriry Science and Research Branch. In 2010, she joined the Department of computer engineering, Islamic Azad University Science and Research Branch. She became an associate professor in 2011. Her interests are cloud computing, internet of things, wireless sensor network and computer architecture and cryptography. She was head of computer department in 2012.Now she is selected as a head of computer department again.She was also an active member of young researcher club from 2004. She has written, translate and published several professional books and paper in her fields.
  •  Alireza Yari received his B.Sc. degree in control system engineering in 1993 from the University of Tehran, Iran, and M.Sc. and Ph.D. degree in System engineering in 2000 from Kitami institute of technology, Japan. He is currently doing research in Information Technology research faculty of Iran Telecom Research Center (ITRC). His research interests include cloud computing and data centers. He is also working on application of cloud computing in data intensive application, such as web search engine.
  •  
  •