Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model

Document Type : Original Article

Authors

1 Electrical and Computer Engineering Department, University of Tehran, Tehran, Iran;

2 Electrical and Computer Engineering Department, Tarbiat Modares University, Tehran, Iran;

Abstract

Sentiment analysis is the process of identifying and categorizing people’s emotions or opinions regarding various topics. The analysis of Twitter sentiment has become an increasingly popular topic in recent years. In this paper, we present several machine learning and a deep learning model to analysis sentiment of Persian political tweets. Our analysis was conducted using Bag of Words and ParsBERT for word representation. We applied Gaussian Naive Bayes, Gradient Boosting, Logistic Regression, Decision Trees, Random Forests, as well as a combination of CNN and LSTM to classify the polarities of tweets. The results of this study indicate that deep learning with ParsBERT embedding performs better than machine learning. The CNN-LSTM model had the highest classification accuracy with 89 percent on the first dataset and 71 percent on the second dataset. Due to the complexity of Persian, it was a difficult task to achieve this level of efficiency. The main objective of our research was to reduce the training time while maintaining the model's performance. As a result, several adjustments were made to the model architecture and parameters. In addition to achieving the objective, the performance was slightly improved as well.

Keywords

Main Subjects


[1]     M. Wankhade, A. C. S. Rao, and C. Kulkarni, "A survey on sentiment analysis methods, applications, and challenges," Artificial Intelligence Review, vol. 55, no. 7, 2022, pp. 5731-5780. https://doi.org/10.1007/s10462-022-10144-1
[2]     W. Medhat, A. Hassan, and H. Korashy, "Sentiment analysis algorithms and applications: A survey," Ain Shams engineering journal, vol. 5, no. 4, 2014, pp. 1093-1113. https://doi.org/10.1016/j.asej.2014.04.011
[3]     K. Ravi and V. Ravi, "A survey on opinion mining and sentiment analysis: tasks, approaches and applications," Knowledge-based systems, vol. 89, 2015, pp. 14-46. https://doi.org/10.1016/j.knosys.2015.06.015.
[4]     S. Y. Ying, P. Keikhosrokiani, and M. P. Asl, "Comparison of data analytic techniques for a spatial opinion mining in literary works: A review paper," In International Conference of Reliable Information and Communication Technology, Cham: Springer International Publishing, 2021, pp. 523-535. https://doi.org/10.1007/978-3-030-70713-2_49
[5]     S. Maghilnan and M. R. Kumar, "Sentiment analysis on speaker specific speech data," in 2017 international conference on intelligent computing and control (I2C2), IEEE, 2017, pp. 1-5. https://doi.org/10.1109/I2C2.2017.8321795.
[6]     N. Mittal, D. Sharma, and M. L. Joshi, "Image sentiment analysis using deep learning," in 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), IEEE, 2018, pp. 684-687. https://doi.org/10.1109/WI.2018.00-11
[7]     H.-C. Soong, N. B. A. Jalil, R. K. Ayyasamy, and R. Akbar, "The essential of sentiment analysis and opinion mining in social media: Introduction and survey of the recent approaches and techniques," in 2019 IEEE 9th symposium on computer applications & industrial electronics (ISCAIE), IEEE, 2019, pp. 272-277. https://doi.org/10.1109/ISCAIE.2019.8743799.
[8]     G. Alexandridis, I. Varlamis, K. Korovesis, G. Caridakis, and P. Tsantilas, "A survey on sentiment analysis and opinion mining in greek social media," Information, vol. 12, no. 8, 2021, p. 331. https://doi.org/10.3390/info12080331.
[9]     U. Naseem, I. Razzak, K. Musial, and M. Imran, "Transformer based deep intelligent contextual embedding for twitter sentiment analysis," Future Generation Computer Systems, vol. 113, 2020, pp. 58-69. https://doi.org/10.1016/j.future.2020.06.050.
[10]   Z. Drus and H. Khalid, "Sentiment analysis in social media and its application: Systematic literature review," Procedia Computer Science, vol. 161, 2019, pp. 707-714. https://doi.org/10.1016/j.procs.2019.11.174.
[11]   D. Antonakaki, P. Fragopoulou, and S. Ioannidis, "A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks," Expert Systems with Applications, vol. 164, 2021, p. 114006. https://doi.org/10.1016/j.eswa.2020.114006.
[12]   V. Van der Mersch, "Twitter’s 10 Year Struggle with Developer Relations," Nordic APIs, 2016. https://nordicapis.com/twitter-10-year-struggle-with-developer-relations/
[13]   Z. Saeed, R. A. Abbasi, O. Maqbool, A. Sadaf, I. Razzak, A. Daud, N. R. Aljohani, and G. Xu, "What’s happening around the world? a survey and framework on event detection techniques on twitter," Journal of Grid Computing, vol. 17, 2019, pp. 279-312. https://doi.org/10.1007/s10723-019-09482-2.
[14]   V. Shah, S. Singh, and M. Singh, "TweeNLP: a twitter exploration portal for natural language processing," arXiv preprint arXiv:2106.10512, 2021. https://doi.org/10.48550/arXiv.2106.10512.
[15]   L. Abualigah, H. E. Alfar, M. Shehab, and A. M. A. Hussein, "Sentiment analysis in healthcare: a brief review," Recent advances in NLP: the case of Arabic language, 2020, pp. 129-141. https://doi.org/10.1007/978-3-030-34614-0_7.
[16]   F. Mehraliyev, I. C. C. Chan, and A. P. Kirilenko, "Sentiment analysis in hospitality and tourism: a thematic and methodological review," International Journal of Contemporary Hospitality Management, vol. 34, no. 1, 2022, pp. 46-77. https://doi.org/10.1108/IJCHM-02-2021-0132.
[17]   J. R. Saura, P. Palos-Sanchez, and A. Grilo, "Detecting indicators for startup business success: Sentiment analysis using text data mining," Sustainability, vol. 11, no. 3, 2019, p. 917. https://doi.org/10.3390/su11030917.
[18]   T. Renault, "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, vol. 2, no. 1-2, 2020, pp. 1-13. https://doi.org/10.1007/s42521-019-00014-x
[19]   K. Mishev, A. Gjorgjevikj, I. Vodenska, L. T. Chitkushev, and D. Trajanov, "Evaluation of sentiment analysis in finance: from lexicons to transformers," IEEE access, vol. 8, 2020, pp. 131662-131682. https://doi.org/10.1109/ACCESS.2020.3009626.
[20]   J. Zhou and J. M. Ye, "Sentiment analysis in education research: a review of journal publications," Interactive learning environments, vol. 31, no. 3, 2020, pp. 1-13. https://doi.org/10.1080/10494820.2020.1826985.
[21]   P. K. Jain, V. Saravanan, and R. Pamula, "A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents," Transactions on Asian and Low-Resource Language Information Processing, vol. 20, no. 5, 2021, pp. 1-15. https://doi.org/10.1145/3457206.
[22]   S. Dorle and N. Pise, "Political sentiment analysis through social media," in 2018 second international conference on computing methodologies and communication (ICCMC), IEEE, 2018, pp. 869-873. https://doi.org/10.1109/ICCMC.2018.8487879.
[23]   R. Bose, R. K. Dey, S. Roy, and D. Sarddar, "Analyzing political sentiment using Twitter data," in Information and Communication Technology for Intelligent Systems: Proceedings of ICTIS 2018, Vol. 2, Springer, 2019, pp. 427-436.  https://doi.org/10.1007/978-981-13-1747-7_41.
[24]   M. Rodríguez-Ibáñez, F. J. Gimeno-Blanes, P. M. Cuenca-Jiménez, C. Soguero-Ruiz, and J. L. Rojo-Álvarez, "Sentiment analysis of political tweets from the 2019 Spanish elections," IEEE Access, vol. 9, 2021, pp. 101847-101862. https://doi.org/10.1109/ACCESS.2021.3097492.
[25]   S. Peng, L. Cao, Y. Zhou, Z. Ouyang, A. Yang, X. Li, W. Jia, and S. Yu, "A survey on deep learning for textual emotion analysis in social networks," Digital Communications and Networks, vol. 8, no. 5, 2022, pp. 745-762. https://doi.org/10.1016/j.dcan.2021.10.003.
[26]   M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, "Parsbert: Transformer-based model for persian language understanding," Neural Processing Letters, vol. 53, 2021, pp. 3831-3847. https://doi.org/10.1007/s11063-021-10528-4.
[27]   A. Sheeba, S. Padmakala, and J. Ramya, "Consuming E-Learning Twitter Data by Live Streaming," in 2021 6th International Conference on Inventive Computation Technologies (ICICT), IEEE, 2021, pp. 738-740. https://doi.org/10.1109/ICICT50816.2021.9358676.
[28]   J. Jotheeswaran and Y. Kumaraswamy, "Opinion mining using decision tree based feature selection through manhattan hierarchical cluster measure," Journal of Theoretical & Applied Information Technology, vol. 58, no. 1, 2013.
[29]   M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh, and O. Qawasmeh, "Enhancing aspect-based sentiment analysis of Arabic hotels’ reviews using morphological, syntactic and semantic features," Information Processing & Management, vol. 56, no. 2, 2019, pp. 308-319. https://doi.org/10.1016/j.ipm.2018.01.006.
[30]   A. B. Goldberg and X. Zhu, "Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization," in Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing, 2006, pp. 45-52.
[31]   J. Mutinda, W. Mwangi, and G. Okeyo, "Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding (LeBERT) Model with Convolutional Neural Network," Applied Sciences, vol. 13, no. 3, 2023, p. 1445. https://doi.org/10.3390/app13031445.
[32]   S. Aslan, S. Kızıloluk, and E. Sert, "TSA-CNN-AOA: Twitter sentiment analysis using CNN optimized via arithmetic optimization algorithm," Neural Computing and Applications, vol. 35, 2023, pp. 1-18. https://doi.org/10.1007/s00521-023-08236-2.
[33]   B. S. Ainapure, R. N. Pise, P. Reddy, B. Appasani, A. Srinivasulu, M. S. Khan, and N. Bizon, "Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches," Sustainability, vol. 15, no. 3, 2023, p. 2573. https://doi.org/10.3390/su15032573.
[34]   B. AlBadani, R. Shi, and J. Dong, "A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM," Applied System Innovation, vol. 5, no. 1, 2022, p. 13. https://doi.org/10.3390/asi5010013.
[35]   M. Misuraca, G. Scepi, and M. Spano, "Using Opinion Mining as an educational analytic: An integrated strategy for the analysis of students’ feedback," Studies in Educational Evaluation, vol. 68, 2021, p. 100979. https://doi.org/10.1016/j.stueduc.2021.100979.
[36]   G. A. Ruz, P. A. Henríquez, and A. Mascareño, "Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers," Future Generation Computer Systems, vol. 106, 2020, pp. 92-104. https://doi.org/10.1016/j.future.2020.01.005
[37]   A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, "Sentiment analysis and classification of Indian farmers’ protest using twitter data," International Journal of Information Management Data Insights, vol. 1, no. 2, 2021, p. 100019. https://doi.org/10.1016/j.jjimei.2021.100019.
[38]   B. Charalampakis, D. Spathis, E. Kouslis, and K. Kermanidis, "A comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets," Engineering Applications of Artificial Intelligence, vol. 51, 2016, pp. 50-57. https://doi.org/10.1016/j.engappai.2016.01.007.
[39]   P. Katta and N. P. Hegde, "A hybrid adaptive neuro-fuzzy interface and support vector machine based sentiment analysis on political twitter data," International Journal of Intelligent Engineering and Systems, vol. 12, no. 1, 2019, pp. 165-173. https://doi.org/10.22266/ijies2019.0228.17.
[40]   M. Shams, A. Shakery, and H. Faili, "A non-parametric LDA-based induction method for sentiment analysis," in The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012), IEEE, 2012, pp. 216-221. https://doi.org/10.1109/AISP.2012.6313747.
[41]   M. Dehghani, D. T. Dehkordy, and M. Bahrani, "Abusive words Detection in Persian tweets using machine learning and deep learning techniques," in 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS), IEEE, 2021, pp. 1-5. https://doi.org/10.1109/ICSPIS54653.2021.9729390
[42]   K. Dashtipour, M. Gogate, A. Gelbukh, and A. Hussain, "Extending persian sentiment lexicon with idiomatic expressions for sentiment analysis," Social Network Analysis and Mining, vol. 12, 2022, pp. 1-13. https://doi.org/10.1007/s13278-021-00840-1.
[43]   Z. B. Nezhad and M. A. Deihimi, "Twitter sentiment analysis from Iran about COVID 19 vaccine," Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 16, no. 1, 2022, p. 102367. https://doi.org/10.1016/j.dsx.2021.102367.
[44]   M. B. Dastgheib, S. Koleini, and F. Rasti, "The application of deep learning in persian documents sentiment analysis," International Journal of Information Science and Management (IJISM), vol. 18, no. 1, 2020, pp. 1-15.
[45]   M. Dehghani and Z. Yazdanparast, "Sentiment Analysis of Persian Political Tweets Using ParsBERT Embedding Model with Convolutional Neural Network," in 2023 9th International Conference on Web Research (ICWR), IEEE, 2023, pp. 20-25. https://doi.org/10.1109/ICWR57742.2023.10139063.
[46]   C. Pavan Kumar and L. Dhinesh Babu, "Novel text preprocessing framework for sentiment analysis," in Smart Intelligent Computing and Applications: Proceedings of the Second International Conference on SCI 2018, Vol. 2, Springer, 2019, pp. 309-317. https://doi.org/10.1007/978-981-13-1927-3_33.
[47]   M. Dehghani and M. Manthouri, "Semi-automatic Detection of Persian Stopwords using FastText Library," in 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), IEEE, 2021, pp. 267-271. https://doi.org/10.1109/ICCKE54056.2021.9721519
[48]   M. Dehghani and F. Ebrahimi, "ParsBERT topic modeling of Persian scientific articles about COVID-19," Informatics in Medicine Unlocked, vol. 36, 2023, p. 101144. https://doi.org/10.1016/j.imu.2022.101144.
[49]   A. Khan, B. Baharudin, L. H. Lee, and K. Khan, "A review of machine learning algorithms for text-documents classification," Journal of advances in information technology, vol. 1, no. 1, 2010, pp. 4-20. https://doi.org/10.4304/jait.1.1.4-20
[50]   F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Vanderplas, "Scikit-learn: Machine learning in Python," the Journal of machine Learning research, vol. 12, 2011, pp. 2825-2830.
[51]   M. Lango, D. Brzezinski, and J. Stefanowski, "PUT at SemEval-2016 Task 4: The ABC of Twitter sentiment analysis," in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 126-132.
[52]   T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785-794. https://doi.org/10.1145/2939672.2939785.
[53]   A. Subasi, "Machine learning techniques," Practical machine learning for data analysis using Python, 2020, pp. 91-202.
[54]   Y. Kaiyawan, "Principle and using logistic regression analysis for research," RMUTSV Research Journal, vol. 4, no. 1, 2012, pp. 1-12.
[55]   C. N. Kamath, S. S. Bukhari, and A. Dengel, "Comparative study between traditional machine learning and deep learning approaches for text classification," in Proceedings of the ACM Symposium on Document Engineering 2018, 2018, pp. 1-11. https://doi.org/10.1145/3209280.3209526.
[56]   M. Sheykhmousa, M. Mahdianpari, H. Ghanbari, F. Mohammadimanesh, P. Ghamisi, and S. Homayouni, "Support vector machine versus random forest for remote sensing image classification: A meta-analysis and systematic review," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, 2020, pp. 6308-6325. https://doi.org/10.1109/JSTARS.2020.3026724.
[57]   Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, "A survey of convolutional neural networks: analysis, applications, and prospects," IEEE transactions on neural networks and learning systems, vol. 33, no. 12, 2022, pp. 6999-7019. https://doi.org/10.1109/TNNLS.2021.3084827.
[58]   S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M. L. Shyu, S. C. Chen, and S. S. Iyengar, "A survey on deep learning: Algorithms, techniques, and applications," ACM Computing Surveys (CSUR), vol. 51, no. 5, 2018, pp. 1-36. https://doi.org/10.1145/3234150.
[59]   S. Hochreiter, "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, 1998, pp. 107-116. https://doi.org/10.1142/S0218488598000094.
[60]   R. Cai, B. Qin, Y. Chen, L. Zhang, R. Yang, S. Chen, and W. Wang, "Sentiment analysis about investors and consumers in energy market based on BERT-BiLSTM," IEEE access, vol. 8, 2020, pp. 171408-171415. https://doi.org/10.1109/ACCESS.2020.3024750.
[61]   A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, "A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis," Multimedia Tools and Applications, vol. 78, 2019, pp. 26597-26613. https://doi.org/10.1007/s11042-019-07788-7.
[62]   L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamaría, M. A. Fadhel, M. Al-Amidie, and L. Farhan,  "Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions," Journal of big Data, vol. 8, 2021. https://doi.org/10.1186/s40537-021-00444-8.
[63]   D. Dessì, D. R. Recupero, and H. Sack, "An assessment of deep learning models and word embeddings for toxicity detection within online textual comments," Electronics, vol. 10, no. 7, 2021, p. 779. https://doi.org/10.3390/electronics10070779.
[64]   H. H. Saeed, K. Shahzad, and F. Kamiran, "Overlapping toxic sentiment classification using deep neural architectures," in 2018 IEEE international conference on data mining workshops (ICDMW), IEEE, 2018, pp. 1361-1366. https://doi.org/10.1109/ICDMW.2018.00193.
[65]   T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. MIT press, 2022.