A Deep Learning Model for Classifying Quality of User Replies

Document Type : Original Article

Authors

1 Department of Computer Eng., Shahrekord University, Shahrekord, Iran

2 Department of Computer Engineering, Faculty of Engineering Shahrekord University Shahrekord, Iran

3 Computer Engineering Dept., Shahrekord University, Shahrekord, Iran

Abstract

Q&A forums are designed to help users in finding useful information and accessing high-quality content posted by other users in text forums. Automatically identifying high-quality replies posted in response to the initial posts not only provides users with appropriate content, but also saves their time. Existing methods for classifying user replies based on their quality, try to extract quality features from both the textual content and metadata of the replies. This feature engineering step is a time and labor-intensive task. The current study addresses this problem by proposing new model based on deep learning for detecting quality user replies using only raw textual content. Specifically, we propose a long short-term memory (LSTM) model that exploits the embeddings from language models (ELMo) for representing words as contextual numerical vectors. We compared the effectiveness of the proposed model with four traditional machine learning models on the TripAdvisor for New York City (NYC) and the Ubuntu Linux distribution online forums datasets. Experimental results indicated that the proposed model significantly outperformed the four traditional algorithms on both datasets. Moreover, the proposed model achieved about 16% higher accuracy compared to that obtained by the traditional algorithms trained on both textual and quality dimension features.

Keywords

Main Subjects


  • Ullah, Fahim, and Samad ME Sepasgozar. "Key factors influencing purchase or rent decisions in smart real estate investments: A system dynamics approach using online forum thread data." Sustainability 12, no. 11 (2020): 4382.
  • Abdar, Moloud, Mohammad Ehsan Basiri, Junjun Yin, Mahmoud Habibnezhad, Guangqing Chi, Shahla Nemati, and Somayeh Asadi. "Energy choices in Alaska: Mining people's perception and attitudes from geotagged tweets." Renewable and Sustainable Energy Reviews 124 (2020): 109781.
  • Laitala, Kirsi, Ingun Grimstad Klepp, Vilde Haugrønning, Harald Throne-Holst, and Pål Strandbakken. "Increasing repair of household appliances, mobile phones and clothing: Experiences from consumers and the repair industry." Journal of Cleaner Production 282 (2021): 125349.
  • Aderibigbe, Semiyu Adejare. "Online Discussions as an Intervention for Strengthening Students’ Engagement in General Education." Journal of Open Innovation: Technology, Market, and Complexity 6, no. 4 (2020): 98.
  • Basiri, Mohammad Ehsan, Arman Kabiri, Moloud Abdar, Wali Khan Mashwani, Neil Y. Yen, and Jason C. Hung. "The effect of aggregation methods on sentiment classification in Persian reviews." Enterprise Information Systems 14, no. 9-10 (2020): 1394-1421.
  • Basiri, Mohammad Ehsan, Moloud Abdar, Arman Kabiri, Shahla Nemati, Xujuan Zhou, Forough Allahbakhshi, and Neil Y. Yen. "Improving sentiment polarity detection through target identification." IEEE Transactions on Computational Social Systems 7, no. 1 (2019): 113-128.
  • Sarrouti, Mourad, and Said Ouatik El Alaoui. "SemBioNLQA: a semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions." Artificial intelligence in medicine 102 (2020): 101767.
  • Osman, N. Salim, and F. Saeed, “Quality dimensions features for identifying high-quality user replies in text forum threads using classification methods,” PloS one, vol. 14, no. 5, p. e0215516, 2019.
  • Nemati, Shahla, Reza Rohani, Mohammad Ehsan Basiri, Moloud Abdar, Neil Y. Yen, and Vladimir Makarenkov. "A hybrid latent space data fusion method for multimodal emotion recognition." IEEE Access 7 (2019): 172948-172964.
  • Nemati, Shahla, and Ahmad Reza Naghsh-Nilchi. "Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval." In 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 222-228. IEEE, 2017.
  • Squire, Megan. "Should We Move to Stack Overflow?" Measuring the Utility of Social Media for Developer Support." In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, pp. 219-228. IEEE, 2015.
  • Sun, Jianling, Hui Gao, and Xiao Yang. "Towards a quality-oriented real-time web" In International Conference on Web Information Systems and Mining, pp. 67-76. Springer, Berlin, Heidelberg, 2010.
  • Weimer, Markus, and Iryna Gurevych. "Predicting the perceived quality of web forum posts." In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP), pp. 643-648. 2007.
  • Wanas, Nayer, Motaz El-Saban, Heba Ashour, and Waleed Ammar. "Automatic scoring of online discussion posts." In Proceedings of the 2nd ACM Workshop on Information Credibility on the Web, pp. 19-26. 2008.
  • Goldberg and O. Levy, “word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method,” arXiv preprint arXiv:1402.3722, 2014.
  • Reimers and I. Gurevych, “Alternative weighting schemes for elmo embeddings,” arXiv preprint arXiv:1904.02954, 2019.
  • Zaidi, Syed Farhan Alam, Faraz Malik Awan, Minsoo Lee, Honguk Woo, and Chan-Gun Lee. "Applying Convolutional Neural Networks With Different Word Representation Techniques to Recommend Bug Fixers." IEEE Access 8 (2020): 213729-213747.
  • Bond and K. Paik, “A survey of wordnets and their licenses,” Small, vol. 8, no. 4, p. 5, 2012.
  • 6atia and P. Mitra, “Adopting inference networks for online thread retrieval,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2010, vol. 24, no. 1.
  • -T. Lee, M.-C. Yang, and H.-C. Rim, “Discovering high-quality threaded discussions in online forums,” Journal of Computer Science and Technology, vol. 29, no. 3, pp. 519–531, 2014.
  • Obasa, Adekunle Isiaka, Naomie Salim, and Atif Khan. "Enhanced lexicon based model for web forum answer detection." In 2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC), pp. 237-243. IEEE, 2015.
  • Biyani, S. Bhatia, C. Caragea, and P. Mitra, “Using subjectivity analysis to improve thread retrieval in online forums,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9022, pp. 495–500.
  • Bhatia, P. Biyani, and P. Mitra, “Identifying the role of individual user messages in an online discussion and its use in thread retrieval,” Journal of the Association for Information Science and Technology, vol. 67, no. 2, pp. 276–288, Feb. 2016.
  • Heydari, M. Tavakoli, Z. Ismail, and N. Salim, “Leveraging quality metrics in voting model based thread retrieval,” World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 10, no. 1, pp. 117–123, 2016.
  • Osman, N. Salim, F. Saeed, and I. Abdelhamid, “Quality features for summarizing text forum threads by selecting quality replies,” in Advances in Intelligent Systems and Computing, 2019, vol. 843, pp. 47–56.
  • Fu, Hengyi, and Sanghee Oh. "Quality assessment of answers with user-identified criteria and data-driven features in social Q&A." Information Processing & Management 56, no. 1 (2019): 14-28.
  • M. Jiménez-Zafra, M. T. Mart in-Valdivia, M. D. Molina-González, and L. A. Ureña-López, “How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for medical domain,” Artificial intelligence in medicine, vol. 93, pp. 50–57, 2019.
  • Shukla, W. Wang, G. G.- Ritu, C. M. I. Y. Can, and undefined 2019, “Catch me if you can—Detecting fraudulent online reviews of doctors using deep learning,” papers.ssrn.com.
  • Parimala, R. M. Swarna Priya, M. Praveen Kumar Reddy, C. Lal Chowdhary, R. Kumar Poluru, and S. Khan, “Spatiotemporal-based sentiment analysis on tweets for risk assessment of event using deep learning approach,” in Software - Practice and Experience, 2020.
  • Minaee, Shervin, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. "Deep Learning--based Text Classification: A Comprehensive Review." ACM Computing Surveys (CSUR) 54, no. 3 (2021): 1-40.
  • Anhar, T. Adji, N. S.-2019 5th International, and undefined 2019, “Question Classification on Question-Answer System using Bidirectional-LSTM,” ieeexplore.ieee.org.
  • Keya, Mumenunnessa, Abu Kaisar Mohammad Masum, Bhaskar Majumdar, Syed Akhter Hossain, and Sheikh Abujar. "Bengali Question Answering System Using Seq2Seq Learning Based on General Knowledge Dataset." In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-6. IEEE, 2020.
  • Wang, Dong, Ying Shen, and Hai-Tao Zheng. "Knowledge Enhanced Latent Relevance Mining for Question Answering." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4282-4286. IEEE, 2020.
  • Gu, X. Luo, H. Wang, J. Huang, Q. Wei, and S. Huang, “Improving answer selection with global features,” Expert Systems, vol. 38, no. 1, Jan. 2021.
  • Yang, W.-T. Yih, and C. Meek, “WIKIQA: A Challenge Dataset for Open-Domain Question Answering,” Association for Computational Linguistics, 2015.
  • Basiri, Mohammad Ehsan, and Arman Kabiri. "HOMPer: A new hybrid system for opinion mining in the Persian language." Journal of Information Science 46, no. 1 (2020): 101-117.
  • Rao, W. Huang, Z. Feng, and Q. Cong, “LSTM with sentence representations for document-level sentiment classification,” Neurocomputing, vol. 308, no. 1, pp. 49–57, 2018.
  • Jiang, Changhui, Yuwei Chen, Shuai Chen, Yuming Bo, Wei Li, Wenxin Tian, and Jun "A mixed deep recurrent neural network for MEMS gyroscope noise suppressing." Electronics 8, no. 2 (2019): 181.
  • Manaswi, Navin Kumar. "Understanding and working with Keras." In Deep Learning with Applications Using Python, pp. 31-43. Apress, Berkeley, CA, 2018.

 

 

 Masoumeh Rajabi received her B.S. degree in software engineering from Arak university in 2015 and her M.S. from Shahrekord University in 2021. Her research interest includes natural language processing, deep learning, and data mining.

 Shahla Nemati was born in Shiraz, Iran in 1982. She received the B.S. degree in hardware engineering from Shiraz University, Shiraz, Iran, in 2005, the M.S. degree from Isfahan University of Technology, Isfahan, Iran, in 2008, and the Ph.D. degree in computer engineering from Isfahan University, Isfahan, Iran, in 2016. Since 2017, she has been an Assistant Professor with the Computer Engineering Department, Shahrekord University, Shahrekord, Iran. Her research interests include data fusion, affective computing, and data mining.

 Mohammad Ehsan Basiri received the B.S. degree in software engineering from Shiraz University, Shiraz, Iran, in 2006 and the M.S. and Ph.D. degrees in Artificial Intelligence from Isfahan University, Isfahan, Iran, in 2009 and 2014. Since 2014, he has been an Assistant Professor with the Computer Engineering Department, Shahrekord University, Shahrekord, Iran. He is the author of three books and more than 60 articles. His research interests include sentiment analysis, natural language processing, deep learning, and data mining.