Document Type : Original Article
Department of Computer Eng., Shahrekord University, Shahrekord, Iran
Department of Computer Engineering, Faculty of Engineering Shahrekord University Shahrekord, Iran
Computer Engineering Dept., Shahrekord University, Shahrekord, Iran
With the fast growth of social media, they have become the most important platform for posting multimodal content generated by users. Much of the data on social networks such as Instagram and Telegram is multimodal data. With the aim of analyzing such multimodal data in social networks, multimodal sentiment analysis has become one of the most significant subjects for researchers in the field of emotion recognition and data mining. Although multimodal sentiment analysis of social media data for English language has been addressed in several researches recently, few studies addressed the problem for the Persian language which is the official language of more than 120 million of people around the word. In this study, a multimodal deep learning model is proposed to address this problem. The proposed method utilizes a bi-directional long short-term memory (bi-LSTM) for processing text posts and a VGG16 convolutional network for analyzing images. A new dataset of Instagram and Telegram posts, MPerSocial, containing 1000 pairs of images and Persian comments is introduced in the current study and used for evaluating the proposed method. The results of experiments show that using the fusion of textual and image modalities improves sentiment polarity detection accuracy by 20% and 8% compared with the scenario in which image and text modalities in isolation. Also, the performance of the proposed model is better than three similar deep and four traditional machine learning models. All codes and dataset used in the current study are publicly available at GitHub.