A Movie Recommender System Based on Topic Modeling using Machine Learning Methods

Document Type : Original Article


1 MSc, Computer Engineering Department, Bu-Ali Sina University, Hamedan, Iran

2 Ph.D. Candidate, Computer Engineering Department, Bu-Ali Sina University, Hamedan, Iran

3 Associate Professor, Computer Engineering Department, Bu-Ali Sina University


In recent years, we have seen an increase in the production of films in a variety of categories and genres. Many of these products contain concepts that are inappropriate for children and adolescents. Hence, parents are concerned that their children may be exposed to these products. As a result, a smart recommendation system that provides appropriate movies based on the user's age range could be a useful tool for parents. Existing movie recommender systems use quantitative factors and metadata that lead to less attention being paid to the content of the movies. This research is motivated by the need to extract movie features using information retrieval methods in order to provide effective suggestions. The goal of this study is to propose a movie recommender system based on topic modeling and text-based age ratings. The proposed method uses latent Dirichlet allocation (LDA) modelling to identify hidden associations between words, document topics, and the levels of expression of each topic in each document. Machine learning models are then used to recommend age-appropriate movies. It has been demonstrated that the proposed method can determine the user's age and recommend movies based on the user's age with 93% accuracy, which is highly satisfactory.


  • Shafaei, M., et al., Rating for parents: Predicting children suitability rating for movies based on language of the movies. arXiv preprint arXiv:1908.07819, 2019.
  • Hofstätter, S., et al., Improving efficient neural ranking models with cross-architecture knowledge distillation. arXiv preprint arXiv:2010.02666, 2020.
  • Chen, S., et al. Fine-grained video-text retrieval with hierarchical graph reasoning. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
  • Singla, R., et al. FLEX: a content based movie recommender. in 2020 International Conference for Emerging Technology (INCET). 2020. IEEE.
  • Goyani, M. and N. Chaurasiya, A Review of Movie Recommendation System. ELCVIA: electronic letters on computer vision and image analysis, 2020. 19(3): p. 18-37.
  • Katarya, R. and O.P. Verma, An effective collaborative movie recommender system with cuckoo search. Egyptian Informatics Journal, 2017. 18(2): p. 105-112.
  • Subramaniyaswamy, V., et al., A personalised movie recommendation system based on collaborative filtering. International Journal of High Performance Computing and Networking, 2017. 10(1-2): p. 54-63.
  • Reddy, S., et al., Content-based movie recommendation system using genre correlation, in Smart Intelligent Computing and Applications. 2019, Springer. p. 391-397.
  • Jelodar, H., et al., Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 2019. 78(11): p. 15169-15211.
  • Srifi, M., et al., Recommender systems based on collaborative filtering using review texts—A survey. Information, 2020. 11(6): p. 317.
  • Lehinevych, T., et al. Discovering similarities for content-based recommendation and browsing in multimedia collections. in 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems. 2014. IEEE.
  • Cataltepe, Z., M. ULUYAĞMUR, and E. TAYFUR, Feature selection for movie recommendation. Turkish Journal of Electrical Engineering & Computer Sciences, 2016. 24(3): p. 833-848.
  • Chen, H.-W., et al. Fully content-based movie recommender system with feature extraction using neural network. in 2017 International conference on machine learning and cybernetics (ICMLC). 2017. IEEE.
  • Ibrahim, Z.A.A., S. Haidar, and I. Sbeity, Large-scale Text-based Video Classification using Contextual Features. European Journal of Electrical Engineering and Computer Science, 2019. 3(2).
  • Khattar, D., et al. Mvae: Multimodal variational autoencoder for fake news detection. in The world wide web conference. 2019.
  • Salton, G. and C. Buckley, Term-weighting approaches in automatic text retrieval. Information processing & management, 1988. 24(5): p. 513-523.
  • Terzi, M., et al. Text-based user-knn: Measuring user similarity based on text reviews. in International Conference on User Modeling, Adaptation, and Personalization. 2014. Springer.
  • Xia, H., et al., Sentiment analysis for online reviews using conditional random fields and support vector machines. Electronic Commerce Research, 2020. 20(2): p. 343-360.
  • Zoghbi, S., I. Vulić, and M.-F. Moens, Latent Dirichlet allocation for linking user-generated content and e-commerce data. Information Sciences, 2016. 367: p. 573-599.
  • Chehal, D., P. Gupta, and P. Gulati, Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations. Journal of Ambient Intelligence and Humanized Computing, 2021. 12(5): p. 5055-5070.
  • Qiu, L., et al., Aspect-based latent factor model by integrating ratings and reviews for recommender system. Knowledge-Based Systems, 2016. 110: p. 233-243.
  • Wang, H. and N. Luo, Collaborative filtering enhanced by user free-text reviews topic modelling. 2014.
  • Anwar, A., G.I. Salama, and M. Abdelhalim. Video classification and retrieval using arabic closed caption. in ICIT 2013 The 6th International Conference on Information Technology VIDEO. 2013.
  • Lee, C.G., Text-based video genre classification using multiple feature categories and categorization methods. 2017.
  • Katsiouli, P., V. Tsetsos, and S. Hadjiefthymiades. Semantic Video Classification Based on Subtitles and Domain Terminologies. in KAMC. 2007.
  • Wehrmann, J. and R.C. Barros, Movie genre classification: A multi-label approach based on convolutions through time. Applied Soft Computing, 2017. 61: p. 973-982.
  • Fourati, M., A. Jedidi, and F. Gargouri. Automatic identification Genre of audiovisual documents. in 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). 2014. IEEE.
  • Hong, H.-Z. and J.-I.G. Hwang. Multimodal PLSA for movie genre classification. in International Workshop on Multiple Classifier Systems. 2015. Springer.
  • Saumya, S., J. Kumar, and J.P. Singh, Genre fraction detection of a movie using text mining, in Advanced Computing and Systems for Security. 2018, Springer. p. 167-177.
  • Bougiatiotis, K. and T. Giannakopoulos, Enhanced movie content similarity based on textual, auditory and visual information. Expert Systems with Applications, 2018. 96: p. 86-102.
  • Kundalia, K., Y. Patel, and M. Shah, Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augmented Human Research, 2020. 5(1): p. 1-9.
  • Mangolin, R.B., et al., A multimodal approach for multi-label movie genre classification. Multimedia Tools and Applications, 2020: p. 1-26.
  • Shafaei, M., et al., A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers. arXiv preprint arXiv:2101.11704, 2021.
  • Watanabe, K., et al. Movie Rating Estimation Based on Weakly Supervised Multi-modal Latent Variable Model. in 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE). 2021. IEEE.
  • Maragatham, G. Movie Rating System based on Blockchain. in 2021 International Conference on Computer Communication and Informatics (ICCCI). 2021. IEEE.
  • Luhmann, J., M. Burghardt, and J. Tiepmar, SubRosa: Determining Movie Similarities based on Subtitles. INFORMATIK 2020, 2021.
  • Rehurek, R. and P. Sojka. Software framework for topic modelling with large corpora. in In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. 2010. Citeseer.
  • Film Rating," Motion Picture Association of America, [Online]. Available: http://www.mpaa.org/ratings. [Accessed 05 08 2022].
  • Chambua, J. and Z. Niu, Review text based rating prediction approaches: preference knowledge learning, representation and utilization. in Artificial Intelligence Review, 2021. 54(2): p. 1171-1200.
  • Rajendran, D P D. and Rangaraja P. S. Using topic models with browsing history in hybrid collaborative filtering recommender system: Experiments with user ratings. in International Journal of Information Management Data Insights, 2021. 1(2): 100027.
  • Samsir, S., et al. Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database. in Building of Informatics, Technology and Science (BITS), 2022. 4 (1): p. 1-6.
  • Martinez, Victor R., et al. Violence rating prediction from movie scripts. in Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019.


 Mojtaba Kordabadi received his B.Sc. in software engineering in 2009 from Hamedan University of Applied Sciences. He graduated with a MSc degree in artificial intelligence from Buali Sina University, Hamedan. He is a teacher of computer courses at Hamedan Technical and Vocational University. His research interests include machine learning, recommender systems, data mining.

 Amin Nazari received BSc degree in Computer Software Engineering from Islamic Azad University, Hamedan, in 2009. He received his MSc degree in Computer Software Engineering from Arak University, Arak, in 2015. He is now a Ph.D. candidate of artificial intelligence at the Bu-Ali Sina University, Hamedan. His research interests include wireless sensor networks, the Internet of Things, IoT-fog networks and recommender systems.

 Muharram Mansoorizadeh is an associate professor at the Computer Engineering Department of Bu-Ali Sina University. He received his BSc degree in software engineering from the University of Isfahan, Isfahan, Iran, in 2001, and his MSc degree in software engineering and the PhD in computer engineering from Tarbiat Modares University, Tehran, Iran, in 2004 and 2010, respectively. His current research interests include machine learning, affective computing and information retrieval.