International Journal of Web Research

International Journal of Web Research

Transformer-Based Personality Trait Recognition Enhanced by Contextual Augmentation

Document Type : Original Article

Authors
Department of Computer Engineering, Central Tehran Branch, Islamic Azad University, Tehran, Iran;
Abstract
psychological research, it often suffers from label interference, vocabulary-driven overfitting, and limited labeled datasets. As a result, models are brittle: they can fail with small training samples and behave inconsistently across trait ranges. To address this, we employ a practical single-trait approach that uses five independent ELECTRA-based classifiers, each corresponding to one of the big five dimensions, and trained them as separate binary tasks to prevent cross-trait interference. To reduce lexical bias and double the Pennebaker and King essay corpus from 2,467 to 4,934 samples, the team applied careful synonym-replacement augmentation using WordNet and additionally incorporated contextual augmentation generated by the Gemma model. Models were adjusted methodically to ensure fair comparisons. With test AUCs above 0.75, the ensemble achieves an average test accuracy of 0.724 on the Pennebaker and King benchmark, with per-trait accuracies of 0.72, 0.71, 0.74, 0.73, and 0.72 for openness, conscientiousness, extraversion, agreeableness, and neuroticism (OCEAN), respectively. These results substantially reduce inter-trait interference while matching or surpassing LIWC baselines and other transformer approaches.
Keywords

Subjects


N. M. Aljuhani, A. A.-M. Al-Ghamdi, H. S. Alghamdi, and F. Saleem, “Convolutional Bi-LSTM for Automatic Personality Recognition from Social Media Texts”, IEEE Access, 2025, vol. 13, pp. 65582-65603. https://doi.org/10.1109/ACCESS.2025.3558714.
[2]     M. Lukac, “Speech-based personality prediction using deep learning with acoustic and linguistic embeddings”, Scientific Reports, 2024, vol. 14, p. 30149. https://doi.org/10.1038/s41598-024-81047-0.
[3]     H. Bhin and J. Choi, “Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features”, Electronics, 2025, vol. 14, p. 2837. https://doi.org/10.3390/electronics14142837.
[4]     A. Feher and P. A. Vernon, “Looking beyond the Big Five: A selective review of alternatives to the Big Five model of personality”, Personality and Individual Differences, 2021, vol. 169, p. 110002. https://doi.org/10.1016/j.paid.2020 .110002.
[5]     M. J. Shayegan and M. Valizadeh, “A method for identifying personality traits in telegram”, in 8th International Conference on Web Research (ICWR), 2022: IEEE, pp. 88-93. https://doi.org/10.1109/ICWR54782. 2022.9786253.
[6]     M. Yang, J. Kim, M. Kim, and J. Han, ““What is your MBTI?”: Predicting the Personality Types using Hierarchical Attention and Graph Learning”, Expert Systems with Applications, 2025, vol. 297, p. 129295. https://doi.org/10.1016/j.eswa.2025.129295.
[7]     H.-Y. Suen, K.-E. Hung, and C.-L. Lin, “TensorFlow-based automatic personality recognition used in asynchronous video interviews”, IEEE Access, 2019, vol. 7, pp. 61018-61023. https://doi.org/10.1109/ACCESS.2019.2902863.
[8]     A. Rasouli, E. Sadraiye, O. Ghahroodi, H. Rabiee, and E. Asgari, “AIMA at SemEval-2025 Task 1: Bridging text and image for idiomatic knowledge extraction via mixture of experts”, in Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2025, pp. 2270-2275. https://aclanthology.org/2025.semeval-1.296.
[9]     H. Saberi, S. Ghofrani, and R. Ravanmehr, “Personality Recognition Using Transformer Model: A Study on the Big Five Traits”, in 11th International Conference on Web Research (ICWR), 2025: IEEE, pp. 228-234. https://doi.org/10.1109/ICWR65219.2025.11006181.
[10]   D. Jain, R. Beniwal, and A. Kumar, “Advancements in personality detection: unleashing the power of transformer-based models and deep learning with static embeddings on English personality quotes”, International Journal of All Research Education & Scientific Methods, 2024, vol. 12, pp. 2235-2251. https://doi.org/10.56025/IJARESM.2023.1201242235.
[11]   A. Naz, H. U. Khan, A. Bukhari, B. Alshemaimri, A. Daud, and M. Ramzan, “Machine and deep learning for personality traits detection: a comprehensive survey and open research challenges”, Artificial Intelligence Review, 2025, vol. 58, p. 239. https://doi.org/10.1007/s10462-025-11245-3.
[12]   Y. O. Sharrab, H. Attar, M. A. H. Eljinini, Y. Al-Omary, and W. a. Al-Momani, “Advancements in Speech Recognition: A Systematic Review of Deep Learning Transformer Models, Trends, Innovations, and Future Directions”, IEEE Access, 2025, vol. 13, pp. 46925-46940. https://doi.org/10.1109/ACCESS.2025.3550855.
[13]   K.-M. Shum, M. Ptaszynski, and F. Masui, “Big Five Personality Trait Prediction Based on User Comments”, Information, 2025, vol. 16, p. 418. https://doi.org/10.3390/info16050418.
[14]   E. F. Tsani and D. Suhartono, “Personality identification from social media using ensemble BERT and RoBERTa”, Informatica, 2023, vol. 47, pp. 537-544. https://doi.org/10.31449/inf.v47i4.4771.
[15]   A. Naz, H. U. Khan, T. Alsahfi, M. Alhajlah, B. Alshemaimri, and A. Daud, “Using transformers and Bi-LSTM with sentence embeddings for prediction of openness human personality trait”, PeerJ Computer Science, 2025, vol. 11, p. 38. https://doi.org/10.7717/peerj-cs.2781.
[16]   A. Guo, R. Hirai, A. Ohashi, Y. Chiba, Y. Tsunomori, and R. Higashinaka, “Personality prediction from task-oriented and open-domain human–machine dialogues”, Scientific Reports, 2024, vol. 14, p. 3868. https://doi.org/ 10.1038/s41598-024-53989-y.
[17]   M. A. Akber, T. Ferdousi, R. Ahmed, R. Asfara, R. Rab, and U. Zakia, “Personality and emotion—A comprehensive analysis using contextual text embeddings”, Natural Language Processing Journal, 2024, vol. 9, p. 100105. https://doi.org/10.1016/j.nlp.2024.100105.
[18]   H. Bousselham and A. Mourhir, “Fine-tuning GPT on biomedical NLP tasks: an empirical evaluation”, in 2024 International Conference on Computer, Electrical & Communication Engineering (ICCECE), 2024: IEEE, pp. 1-6. https://doi.org/10.1109/ICCECE58645.2024.10497313.
[19]   H. Xu et al., “Temporal Shift for Personality Recognition with Pre-Trained Representations”, in 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2024: IEEE, pp. 446-450. https://doi.org/10.1109/ISCSLP63861.2024.10799950.
[20]   T. Agrawal, D. Agarwal, M. Balazia, N. Sinha, and F. Bremond, “Multimodal personality recognition using cross-attention transformer and behaviour encoding”, arXiv preprint arXiv:2112.12180, 2021. https://doi.org/10.48550/arXiv.2112.12180.
[21]   G. B. Mohan, R. P. Kumar, R. E, and S. Gorantla, “Enhancing Personality Classification through Textual Analysis: A Deep Learning Approach Utilizing MBTI and Social Media Data”, in 2023 International Conference on Network, Multimedia and Information Technology (NMITCON), 2023: IEEE, pp. 01-06. https://doi.org/10.1109/NMITCON58196.2023.10276193.
[22]   C. Yuan, J. Wu, H. Li, and L. Wang, “Personality recognition based on user generated content”, in 15th International Conference on Service Systems and Service Management (ICSSSM), 2018: IEEE, pp. 1-6. https://doi.org/10.1109/ICSSSM.2018.8465006.
[23]   C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable machine learning–a brief history, state-of-the-art and challenges”, in Joint European conference on machine learning and knowledge discovery in databases, 2020: Springer, pp. 417-431. https://doi.org/10.1007/978-3-030-65965-3_28.
[24]   A. R. Sajun, I. Zualkernan, and D. Sankalpa, “A historical survey of advances in transformer architectures”, Applied Sciences, 2024, vol. 14, p. 4316. https://doi.org/ 10.3390/app14104316.
[25]   P. T. Costa Jr and R. R. McCrae, “The five-factor model of personality and its relevance to personality disorders”, Journal of personality disorders, 1992, vol. 6, pp. 343-359. https://doi.org/10.1521/pedi.1992.6.4.343.
[26]   M. Fatahian and R. Ravanmehr, “Personality Recognition in Social Media using Sentence Embeddings Based on Transformer Networks”, SN Computer Science, 2025, vol. 6, pp. 1-22. https://doi.org/10.1007/s42979-025-04326-1.
[27]   E. Kerz, Y. Qiao, S. Zanwar, and D. Wiechmann, “Pushing on personality detection from verbal behavior: A transformer meets text contours of psycholinguistic features”, arXiv preprint arXiv:2204.04629, 2022. https://doi.org/10.48550/arXiv.2204.04629.
[28]   F. Elourajini and E. Aïmeur, “AWS-EP: a multi-task prediction approach for MBTI/Big5 Personality Tests”, in 2022 IEEE International Conference on Data Mining Workshops (ICDMW), 2022: IEEE, pp. 1-8. https://doi.org/10.1109/ICDMW58026.2022.00049.
[29]   Y. Ji, W. Wu, H. Zheng, Y. Hu, X. Chen, and L. He, “Is chatgpt a good personality recognizer? a preliminary study”, arXiv preprint arXiv:2307.03952, 2023. https://doi.org/10.48550/arXiv.2307.03952.
[30]   M. Sobhi and A. Mazochi, “A Comparative Study of BERT-X for Sentiment Analysis and Stance Detection in Persian Social Media”, International Journal of Information & Communication Technology Research, 2024, vol. 16, pp. 9-18. https://doi.org/10.61186/itrc.16.3.9.
[31]   S. Leonardi, D. Monti, G. Rizzo, and M. Morisio, “Multilingual transformer-based personality traits estimation” Information 2020, vol. 11, p. 179. https://doi.org/10.3390/info11040179.
[32]   Hasan, K., Saquer, J. and Ghosh, M., “Advancing mental disorder detection: A comparative evaluation of transformer and lstm architectures on social media”, in 2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC), 2025: IEEE, pp. 193-202. https://doi.org/10.1109/COMPSAC65507.2025.00033.
[33]   C. M. Greco, A. Simeri, A. Tagarelli, and E. Zumpano, “Transformer-based language models for mental health issues: a survey”, Pattern Recognition Letters, 2023, vol. 167, pp. 204-211. https://doi.org/10.1016/j.patrec.2023.02.016.
[34]   N. Gholinejad and M. H. Chehreghani, “Heterophily-aware fair recommendation using graph convolutional networks”, arXiv preprint arXiv:2402.03365, 2024. https://doi.org/ 10.48550/arXiv.2402.03365.
[35]   S. Dhelim, N. Aung, M. A. Bouras, H. Ning, and E. Cambria, “A survey on personality-aware recommendation systems”, Artificial Intelligence Review, 2022, vol. 55, pp. 2409-2454. https://doi.org/10.1007/s10462-021-10063-7.
[36]   D. Fernau, S. Hillmann, N. Feldhus, T. Polzehl, and S. Möller, “Towards personality-aware chatbots”, in Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022, pp. 135-145. https://doi.org/10.18653/v1/2022.sigdial-1.15.
[37]   S. Garg, S. Sinha, A. K. Kar, and M. Mani, “A review of machine learning applications in human resource management”, International Journal of Productivity and Performance Management, 2022, vol. 71, pp. 1590-1610. https://doi.org/10.1108/IJPPM-08-2020-0427.
[38]   D. K. Kothari and O. N. N. Fernando, “Enhancing human-computer interaction through ai: A study on chatgpt in educational environments”, in 2024 IEEE Conference on Artificial Intelligence (CAI), 2024: IEEE, pp. 500-503. https://doi.org/10.1109/CAI59869.2024.00100.
[39]   Z. Liu et al., “Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education”, Scientific Data, 2025, vol. 12, p.514. https://doi.org/10.1038/s41597-025-04836-w.
[40]   D. Han, T. McInroe, A. Jelley, S. V. Albrecht, P. Bell, and A. Storkey, “Llm-personalize: Aligning llm planners with human preferences via reinforced self-training for housekeeping robots”, arXiv preprint arXiv:2404.14285, 2024. https://doi.org/10.48550/arXiv.2404.14285.
[41]   H. K. Jach, L. Bardach, and K. Murayama, “How personality matters for education research”, Educational Psychology Review, 2023, vol. 35, p. 94. https://doi.org/10.1007/s10648-023-09807-4.
[42]   J. Hui, C. W. Espinola, T. Rodak, and D. M. Blumberger, “Electroconvulsive therapy in patients with trauma and personality disorders: what is the evidence?”, Expert Review of Neurotherapeutics, 2025, pp. 1-33. https://doi.org/10.1080/14737175.2025.2542759.
[43]   L. V. Phan and J. F. Rauthmann, “Personality computing: New frontiers in personality assessment”, Social and personality psychology compass, 2021, vol. 15, p. e12624, 2021. https://doi.org/10.1111/spc3.12624.
[44]   M. Hashemi, A. Darejeh, and F. Cruz, “Understanding User Preferences in Explainable Artificial Intelligence: A Mapping Function Proposal”, ACM Transactions on Intelligent Systems and Technology, 2025, vol. 16, pp. 1-37. https://doi.org/10.1145/3733837.
[45]   J. W. Pennebaker and L. A. King, “Linguistic styles: language use as an individual difference”, Journal of personality and social psychology, 1999, vol. 77, p. 1296. https://doi.org/10.1037//0022-3514.77.6.1296.
[46]   J. Killian Jr and R. Sun, “Detecting big-5 personality dimensions from text based on large language models”, in International Conference on Deep Learning Theory and Applications, 2024: Springer, pp. 264-278. https://doi.org/10.1007/978-3-031-66705-3_18.
[47]   K. Clark, “Electra: Pre-training text encoders as discriminators rather than generators”, arXiv preprint arXiv:2003.10555, 2020. https://doi.org/10.48550/arXiv.2003.10555.
[48]   H. Perera and L. Costa, “Personality Classification of text through Machine learning and Deep learning: A Review (2023)”, Authorea Preprints, 2023. https://doi.org/10.36227/techrxiv.22337746.v1.
[49]   H. Bashiri and H. Naderi, “Comprehensive review and comparative analysis of transformer models in sentiment analysis”, Knowledge and Information Systems, 2024, vol. 66, pp. 7305-7361. https://doi.org/10.1007/s10115-024-02214-3.
[50]   A. Taheri, A. Zamanifar, and A. Farhadi, “Enhancing aspect-based sentiment analysis using data augmentation based on back-translation”, International Journal of Data Science and Analytics, 2025, vol. 19, pp. 491-516. https://doi.org/10.1007/s41060-024-00622-w.
[51]   H. Dai et al., “Auggpt: Leveraging chatgpt for text data augmentation”, IEEE Transactions on Big Data, 2025, vol. 11, pp. 907-918. https://doi.org/10.1109/TBDATA.2025.3536934.
[52]   M. A. Khan, M. S. Khan, I. Khan, S. Ahmad, and S. Huda, “Non functional requirements identification and classification using transfer learning model”, IEEE Access, 2023, vol. 11, pp. 74997-75005. https://doi.org/10.1109/ACCESS.2023.3295238.
[53]   R. Ravanmehr and R. Mohamadrezaei, “Deep learning overview”, in Session-Based Recommender Systems Using Deep Learning, 2023: Springer, pp. 27-72. https://doi.org/10.1007/978-3-031-42559-2_2.
[54]   M. Ramezani, M.-R. Feizi-Derakhshi, and M.-A. Balafar, “Text-based automatic personality prediction using KGrAt-Net: a knowledge graph attention network classifier”, Scientific reports, 2022, vol. 12, p. 21453. https://doi.org/10.1038/s41598-022-25955-z.
[55]   M. Ramezani, M.-R. Feizi-Derakhshi, and M.-A. Balafar, “Knowledge Graph‐Enabled Text‐Based Automatic Personality Prediction”, Computational intelligence and neuroscience, 2022, vol. 2022, p. 3732351. https://doi.org/10.1155/2022/3732351.
[56]   Z. Wang, C.-H. Wu, Q.-B. Li, B. Yan, and K.-F. Zheng, “Encoding text information with graph convolutional networks for personality recognition”, Applied sciences, 2020, vol. 10, p. 4081. https://doi.org/10.3390/app10124081.
[57]   M. Ramezani et al., “Automatic personality prediction: an enhanced method using ensemble modeling”, Neural Computing and Applications, 2022, vol. 34, pp. 18369-18389. https://doi.org/10.1007/s00521-022-07444-6.
[58]   X. Xue, J. Feng, and X. Sun, “Semantic-enhanced sequential modeling for personality trait recognition from texts”, Applied Intelligence, 2021, vol. 51, pp. 7705-7717. https://doi.org/10.1007/s10489-021-02277-7.