International Journal of Web Research

International Journal of Web Research

From Cover to Story: AI-Driven Genre Classification and Illustrated Narrative Creation for Children's Literature

Document Type : Original Article

Authors
Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran;
Abstract
Storytelling is a fundamental pillar of childhood development, where visual narratives play a crucial role in enhancing engagement and cognitive processing. While Generative Artificial Intelligence (GAI) has revolutionized content creation, its application for automated story generation from book covers remains largely unexplored. This study presents an innovative pipeline that combines computer vision for genre classification with GAI to create tailored illustrated stories. After evaluating four deep learning architectures widely used in image classification tasks, ConvNeXt-Tiny was selected as the final model, achieving a Weighted F1-score of 0.6898 in categorizing children's books into 13 distinct genres through cover image analysis. To address the lack of benchmark datasets, we compiled and rigorously validated a specialized collection of 4,085 Persian children's book covers. The proposed system leverages both cover design elements and predicted genre features within structured prompts to generate coherent illustrated stories through LLMs and image-synthesis models. A sample of 26 generated stories was qualitatively evaluated by three child psychologists based on narrative coherence, genre alignment, age appropriateness, character continuity, and visual congruence. This research makes significant contributions to both Persian literary analysis and AI-driven creative systems, demonstrating how machine learning can enhance educational storytelling while preserving cultural authenticity.
Keywords

Subjects


[1]     M. Sunderland, Using Story Telling as a Therapeutic Tool with Children, London: Routledge, 2017.
[2]     G. Trionfi and E. Reese, "A Good Story: Children With Imaginary Companions Create Richer Narratives," Child Development, vol. 80, pp. 1301-1313, 2009.
[3]     A. Nicolopoulou, "Children and Narratives," in Narrative Development, New York, Routledge, 1997, p. 37.
[4]     B. Seuling, How to write a children's book and get it published, New York City: Wiley, 2004.
[5]     M. Evans and J. Saint-Aubin, "What children are looking at during shared storybook reading," Psychol Sci., vol. 16, pp. 913-920, 2005.
[6]     R. E. Mayer, Multimedia Learning, 3rd ed., Cambridge: Cambridge University Press, 2020.
[7]     Y. Li, X. Zhiding, H. Wenxin and Z. Xian, "Enhancing Visual Storytelling with Multi-Modal Large Language Models," in 31st International Conference on Computational Linguistics, Abu Dhabi, 2025.
[8]     G. R. Biradar, R. JM, A. Varier and M. Sudhir, "Classification of Book Genres Using Book Cover and Title," in IEEE International Conference on Intelligent Systems and Green Technology (ICISGT), Visakhapatnam, 2019.
[9]     P. Buczkowski, A. Sobkowicz and M. Kozlowski, "Deep Learning Approaches towards Book Covers Classification," in Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, Funchal, Madeira, 2018.
[10]   R. Jayaram, M. Harshitha, S. Pavithra, B. Munshira Noor and K. J. Bhanushree, "Classifying Books by Genre Based on Cover," International Journal of Engineering and Advanced Technology, vol. 9, no. 5, pp. 530-535, 30 June 2020.
[11]   C. S. Kundu, "Book Genre Classification By Its Cover Using A Multi-View Learning Approach," Masters Theses & Specialist Projects, Kentucky, 2020.
[12]   S. Sung and R. Chokshi, "Classification of movie posters to movie genres," California, 2018.
[13]   S. Oramas, O. Nieto, F. Barbieri and X. Serra, "Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features," in 18th International Society for Music Information Retrieval Conference, Suzhou, 2017.
[14]   J. Li, D. Sun and T. Cai, "Genre Classification via Album Cover," Stanford University, California, 2019.
[15]   J. A. Wi, S. Jang and Y. Kim, "Poster-Based Multiple Movie Genre Classification Using Inter-Channel Features," IEEE Access, vol. 8, pp. 66615-66624, 2020.
[16]   J. Kim and H.-J. Suk, "Prediction of the Emotion Responses to Poster Designs based on Graphical Features: A Machine Learning-Driven Approach," Archives of Design Research, vol. 33, no. 2, pp. 39-55, 31 May 2020.
[17]   U. K. Nareti, C. Adak and S. Chattopadhyay, "Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification," IEEE Transactions on Computational Social Systems, Doi: 10.1109/TCSS.2024.3481157, 2024.
[18]   S. Pooranalingam, "Film Poster Design: Understanding Film Poster Designs and the Compositional Similarities within Specific Genres," Spectrum, no. 12, 8 January 2024.
[19]   L. Xiaochuan and C. Xiangyong, "Improving Visual Storytelling with Multimodal Large Language Models," arXiv:2407.02586, 2024.
[20]   C. Zang, J. Tang, R. Zhang, Z. Zhao, T. Lv, M. Pei and W. Liang, "Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller," arXiv:2403.07301, 2024.
[21]   S. Yang, Y. Ge, Y. LI, Y. Chen, Y. Ge, Y. Shan and Y.-C. Chen, "SEED-Story: Multimodal Long Story Generation with Large Language Model," CoRR abs/2407.08683, 2024.
[22]   T. Huang, E. Qasemi, B. Li, H. Wang, F. Brahman, M. Chen and S. Chaturvedi, "Affective and Dynamic Beam Search for Story Generation," in Findings of the Association for Computational Linguistics: EMNLP, Singapore, 2023.
[23]   A. Alabdulkarim, W. Li, L. J. Martin and M. O. Riedl, "Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning," 10.48550/arXiv.2112.08593, 2021.
[24]   J. Canary, "Transfer Learning: Leveraging Pretrained Models - Jim Canary - Medium," Medium, 28 1 2025. [Online]. Available: https://medium.com/%40jimcanary/transfer-learning-leveraging-pretrained-models-153ab99b9b00.
[25]   K. Juntae, H. Yoonseok, Y. Hogeon and N. Jongho, "A Multi-Modal Story Generation Framework with AI-Driven Storyline Guidance," Electronics, vol. 12, no. 6, p. 1289, 2023.
[26]   J.-B. Alayrac et al., “Flamingo: a visual language model for few-shot learning,” Advances in Neural Information Processing Systems (NeurIPS), 2022.
[27]   D. Driess et al., “PaLM-E: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023.
[28]   D. Zhu et al., “MiniGPT-4: Enhancing vision-language understanding with advanced large language models,” arXiv preprint arXiv:2304.10592, 2023.
[29]   H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” Advances in Neural Information Processing Systems, vol. 36, pp. 34892–34916, 2023.
[30]   B. Hejazi, Children's and Adolescents' Literature: Features and Aspects (In Persian), Tehran: Roshangaran Publications, 2023.
[31]   C. Zauner, "pHash – Perceptual Hash Library," [Online]. Available: https://phash.org/docs/design.html.
[32]   M. F. Uddin, "Addressing Accuracy Paradox Using Enhanched Weighted Performance Metric in Machine Learning," in Sixth HCT Information Technology Trends (ITT), United Arab Emirates, 2019.
[33]   X.-Z. Wu and Z.-H. Zhou, "A Unified View of Multi-Label Performance Measures," in 34th International Conference on Machine Learning, Sydney, 2017.
[34]   L. M. Justice and P. C. Pullen, "Promising Interventions for Promoting Emergent Literacy Skills: Three Evidence-Based Approaches," Intervention in Scholl and Clinic, vol. 39, p. 87–98, 2003.
[35]   L. R. Buccieri and P. Economy, Writing Children's Books for Dummies, Wiley, 2011.
[36]   S. Earnshaw, The Handbook of Creative Writing, Edinburgh: Edinburgh University Press Ltd, 2014.
[37]   H. Rahim and M. D. H. Rahiem, "The Use of Stories as Moral Education for Young Children," International Journal of Social Science and Humanity, vol. 2, pp. 454-458, 2012.
[38]   Booka, "What Makes a Good Children’s Book: 10 Important Characteristics," [Online]. Available: https://appbooka.com/blog/what-makes-a-good-childrens-book.
[39]   Brett, "How to Tell Awesome Stories to Your Kids," Brett, 22 October 2020. [Online]. Available: https://www.artofmanliness.com/people/family/how-to-tell-awesome-stories-to-your-kids/. [Accessed 28 December 2024].
[40]   A. McCabe, "Developmental and Cross-Cultural Aspects of Children's Narration," in Narrative Development, New York, Routledge, 1997, p. 38.
[41]   J. R. Brown and J. Dunn, "Continuities in Emotion Understanding from Three to Six Years," Child Development, vol. 67, pp. 789-802, 1996.
[42]   M. N. Sala, F. Pons and P. Molina, "Emotion regulation strategies in preschool children," British Journal of Developmental Psychology, vol. 32, p. 10.1111/bjdp.12055, 2014.