FarsWikiKG: an Automatically Constructed Knowledge Graph for Persian

Document Type : Original Article

Authors

Amirkabir University of Technology

Abstract

We present FarsWikiKG, a Persian knowledge graph extracted from Wikipedia. Wikipedia infoboxes have been used as a valuable resource for building knowledge graphs in recent years. FarsWikiKG consists of more than 2 million entities, as well as 5.7 million facts about the entities. Using Wikidata, we constructed an ontology with more than 6000 classes representing entity types. As the second Persian knowledge graph, which has the ability of self-update, FarsWikiKG shows improvement on NLP tasks, especially question answering systems. Although FarsWikiKG is a dynamic knowledge graph, our evaluation shows a coverage of 90% on Persian Wikipedia pages. As Wikipedia information is constantly changing, a fixed knowledge graph can provide unstable data to the user. The proposed system, in addition to solving the problem of unstable data, reduces the need for experts to extract and construct knowledge graphs manually. Storing information in RDF as a standard method of storing knowledge graph information, FarsWikiKG allows NLP systems to run SPARQL queries on it.

Keywords


  • Rajpurkar, J. Zhang, K. Lopyrev and P. Liang, "SQuad: 100,000+ questions for machine comprehension of text," Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2383-2392.
  • Joshi, E. Choi, D. S. Weld and L. Zettlemoyer, "TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension," Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, 2017, pp. 1601-1611.
  • Huang, S. Xu, M. Hu, X. Wang and J. Qiu, "Recent trends in deep learning based open-domain textual question answering systems," IEEE Access, vol. 8, pp. 94341-94356, 2020.
  • Shin, X. Jin, J. Jung and K.-H. Lee, "Predicate constraints based question answering over knowledge graph," Information Processing and Management, vol. 56, pp. 445-462, 2019.
  • Huang, J. Zhang, D. Li and P. Li, "Knowledge graph embedding based question answering," Proceedings of the 12th ACM International Conference on Web Search and Data Mining, no. Ccl, pp. 105-113, 2019.
  • Lukovnikov, A. Fischer and J. Lehmann, "Pretrained Transformers for Simple Question Answering over Knowledge Graphs," In International Semantic Web Conference, Springer, Cham, 2019, pp. 470-486.
  • Guo, F. Zhuang, C. Qin, H. Zhu, X. Xie, H. Xiong and Q. He, "A Survey on Knowledge Graph-Based Recommender Systems," IEEE Transactions on Knowledge and Data Engineering, pp. 1-1, 2020.
  • Asgari-Bidhendi, A. Hadian and B. Minaei-Bidgoli, "Farsbase: The persian knowledge graph," Semantic Web, vol. 10, p. 1169–1196, 2019.
  • M. Suchanek, G. Kasneci and G. Weikum, "Yago: a core of semantic knowledge," in Proceedings of the 16th international conference on World Wide Web, 2007.
  • Hoffart, F. M. Suchanek, K. Berberich and G. Weikum, "YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia," Artificial Intelligence, vol. 194, p. 28–61, 2013.
  • Mahdisoltani, J. Biega and F. Suchanek, "Yago3: A knowledge base from multilingual wikipedias," in 7th biennial conference on innovative data systems research, 2014.
  • Pellissier Tanon, G. Weikum and F. Suchanek, "Yago 4: A reason-able knowledge base," in European Semantic Web Conference, Springer, Cham, 2020, pp. 583-596.
  • Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer et al., "Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia," Semantic web, vol. 6, p. 167–195, 2015.
  • Vrandečić and M. Krötzsch, "Wikidata: a free collaborative knowledgebase," Communications of the ACM, vol. 57, p. 78–85, 2014.
  • Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor, "Freebase: a collaboratively created graph database for structuring human knowledge," in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp. 1247-1250.
  • Xu, Y. Xu, J. Liang, C. Xie, B. Liang, W. Cui and Y. Xiao, "CN-DBpedia: A never-ending Chinese knowledge extraction system," in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Springer, Cham, 2017, pp. 428-438.
  • Niu, X. Sun, H. Wang, S. Rong, G. Qi and Y. Yu, "Zhishi. me-weaving chinese linking open data," in International Semantic Web Conference, Springer, Berlin, Heidelberg. 2011, pp. 205-220.
  • A. Ahmed, F. N. AL-Aswadi, K. M. G. Noaman and W. Z. Alma'aitah, "Arabic Knowledge Graph Construction: A close look in the present and into the future," Journal of King Saud University - Computer and Information Sciences, 2022.
  • G. Garcı́a-Pérez, A. B. Rı́os-Alvarado, T. Y. Guerrero-Meléndez, E. Tello-Leal and J. L. Martı́nez-Rodrı́guez, "An Approach for Knowledge Graph Construction from Spanish Texts," Research in Computing Science, vol. 149, p. 9–17, 2020.
  • Marchand, M. Gagnon and A. Zouaq, "Extraction of a Knowledge Graph from French Cultural Heritage Documents," in ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, Cham, 2020.
  • S. Shahshahani, M. Mohseni, A. Shakery and H. Faili, "PAYMA: A Tagged Corpus of Persian Named Entities," Signal and Data Processing, vol. 16, pp. 91-110, 2019.

 

Saeedeh Momtazi is currently an associate professor at Amirkabir University of Technology (AUT), Iran. She completed her BSc and MSc education at Sharif University of Technology, Iran. She received a PhD degree in Artificial Intelligence from Saarland University, Germany. As part of her PhD, she was a visiting researcher at the Center of Language and Speech Processing at Johns Hopkins University, US. After finishing the PhD, she worked at the Hasso-Plattner Institute (HPI) at Potsdam University, Germany and the German Institute for International Educational Research (DIPF), Germany as a postdoctoral researcher. Natural language processing is her main research focus.

Farhad Shirmardi received his BSc Computer Science at Amirkabir University of Technology (AUT), Iran in 2018. He received his MSc in Artificial Intelligence from Amirkabir University of Technology (AUT), Iran. His research interesets are Question Answering and Knowledge Graphs.

 

Mohammad Hadi Hosseisni He received his bachelor’s degree in computer engineering from Amirkabir University of Technology (AUT), Iran in 2021. His research interests include machine learning, and algorithms, natural language processing.