A Distant Supervised Approach for Relation Extraction in Farsi Texts

Document Type : Original Article


1 Department of Computer Engineering, Science and Research Branch, Azad University, Tehran, Iran.

2 Assistant Professor, ICT Research Institute (ITRC), Tehran, Iran


The volume of Farsi information on the Internet has been increasing in recent years. However, most of this information is in the form of unstructured or semi-structured free text. For quick and accurate access to the vast knowledge contained in these texts, the information extraction methods are essential to generate knowledge bases. In recent years, relation extraction as a sub-task of information extraction has received much attention. While many of these systems were developed in English and other well-known languages, the systems for information extraction in Farsi have received less attention from researchers. In this systematic research for semi-automatic relation extraction, Persian Wikipedia articles were presented as reliable and semi-structured sources. In this system, the relation extraction is performed with the assistance of patterns that are automatically obtained with an approach based on distant supervised. In order to apply the distant supervised, the vast knowledge base of Wikidata has been used as a source in perfect synchronization with Wikipedia. The results show that the average precision value for all relations is 76.81%, which indicates an enhancement of precision compared to other methods in Farsi.


Shireen Atarod received her bachelor’s degree in information technology (IT) engineering from Hamedan University of Technology (HUT), From Hamedan, Iran, in 2012. She received her master’s degree in e-commerce from Science and Research Branch of Islamic Azad University (SRBIAU) in 2018. Her research interests include relation extraction, supervised and semi supervised machine learning, and text mining.


Alireza Yari received his B.Sc. degree in control system engineering in 1993 from the University of Tehran, Iran, and M.Sc. and a Ph.D. degree in System engineering in 2000 from Kitami institute of technology, Japan. He is currently doing research in the Information Technology research faculty of Iran Telecom Research Center (ITRC). His research interests include web processing and cyber linguistics application, such as web search engines.