Ontology Creation and Population for Natural Language Processing Domain

Document Type: Original Article

Authors

1 Computer Science and Engineering Faculty, Shahid Beheshti University, Tehra, Iran.

2 Faculty of Computer Science and Engineering, Shahdi Beheshti University of Technology, Tehran, Iran

3 Faculty of Computer Science and Engineering Shahid Beheshti University Tehran, Iran

Abstract

In this paper, we describe our proposed methodology for constructing an ontology of natural language processing (NLP). We use a semi-automatic method; a combination of rule-based and machine learning techniques; to construct and populate an ontology with bilingual (English-Persian) concept labels (lexicon) and evaluate it manually. This methodology results in a complete ontology in the natural language processing domain with 1333 classes (containing concepts, tools, applications, etc.), 88 object properties, and 2437 annotation assertions for different classes. The built ontology is populated with about 428K NLP related papers and 38K authors, and also about 5M "is Related to" relations between papers and ontology classes and 1M "is Author of" relations between papers and authors. The evaluation results show that the ontology achieved a good result. The instantiation is done to enable applications find experts, publications and institutions (such as universities or research laboratories) related to various topics in NLP field.

Keywords

Main Subjects