Department of Computer Engineering, University of Science & Culture Tehran, Iran
Ph.D, Scientific Information Database (SID), ACECR Tehran, Iran
With the increasing spread of science, various methods have been proposed to restore more and better scientific documents according to the needs and requests of users. Since there is no complete information for some documents, users have to access the metadata including the name of authors and their affiliation, the publication date, and references used for the document by accessing to the documents. Therefore, extraction of information based on the structural and geometrical characteristics of the document can be very helpful in retrieving relevant and required documents. In this paper, after extracting metadata using geometrical features of documents and graph-based model, the relationships between different entities such as documents, authors, journals, and conferences are modeled for more efficient information retrieval. The extracted and refined data, stored in the graph model, are available in a web-based user interface. To produce the results of each query, the related documents are retrieved based on the graph’s relationships, the quality of each document, and their citation score. To evaluate the proposed method, the PubMed and D2SPR databases are used. The results from the experiments show that the number of retrieved documents in the proposed method is 60% higher than the PubMed database search engine and 80% higher than D2SPR. Moreover, nDCG with an average of 0.824 in the proposed approach has a significant distance with the average of 0.30 in Pubmed search engine. While the average of F-measure on D2SPR dataset is 0.834 for the suggested system, the value is 0.71 in the current study.