Hybrid Spelling Correction and Query Expansion for Relevance Document Searching

SOYUSIAWATY, DEWI and Rahmatullah Wolley, Denny Hilmawan (2021) Hybrid Spelling Correction and Query Expansion for Relevance Document Searching. International Journal of Advanced Computer Science and Applications, 12 (8). ISSN 2156-5570

[thumbnail of HASIL CEK_60040497_Hybrid spelling7.pdf] Text
HASIL CEK_60040497_Hybrid spelling7.pdf

Download (2MB)
[thumbnail of Peer Review_Dewi Soyusiawaty_Hybrid Spelling7.pdf] Text
Peer Review_Dewi Soyusiawaty_Hybrid Spelling7.pdf

Download (253kB)
[thumbnail of KORESPONDENSI IJACSA R6.pdf] Text
KORESPONDENSI IJACSA R6.pdf

Download (1MB)

Abstract

A digital library is a type of information retrieval (IR) system. The existing IR methodologies generally have problems on keyword searching. Some of search engine has not been able to provide search results with partial matching and typographical error. Therefore, it is required to be able to provide search results that are relevant to keywords provided by the user. We proposed a model to solve the problem by combining the spell correction and query expansion. Searching is starting with indexing the title of the document by preprocessing the title of all incoming document data and then weighting the Term Frequency – Inverse Document Frequency (TF-IDF) against all terms of the whole document. Levenshtein Distance algorithm is used in the search process to correct typo-indicated keywords. Before calculating the relevance between the keywords and the documents using Cosine Similarity, the keywords are expanded using Query Expansion to increase number of documents retrieved. Calculation results using Cosine Similarity are then added to Query Expansion weight calculation to get final ranking result. Results show improvements over IR system compared with system without spell check and query expansion. The results of the study in the form of web-based application conducted testing for 50 times with number of data of 2,045. The system was able to correct typo-indicated keywords and search documents with average recall value of 95.91%, average precision value of 63.82% and average Non Interpolated Average Precision (NIAP) value of 86.29%.

Item Type: Artikel Umum
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Informatics Engineering (S1-Teknik Informatika)
Depositing User: Mrs Dewi Soyusiawaty
Date Deposited: 17 Jun 2022 02:11
Last Modified: 14 Oct 2022 03:51
URI: http://eprints.uad.ac.id/id/eprint/35383

Actions (login required)

View Item View Item