Atikah, Luthfi, Hasanah, Novrindah Alvi and Arifin, Agus Zainal (2022) Topic modelling using VSM-LDA for document summarization. Jurnal ULTIMATICS, 14 (2). pp. 91-95. ISSN 2085-4552
|
Text
14838.pdf - Published Version Available under License Creative Commons Attribution Share Alike. Download (275kB) | Preview |
Abstract
Summarization is a process to simplify the contents of a document by eliminating elements that are considered unimportant but do not reduce the core meaning the document wants to convey. However, as is known, a document will contain more than one topic. So it is necessary to identify the topic so that the summarization process is more effective. Latent Dirichlet Allocation (LDA) is a commonly used method of identifying topics. However, when running a program on a different dataset, LDA experiences "order effects", that is, the resulting topic will be different if the train data sequence is changed. In the same document input, LDA will provide inconsistent topics resulting in low coherence values. Therefore, this paper proposes a topic modelling method using a combination of LDA and VSM (Vector Space Model) for automatic summarization. The proposed method can overcome order effects and identify document topics that are calculated based on the TF-IDF weight on VSM generated by LDA. The results of the proposed topic modeling method on the 1300 Twitter data resulted in the highest coherence value reaching 0.72. The summary results obtained Rouge 1 is 0.78, Rouge 2 is 0.67 dan Rouge L is 0.80.
Item Type: | Journal Article |
---|---|
Keywords: | LDA; order effects; summarization; topic modelling; VSM-LDA |
Subjects: | 08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing |
Divisions: | Faculty of Technology > Department of Informatics Engineering |
Depositing User: | Novrindah Alvi Hasanah |
Date Deposited: | 09 Jun 2023 06:21 |
Downloads
Downloads per month over past year
Origin of downloads
Actions (login required)
View Item |