Ulya, Diah Mariatul, Juhari, Juhari, Yuliana, Rossima Eva and Jamhuri, Mohammad (2025) Reliable and efficient sentiment analysis on IMDb with logistic regression. Cauchy: Jurnal Matematika Murni dan Aplikasi, 10 (2). ISSN 20860382
|
Text
24731.pdf - Published Version Available under License Creative Commons Attribution Share Alike. Download (556kB) |
Abstract
Understanding public opinion at scale is essential for modern media analytics. We present a reproducible, leakage-safe evaluation of logistic regression (LR) for binary sentiment classification on the IMDb Large Movie Review dataset and compare it with five widely used baselines: multinomial Naive Bayes, linear support vector machine (SVM), decision tree, k-nearest neighbors, and random forest. Using a standardized text pipeline (HTML stripping, stopword removal, WordNet lemmatization) with TF–IDF unigrams–bigrams and nested, stratified cross-validation, we assess threshold-dependent and threshold-independent performance, probability calibration, and computational efficiency. LR attains the best overall balance of quality and speed, achieving 88.98% accuracy and 89.13% F1, with strong ranking performance (OOF ROC–AUC ≈ 0.9568; PR–AUC ≈ 0.9554) and well-behaved calibration (Brier ≈ 0.0858). Training completes in seconds per fold and CPU inference reaches about 2.46×10^6 samples per second. While a calibrated linear SVM yields slightly higher precision, LR delivers higher F1 at markedly lower compute. These results establish LR as a robust, transparent baseline that remains competitive with more complex neural and ensemble approaches, offering a favorable performance–efficiency trade-off for practical deployment and reproducible research on IMDb sentiment classification.
| Item Type: | Journal Article |
|---|---|
| Keywords: | classification; IMDb; logistic regression; sentiment analysis; text mining. |
| Subjects: | 01 MATHEMATICAL SCIENCES > 0103 Numerical and Computational mathematics > 010303 Optimisation 01 MATHEMATICAL SCIENCES > 0103 Numerical and Computational mathematics > 010399 Numerical and Computational Mathematics not elsewhere classified 01 MATHEMATICAL SCIENCES > 0102 Applied Mathematics |
| Divisions: | Faculty of Mathematics and Sciences > Department of Mathematics |
| Depositing User: | Juhari Juhari |
| Date Deposited: | 12 Nov 2025 14:13 |
Downloads
Downloads per month over past year
Origin of downloads
Actions (login required)
![]() |
View Item |
Dimensions
Dimensions