Responsive Banner

Reliable and efficient sentiment analysis on IMDb with logistic regression

Ulya, Diah Mariatul, Juhari, Juhari, Yuliana, Rossima Eva and Jamhuri, Mohammad (2025) Reliable and efficient sentiment analysis on IMDb with logistic regression. Cauchy: Jurnal Matematika Murni dan Aplikasi, 10 (2). ISSN 20860382

[img] Text
24731.pdf - Published Version
Available under License Creative Commons Attribution Share Alike.

Download (556kB)

Abstract

Understanding public opinion at scale is essential for modern media analytics. We present a reproducible, leakage-safe evaluation of logistic regression (LR) for binary sentiment classification on the IMDb Large Movie Review dataset and compare it with five widely used baselines: multinomial Naive Bayes, linear support vector machine (SVM), decision tree, k-nearest neighbors, and random forest. Using a standardized text pipeline (HTML stripping, stopword removal, WordNet lemmatization) with TF–IDF unigrams–bigrams and nested, stratified cross-validation, we assess threshold-dependent and threshold-independent performance, probability calibration, and computational efficiency. LR attains the best overall balance of quality and speed, achieving 88.98% accuracy and 89.13% F1, with strong ranking performance (OOF ROC–AUC ≈ 0.9568; PR–AUC ≈ 0.9554) and well-behaved calibration (Brier ≈ 0.0858). Training completes in seconds per fold and CPU inference reaches about 2.46×10^6 samples per second. While a calibrated linear SVM yields slightly higher precision, LR delivers higher F1 at markedly lower compute. These results establish LR as a robust, transparent baseline that remains competitive with more complex neural and ensemble approaches, offering a favorable performance–efficiency trade-off for practical deployment and reproducible research on IMDb sentiment classification.

Item Type: Journal Article
Keywords: classification; IMDb; logistic regression; sentiment analysis; text mining.
Subjects: 01 MATHEMATICAL SCIENCES > 0103 Numerical and Computational mathematics > 010303 Optimisation
01 MATHEMATICAL SCIENCES > 0103 Numerical and Computational mathematics > 010399 Numerical and Computational Mathematics not elsewhere classified
01 MATHEMATICAL SCIENCES > 0102 Applied Mathematics
Divisions: Faculty of Mathematics and Sciences > Department of Mathematics
Depositing User: Juhari Juhari
Date Deposited: 12 Nov 2025 14:13

Downloads

Downloads per month over past year

Origin of downloads

Actions (login required)

View Item View Item