Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques

Febriyanti Panjaitan; Win Ce; Hery Oktafiandi; Ghanim Kanugrahan; Yudi Ramdhani; Vito Hafizh Cahaya Putra

doi:10.51519/journalisi.v7i1.1019

Journal of Information Systems and Informatics (Mar 2025)

Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques

Febriyanti Panjaitan,
Win Ce,
Hery Oktafiandi,
Ghanim Kanugrahan,
Yudi Ramdhani,
Vito Hafizh Cahaya Putra

Affiliations

Febriyanti Panjaitan: Satu University
Win Ce: Bina Nusantara University
Hery Oktafiandi: Satu University
Ghanim Kanugrahan: Satu University
Yudi Ramdhani: Satu University
Vito Hafizh Cahaya Putra: Satu University

DOI: https://doi.org/10.51519/journalisi.v7i1.1019
Journal volume & issue: Vol. 7, no. 1
pp. 461 – 478

Abstract

Read online

Sentiment analysis is crucial for understanding public opinion, especially in political contexts like the 2024 South Sumatra gubernatorial election. Social media platforms such as Twitter and YouTube provide key sources of public sentiment, which can be analyzed using machine learning to classify opinions as positive, neutral, or negative. However, challenges such as data imbalance and selecting the right model to improve classification accuracy remain significant. This study compares five machine learning algorithms (SVM, Naïve Bayes, KNN, Decision Tree, and Random Forest) and examines the impact of data balancing on their performance. Data was collected via Twitter crawling (140 entries) and YouTube scraping (384 entries), and text features were extracted using CountVectorizer. The models were then evaluated on imbalanced and balanced datasets using accuracy, precision, recall, and F1-score. The Decision Tree and Random Forest models achieved the highest accuracies of 79.22% and 75.32% on imbalanced data, respectively. However, they also exhibited overfitting, as indicated by their near-perfect training performance. Naïve Bayes, on the other hand, demonstrated the lowest accuracy at 54.55% despite achieving high precision, suggesting frequent misclassification, particularly for the minority class. SVM and KNN also struggled with imbalanced data, recording accuracies of 58.44% and 63.64%, respectively. Significant improvements were observed after applying data balancing techniques. The accuracy of SVM increased to 71.43%, and KNN improved to 66.23%, indicating that these models are more stable and effective when class distributions are even. These findings highlight the substantial impact of data balancing on model performance, particularly for methods sensitive to class distribution. While tree-based models achieved high accuracy on imbalanced data, their tendency to overfit underscores the importance of balancing techniques to enhance model generalization.

sentiment analysis, machine learning, governor election, twitter, youtube, countvectorizer, balancing data.

Published in Journal of Information Systems and Informatics

ISSN: 2656-5935 (Print); 2656-4882 (Online)
Publisher: Informatics Department, Faculty of Computer Science Bina Darma University
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://journal-isi.org/index.php/isi

About the journal

Abstract

Keywords