Optimalisasi Random Forest untuk Sentimen Bahasa Indonesia dengan GridSearch dan SMOTE

Authors

  • Ahmad Fauzi Universitas Pamulang
  • Agus Heri Yunial Universitas Pamulang
  • Dede Eko Saputro Universitas Pamulang
  • Reza Saputra Universitas Pamulang

DOI:

https://doi.org/10.70340/jirsi.v4i2.207

Keywords:

Sentiment analysis, Gridsearch Hyperparameters, Random forest, TextBlob

Abstract

This research focuses on optimizing the Random Forest algorithm for sentiment analysis of social media x in Indonesian using TextBlob as a labeling tool, followed by the SMOTE data balancing technique and hyperparameter optimization with GridSearch. The data used was taken from 611 tweets with the keyword ukt (single tuition). Sentiment labeling using TextBlob produces 438 negative sentiments and 173 positive sentiments. The SMOTE method is used to balance the data by first dividing the data into 75% training data and 25% test data. Data vectorization using tf-idf. The Random Forest algorithm model was evaluated with an initial accuracy using split data of 73%, and cross validation evaluation with 10 k-folds produced an accuracy value of 75%. Optimization carried out with GridSearch hyperparameters succeeded in increasing the accuracy value to 74%, while cross validation evaluation using 10 k-fold accuracy was 89%. In this research, the SMOTE method was effective in balancing unbalanced data, and gridsearch hyperparameter optimization succeeded in increasing the accuracy value of the Random Forest algorithm in classifying social media sentiment x in Indonesian with automatic texblob labeling.

Downloads

Download data is not yet available.

References

S. H. Fikri, W. R. W. R. Panji, and E. L. Fitriyah, “Urgensi Pelaksanaan Pendidikan Karakter Yang Terintegrasi: Analisis Kebijakan Penguatan Pendidikan Karakter,” Indonesian Journal of Educational Management and Leadership, vol. 1, no. 1, pp. 45–56, 2023, doi: 10.51214/ijemal.v1i1.485.

C. Suhaeni and H.-S. Yong, “Mitigating Class Imbalance in Sentiment Analysis Through GPT-3-Generated Synthetic Sentences,” Applied Sciences, vol. 13, no. 17, p. 9766, 2023, doi: 10.3390/app13179766.

R. Obiedat et al., “Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution,” Ieee Access, vol. 10, pp. 22260–22273, 2022, doi: 10.1109/access.2022.3149482.

H. Wen and J. Zhao, “Sentiment Analysis Model of Imbalanced Comment Texts Based on BiLSTM,” 2023, doi: 10.21203/rs.3.rs-2434519/v1.

L. Chen, S. Shang, and Y. Wang, “Cross-Lingual Sentiment Analysis With MultiEmo: Exploring Language-Agnostic Models for Emotion Recognition,” 2024, doi: 10.20944/preprints202408.1639.v1.

T. W. Purnomo and J. Sutopo, “Comparison of Pre-Trained Bert-Based Transformer Models for Regional Language Text Sentiment Analysis in Indonesia,” International Journal Science and Technology, vol. 3, no. 3, pp. 11–21, 2024, doi: 10.56127/ijst.v3i3.1739.

R. Kusumaningrum, I. Z. Nisa, R. Jayanto, R. P. Nawangsari, and A. Wibowo, “Deep Learning-Based Application for Multilevel Sentiment Analysis of Indonesian Hotel Reviews,” Heliyon, vol. 9, no. 6, p. e17147, 2023, doi: 10.1016/j.heliyon.2023.e17147.

A. H. Nasution and A. Onan, “ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks,” Ieee Access, vol. 12, pp. 71876–71900, 2024, doi: 10.1109/access.2024.3402809.

F. Fathoni, E. Erwin, and A. Abdiansah, “Multilabel Sentiment Analysis for Classification of the Spread of COVID-19 in Indonesia Using Machine Learning,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, no. 2, p. 968, 2023, doi: 10.11591/ijeecs.v31.i2.pp968-978.

L. Damayanti and K. M. Lhaksmana, “Sentiment Analysis of the 2024 Indonesia Presidential Election on Twitter,” Sinkron, vol. 8, no. 2, pp. 938–946, 2024, doi: 10.33395/sinkron.v8i2.13379.

M. B. Ressan and R. F. Hassan, “Naïve-Bayes Family for Sentiment Analysis During COVID-19 Pandemic and Classification Tweets,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, p. 375, 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.

A. Romadhony, S. A. Faraby, R. Rismala, U. N. Wisesty, and A. Arifianto, “Sentiment Analysis on a Large Indonesian Product Review Dataset,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 1, pp. 167–178, 2024, doi: 10.20473/jisebi.10.1.167-178.

V. Vinardo and I. Wasito, “Two-Stage Sentiment Analysis on Indonesian Online News Using Lexicon-Based,” Sinkron, vol. 8, no. 4, pp. 2109–2119, 2023, doi: 10.33395/sinkron.v8i4.12769.

M. A. W. Sinaga, N. F. Nuzula, and C. R. Damayanti, “The Psychology of Risk Influence and Investor Sentiment on Investment Decision Making in the Indonesian Stock Market,” Jurnal Ilmiah Akuntansi Dan Bisnis, vol. 18, no. 2, p. 197, 2023, doi: 10.24843/jiab.2023.v18.i02.p01.

A. Ardisurya and M. Rizkinia, “Implementation of Diffusion Variational Autoencoder for Stock Price Prediction With the Integration of Historical and Market Sentiment Data,” Ijecbe, vol. 2, no. 2, 2024, doi: 10.62146/ijecbe.v2i2.55.

H. Sujadi, “Analisis Sentimen Pengguna Media Sosial Twitter Terhadap Wabah Covid-19 Dengan Metode Naive Bayes Classifier Dan Support Vector Machine,” Infotech Journal, vol. 8, no. 1, pp. 22–27, 2022, doi: 10.31949/infotech.v8i1.1883.

E. Hasibuan and E. A. Heriyanto, “Analisis Sentimen Pada Ulasan Aplikasi Amazon Shopping Di Google Play Store Menggunakan Naive Bayes Classifier,” Jurnal Teknik Dan Science, vol. 1, no. 3, pp. 13–24, 2022, doi: 10.56127/jts.v1i3.434.

D. Atmajaya, A. Febrianti, and H. Darwis, “Metode SVM Dan Naive Bayes Untuk Analisis Sentimen ChatGPT Di Twitter,” Indonesian Journal of Computer Science, vol. 12, no. 4, 2023, doi: 10.33022/ijcs.v12i4.3341.

E. Eviyanti, B. Irawan, and A. Bahtiar, “Penggunaan Algoritma Naïve Bayes Dalam Menganalisis Sentimen Ulasan Aplikasi Adakami Di Google Play Store,” Jati (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 6, pp. 3879–3885, 2024, doi: 10.36040/jati.v7i6.8272.

A. I. Tanggraeni and M. N. N. Sitokdana, “Analisis Sentimen Aplikasi E-Government Pada Google Play Menggunakan Algoritma Naïve Bayes,” Jatisi (Jurnal Teknik Informatika Dan Sistem Informasi), vol. 9, no. 2, pp. 785–795, 2022, doi: 10.35957/jatisi.v9i2.1835.

J.-H. Wang, C. Liu, Y.-R. Min, Z.-H. Wu, and P.-L. Hou, “Cancer Diagnosis by Gene-Environment Interactions via Combination of SMOTE-Tomek and Overlapped Group Screening Approaches With Application to Imbalanced TCGA Clinical and Genomic Data,” Mathematics, vol. 12, no. 14, p. 2209, 2024, doi: 10.3390/math12142209.

Downloads

Published

2025-05-31

Issue

Section

Articles