Comparative Analysis of Naïve Bayes Algorithm Performance in English and Indonesian Text Sentiment Classification on Duolingo Application in Playstore

Authors

  • Andi Serlina Informatics Engineering, Science and Technology, Muhammadiyah University of East Kalimantan, Samarinda, East Kalimantan, Indonesia
  • Abdul Rahim Informatics Engineering, Science and Technology, Muhammadiyah University of East Kalimantan, Samarinda, East Kalimantan, Indonesia
  • Arbansyah Informatics Engineering, Science and Technology, Muhammadiyah University of East Kalimantan, Samarinda, East Kalimantan, Indonesia

DOI:

https://doi.org/10.34148/teknika.v14i1.1207

Keywords:

Naïve Bayes, Sentiment Classification, Text Mining, Duolingo, NLP

Abstract

Text classification is an important topic in Natural Language Processing (NLP), especially when conducting research on user reviews on language learning apps such as Duolingo. This study compares the effectiveness of the Naïve Bayes algorithm in identifying sentiment in English and Indonesian reviews on the Duolingo app on Playstore. The approach includes data collection, text preparation (case folding, tokenization, stopword removal, and stemming), and Naïve Bayes algorithm evaluation for each dataset. Model performance was evaluated using accuracy, precision, recall, and F1-score. The Naïve Bayes method obtained 84% accuracy on the English dataset with a 90:10 data split and 67% accuracy on the Indonesian dataset with the same split ratio. The difference in the results obtained is due to several variables, including the use of informal language, slang, and more complicated word variants in Indonesian, which make proper classification more difficult for the model to achieve.

Downloads

Download data is not yet available.

References

[1] P. Lavanya and E. Sasikala, “Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey,” in 2021 3rd International Conference on Signal Processing and Communication (ICPSC), 2021, pp. 603–609. doi: 10.1109/ICSPC51351.2021.9451752.

[2] Fristia Nopano, “10 Aplikasi Smartphone Terbaik untuk Belajar Bahasa Inggris,” Telkom University Language Center.

[3] D. Winoto, V. D. Aditia, C. Sorisa, R. Priskila, and V. H. Pranatawijaya, “Analisis sentimen pada ulasan pengguna terhadap aplikasi pembelajaran bahasa Duolingo: Menggunakan algoritma Naïve Bayes dan K-Nearest Neighbor,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 3, pp. 3230–3236, 2024.

[4] A. Wibisono, “Filtering Spam Email Menggunakan Metode Naive Bayes,” Jurnal Teknologi Pintar, vol. 3, no. 4, 2023.

[5] R. Hidayat, R. N. Rahman, M. R. Perdana, and A. Arbansyah, “Analisis Sentimen Aplikasi Identitas Kependudukan Digital (IKD) Menggunakan Metode Naïve Bayes,” Jurnal Sistem Informasi dan Ilmu Komputer, vol. 2, no. 1, pp. 129–140, 2024.

[6] F. Alifiana, M. F. Asnawi, I. A. Ihsannudin, M. A. M. Baihaqy, and D. Asmarajati, “Analisis Sentimen Aplikasi Duolingo Menggunakan Algoritma Naïve Bayes dan Support Machine Learning,” Device, vol. 13, no. 2, pp. 223–230, 2023.

[7] P. Arsi and R. Waluyo, “Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM),” J. Teknol. Inf. dan Ilmu Komput, vol. 8, no. 1, p. 147, 2021.

[8] M. Cognetta, S. Moon, L. Wolf-sonkin, and N. Okazaki, Parameter-Efficient Korean Character-Level Language Modeling. 2023. doi: 10.18653/v1/2023.eacl-main.172.

[9] Ahmad Hussein Ababneh, “Investigating the relevance of Arabic text classification datasets based on supervised learning,” Journal of Electronic Science and Technology, vol. 20, no. 2, Jun. 2022.

[10] T. A. Azzahra et al., “Perbandingan Efektivitas Naïve Bayes dan SVM dalam Menganalisis Sentimen Kebencanaan di Youtube,” Jurnal Media Informatika Budidarma, vol. 8, no. 1, pp. 312–322, 2024.

[11] K. Fithriasari, R. W. Mayasari, N. Iriawan, and W. S. Winahju, “Surabaya Government Performance Evaluation Using Tweet Analysis,” MATEMATIKA: Malaysian Journal of Industrial and Applied Mathematics, pp. 31–42, 2020.

[12] C. C. Le, P. W. C. Prasad, A. Alsadoon, L. Pham, and A. Elchouemi, “Text classification: Naïve bayes classifier with sentiment Lexicon,” IAENG Int J Comput Sci, vol. 46, no. 2, pp. 141–148, 2019.

[13] E. Indrayuni, “Klasifikasi Text Mining Review Produk Kosmetik Untuk Teks Bahasa Indonesia Menggunakan Algoritma Naive Bayes,” Jurnal Khatulistiwa Informatika, vol. 7, no. 1, 2019.

[14] R. Rahmadani, A. Rahim, and R. Rudiman, “Analisis Sentimen Ulasan ‘Ojol The Game’ di Google Play Store Menggunakan Algoritma Naive Bayes Dan Model Ekstraksi Fitur TF-IDF Untuk Meningkatkan Kualitas Game,” Jurnal Informatika dan Teknik Elektro Terapan, vol. 12, no. 3, 2024.

[15] S. D. Prasetyo, S. S. Hilabi, and F. Nurapriani, “Analisis Sentimen Relokasi Ibukota Nusantara Menggunakan Algoritma Naïve Bayes dan KNN,” Jurnal KomtekInfo, pp. 1–7, 2023.

[16] X. Song, A. Salcianu, Y. Song, D. Dopson, and D. Zhou, “Fast wordpiece tokenization,” arXiv preprint arXiv:2012.15524, 2020.

[17] A. P. Wibawa, H. K. Fithri, I. A. E. Zaeni, and A. Nafalski, “Generating Javanese Stopwords List using K-means Clustering Algorithm.,” Knowl. Eng. Data Sci., vol. 3, no. 2, pp. 106–111, 2020.

[18] W. B. Demilie, “Implemented Stemming Algorithms for Information Retrieval Applications,” 2020.

[19] Y. Asri, W. N. Suliyanti, D. Kuswardani, and M. Fajri, “Pelabelan Otomatis Lexicon Vader dan Klasifikasi Naive Bayes dalam menganalisis sentimen data ulasan PLN Mobile,” 2022, Petir.

[20] R. Merdiansah, S. Siska, and A. A. Ridha, “Analisis sentimen pengguna X Indonesia terkait kendaraan listrik menggunakan IndoBERT,” Jurnal Ilmu Komputer dan Sistem Informasi (JIKOMSI), vol. 7, no. 1, pp. 221–228, 2024.

[21] E. Hasibuan and E. A. Heriyanto, “Analisis Sentimen Pada Ulasan Aplikasi Amazon Shopping Di Google Play Store Menggunakan Naive Bayes Classifier,” Jurnal Teknik dan Science, vol. 1, no. 3, pp. 13–24, 2022.

[22] R. D. Yahya, S. A. Wibowo, and N. Vendyansyah, “Analisis Sentimen Untuk Deteksi Ujaran Kebencian Pada Media Sosial Terkait Pemilu 2024 Menggunakan Metode Support Vector Machine,” 2024.

[23] Muhammad Ardiansyah Sembiring, Mustika Fitri Larasati Sibuea, and Ika Rahmanda Sitorus, “Penerapan Naive Bayes Untuk Mengetahui Status Gizi Balita,” Journal Of Science And Social Research, vol. 6(2), Jun. 2023.

[24] M. R. E. Rayhan, R. Rudiman, and F. Y. Fendy, “Perbandingan Metode K–Nearest Neighbor (KNN) dan Naive Bayes Terhadap Analisis Sentimen Pada Pengguna E-Wallet Aplikasi Dana Menggunakan Fitur Ekstraksi TF-IDF,” Jurnal Teknologi Informasi: Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika, vol. 18, no. 2, pp. 139–159, 2024.

Comparative Analysis of Naïve Bayes Algorithm Performance in English and Indonesian Text Sentiment Classification on Duolingo Application in Playstore

Downloads

Published

2025-03-03

Issue

Section

Articles

How to Cite

Comparative Analysis of Naïve Bayes Algorithm Performance in English and Indonesian Text Sentiment Classification on Duolingo Application in Playstore. (2025). Teknika, 14(1), 165-171. https://doi.org/10.34148/teknika.v14i1.1207