Model Klasifikasi Berita Palsu Menggunakan Bidirectional LSTM dan Word2vec sebagai Vektorisasi

  • Juanda Antonius Pakpahan Institut Teknologi Del
  • Yeni Chintya Panjaitan
  • Junita Amalia
  • Melani Basaria Pakpahan

Abstract

In general, classification is defined as a learning method that classifies data into class labels. It can be perfomed on both structured and unstructured data, based on the data training that has been done. This research leverages Bidirectional LSTM technology in order to develop a news classification model using the CBOW architectures using Word2vec as a word vector. In this research, three main parameters are used: embedding size, window size, and units bilstm. The effects of these three parameters will show optimization of model performance. The results of the constructed model are measured using the accuracy, recall, precision, f1-score and computational time metrics. The findings revealed the greatest performance for title data was for the model with windows size 3, embedding size 200 and unit 128 with 79,18% accuracy. Meanwhile, the data content model has the best performance, on windows size 5, embedding size 300 and units 256 with 92,80% accuracy.

References

[1] S. H. Kong, L. M. Tan, K. H. Gan and N. H. Samsudin, 2020, Fake News Detection using Deep Learning, pp 102-107.
[2] ProgrammerSought, word2vec and glove https://www.programmersought.com/article/27162713146/, diakses tgl 10 November 2020.
[3] Tensorflow, https://www.tensorflow.org/tutorials/text/word2vec#model_and_training, diakses tgl 21 Juli 2021
[4] U. Malik, Implementing Word2Vec with Gensim Library in Python, https://stackabuse.com/implementing-word2vec-with-gensim-library-in-python/#:~:text=Word2Vec%20has%20several%20advantages%20over,embedding%20vector%20is%20very%20small, diakses 22 November 2020.
[5] J. Han, M. Kamber. &. J. Pei, 2011, Data Mining Concepts and Techniques, Elsevier Inc.
[6] J. Han, M. Kamber. &. J. Pei, 2011, Data Mining Concepts and Techniques, Elsevier Inc, pp 93-04
[7] A. Saha, 2018, Handle contractions in text preprocessing – NLP, https://dev.to/edualgo/handle-contractions-in-text-preprocessing-nlp-21p, diakses tgl 23 Maret 2021.
[8] R. Ghaniy and K. Sihotang, 2019, Penerapan Metode Naïve Bayes Classifier Untuk Penentuan Topik Tugas Akhir, Teknois : Jurnal Ilmiah Teknologi Informasi dan Sains, Vol. 9, No. 1, hal 63-72.
[9] B. o. F. Kanani, 2019 "NATURAL LANGUAGE PROCESSING Stemming and Lemmatizatio,. https://studymachinelearning.com/stemming-and-lemmatization/, diakses tgl 08 Desember 2020.
[10] M. M. A. Raffi and S. A. Ardiayanti, 2019, e-Proceeding of Engineering," Analisis Model Word2vec dalam Penyelesaian Soal Analogi pada Bahasa Indonesia, Vol. 6 , pp 8513.
[11] Hackdeploy, 2018, HackDeploy Word2Vec Explained, https://www.hackdeploy.com/word2vec-explained-easily/, diakses tgl 7 November 2020.
[12] O. Levy and Y. Golberg, 2014, Neural word embedding as implicit matrix factorization, Advances in Neural Information Processing Systems, Vol. 3, No. 1, pp 2177-2185.
[13] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami and C. Dyer, 2016, Neural Architectures for Named Entity Recognition, Vol. 2 , pp 261-262.
[14] Ł. Augustyniak, T. Kajdanowicz and P. Kazienko, 2019, ASPECT DETECTION USING WORD AND CHAR EMBEDDINGS WITH (BI)LSTM AND CRF, IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering, pp. 2-4.
[15] D. Berrar, 2018, Cross-Validation, ResearchGate, Vol. 1, p. 3.
[16] C. J. C. A. K. Santra, 2012, Genetic Algorithm and Confusion Matrix for Document Clustering, International Journal of Computer Science Issues, p. 3.
[17] R. &. J. A. M. George, 2016, Emotion Classification Using Machine Learning and Data Preprocessing Approach on Tulu Speech Data, International Journal of Computer Science and Mobile Computing, p. 10.
[18] A. R.-A. R. H.-K. H. Ajiboye, 2015, Evaluating the Effect of Dataset Size on Predictive Model Using Supervised Learning Technique, International Journal of Computer Systems & Software Engineering, Vol. 1, No. 1, pp. 75-85.
Published
2022-12-13