Perbandingan Kinerja Akurasi Klasifikasi K-NN, NB dan DT pada APK Android

  • Djarot Hindarto Pradita University
Keywords: Malware, APK Android, Naïve Bayes, K-Nearest Neighbor, Decision Tree


Today many people use Internet technology, for various needs. Starting from shopping, transportation and education, using the Internet as a digital service. Equipment in accessing the Internet is also many and very diverse, ranging from personal computers, laptops to communication devices such as mobile devices. Today's mobile devices that have quite a lot of variations and are used by the community are mobile devices based on the Android operating system. In this situation, it encourages certain parties to take advantage of loopholes to seek profit, one of which is the creation of Malicious Software (Malware). The existence of Malware is very troubling, where the growth of malware is very fast. The phenomenon of Malware that continues to grow is what makes researchers focus on analyzing Malware by utilizing artificial intelligence technology. The purpose of this study is to analyze Android APK files with the static analysis method and classify the Malware family and not Malware or Normal APK files. Malware and non-Malware APK files were downloaded from Canadian Institute for Cyber ​​Security, Google Play and APK Pure. The files are extracted to be generated and stored as Malware datasets. The malware dataset was trained using machine learning algorithms. Machine learning used is Naïve Bayes, K-Nearest Neighbor and Decision Tree. Performance measurement accuracy and comparison between Naïve Bayes, K-Nearest Neighbor and Decision Tree which is part of Machine Learning.


[1] X. Su, Q. Gong, Y. Zheng, X. Liu, and K. C. Li, “An Informative and Comprehensive Behavioral Characteristics Analysis Methodology of Android Application for Data Security in Brain-Machine Interfacing,” Comput. Math. Methods Med., vol. 2020, 2020, doi: 10.1155/2020/3658795.
[2] S. Garg and N. Baliyan, “Data on Vulnerability Detection in Android,” Data Br., vol. 22, pp. 1081–1087, 2019, doi: 10.1016/j.dib.2018.12.038.
[3] J. Abawajy, A. Darem, and A. A. Alhashmi, “Feature subset selection for malware detection in smart iot platforms,” Sensors (Switzerland), vol. 21, no. 4, pp. 1–19, 2021, doi: 10.3390/s21041374.
[4] H. Yuan, Y. Tang, W. Sun, and L. Liu, “A detection method for android application security based on TF-IDF and machine learning,” PLoS One, vol. 15, no. 9 September, pp. 1–19, 2020, doi: 10.1371/journal.pone.0238694.
[5] A. Mahindru and A. L. Sangal, “FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques: FSDroid,” Multimed. Tools Appl., 2021, doi: 10.1007/s11042-020-10367-w.
[6] C. Ding, N. Luktarhan, B. Lu, and W. Zhang, “A hybrid analysis-based approach to android malware family classification,” Entropy, vol. 23, no. 8, 2021, doi: 10.3390/e23081009.
[7] C. Wang, Z. Wu, X. Li, X. Zhou, A. Wang, and P. C. K. Hung, “SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Mobile Devices,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/101986.
[8] X. Wang, Y. Yang, and Y. Zeng, “Accurate mobile malware detection and classification in the cloud,” Springerplus, vol. 4, no. 1, pp. 1–23, 2015, doi: 10.1186/s40064-015-1356-1.
[9] H. Yuan, “MADFU : An Improved Malicious Application,” Entropy, 2020.
[10] M. Rashed and G. Suarez-Tangil, “An Analysis of Android Malware Classification Services,” Sensors, vol. 21, no. 16, p. 5671, 2021, doi: 10.3390/s21165671.
[11] S. R. T. Mat, M. F. Ab Razak, M. N. M. Kahar, J. M. Arif, S. Mohamad, and A. Firdaus, Towards a systematic description of the field using bibliometric analysis: malware evolution, vol. 126, no. 3. Springer International Publishing, 2021.
[12] V. Balakrishnan and W. Kaur, “String-based multinomial naïve bayes for emotion detection among facebook diabetes community,” Procedia Comput. Sci., vol. 159, pp. 30–37, 2019, doi: 10.1016/j.procs.2019.09.157.
[13] L. Jiang, S. Wang, C. Li, and L. Zhang, “Structure extended multinomial naive Bayes,” Inf. Sci. (Ny)., vol. 329, pp. 346–356, 2016, doi: 10.1016/j.ins.2015.09.037.
[14] M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “Performance of bernoulli’s naive bayes classifier in the detection of fake news,” Mater. Today Proc., no. xxxx, 2020, doi: 10.1016/j.matpr.2020.10.896.
[15] M. Artur, “Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,” Procedia Comput. Sci., vol. 190, no. 2019, pp. 564–570, 2021, doi: 10.1016/j.procs.2021.06.066.
[16] D. Petschke and T. E. M. Staab, “A supervised machine learning approach using naive Gaussian Bayes classification for shape-sensitive detector pulse discrimination in positron annihilation lifetime spectroscopy (PALS),” Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., vol. 947, no. September, p. 162742, 2019, doi: 10.1016/j.nima.2019.162742.
[17] M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, and M. Valdes-Sosa, “Fast Gaussian Naïve Bayes for searchlight classification analysis,” Neuroimage, vol. 163, pp. 471–479, 2017, doi: 10.1016/j.neuroimage.2017.09.001.
[18] C.-W. Tsai, Y.-P. Chen, T.-C. Tang, and Y.-C. Luo, “An efficient parallel machine learning-based blockchain framework,” ICT Express, no. xxxx, pp. 0–7, 2021, doi: 10.1016/j.icte.2021.08.014.
[19] O. E. Gundersen, S. Shamsaliei, and R. J. Isdahl, “Do machine learning platforms provide out-of-the-box reproducibility?,” Futur. Gener. Comput. Syst., vol. 126, pp. 34–47, 2022, doi: 10.1016/j.future.2021.06.014.
[20] D. Prieto-González, I. Castilla-Rodríguez, E. González, and M. L. Couce, “Automated generation of decision-tree models for the economic assessment of interventions for rare diseases using the RaDiOS ontology,” J. Biomed. Inform., vol. 110, no. May, p. 103563, 2020, doi: 10.1016/j.jbi.2020.103563.
[21] M. Li, P. Vanberkel, and X. Zhong, “Predicting ambulance offload delay using a hybrid decision tree model,” Socioecon. Plann. Sci., no. July, p. 101146, 2021, doi: 10.1016/j.seps.2021.101146.
[22] M. Moshkov, “Decision trees based on 1-consequences,” Discret. Appl. Math., vol. 302, pp. 208–214, 2021, doi: 10.1016/j.dam.2021.07.017.
[23] V. Gumuskaya, W. van Jaarsveld, R. Dijkman, P. Grefen, and A. Veenstra, “Integrating stochastic programs and decision trees in capacitated barge planning with uncertain container arrivals,” Transp. Res. Part C Emerg. Technol., vol. 132, no. December 2020, p. 103383, 2021, doi: 10.1016/j.trc.2021.103383.
[24] A. Strzelecka and D. Zawadzka, “Application of classification and regression tree (CRT) analysis to identify the agricultural households at risk of financial exclusion,” Procedia Comput. Sci., vol. 192, pp. 4532–4541, 2021, doi: 10.1016/j.procs.2021.09.231.
[25] D. H. Lee, S. H. Kim, and K. J. Kim, “Multistage MR-CART: Multiresponse optimization in a multistage process using a classification and regression tree method,” Comput. Ind. Eng., vol. 159, no. May, p. 107513, 2021, doi: 10.1016/j.cie.2021.107513.
[26] W. Gao, Z. Bai, F. Zhu, C. C. Chou, and B. Jiang, “A study on the cyclist head kinematic responses in electric-bicycle-to-car accidents using decision-tree model,” Accid. Anal. Prev., vol. 160, no. May 2020, p. 106305, 2021, doi: 10.1016/j.aap.2021.106305.