Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang

Main Article Content

Ahmad Ilham

Abstract

Masalah data kelas tidak seimbang memiliki efek buruk pada ketepatan prediksi data. Untuk menangani masalah ini, telah banyak penelitian sebelumnya menggunakan algoritma klasifikasi menangani masalah data kelas tidak seimbang. Pada penelitian ini akan menyajikan teknik under-sampling dan over-sampling untuk menangani data kelas tidak seimbang. Teknik ini akan digunakan pada tingkat preprocessing untuk menyeimbangkan kondisi kelas pada data. Hasil eksperimen menunjukkan neural network (NN) lebih unggul dari decision tree (DT), linear regression (LR), naïve bayes (NB) dan support vector machine (SVM).

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

[1] Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recognit., vol. 48, no. 5, pp. 1623–1637, 2015.
[2] N. V Chawla, N. Japkowicz, and P. Drive, “Editorial : Special Issue on Learning from Imbalanced Data Sets,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 1–6, 2004.
[3] Mi. Kubat, R. Holte, and S. Matwin, “Learning when Negatif Example Abound,” Mach. Learn. ECML-97, vol. 1, 1997.
[4] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One Sided Selection,” Proc. Fourteenth Int. Conf. Mach. Learn., vol. 4, no. 1, pp. 179–186, 1997.
[5] L. Bruzzone and S. B. B. Serpico, “Classification of imbalanced remote-sensing data by neural networks,” Pattern Recognit. Lett., vol. 18, pp. 1323–1328, 1997.
[6] M. Kubat, R. C. Holte, and S. Matwin, “Machine learning for the detection of oil spills in satellite radar images,” Mach. Learn., vol. 30, no. 2–3, pp. 195–215, 1998.
[7] H. Shin and S. Cho, “Response modeling with support vector machines,” Expert Syst. Appl., vol. 30, no. 4, pp. 746–760, 2006.
[8] A. Rahman, D. V. Smith, and G. Timms, “Multiple classifier system for automated quality assessment of marine sensor data,” 2013 IEEE Eighth Int. Conf. Intell. Sensors, Sens. Networks Inf. Process., pp. 362–367, 2013.
[9] A. Agrawal, H. L. Viktor, and E. Paquet, “SCUT : Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling,” vol. 1, no. Ic3k, pp. 226–234, 2015.
[10] A. Bhardwaj, A. Tiwari, H. Bhardwaj, and A. Bhardwaj, “A Genetically Optimized Neural Network Model for Multi-class Classificatio,” Expert Syst. Appl., 2016.
[11] X. Wu, V. Kumar, Q. J. Ross, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, Top 10 algorithms in data mining, vol. 14, no. 1. 2008.
[12] S. García and F. Herrera, “Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.,” Evol. Comput., vol. 17, no. 3, pp. 275–306, 2009.
[13] Y. Tang, Y. Q. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalanced classification,” IEEE Trans. Syst. Man, Cybern. Part B Cybern., vol. 39, no. 1, pp. 281–288, 2009.
[14] L. Abdi and S. Hashemi, “To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques,” vol. 28, no. 1, pp. 238–251, 2016.
[15] G. Menardi and N. Torelli, Training and assessing classification rules with imbalanced data, vol. 28, no. 1. 2014.
[16] A. Fernandez, V. Lopez, M. Galar, M. J. Del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowledge-Based Syst., vol. 42, pp. 97–110, 2013.
[17] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” Adv. Intell. Comput., vol. 17, no. 12, pp. 878–887, 2005.
[18] C. Drummond and R. C. Holte, “C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” Work. Learn. from Imbalanced Datasets II, pp. 1–8, 2003.
[19] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, no. January, pp. 321–357, 2002.
[20] D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Trans. Syst. Man Cybern., vol. 2, no. 3, pp. 408–421, 1972.
[21] I. Tomek, “Two Modification of CNN,” pp. 769–772, 1976.
[22] K. C. Gowda and G. Krishna, “The Condensed Nearest Neighbor Rule Using the Concept of Mutual Nearest Neighborhood,” IEEE Trans. Inf. Theory, vol. 25, no. 4, pp. 488–490, 1979.
[23] C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Inf. Sci. (Ny)., vol. 179, no. 8, pp. 1040–1058, 2009.
[24] I. H. Witten, E. Frank, and M. a. Hall, Data Mining Practical Machine Learning Tools and Techniques Third Edition, vol. 277, no. Tentang Data Mining. 2011.