Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang
Main Article Content
Abstract
Masalah data kelas tidak seimbang memiliki efek buruk pada ketepatan prediksi data. Untuk menangani masalah ini, telah banyak penelitian sebelumnya menggunakan algoritma klasifikasi menangani masalah data kelas tidak seimbang. Pada penelitian ini akan menyajikan teknik under-sampling dan over-sampling untuk menangani data kelas tidak seimbang. Teknik ini akan digunakan pada tingkat preprocessing untuk menyeimbangkan kondisi kelas pada data. Hasil eksperimen menunjukkan neural network (NN) lebih unggul dari decision tree (DT), linear regression (LR), naïve bayes (NB) dan support vector machine (SVM).
Downloads
Download data is not yet available.
Article Details
Section
Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
[1] Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recognit., vol. 48, no. 5, pp. 1623–1637, 2015.
[2] N. V Chawla, N. Japkowicz, and P. Drive, “Editorial : Special Issue on Learning from Imbalanced Data Sets,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 1–6, 2004.
[3] Mi. Kubat, R. Holte, and S. Matwin, “Learning when Negatif Example Abound,” Mach. Learn. ECML-97, vol. 1, 1997.
[4] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One Sided Selection,” Proc. Fourteenth Int. Conf. Mach. Learn., vol. 4, no. 1, pp. 179–186, 1997.
[5] L. Bruzzone and S. B. B. Serpico, “Classification of imbalanced remote-sensing data by neural networks,” Pattern Recognit. Lett., vol. 18, pp. 1323–1328, 1997.
[6] M. Kubat, R. C. Holte, and S. Matwin, “Machine learning for the detection of oil spills in satellite radar images,” Mach. Learn., vol. 30, no. 2–3, pp. 195–215, 1998.
[7] H. Shin and S. Cho, “Response modeling with support vector machines,” Expert Syst. Appl., vol. 30, no. 4, pp. 746–760, 2006.
[8] A. Rahman, D. V. Smith, and G. Timms, “Multiple classifier system for automated quality assessment of marine sensor data,” 2013 IEEE Eighth Int. Conf. Intell. Sensors, Sens. Networks Inf. Process., pp. 362–367, 2013.
[9] A. Agrawal, H. L. Viktor, and E. Paquet, “SCUT : Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling,” vol. 1, no. Ic3k, pp. 226–234, 2015.
[10] A. Bhardwaj, A. Tiwari, H. Bhardwaj, and A. Bhardwaj, “A Genetically Optimized Neural Network Model for Multi-class Classificatio,” Expert Syst. Appl., 2016.
[11] X. Wu, V. Kumar, Q. J. Ross, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, Top 10 algorithms in data mining, vol. 14, no. 1. 2008.
[12] S. García and F. Herrera, “Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.,” Evol. Comput., vol. 17, no. 3, pp. 275–306, 2009.
[13] Y. Tang, Y. Q. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalanced classification,” IEEE Trans. Syst. Man, Cybern. Part B Cybern., vol. 39, no. 1, pp. 281–288, 2009.
[14] L. Abdi and S. Hashemi, “To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques,” vol. 28, no. 1, pp. 238–251, 2016.
[15] G. Menardi and N. Torelli, Training and assessing classification rules with imbalanced data, vol. 28, no. 1. 2014.
[16] A. Fernandez, V. Lopez, M. Galar, M. J. Del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowledge-Based Syst., vol. 42, pp. 97–110, 2013.
[17] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” Adv. Intell. Comput., vol. 17, no. 12, pp. 878–887, 2005.
[18] C. Drummond and R. C. Holte, “C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” Work. Learn. from Imbalanced Datasets II, pp. 1–8, 2003.
[19] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, no. January, pp. 321–357, 2002.
[20] D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Trans. Syst. Man Cybern., vol. 2, no. 3, pp. 408–421, 1972.
[21] I. Tomek, “Two Modification of CNN,” pp. 769–772, 1976.
[22] K. C. Gowda and G. Krishna, “The Condensed Nearest Neighbor Rule Using the Concept of Mutual Nearest Neighborhood,” IEEE Trans. Inf. Theory, vol. 25, no. 4, pp. 488–490, 1979.
[23] C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Inf. Sci. (Ny)., vol. 179, no. 8, pp. 1040–1058, 2009.
[24] I. H. Witten, E. Frank, and M. a. Hall, Data Mining Practical Machine Learning Tools and Techniques Third Edition, vol. 277, no. Tentang Data Mining. 2011.
[2] N. V Chawla, N. Japkowicz, and P. Drive, “Editorial : Special Issue on Learning from Imbalanced Data Sets,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 1–6, 2004.
[3] Mi. Kubat, R. Holte, and S. Matwin, “Learning when Negatif Example Abound,” Mach. Learn. ECML-97, vol. 1, 1997.
[4] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One Sided Selection,” Proc. Fourteenth Int. Conf. Mach. Learn., vol. 4, no. 1, pp. 179–186, 1997.
[5] L. Bruzzone and S. B. B. Serpico, “Classification of imbalanced remote-sensing data by neural networks,” Pattern Recognit. Lett., vol. 18, pp. 1323–1328, 1997.
[6] M. Kubat, R. C. Holte, and S. Matwin, “Machine learning for the detection of oil spills in satellite radar images,” Mach. Learn., vol. 30, no. 2–3, pp. 195–215, 1998.
[7] H. Shin and S. Cho, “Response modeling with support vector machines,” Expert Syst. Appl., vol. 30, no. 4, pp. 746–760, 2006.
[8] A. Rahman, D. V. Smith, and G. Timms, “Multiple classifier system for automated quality assessment of marine sensor data,” 2013 IEEE Eighth Int. Conf. Intell. Sensors, Sens. Networks Inf. Process., pp. 362–367, 2013.
[9] A. Agrawal, H. L. Viktor, and E. Paquet, “SCUT : Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling,” vol. 1, no. Ic3k, pp. 226–234, 2015.
[10] A. Bhardwaj, A. Tiwari, H. Bhardwaj, and A. Bhardwaj, “A Genetically Optimized Neural Network Model for Multi-class Classificatio,” Expert Syst. Appl., 2016.
[11] X. Wu, V. Kumar, Q. J. Ross, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, Top 10 algorithms in data mining, vol. 14, no. 1. 2008.
[12] S. García and F. Herrera, “Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.,” Evol. Comput., vol. 17, no. 3, pp. 275–306, 2009.
[13] Y. Tang, Y. Q. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalanced classification,” IEEE Trans. Syst. Man, Cybern. Part B Cybern., vol. 39, no. 1, pp. 281–288, 2009.
[14] L. Abdi and S. Hashemi, “To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques,” vol. 28, no. 1, pp. 238–251, 2016.
[15] G. Menardi and N. Torelli, Training and assessing classification rules with imbalanced data, vol. 28, no. 1. 2014.
[16] A. Fernandez, V. Lopez, M. Galar, M. J. Del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowledge-Based Syst., vol. 42, pp. 97–110, 2013.
[17] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” Adv. Intell. Comput., vol. 17, no. 12, pp. 878–887, 2005.
[18] C. Drummond and R. C. Holte, “C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling,” Work. Learn. from Imbalanced Datasets II, pp. 1–8, 2003.
[19] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, no. January, pp. 321–357, 2002.
[20] D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Trans. Syst. Man Cybern., vol. 2, no. 3, pp. 408–421, 1972.
[21] I. Tomek, “Two Modification of CNN,” pp. 769–772, 1976.
[22] K. C. Gowda and G. Krishna, “The Condensed Nearest Neighbor Rule Using the Concept of Mutual Nearest Neighborhood,” IEEE Trans. Inf. Theory, vol. 25, no. 4, pp. 488–490, 1979.
[23] C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Inf. Sci. (Ny)., vol. 179, no. 8, pp. 1040–1058, 2009.
[24] I. H. Witten, E. Frank, and M. a. Hall, Data Mining Practical Machine Learning Tools and Techniques Third Edition, vol. 277, no. Tentang Data Mining. 2011.