Please use this identifier to cite or link to this item:
https://ruomoplus.lib.uom.gr/handle/8000/1790
Title: | Reduction Through Homogeneous Clustering: Variations for Categorical Data and Fast Data Reduction | Authors: | Ougiaroglou, Stefanos Papadimitriou, Nikolaos Evangelidis, Georgios |
Author Department Affiliations: | Department of Applied Informatics Department of Applied Informatics |
Author School Affiliations: | School of Information Sciences School of Information Sciences |
Subjects: | FRASCATI__Natural sciences__Computer and information sciences | Keywords: | Categorical data Data reduction k-means k-modes k-NN Classification Prototype generation RHC |
Issue Date: | 25-Jun-2024 | Publisher: | Springer | Journal: | SN Computer Science | ISSN: | 2662-995X | Volume: | 5 | Issue: | 6 | Start page: | 671 | Abstract: | Reduction through Homogeneous Clustering (RHC) and its editing variant (ERHC) represent effective methods for reducing data in the context of instance-based classification. Both RHC and ERHC are based on an iterative k-means clustering procedure that builds homogeneous clusters. Therefore, they are inappropriate for data reduction tasks that need to be performed quickly, especially, when run over large training datasets. Moreover, since they are based on k-means clustering, they are inappropriate for categorical data. This paper introduces a set of variations to the RHC and ERHC algorithms. More specifically, addressing the iterative nature of k-means clustering in RHC and ERHC, we present new adaptations known as RHC2 and ERHC2. These variations strategically replace the complete execution of k-means clustering with a streamlined task, demonstrating significant improvements in speed. Additionally, we extend the scope of our study to address categorical data by introducing new variations of RHC and ERHC. The adaptations designed for handling categorical data are denoted as RHCM and ERHCM and are based on k-modes clustering. Our experimental study spans diverse datasets and includes statistical tests. The findings reveal a notable performance improvement in execution time for adaptations we propose compared to RHC, ERHC and two other prominent data reduction techniques. Moreover, RHC2 and ERHC2 are found to outperform their predecessors in data reduction effectiveness. Concerning RHCM and ERHCM, performance evaluations conducted on various categorical datasets indicate that these variations efficiently minimize the dataset size, with a relatively modest compromise in accuracy. |
URI: | https://ruomoplus.lib.uom.gr/handle/8000/1790 | DOI: | 10.1007/s42979-024-03007-9 | Rights: | Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές | Corresponding Item Departments: | Department of Applied Informatics |
Appears in Collections: | Articles |
Files in This Item:
File | Description | Size | Format | Existing users please |
---|---|---|---|---|
SN_Computer_Science_2024.pdf | 248,68 kB | Adobe PDF | Request a copy | Embargoed until June 25, 2025
Page view(s)
6
checked on Dec 11, 2024
Download(s)
1
checked on Dec 11, 2024
Google ScholarTM
Check
Altmetric
Altmetric
This item is licensed under a Creative Commons License