Please use this identifier to cite or link to this item: https://ruomoplus.lib.uom.gr/handle/8000/1790
Title: Reduction Through Homogeneous Clustering: Variations for Categorical Data and Fast Data Reduction
Authors: Ougiaroglou, Stefanos 
Papadimitriou, Nikolaos 
Evangelidis, Georgios 
Author Department Affiliations: Department of Applied Informatics 
Department of Applied Informatics 
Author School Affiliations: School of Information Sciences 
School of Information Sciences 
Subjects: FRASCATI__Natural sciences__Computer and information sciences
Keywords: Categorical data
Data reduction
k-means
k-modes
k-NN Classification
Prototype generation
RHC
Issue Date: 25-Jun-2024
Publisher: Springer
Journal: SN Computer Science 
ISSN: 2662-995X
Volume: 5
Issue: 6
Start page: 671
Abstract: 
Reduction through Homogeneous Clustering (RHC) and its editing variant (ERHC) represent effective methods for reducing data in the context of instance-based classification. Both RHC and ERHC are based on an iterative k-means clustering procedure that builds homogeneous clusters. Therefore, they are inappropriate for data reduction tasks that need to be performed quickly, especially, when run over large training datasets. Moreover, since they are based on k-means clustering, they are inappropriate for categorical data. This paper introduces a set of variations to the RHC and ERHC algorithms. More specifically, addressing the iterative nature of k-means clustering in RHC and ERHC, we present new adaptations known as RHC2 and ERHC2. These variations strategically replace the complete execution of k-means clustering with a streamlined task, demonstrating significant improvements in speed. Additionally, we extend the scope of our study to address categorical data by introducing new variations of RHC and ERHC. The adaptations designed for handling categorical data are denoted as RHCM and ERHCM and are based on k-modes clustering. Our experimental study spans diverse datasets and includes statistical tests. The findings reveal a notable performance improvement in execution time for adaptations we propose compared to RHC, ERHC and two other prominent data reduction techniques. Moreover, RHC2 and ERHC2 are found to outperform their predecessors in data reduction effectiveness. Concerning RHCM and ERHCM, performance evaluations conducted on various categorical datasets indicate that these variations efficiently minimize the dataset size, with a relatively modest compromise in accuracy.
URI: https://ruomoplus.lib.uom.gr/handle/8000/1790
DOI: 10.1007/s42979-024-03007-9
Rights: Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές
Corresponding Item Departments: Department of Applied Informatics
Appears in Collections:Articles

Files in This Item:
File Description SizeFormat Existing users please
SN_Computer_Science_2024.pdf248,68 kBAdobe PDF
Embargoed until June 25, 2025    Request a copy
Show full item record

Page view(s)

6
checked on Dec 11, 2024

Download(s)

1
checked on Dec 11, 2024

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons