Please use this identifier to cite or link to this item: https://ruomoplus.lib.uom.gr/handle/8000/2037
Title: Vulnerability Classification on Source Code Using Text Mining and Deep Learning Techniques
Authors: Kalouptsoglou, Ilias 
Siavvas, Miltiadis 
Ampatzoglou, Apostolos 
Kehagias, Dionysios 
Chatzigeorgiou, Alexander 
Author Department Affiliations: Department of Applied Informatics 
Department of Applied Informatics 
Department of Applied Informatics 
Author School Affiliations: School of Information Sciences 
School of Information Sciences 
School of Information Sciences 
Subjects: FRASCATI__Natural sciences__Computer and information sciences
FRASCATI__Engineering and technology__Electrical engineering, Electronic engineering, Information engineering
Keywords: contextual word embedding
large language models
natural language processing
security testing
transfer learning
vulnerability classification
Issue Date: 29-Oct-2024
Publisher: IEEE
Volume Title: Proceedings of the 2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)
Start page: 47
End page: 56
Conference: 2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C) 
Abstract: 
Nowadays, security testing is an integral part of the testing activities during the software development life-cycle. Over the years, various techniques have been proposed to identify security issues in the source code, especially vulnerabilities, which can be exploited and cause severe damages. Recently, Machine Learning (ML) techniques capable of predicting vulnerable software components and indicating high-risk areas have appeared, among others, accelerating the effort demanding and time consuming process of vulnerability localization. For effective subsequent vulnerability elimination, there is a need for automating the process of labeling detected vulnerabilities in vulnerability categories i.e., identifying the type of the vulnerability. Several techniques have been proposed over the years for automating the labeling process of vulnerabilities. However, the vast majority of the proposed methods attempt to identify the type of vulnerabilities based on their textual description that is provided by experts, such as the description provided by the vulnerability report in the National Vulnerability Database, and not on their actual source code, hindering their full automation and the vulnerability categorization from the software testing phase. This work examines the vulnerability classification directly from the source code during the vulnerability detection step. Moreover, this way, a vulnerability detection method will be able to provide complete information and interpretation of its findings. Leveraging the advances in the field of Artificial Intelligence and Natural Language Processing, we construct and compare several multi-class classification models for categorizing vulnerable code snippets. The results highlight the importance of the context-aware embeddings of the pre-trained Transformer-based models, as well as the significance of transfer learning from a programming language-related domain.
URI: https://ruomoplus.lib.uom.gr/handle/8000/2037
ISBN: [9798350365658]
DOI: 10.1109/QRS-C63300.2024.00017
Rights: CC0 1.0 Παγκόσμια
Attribution-NonCommercial-NoDerivatives 4.0 Διεθνές
Corresponding Item Departments: Department of Applied Informatics
Department of Applied Informatics
Department of Applied Informatics
Appears in Collections:Conference proceedings

Files in This Item:
File Description SizeFormat
kalouptsoglou2024qrs.pdf313,86 kBAdobe PDF
View/Open
Show full item record

SCOPUSTM   
Citations

4
checked on Apr 13, 2026

Page view(s)

78
checked on Apr 18, 2026

Download(s)

78
checked on Apr 18, 2026

Google ScholarTM

Check

Altmetric

Altmetric


This item is licensed under a Creative Commons License Creative Commons