An empirical study toward dealing with noise and class imbalance issues in software defect prediction

Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fi...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Pandey, Sushant Kumar [verfasserIn] Tripathi, Anil Kumar [verfasserIn]

Format:	E-Artikel
Sprache:	Englisch

Erschienen:	2021

Schlagwörter:	Software testing Software fault prediction Class imbalance Noisy instance Machine learning Software metrics Fault proneness

Anmerkung:	© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021

Übergeordnetes Werk:	Enthalten in: Soft Computing - Springer-Verlag, 2003, 25(2021), 21 vom: 13. Aug., Seite 13465-13492
Übergeordnetes Werk:	volume:25 ; year:2021 ; number:21 ; day:13 ; month:08 ; pages:13465-13492

Links:	Volltext

DOI / URN:	10.1007/s00500-021-06096-3

Katalog-ID:	SPR045276366

Internformat


LEADER	01000naa a22002652 4500
001	SPR045276366
003	DE-627
005	20211013064745.0
007	cr uuu---uuuuu
008	211013s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1007/s00500-021-06096-3 \|2 doi
035			\|a (DE-627)SPR045276366
035			\|a (SPR)s00500-021-06096-3-e
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Pandey, Sushant Kumar \|e verfasserin \|4 aut
245	1	3	\|a An empirical study toward dealing with noise and class imbalance issues in software defect prediction
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
520			\|a Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.
650		4	\|a Software testing \|7 (dpeaa)DE-He213
650		4	\|a Software fault prediction \|7 (dpeaa)DE-He213
650		4	\|a Class imbalance \|7 (dpeaa)DE-He213
650		4	\|a Noisy instance \|7 (dpeaa)DE-He213
650		4	\|a Machine learning \|7 (dpeaa)DE-He213
650		4	\|a Software metrics \|7 (dpeaa)DE-He213
650		4	\|a Fault proneness \|7 (dpeaa)DE-He213
700	1		\|a Tripathi, Anil Kumar \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Soft Computing \|d Springer-Verlag, 2003 \|g 25(2021), 21 vom: 13. Aug., Seite 13465-13492 \|w (DE-627)SPR006469531 \|7 nnns
773	1	8	\|g volume:25 \|g year:2021 \|g number:21 \|g day:13 \|g month:08 \|g pages:13465-13492
856	4	0	\|u https://dx.doi.org/10.1007/s00500-021-06096-3 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_SPRINGER
951			\|a AR
952			\|d 25 \|j 2021 \|e 21 \|b 13 \|c 08 \|h 13465-13492

Indexfelder

author_variant	s k p sk skp a k t ak akt
matchkey_str	pandeysushantkumartripathianilkumar:2021----:nmiiasuyoadelnwtniencasmaacisei
hierarchy_sort_str	2021
publishDate	2021
allfields	10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492
spelling	10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492
allfields_unstemmed	10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492
allfieldsGer	10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492
allfieldsSound	10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492
language	English
source	Enthalten in Soft Computing 25(2021), 21 vom: 13. Aug., Seite 13465-13492 volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492
sourceStr	Enthalten in Soft Computing 25(2021), 21 vom: 13. Aug., Seite 13465-13492 volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Software testing Software fault prediction Class imbalance Noisy instance Machine learning Software metrics Fault proneness
isfreeaccess_bool	false
container_title	Soft Computing
authorswithroles_txt_mv	Pandey, Sushant Kumar @@aut@@ Tripathi, Anil Kumar @@aut@@
publishDateDaySort_date	2021-08-13T00:00:00Z
hierarchy_top_id	SPR006469531
id	SPR045276366
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR045276366</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20211013064745.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">211013s2021 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-021-06096-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR045276366</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-021-06096-3-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pandey, Sushant Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="3"><subfield code="a">An empirical study toward dealing with noise and class imbalance issues in software defect prediction</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software testing</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software fault prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class imbalance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Noisy instance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software metrics</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fault proneness</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tripathi, Anil Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 21 vom: 13. Aug., Seite 13465-13492</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:21</subfield><subfield code="g">day:13</subfield><subfield code="g">month:08</subfield><subfield code="g">pages:13465-13492</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-021-06096-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">21</subfield><subfield code="b">13</subfield><subfield code="c">08</subfield><subfield code="h">13465-13492</subfield></datafield></record></collection>
author	Pandey, Sushant Kumar
spellingShingle	Pandey, Sushant Kumar misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness An empirical study toward dealing with noise and class imbalance issues in software defect prediction
authorStr	Pandey, Sushant Kumar
ppnlink_with_tag_str_mv	@@773@@(DE-627)SPR006469531
format	electronic Article
delete_txt_mv	keep
author_role	aut aut
collection	springer
remote_str	true
illustrated	Not Illustrated
topic_title	An empirical study toward dealing with noise and class imbalance issues in software defect prediction Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213
topic	misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness
topic_unstemmed	misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness
topic_browse	misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness
format_facet	Elektronische Aufsätze Aufsätze Elektronische Ressource
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	cr
hierarchy_parent_title	Soft Computing
hierarchy_parent_id	SPR006469531
hierarchy_top_title	Soft Computing
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)SPR006469531
title	An empirical study toward dealing with noise and class imbalance issues in software defect prediction
ctrlnum	(DE-627)SPR045276366 (SPR)s00500-021-06096-3-e
title_full	An empirical study toward dealing with noise and class imbalance issues in software defect prediction
author_sort	Pandey, Sushant Kumar
journal	Soft Computing
journalStr	Soft Computing
lang_code	eng
isOA_bool	false
recordtype	marc
publishDateSort	2021
contenttype_str_mv	txt
container_start_page	13465
author_browse	Pandey, Sushant Kumar Tripathi, Anil Kumar
container_volume	25
format_se	Elektronische Aufsätze
author-letter	Pandey, Sushant Kumar
doi_str_mv	10.1007/s00500-021-06096-3
author2-role	verfasserin
title_sort	empirical study toward dealing with noise and class imbalance issues in software defect prediction
title_auth	An empirical study toward dealing with noise and class imbalance issues in software defect prediction
abstract	Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
abstractGer	Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
abstract_unstemmed	Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER
container_issue	21
title_short	An empirical study toward dealing with noise and class imbalance issues in software defect prediction
url	https://dx.doi.org/10.1007/s00500-021-06096-3
remote_bool	true
author2	Tripathi, Anil Kumar
author2Str	Tripathi, Anil Kumar
ppnlink	SPR006469531
mediatype_str_mv	c
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s00500-021-06096-3
up_date	2024-07-03T14:57:07.397Z
_version_	1803570252245630976
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR045276366</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20211013064745.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">211013s2021 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-021-06096-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR045276366</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-021-06096-3-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pandey, Sushant Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="3"><subfield code="a">An empirical study toward dealing with noise and class imbalance issues in software defect prediction</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software testing</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software fault prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class imbalance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Noisy instance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software metrics</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fault proneness</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tripathi, Anil Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 21 vom: 13. Aug., Seite 13465-13492</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:21</subfield><subfield code="g">day:13</subfield><subfield code="g">month:08</subfield><subfield code="g">pages:13465-13492</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-021-06096-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">21</subfield><subfield code="b">13</subfield><subfield code="c">08</subfield><subfield code="h">13465-13492</subfield></datafield></record></collection>
score	7.397253

Nicht das Richtige dabei?

Schreiben Sie uns!

An empirical study toward dealing with noise and class imbalance issues in software defect prediction

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?