An empirical study toward dealing with noise and class imbalance issues in software defect prediction
Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fi...
Ausführliche Beschreibung
Autor*in: |
Pandey, Sushant Kumar [verfasserIn] Tripathi, Anil Kumar [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2021 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 |
---|
Übergeordnetes Werk: |
Enthalten in: Soft Computing - Springer-Verlag, 2003, 25(2021), 21 vom: 13. Aug., Seite 13465-13492 |
---|---|
Übergeordnetes Werk: |
volume:25 ; year:2021 ; number:21 ; day:13 ; month:08 ; pages:13465-13492 |
Links: |
---|
DOI / URN: |
10.1007/s00500-021-06096-3 |
---|
Katalog-ID: |
SPR045276366 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | SPR045276366 | ||
003 | DE-627 | ||
005 | 20211013064745.0 | ||
007 | cr uuu---uuuuu | ||
008 | 211013s2021 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1007/s00500-021-06096-3 |2 doi | |
035 | |a (DE-627)SPR045276366 | ||
035 | |a (SPR)s00500-021-06096-3-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Pandey, Sushant Kumar |e verfasserin |4 aut | |
245 | 1 | 3 | |a An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 | ||
520 | |a Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. | ||
650 | 4 | |a Software testing |7 (dpeaa)DE-He213 | |
650 | 4 | |a Software fault prediction |7 (dpeaa)DE-He213 | |
650 | 4 | |a Class imbalance |7 (dpeaa)DE-He213 | |
650 | 4 | |a Noisy instance |7 (dpeaa)DE-He213 | |
650 | 4 | |a Machine learning |7 (dpeaa)DE-He213 | |
650 | 4 | |a Software metrics |7 (dpeaa)DE-He213 | |
650 | 4 | |a Fault proneness |7 (dpeaa)DE-He213 | |
700 | 1 | |a Tripathi, Anil Kumar |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Soft Computing |d Springer-Verlag, 2003 |g 25(2021), 21 vom: 13. Aug., Seite 13465-13492 |w (DE-627)SPR006469531 |7 nnns |
773 | 1 | 8 | |g volume:25 |g year:2021 |g number:21 |g day:13 |g month:08 |g pages:13465-13492 |
856 | 4 | 0 | |u https://dx.doi.org/10.1007/s00500-021-06096-3 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_SPRINGER | ||
951 | |a AR | ||
952 | |d 25 |j 2021 |e 21 |b 13 |c 08 |h 13465-13492 |
author_variant |
s k p sk skp a k t ak akt |
---|---|
matchkey_str |
pandeysushantkumartripathianilkumar:2021----:nmiiasuyoadelnwtniencasmaacisei |
hierarchy_sort_str |
2021 |
publishDate |
2021 |
allfields |
10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492 |
spelling |
10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492 |
allfields_unstemmed |
10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492 |
allfieldsGer |
10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492 |
allfieldsSound |
10.1007/s00500-021-06096-3 doi (DE-627)SPR045276366 (SPR)s00500-021-06096-3-e DE-627 ger DE-627 rakwb eng Pandey, Sushant Kumar verfasserin aut An empirical study toward dealing with noise and class imbalance issues in software defect prediction 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 Tripathi, Anil Kumar verfasserin aut Enthalten in Soft Computing Springer-Verlag, 2003 25(2021), 21 vom: 13. Aug., Seite 13465-13492 (DE-627)SPR006469531 nnns volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 https://dx.doi.org/10.1007/s00500-021-06096-3 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 25 2021 21 13 08 13465-13492 |
language |
English |
source |
Enthalten in Soft Computing 25(2021), 21 vom: 13. Aug., Seite 13465-13492 volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 |
sourceStr |
Enthalten in Soft Computing 25(2021), 21 vom: 13. Aug., Seite 13465-13492 volume:25 year:2021 number:21 day:13 month:08 pages:13465-13492 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Software testing Software fault prediction Class imbalance Noisy instance Machine learning Software metrics Fault proneness |
isfreeaccess_bool |
false |
container_title |
Soft Computing |
authorswithroles_txt_mv |
Pandey, Sushant Kumar @@aut@@ Tripathi, Anil Kumar @@aut@@ |
publishDateDaySort_date |
2021-08-13T00:00:00Z |
hierarchy_top_id |
SPR006469531 |
id |
SPR045276366 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR045276366</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20211013064745.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">211013s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-021-06096-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR045276366</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-021-06096-3-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pandey, Sushant Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="3"><subfield code="a">An empirical study toward dealing with noise and class imbalance issues in software defect prediction</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software testing</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software fault prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class imbalance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Noisy instance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software metrics</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fault proneness</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tripathi, Anil Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 21 vom: 13. Aug., Seite 13465-13492</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:21</subfield><subfield code="g">day:13</subfield><subfield code="g">month:08</subfield><subfield code="g">pages:13465-13492</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-021-06096-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">21</subfield><subfield code="b">13</subfield><subfield code="c">08</subfield><subfield code="h">13465-13492</subfield></datafield></record></collection>
|
author |
Pandey, Sushant Kumar |
spellingShingle |
Pandey, Sushant Kumar misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
authorStr |
Pandey, Sushant Kumar |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)SPR006469531 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
springer |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
An empirical study toward dealing with noise and class imbalance issues in software defect prediction Software testing (dpeaa)DE-He213 Software fault prediction (dpeaa)DE-He213 Class imbalance (dpeaa)DE-He213 Noisy instance (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 Software metrics (dpeaa)DE-He213 Fault proneness (dpeaa)DE-He213 |
topic |
misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness |
topic_unstemmed |
misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness |
topic_browse |
misc Software testing misc Software fault prediction misc Class imbalance misc Noisy instance misc Machine learning misc Software metrics misc Fault proneness |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Soft Computing |
hierarchy_parent_id |
SPR006469531 |
hierarchy_top_title |
Soft Computing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)SPR006469531 |
title |
An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
ctrlnum |
(DE-627)SPR045276366 (SPR)s00500-021-06096-3-e |
title_full |
An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
author_sort |
Pandey, Sushant Kumar |
journal |
Soft Computing |
journalStr |
Soft Computing |
lang_code |
eng |
isOA_bool |
false |
recordtype |
marc |
publishDateSort |
2021 |
contenttype_str_mv |
txt |
container_start_page |
13465 |
author_browse |
Pandey, Sushant Kumar Tripathi, Anil Kumar |
container_volume |
25 |
format_se |
Elektronische Aufsätze |
author-letter |
Pandey, Sushant Kumar |
doi_str_mv |
10.1007/s00500-021-06096-3 |
author2-role |
verfasserin |
title_sort |
empirical study toward dealing with noise and class imbalance issues in software defect prediction |
title_auth |
An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
abstract |
Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 |
abstractGer |
Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 |
abstract_unstemmed |
Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER |
container_issue |
21 |
title_short |
An empirical study toward dealing with noise and class imbalance issues in software defect prediction |
url |
https://dx.doi.org/10.1007/s00500-021-06096-3 |
remote_bool |
true |
author2 |
Tripathi, Anil Kumar |
author2Str |
Tripathi, Anil Kumar |
ppnlink |
SPR006469531 |
mediatype_str_mv |
c |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s00500-021-06096-3 |
up_date |
2024-07-03T14:57:07.397Z |
_version_ |
1803570252245630976 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR045276366</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20211013064745.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">211013s2021 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-021-06096-3</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR045276366</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-021-06096-3-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Pandey, Sushant Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="3"><subfield code="a">An empirical study toward dealing with noise and class imbalance issues in software defect prediction</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The quality of the defect datasets is a critical issue in the domain of software defect prediction (SDP). These datasets are obtained through the mining of software repositories. Recent studies claim over the quality of the defect dataset. It is because of inconsistency between bug/clean fix keyword in fault reports and the corresponding link in the change management logs. Class Imbalance (CI) problem is also a big challenging issue in SDP models. The defect prediction method trained using noisy and imbalanced data leads to inconsistent and unsatisfactory results. Combined analysis over noisy instances and CI problem needs to be required. To the best of our knowledge, there are insufficient studies that have been done over such aspects. In this paper, we deal with the impact of noise and CI problem on five baseline SDP models; we manually added the various noise level (0–80%) and identified its impact on the performance of those SDP models. Moreover, we further provide guidelines for the possible range of tolerable noise for baseline models. We have also suggested the SDP model, which has the highest noise tolerable ability and outperforms over other classical methods. The True Positive Rate (TPR) and False Positive Rate (FPR) values of the baseline models reduce between 20–30% after adding 10–40% noisy instances. Similarly, the ROC (Receiver Operating Characteristics) values of SDP models reduce to 40–50%. The suggested model leads to avoid noise between 40–60% as compared to other traditional models.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software testing</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software fault prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class imbalance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Noisy instance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software metrics</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Fault proneness</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tripathi, Anil Kumar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">25(2021), 21 vom: 13. Aug., Seite 13465-13492</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:25</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:21</subfield><subfield code="g">day:13</subfield><subfield code="g">month:08</subfield><subfield code="g">pages:13465-13492</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-021-06096-3</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">25</subfield><subfield code="j">2021</subfield><subfield code="e">21</subfield><subfield code="b">13</subfield><subfield code="c">08</subfield><subfield code="h">13465-13492</subfield></datafield></record></collection>
|
score |
7.397253 |