Which standard classification algorithm has more stable performance for imbalanced network traffic data?
Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algori...
Ausführliche Beschreibung
Autor*in: |
Zheng, Ming [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2023 |
---|
Schlagwörter: |
Imbalanced network traffic data |
---|
Anmerkung: |
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
---|
Übergeordnetes Werk: |
Enthalten in: Soft Computing - Springer-Verlag, 2003, 28(2023), 1 vom: 26. Okt., Seite 217-234 |
---|---|
Übergeordnetes Werk: |
volume:28 ; year:2023 ; number:1 ; day:26 ; month:10 ; pages:217-234 |
Links: |
---|
DOI / URN: |
10.1007/s00500-023-09331-1 |
---|
Katalog-ID: |
SPR054258472 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | SPR054258472 | ||
003 | DE-627 | ||
005 | 20240105064714.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240105s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1007/s00500-023-09331-1 |2 doi | |
035 | |a (DE-627)SPR054258472 | ||
035 | |a (SPR)s00500-023-09331-1-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Zheng, Ming |e verfasserin |0 (orcid)0000-0001-9001-0859 |4 aut | |
245 | 1 | 0 | |a Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. | ||
520 | |a Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. | ||
650 | 4 | |a Imbalanced network traffic data |7 (dpeaa)DE-He213 | |
650 | 4 | |a Data augmentation algorithms |7 (dpeaa)DE-He213 | |
650 | 4 | |a Standard classification algorithms |7 (dpeaa)DE-He213 | |
650 | 4 | |a Stable classification performance |7 (dpeaa)DE-He213 | |
700 | 1 | |a Ma, Kai |4 aut | |
700 | 1 | |a Wang, Fei |4 aut | |
700 | 1 | |a Hu, Xiaowen |4 aut | |
700 | 1 | |a Yu, Qingying |4 aut | |
700 | 1 | |a Guo, Liangmin |4 aut | |
700 | 1 | |a Chen, Fulong |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Soft Computing |d Springer-Verlag, 2003 |g 28(2023), 1 vom: 26. Okt., Seite 217-234 |w (DE-627)SPR006469531 |7 nnns |
773 | 1 | 8 | |g volume:28 |g year:2023 |g number:1 |g day:26 |g month:10 |g pages:217-234 |
856 | 4 | 0 | |u https://dx.doi.org/10.1007/s00500-023-09331-1 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_SPRINGER | ||
951 | |a AR | ||
952 | |d 28 |j 2023 |e 1 |b 26 |c 10 |h 217-234 |
author_variant |
m z mz k m km f w fw x h xh q y qy l g lg f c fc |
---|---|
matchkey_str |
zhengmingmakaiwangfeihuxiaowenyuqingying:2023----:hcsadrcasfctoagrtmamrsalpromneoib |
hierarchy_sort_str |
2023 |
publishDate |
2023 |
allfields |
10.1007/s00500-023-09331-1 doi (DE-627)SPR054258472 (SPR)s00500-023-09331-1-e DE-627 ger DE-627 rakwb eng Zheng, Ming verfasserin (orcid)0000-0001-9001-0859 aut Which standard classification algorithm has more stable performance for imbalanced network traffic data? 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 Ma, Kai aut Wang, Fei aut Hu, Xiaowen aut Yu, Qingying aut Guo, Liangmin aut Chen, Fulong aut Enthalten in Soft Computing Springer-Verlag, 2003 28(2023), 1 vom: 26. Okt., Seite 217-234 (DE-627)SPR006469531 nnns volume:28 year:2023 number:1 day:26 month:10 pages:217-234 https://dx.doi.org/10.1007/s00500-023-09331-1 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 28 2023 1 26 10 217-234 |
spelling |
10.1007/s00500-023-09331-1 doi (DE-627)SPR054258472 (SPR)s00500-023-09331-1-e DE-627 ger DE-627 rakwb eng Zheng, Ming verfasserin (orcid)0000-0001-9001-0859 aut Which standard classification algorithm has more stable performance for imbalanced network traffic data? 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 Ma, Kai aut Wang, Fei aut Hu, Xiaowen aut Yu, Qingying aut Guo, Liangmin aut Chen, Fulong aut Enthalten in Soft Computing Springer-Verlag, 2003 28(2023), 1 vom: 26. Okt., Seite 217-234 (DE-627)SPR006469531 nnns volume:28 year:2023 number:1 day:26 month:10 pages:217-234 https://dx.doi.org/10.1007/s00500-023-09331-1 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 28 2023 1 26 10 217-234 |
allfields_unstemmed |
10.1007/s00500-023-09331-1 doi (DE-627)SPR054258472 (SPR)s00500-023-09331-1-e DE-627 ger DE-627 rakwb eng Zheng, Ming verfasserin (orcid)0000-0001-9001-0859 aut Which standard classification algorithm has more stable performance for imbalanced network traffic data? 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 Ma, Kai aut Wang, Fei aut Hu, Xiaowen aut Yu, Qingying aut Guo, Liangmin aut Chen, Fulong aut Enthalten in Soft Computing Springer-Verlag, 2003 28(2023), 1 vom: 26. Okt., Seite 217-234 (DE-627)SPR006469531 nnns volume:28 year:2023 number:1 day:26 month:10 pages:217-234 https://dx.doi.org/10.1007/s00500-023-09331-1 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 28 2023 1 26 10 217-234 |
allfieldsGer |
10.1007/s00500-023-09331-1 doi (DE-627)SPR054258472 (SPR)s00500-023-09331-1-e DE-627 ger DE-627 rakwb eng Zheng, Ming verfasserin (orcid)0000-0001-9001-0859 aut Which standard classification algorithm has more stable performance for imbalanced network traffic data? 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 Ma, Kai aut Wang, Fei aut Hu, Xiaowen aut Yu, Qingying aut Guo, Liangmin aut Chen, Fulong aut Enthalten in Soft Computing Springer-Verlag, 2003 28(2023), 1 vom: 26. Okt., Seite 217-234 (DE-627)SPR006469531 nnns volume:28 year:2023 number:1 day:26 month:10 pages:217-234 https://dx.doi.org/10.1007/s00500-023-09331-1 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 28 2023 1 26 10 217-234 |
allfieldsSound |
10.1007/s00500-023-09331-1 doi (DE-627)SPR054258472 (SPR)s00500-023-09331-1-e DE-627 ger DE-627 rakwb eng Zheng, Ming verfasserin (orcid)0000-0001-9001-0859 aut Which standard classification algorithm has more stable performance for imbalanced network traffic data? 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 Ma, Kai aut Wang, Fei aut Hu, Xiaowen aut Yu, Qingying aut Guo, Liangmin aut Chen, Fulong aut Enthalten in Soft Computing Springer-Verlag, 2003 28(2023), 1 vom: 26. Okt., Seite 217-234 (DE-627)SPR006469531 nnns volume:28 year:2023 number:1 day:26 month:10 pages:217-234 https://dx.doi.org/10.1007/s00500-023-09331-1 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER AR 28 2023 1 26 10 217-234 |
language |
English |
source |
Enthalten in Soft Computing 28(2023), 1 vom: 26. Okt., Seite 217-234 volume:28 year:2023 number:1 day:26 month:10 pages:217-234 |
sourceStr |
Enthalten in Soft Computing 28(2023), 1 vom: 26. Okt., Seite 217-234 volume:28 year:2023 number:1 day:26 month:10 pages:217-234 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Imbalanced network traffic data Data augmentation algorithms Standard classification algorithms Stable classification performance |
isfreeaccess_bool |
false |
container_title |
Soft Computing |
authorswithroles_txt_mv |
Zheng, Ming @@aut@@ Ma, Kai @@aut@@ Wang, Fei @@aut@@ Hu, Xiaowen @@aut@@ Yu, Qingying @@aut@@ Guo, Liangmin @@aut@@ Chen, Fulong @@aut@@ |
publishDateDaySort_date |
2023-10-26T00:00:00Z |
hierarchy_top_id |
SPR006469531 |
id |
SPR054258472 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR054258472</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240105064714.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240105s2023 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-023-09331-1</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR054258472</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-023-09331-1-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Zheng, Ming</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9001-0859</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Which standard classification algorithm has more stable performance for imbalanced network traffic data?</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Imbalanced network traffic data</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data augmentation algorithms</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Standard classification algorithms</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Stable classification performance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ma, Kai</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Fei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hu, Xiaowen</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Yu, Qingying</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Liangmin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Fulong</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">28(2023), 1 vom: 26. Okt., Seite 217-234</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:28</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:1</subfield><subfield code="g">day:26</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:217-234</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-023-09331-1</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">28</subfield><subfield code="j">2023</subfield><subfield code="e">1</subfield><subfield code="b">26</subfield><subfield code="c">10</subfield><subfield code="h">217-234</subfield></datafield></record></collection>
|
author |
Zheng, Ming |
spellingShingle |
Zheng, Ming misc Imbalanced network traffic data misc Data augmentation algorithms misc Standard classification algorithms misc Stable classification performance Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
authorStr |
Zheng, Ming |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)SPR006469531 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut aut aut aut |
collection |
springer |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
Which standard classification algorithm has more stable performance for imbalanced network traffic data? Imbalanced network traffic data (dpeaa)DE-He213 Data augmentation algorithms (dpeaa)DE-He213 Standard classification algorithms (dpeaa)DE-He213 Stable classification performance (dpeaa)DE-He213 |
topic |
misc Imbalanced network traffic data misc Data augmentation algorithms misc Standard classification algorithms misc Stable classification performance |
topic_unstemmed |
misc Imbalanced network traffic data misc Data augmentation algorithms misc Standard classification algorithms misc Stable classification performance |
topic_browse |
misc Imbalanced network traffic data misc Data augmentation algorithms misc Standard classification algorithms misc Stable classification performance |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Soft Computing |
hierarchy_parent_id |
SPR006469531 |
hierarchy_top_title |
Soft Computing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)SPR006469531 |
title |
Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
ctrlnum |
(DE-627)SPR054258472 (SPR)s00500-023-09331-1-e |
title_full |
Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
author_sort |
Zheng, Ming |
journal |
Soft Computing |
journalStr |
Soft Computing |
lang_code |
eng |
isOA_bool |
false |
recordtype |
marc |
publishDateSort |
2023 |
contenttype_str_mv |
txt |
container_start_page |
217 |
author_browse |
Zheng, Ming Ma, Kai Wang, Fei Hu, Xiaowen Yu, Qingying Guo, Liangmin Chen, Fulong |
container_volume |
28 |
format_se |
Elektronische Aufsätze |
author-letter |
Zheng, Ming |
doi_str_mv |
10.1007/s00500-023-09331-1 |
normlink |
(ORCID)0000-0001-9001-0859 |
normlink_prefix_str_mv |
(orcid)0000-0001-9001-0859 |
title_sort |
which standard classification algorithm has more stable performance for imbalanced network traffic data? |
title_auth |
Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
abstract |
Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
abstractGer |
Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
abstract_unstemmed |
Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER |
container_issue |
1 |
title_short |
Which standard classification algorithm has more stable performance for imbalanced network traffic data? |
url |
https://dx.doi.org/10.1007/s00500-023-09331-1 |
remote_bool |
true |
author2 |
Ma, Kai Wang, Fei Hu, Xiaowen Yu, Qingying Guo, Liangmin Chen, Fulong |
author2Str |
Ma, Kai Wang, Fei Hu, Xiaowen Yu, Qingying Guo, Liangmin Chen, Fulong |
ppnlink |
SPR006469531 |
mediatype_str_mv |
c |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s00500-023-09331-1 |
up_date |
2024-07-04T00:43:53.837Z |
_version_ |
1803607168870514688 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR054258472</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240105064714.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240105s2023 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s00500-023-09331-1</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR054258472</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s00500-023-09331-1-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Zheng, Ming</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9001-0859</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Which standard classification algorithm has more stable performance for imbalanced network traffic data?</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Most standard classification algorithms are difficult to effectively learn and predict from imbalanced network traffic data, which usually leads to lower classification accuracy. To analyze the influence of imbalanced network traffic data on the performance of standard classification algorithms, the imbalanced data augmentation algorithms are first designed to obtain the imbalanced network traffic data set with gradually varying Imbalance Ratio (IR) and belonging to the same distribution. Then, to obtain more objective classification result and simplify the evaluation process, the evaluation metric AFG is used to evaluate the classification performance of standard classification algorithms based on area under the receiver operating characteristic curve (AUC), F-measure and G-mean. Finally, based on AFG and coefficient of variation (CV), performance stability of standard classification algorithms on imbalanced network traffic data is obtained. Experiments of eight widely used standard classification algorithms on 25 different imbalanced network traffic data demonstrate that the classification performance of GNB, RF and DT is unstable, while BNB, KNN, LR, GBDT, and SVC are relatively stable and not susceptible to imbalanced data. Especially, the KNN has the most stable classification performance. Also, the results are statistically confirmed by Friedman and Nemenyi post hoc statistical tests.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Imbalanced network traffic data</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data augmentation algorithms</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Standard classification algorithms</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Stable classification performance</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ma, Kai</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Fei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hu, Xiaowen</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Yu, Qingying</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Liangmin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Chen, Fulong</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Soft Computing</subfield><subfield code="d">Springer-Verlag, 2003</subfield><subfield code="g">28(2023), 1 vom: 26. Okt., Seite 217-234</subfield><subfield code="w">(DE-627)SPR006469531</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:28</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:1</subfield><subfield code="g">day:26</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:217-234</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1007/s00500-023-09331-1</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">28</subfield><subfield code="j">2023</subfield><subfield code="e">1</subfield><subfield code="b">26</subfield><subfield code="c">10</subfield><subfield code="h">217-234</subfield></datafield></record></collection>
|
score |
7.399643 |