Missing value imputation strategies for metabolomics data
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality an...
Ausführliche Beschreibung
Autor*in: |
Armitage, Emily Grace [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2015 |
---|
Rechteinformationen: |
Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Electrophoresis - Weinheim : Wiley-VCH, 1980, 36(2015), 24, Seite 3050-3060 |
---|---|
Übergeordnetes Werk: |
volume:36 ; year:2015 ; number:24 ; pages:3050-3060 |
Links: |
---|
DOI / URN: |
10.1002/elps.201500352 |
---|
Katalog-ID: |
OLC1958967114 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1958967114 | ||
003 | DE-627 | ||
005 | 20230519020931.0 | ||
007 | tu | ||
008 | 160206s2015 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1002/elps.201500352 |2 doi | |
028 | 5 | 2 | |a PQ20160617 |
035 | |a (DE-627)OLC1958967114 | ||
035 | |a (DE-599)GBVOLC1958967114 | ||
035 | |a (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 | ||
035 | |a (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 540 |a 570 |q DNB |
082 | 0 | 4 | |a 570 |q AVZ |
084 | |a BIODIV |2 fid | ||
084 | |a 35.29 |2 bkl | ||
100 | 1 | |a Armitage, Emily Grace |e verfasserin |4 aut | |
245 | 1 | 0 | |a Missing value imputation strategies for metabolomics data |
264 | 1 | |c 2015 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. | ||
540 | |a Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim | ||
540 | |a © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. | ||
650 | 4 | |a CE‐MS | |
650 | 4 | |a Data | |
650 | 4 | |a False‐discovery rate | |
650 | 4 | |a k‐nearest neighbour | |
650 | 4 | |a Imputation | |
650 | 4 | |a Missing values | |
650 | 4 | |a Metabolomics | |
700 | 1 | |a Godzien, Joanna |4 oth | |
700 | 1 | |a Alonso‐Herranz, Vanesa |4 oth | |
700 | 1 | |a López‐Gonzálvez, Ángeles |4 oth | |
700 | 1 | |a Barbas, Coral |4 oth | |
773 | 0 | 8 | |i Enthalten in |t Electrophoresis |d Weinheim : Wiley-VCH, 1980 |g 36(2015), 24, Seite 3050-3060 |w (DE-627)130409952 |w (DE-600)619001-7 |w (DE-576)015913732 |x 0173-0835 |7 nnns |
773 | 1 | 8 | |g volume:36 |g year:2015 |g number:24 |g pages:3050-3060 |
856 | 4 | 1 | |u http://dx.doi.org/10.1002/elps.201500352 |3 Volltext |
856 | 4 | 2 | |u http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract |
856 | 4 | 2 | |u http://www.ncbi.nlm.nih.gov/pubmed/26376450 |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a FID-BIODIV | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-CHE | ||
912 | |a SSG-OLC-PHA | ||
912 | |a SSG-OLC-DE-84 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_267 | ||
912 | |a GBV_ILN_2018 | ||
912 | |a GBV_ILN_2219 | ||
912 | |a GBV_ILN_4012 | ||
936 | b | k | |a 35.29 |q AVZ |
951 | |a AR | ||
952 | |d 36 |j 2015 |e 24 |h 3050-3060 |
author_variant |
e g a eg ega |
---|---|
matchkey_str |
article:01730835:2015----::isnvlemuaintaeisom |
hierarchy_sort_str |
2015 |
bklnumber |
35.29 |
publishDate |
2015 |
allfields |
10.1002/elps.201500352 doi PQ20160617 (DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat DE-627 ger DE-627 rakwb eng 540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Armitage, Emily Grace verfasserin aut Missing value imputation strategies for metabolomics data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics Godzien, Joanna oth Alonso‐Herranz, Vanesa oth López‐Gonzálvez, Ángeles oth Barbas, Coral oth Enthalten in Electrophoresis Weinheim : Wiley-VCH, 1980 36(2015), 24, Seite 3050-3060 (DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 0173-0835 nnns volume:36 year:2015 number:24 pages:3050-3060 http://dx.doi.org/10.1002/elps.201500352 Volltext http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 35.29 AVZ AR 36 2015 24 3050-3060 |
spelling |
10.1002/elps.201500352 doi PQ20160617 (DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat DE-627 ger DE-627 rakwb eng 540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Armitage, Emily Grace verfasserin aut Missing value imputation strategies for metabolomics data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics Godzien, Joanna oth Alonso‐Herranz, Vanesa oth López‐Gonzálvez, Ángeles oth Barbas, Coral oth Enthalten in Electrophoresis Weinheim : Wiley-VCH, 1980 36(2015), 24, Seite 3050-3060 (DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 0173-0835 nnns volume:36 year:2015 number:24 pages:3050-3060 http://dx.doi.org/10.1002/elps.201500352 Volltext http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 35.29 AVZ AR 36 2015 24 3050-3060 |
allfields_unstemmed |
10.1002/elps.201500352 doi PQ20160617 (DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat DE-627 ger DE-627 rakwb eng 540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Armitage, Emily Grace verfasserin aut Missing value imputation strategies for metabolomics data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics Godzien, Joanna oth Alonso‐Herranz, Vanesa oth López‐Gonzálvez, Ángeles oth Barbas, Coral oth Enthalten in Electrophoresis Weinheim : Wiley-VCH, 1980 36(2015), 24, Seite 3050-3060 (DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 0173-0835 nnns volume:36 year:2015 number:24 pages:3050-3060 http://dx.doi.org/10.1002/elps.201500352 Volltext http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 35.29 AVZ AR 36 2015 24 3050-3060 |
allfieldsGer |
10.1002/elps.201500352 doi PQ20160617 (DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat DE-627 ger DE-627 rakwb eng 540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Armitage, Emily Grace verfasserin aut Missing value imputation strategies for metabolomics data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics Godzien, Joanna oth Alonso‐Herranz, Vanesa oth López‐Gonzálvez, Ángeles oth Barbas, Coral oth Enthalten in Electrophoresis Weinheim : Wiley-VCH, 1980 36(2015), 24, Seite 3050-3060 (DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 0173-0835 nnns volume:36 year:2015 number:24 pages:3050-3060 http://dx.doi.org/10.1002/elps.201500352 Volltext http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 35.29 AVZ AR 36 2015 24 3050-3060 |
allfieldsSound |
10.1002/elps.201500352 doi PQ20160617 (DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat DE-627 ger DE-627 rakwb eng 540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Armitage, Emily Grace verfasserin aut Missing value imputation strategies for metabolomics data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics Godzien, Joanna oth Alonso‐Herranz, Vanesa oth López‐Gonzálvez, Ángeles oth Barbas, Coral oth Enthalten in Electrophoresis Weinheim : Wiley-VCH, 1980 36(2015), 24, Seite 3050-3060 (DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 0173-0835 nnns volume:36 year:2015 number:24 pages:3050-3060 http://dx.doi.org/10.1002/elps.201500352 Volltext http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 35.29 AVZ AR 36 2015 24 3050-3060 |
language |
English |
source |
Enthalten in Electrophoresis 36(2015), 24, Seite 3050-3060 volume:36 year:2015 number:24 pages:3050-3060 |
sourceStr |
Enthalten in Electrophoresis 36(2015), 24, Seite 3050-3060 volume:36 year:2015 number:24 pages:3050-3060 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics |
dewey-raw |
540 |
isfreeaccess_bool |
false |
container_title |
Electrophoresis |
authorswithroles_txt_mv |
Armitage, Emily Grace @@aut@@ Godzien, Joanna @@oth@@ Alonso‐Herranz, Vanesa @@oth@@ López‐Gonzálvez, Ángeles @@oth@@ Barbas, Coral @@oth@@ |
publishDateDaySort_date |
2015-01-01T00:00:00Z |
hierarchy_top_id |
130409952 |
dewey-sort |
3540 |
id |
OLC1958967114 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1958967114</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230519020931.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1002/elps.201500352</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1958967114</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1958967114</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">540</subfield><subfield code="a">570</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.29</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Armitage, Emily Grace</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Missing value imputation strategies for metabolomics data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">CE‐MS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">False‐discovery rate</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">k‐nearest neighbour</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Imputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Missing values</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Metabolomics</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Godzien, Joanna</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Alonso‐Herranz, Vanesa</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">López‐Gonzálvez, Ángeles</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Barbas, Coral</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Electrophoresis</subfield><subfield code="d">Weinheim : Wiley-VCH, 1980</subfield><subfield code="g">36(2015), 24, Seite 3050-3060</subfield><subfield code="w">(DE-627)130409952</subfield><subfield code="w">(DE-600)619001-7</subfield><subfield code="w">(DE-576)015913732</subfield><subfield code="x">0173-0835</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:36</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:24</subfield><subfield code="g">pages:3050-3060</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1002/elps.201500352</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/26376450</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-CHE</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_267</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2219</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.29</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">36</subfield><subfield code="j">2015</subfield><subfield code="e">24</subfield><subfield code="h">3050-3060</subfield></datafield></record></collection>
|
author |
Armitage, Emily Grace |
spellingShingle |
Armitage, Emily Grace ddc 540 ddc 570 fid BIODIV bkl 35.29 misc CE‐MS misc Data misc False‐discovery rate misc k‐nearest neighbour misc Imputation misc Missing values misc Metabolomics Missing value imputation strategies for metabolomics data |
authorStr |
Armitage, Emily Grace |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)130409952 |
format |
Article |
dewey-ones |
540 - Chemistry & allied sciences 570 - Life sciences; biology |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0173-0835 |
topic_title |
540 570 DNB 570 AVZ BIODIV fid 35.29 bkl Missing value imputation strategies for metabolomics data CE‐MS Data False‐discovery rate k‐nearest neighbour Imputation Missing values Metabolomics |
topic |
ddc 540 ddc 570 fid BIODIV bkl 35.29 misc CE‐MS misc Data misc False‐discovery rate misc k‐nearest neighbour misc Imputation misc Missing values misc Metabolomics |
topic_unstemmed |
ddc 540 ddc 570 fid BIODIV bkl 35.29 misc CE‐MS misc Data misc False‐discovery rate misc k‐nearest neighbour misc Imputation misc Missing values misc Metabolomics |
topic_browse |
ddc 540 ddc 570 fid BIODIV bkl 35.29 misc CE‐MS misc Data misc False‐discovery rate misc k‐nearest neighbour misc Imputation misc Missing values misc Metabolomics |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
j g jg v a va á l ál c b cb |
hierarchy_parent_title |
Electrophoresis |
hierarchy_parent_id |
130409952 |
dewey-tens |
540 - Chemistry 570 - Life sciences; biology |
hierarchy_top_title |
Electrophoresis |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)130409952 (DE-600)619001-7 (DE-576)015913732 |
title |
Missing value imputation strategies for metabolomics data |
ctrlnum |
(DE-627)OLC1958967114 (DE-599)GBVOLC1958967114 (PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863 (KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat |
title_full |
Missing value imputation strategies for metabolomics data |
author_sort |
Armitage, Emily Grace |
journal |
Electrophoresis |
journalStr |
Electrophoresis |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
500 - Science |
recordtype |
marc |
publishDateSort |
2015 |
contenttype_str_mv |
txt |
container_start_page |
3050 |
author_browse |
Armitage, Emily Grace |
container_volume |
36 |
class |
540 570 DNB 570 AVZ BIODIV fid 35.29 bkl |
format_se |
Aufsätze |
author-letter |
Armitage, Emily Grace |
doi_str_mv |
10.1002/elps.201500352 |
dewey-full |
540 570 |
title_sort |
missing value imputation strategies for metabolomics data |
title_auth |
Missing value imputation strategies for metabolomics data |
abstract |
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. |
abstractGer |
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. |
abstract_unstemmed |
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC FID-BIODIV SSG-OLC-TEC SSG-OLC-CHE SSG-OLC-PHA SSG-OLC-DE-84 GBV_ILN_70 GBV_ILN_267 GBV_ILN_2018 GBV_ILN_2219 GBV_ILN_4012 |
container_issue |
24 |
title_short |
Missing value imputation strategies for metabolomics data |
url |
http://dx.doi.org/10.1002/elps.201500352 http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract http://www.ncbi.nlm.nih.gov/pubmed/26376450 |
remote_bool |
false |
author2 |
Godzien, Joanna Alonso‐Herranz, Vanesa López‐Gonzálvez, Ángeles Barbas, Coral |
author2Str |
Godzien, Joanna Alonso‐Herranz, Vanesa López‐Gonzálvez, Ángeles Barbas, Coral |
ppnlink |
130409952 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth oth oth |
doi_str |
10.1002/elps.201500352 |
up_date |
2024-07-03T15:15:53.794Z |
_version_ |
1803571433356394496 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1958967114</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230519020931.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1002/elps.201500352</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1958967114</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1958967114</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)p927-b5b5a994d8613c28550d7e445dce004084555a05a4eb5508265a49738119ef863</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0204026320150000036002403050missingvalueimputationstrategiesformetabolomicsdat</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">540</subfield><subfield code="a">570</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.29</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Armitage, Emily Grace</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Missing value imputation strategies for metabolomics data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © 2015 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">CE‐MS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">False‐discovery rate</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">k‐nearest neighbour</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Imputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Missing values</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Metabolomics</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Godzien, Joanna</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Alonso‐Herranz, Vanesa</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">López‐Gonzálvez, Ángeles</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Barbas, Coral</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Electrophoresis</subfield><subfield code="d">Weinheim : Wiley-VCH, 1980</subfield><subfield code="g">36(2015), 24, Seite 3050-3060</subfield><subfield code="w">(DE-627)130409952</subfield><subfield code="w">(DE-600)619001-7</subfield><subfield code="w">(DE-576)015913732</subfield><subfield code="x">0173-0835</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:36</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:24</subfield><subfield code="g">pages:3050-3060</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1002/elps.201500352</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://onlinelibrary.wiley.com/doi/10.1002/elps.201500352/abstract</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/26376450</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-CHE</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_267</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2219</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.29</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">36</subfield><subfield code="j">2015</subfield><subfield code="e">24</subfield><subfield code="h">3050-3060</subfield></datafield></record></collection>
|
score |
7.398837 |