Evaluation of a two-stage framework for prediction using big genomic data
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies...
Ausführliche Beschreibung
Autor*in: |
Jiang, Xia [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2015 |
---|
Rechteinformationen: |
Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Briefings in bioinformatics - Oxford : Oxford Univ. Press, 2000, 16(2015), 6, Seite 912-921 |
---|---|
Übergeordnetes Werk: |
volume:16 ; year:2015 ; number:6 ; pages:912-921 |
Links: |
---|
DOI / URN: |
10.1093/bib/bbv010 |
---|
Katalog-ID: |
OLC1960986619 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1960986619 | ||
003 | DE-627 | ||
005 | 20230512151419.0 | ||
007 | tu | ||
008 | 160206s2015 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1093/bib/bbv010 |2 doi | |
028 | 5 | 2 | |a PQ20160617 |
035 | |a (DE-627)OLC1960986619 | ||
035 | |a (DE-599)GBVOLC1960986619 | ||
035 | |a (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 | ||
035 | |a (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 570 |a 004 |q DNB |
100 | 1 | |a Jiang, Xia |e verfasserin |4 aut | |
245 | 1 | 0 | |a Evaluation of a two-stage framework for prediction using big genomic data |
264 | 1 | |c 2015 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. | ||
540 | |a Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. | ||
650 | 4 | |a Algorithms | |
650 | 4 | |a Genomes | |
650 | 4 | |a Bioinformatics | |
650 | 4 | |a Bayesian analysis | |
650 | 4 | |a Polymorphism | |
700 | 1 | |a Neapolitan, Richard E |4 oth | |
773 | 0 | 8 | |i Enthalten in |t Briefings in bioinformatics |d Oxford : Oxford Univ. Press, 2000 |g 16(2015), 6, Seite 912-921 |w (DE-627)341354120 |w (DE-600)2068142-2 |w (DE-576)098546627 |x 1467-5463 |7 nnns |
773 | 1 | 8 | |g volume:16 |g year:2015 |g number:6 |g pages:912-921 |
856 | 4 | 1 | |u http://dx.doi.org/10.1093/bib/bbv010 |3 Volltext |
856 | 4 | 2 | |u http://www.ncbi.nlm.nih.gov/pubmed/25788325 |
856 | 4 | 2 | |u http://search.proquest.com/docview/1753220350 |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-PHA | ||
912 | |a SSG-OLC-DE-84 | ||
951 | |a AR | ||
952 | |d 16 |j 2015 |e 6 |h 912-921 |
author_variant |
x j xj |
---|---|
matchkey_str |
article:14675463:2015----::vlainftotgfaeokopeitou |
hierarchy_sort_str |
2015 |
publishDate |
2015 |
allfields |
10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921 |
spelling |
10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921 |
allfields_unstemmed |
10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921 |
allfieldsGer |
10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921 |
allfieldsSound |
10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921 |
language |
English |
source |
Enthalten in Briefings in bioinformatics 16(2015), 6, Seite 912-921 volume:16 year:2015 number:6 pages:912-921 |
sourceStr |
Enthalten in Briefings in bioinformatics 16(2015), 6, Seite 912-921 volume:16 year:2015 number:6 pages:912-921 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism |
dewey-raw |
570 |
isfreeaccess_bool |
false |
container_title |
Briefings in bioinformatics |
authorswithroles_txt_mv |
Jiang, Xia @@aut@@ Neapolitan, Richard E @@oth@@ |
publishDateDaySort_date |
2015-01-01T00:00:00Z |
hierarchy_top_id |
341354120 |
dewey-sort |
3570 |
id |
OLC1960986619 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1960986619</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230512151419.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1093/bib/bbv010</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiang, Xia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Evaluation of a two-stage framework for prediction using big genomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Genomes</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bioinformatics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bayesian analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Polymorphism</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Neapolitan, Richard E</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Briefings in bioinformatics</subfield><subfield code="d">Oxford : Oxford Univ. Press, 2000</subfield><subfield code="g">16(2015), 6, Seite 912-921</subfield><subfield code="w">(DE-627)341354120</subfield><subfield code="w">(DE-600)2068142-2</subfield><subfield code="w">(DE-576)098546627</subfield><subfield code="x">1467-5463</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:6</subfield><subfield code="g">pages:912-921</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1093/bib/bbv010</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/25788325</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1753220350</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2015</subfield><subfield code="e">6</subfield><subfield code="h">912-921</subfield></datafield></record></collection>
|
author |
Jiang, Xia |
spellingShingle |
Jiang, Xia ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism Evaluation of a two-stage framework for prediction using big genomic data |
authorStr |
Jiang, Xia |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)341354120 |
format |
Article |
dewey-ones |
570 - Life sciences; biology 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1467-5463 |
topic_title |
570 004 DNB Evaluation of a two-stage framework for prediction using big genomic data Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism |
topic |
ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism |
topic_unstemmed |
ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism |
topic_browse |
ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
r e n re ren |
hierarchy_parent_title |
Briefings in bioinformatics |
hierarchy_parent_id |
341354120 |
dewey-tens |
570 - Life sciences; biology 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Briefings in bioinformatics |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 |
title |
Evaluation of a two-stage framework for prediction using big genomic data |
ctrlnum |
(DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi |
title_full |
Evaluation of a two-stage framework for prediction using big genomic data |
author_sort |
Jiang, Xia |
journal |
Briefings in bioinformatics |
journalStr |
Briefings in bioinformatics |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
500 - Science 000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2015 |
contenttype_str_mv |
txt |
container_start_page |
912 |
author_browse |
Jiang, Xia |
container_volume |
16 |
class |
570 004 DNB |
format_se |
Aufsätze |
author-letter |
Jiang, Xia |
doi_str_mv |
10.1093/bib/bbv010 |
dewey-full |
570 004 |
title_sort |
evaluation of a two-stage framework for prediction using big genomic data |
title_auth |
Evaluation of a two-stage framework for prediction using big genomic data |
abstract |
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. |
abstractGer |
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. |
abstract_unstemmed |
We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 |
container_issue |
6 |
title_short |
Evaluation of a two-stage framework for prediction using big genomic data |
url |
http://dx.doi.org/10.1093/bib/bbv010 http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 |
remote_bool |
false |
author2 |
Neapolitan, Richard E |
author2Str |
Neapolitan, Richard E |
ppnlink |
341354120 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth |
doi_str |
10.1093/bib/bbv010 |
up_date |
2024-07-03T23:27:01.496Z |
_version_ |
1803602332482535424 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1960986619</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230512151419.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1093/bib/bbv010</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiang, Xia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Evaluation of a two-stage framework for prediction using big genomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Genomes</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bioinformatics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bayesian analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Polymorphism</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Neapolitan, Richard E</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Briefings in bioinformatics</subfield><subfield code="d">Oxford : Oxford Univ. Press, 2000</subfield><subfield code="g">16(2015), 6, Seite 912-921</subfield><subfield code="w">(DE-627)341354120</subfield><subfield code="w">(DE-600)2068142-2</subfield><subfield code="w">(DE-576)098546627</subfield><subfield code="x">1467-5463</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:6</subfield><subfield code="g">pages:912-921</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1093/bib/bbv010</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/25788325</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1753220350</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2015</subfield><subfield code="e">6</subfield><subfield code="h">912-921</subfield></datafield></record></collection>
|
score |
7.398793 |