Evaluation of a two-stage framework for prediction using big genomic data

We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Jiang, Xia [verfasserIn] Neapolitan, Richard E

Format:	Artikel
Sprache:	Englisch

Erschienen:	2015

Rechteinformationen:	Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Schlagwörter:	Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism

Übergeordnetes Werk:	Enthalten in: Briefings in bioinformatics - Oxford : Oxford Univ. Press, 2000, 16(2015), 6, Seite 912-921
Übergeordnetes Werk:	volume:16 ; year:2015 ; number:6 ; pages:912-921

Links:	Volltext Link aufrufen Link aufrufen

DOI / URN:	10.1093/bib/bbv010

Katalog-ID:	OLC1960986619

Internformat


LEADER	01000caa a2200265 4500
001	OLC1960986619
003	DE-627
005	20230512151419.0
007	tu
008	160206s2015 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1093/bib/bbv010 \|2 doi
028	5	2	\|a PQ20160617
035			\|a (DE-627)OLC1960986619
035			\|a (DE-599)GBVOLC1960986619
035			\|a (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90
035			\|a (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 570 \|a 004 \|q DNB
100	1		\|a Jiang, Xia \|e verfasserin \|4 aut
245	1	0	\|a Evaluation of a two-stage framework for prediction using big genomic data
264		1	\|c 2015
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
520			\|a We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
540			\|a Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
650		4	\|a Algorithms
650		4	\|a Genomes
650		4	\|a Bioinformatics
650		4	\|a Bayesian analysis
650		4	\|a Polymorphism
700	1		\|a Neapolitan, Richard E \|4 oth
773	0	8	\|i Enthalten in \|t Briefings in bioinformatics \|d Oxford : Oxford Univ. Press, 2000 \|g 16(2015), 6, Seite 912-921 \|w (DE-627)341354120 \|w (DE-600)2068142-2 \|w (DE-576)098546627 \|x 1467-5463 \|7 nnns
773	1	8	\|g volume:16 \|g year:2015 \|g number:6 \|g pages:912-921
856	4	1	\|u http://dx.doi.org/10.1093/bib/bbv010 \|3 Volltext
856	4	2	\|u http://www.ncbi.nlm.nih.gov/pubmed/25788325
856	4	2	\|u http://search.proquest.com/docview/1753220350
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-PHA
912			\|a SSG-OLC-DE-84
951			\|a AR
952			\|d 16 \|j 2015 \|e 6 \|h 912-921

Indexfelder

author_variant	x j xj
matchkey_str	article:14675463:2015----::vlainftotgfaeokopeitou
hierarchy_sort_str	2015
publishDate	2015
allfields	10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921
spelling	10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921
allfields_unstemmed	10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921
allfieldsGer	10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921
allfieldsSound	10.1093/bib/bbv010 doi PQ20160617 (DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi DE-627 ger DE-627 rakwb eng 570 004 DNB Jiang, Xia verfasserin aut Evaluation of a two-stage framework for prediction using big genomic data 2015 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods. Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com. Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism Neapolitan, Richard E oth Enthalten in Briefings in bioinformatics Oxford : Oxford Univ. Press, 2000 16(2015), 6, Seite 912-921 (DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627 1467-5463 nnns volume:16 year:2015 number:6 pages:912-921 http://dx.doi.org/10.1093/bib/bbv010 Volltext http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84 AR 16 2015 6 912-921
language	English
source	Enthalten in Briefings in bioinformatics 16(2015), 6, Seite 912-921 volume:16 year:2015 number:6 pages:912-921
sourceStr	Enthalten in Briefings in bioinformatics 16(2015), 6, Seite 912-921 volume:16 year:2015 number:6 pages:912-921
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism
dewey-raw	570
isfreeaccess_bool	false
container_title	Briefings in bioinformatics
authorswithroles_txt_mv	Jiang, Xia @@aut@@ Neapolitan, Richard E @@oth@@
publishDateDaySort_date	2015-01-01T00:00:00Z
hierarchy_top_id	341354120
dewey-sort	3570
id	OLC1960986619
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1960986619</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230512151419.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1093/bib/bbv010</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiang, Xia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Evaluation of a two-stage framework for prediction using big genomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Genomes</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bioinformatics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bayesian analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Polymorphism</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Neapolitan, Richard E</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Briefings in bioinformatics</subfield><subfield code="d">Oxford : Oxford Univ. Press, 2000</subfield><subfield code="g">16(2015), 6, Seite 912-921</subfield><subfield code="w">(DE-627)341354120</subfield><subfield code="w">(DE-600)2068142-2</subfield><subfield code="w">(DE-576)098546627</subfield><subfield code="x">1467-5463</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:6</subfield><subfield code="g">pages:912-921</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1093/bib/bbv010</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/25788325</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1753220350</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2015</subfield><subfield code="e">6</subfield><subfield code="h">912-921</subfield></datafield></record></collection>
author	Jiang, Xia
spellingShingle	Jiang, Xia ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism Evaluation of a two-stage framework for prediction using big genomic data
authorStr	Jiang, Xia
ppnlink_with_tag_str_mv	@@773@@(DE-627)341354120
format	Article
dewey-ones	570 - Life sciences; biology 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	1467-5463
topic_title	570 004 DNB Evaluation of a two-stage framework for prediction using big genomic data Algorithms Genomes Bioinformatics Bayesian analysis Polymorphism
topic	ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism
topic_unstemmed	ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism
topic_browse	ddc 570 misc Algorithms misc Genomes misc Bioinformatics misc Bayesian analysis misc Polymorphism
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
author2_variant	r e n re ren
hierarchy_parent_title	Briefings in bioinformatics
hierarchy_parent_id	341354120
dewey-tens	570 - Life sciences; biology 000 - Computer science, knowledge & systems
hierarchy_top_title	Briefings in bioinformatics
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)341354120 (DE-600)2068142-2 (DE-576)098546627
title	Evaluation of a two-stage framework for prediction using big genomic data
ctrlnum	(DE-627)OLC1960986619 (DE-599)GBVOLC1960986619 (PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90 (KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi
title_full	Evaluation of a two-stage framework for prediction using big genomic data
author_sort	Jiang, Xia
journal	Briefings in bioinformatics
journalStr	Briefings in bioinformatics
lang_code	eng
isOA_bool	false
dewey-hundreds	500 - Science 000 - Computer science, information & general works
recordtype	marc
publishDateSort	2015
contenttype_str_mv	txt
container_start_page	912
author_browse	Jiang, Xia
container_volume	16
class	570 004 DNB
format_se	Aufsätze
author-letter	Jiang, Xia
doi_str_mv	10.1093/bib/bbv010
dewey-full	570 004
title_sort	evaluation of a two-stage framework for prediction using big genomic data
title_auth	Evaluation of a two-stage framework for prediction using big genomic data
abstract	We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
abstractGer	We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
abstract_unstemmed	We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-PHA SSG-OLC-DE-84
container_issue	6
title_short	Evaluation of a two-stage framework for prediction using big genomic data
url	http://dx.doi.org/10.1093/bib/bbv010 http://www.ncbi.nlm.nih.gov/pubmed/25788325 http://search.proquest.com/docview/1753220350
remote_bool	false
author2	Neapolitan, Richard E
author2Str	Neapolitan, Richard E
ppnlink	341354120
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
author2_role	oth
doi_str	10.1093/bib/bbv010
up_date	2024-07-03T23:27:01.496Z
_version_	1803602332482535424
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1960986619</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230512151419.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160206s2015 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1093/bib/bbv010</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160617</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1960986619</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)c1222-9de4f23792dcbf065f70ca547becc633cb8b4eb9ee57a93a4d64a44ad1a435d90</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0410330020150000016000600912evaluationofatwostageframeworkforpredictionusingbi</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Jiang, Xia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Evaluation of a two-stage framework for prediction using big genomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2015</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.</subfield></datafield><datafield tag="540" ind1=" " ind2=" "><subfield code="a">Nutzungsrecht: © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Genomes</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bioinformatics</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Bayesian analysis</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Polymorphism</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Neapolitan, Richard E</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Briefings in bioinformatics</subfield><subfield code="d">Oxford : Oxford Univ. Press, 2000</subfield><subfield code="g">16(2015), 6, Seite 912-921</subfield><subfield code="w">(DE-627)341354120</subfield><subfield code="w">(DE-600)2068142-2</subfield><subfield code="w">(DE-576)098546627</subfield><subfield code="x">1467-5463</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2015</subfield><subfield code="g">number:6</subfield><subfield code="g">pages:912-921</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1093/bib/bbv010</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://www.ncbi.nlm.nih.gov/pubmed/25788325</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://search.proquest.com/docview/1753220350</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-DE-84</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2015</subfield><subfield code="e">6</subfield><subfield code="h">912-921</subfield></datafield></record></collection>
score	7.398793

Nicht das Richtige dabei?

Schreiben Sie uns!

Evaluation of a two-stage framework for prediction using big genomic data

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?