Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been prop...
Ausführliche Beschreibung
Autor*in: |
Manduchi, Elisabetta [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2019 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s). 2019 |
---|
Übergeordnetes Werk: |
Enthalten in: BioData Mining - London : BioMed Central, 2008, 12(2019), 1 vom: 09. Juli |
---|---|
Übergeordnetes Werk: |
volume:12 ; year:2019 ; number:1 ; day:09 ; month:07 |
Links: |
---|
DOI / URN: |
10.1186/s13040-019-0201-4 |
---|
Katalog-ID: |
SPR029588634 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | SPR029588634 | ||
003 | DE-627 | ||
005 | 20230519154636.0 | ||
007 | cr uuu---uuuuu | ||
008 | 201007s2019 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s13040-019-0201-4 |2 doi | |
035 | |a (DE-627)SPR029588634 | ||
035 | |a (SPR)s13040-019-0201-4-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Manduchi, Elisabetta |e verfasserin |0 (orcid)0000-0002-4110-3714 |4 aut | |
245 | 1 | 0 | |a Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
264 | 1 | |c 2019 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a © The Author(s). 2019 | ||
520 | |a Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. | ||
650 | 4 | |a GWAS |7 (dpeaa)DE-He213 | |
650 | 4 | |a Association analysis |7 (dpeaa)DE-He213 | |
650 | 4 | |a Univariate analysis |7 (dpeaa)DE-He213 | |
650 | 4 | |a Ranked list |7 (dpeaa)DE-He213 | |
650 | 4 | |a Canberra metric |7 (dpeaa)DE-He213 | |
700 | 1 | |a Orzechowski, Patryk R. |0 (orcid)0000-0003-3578-9809 |4 aut | |
700 | 1 | |a Ritchie, Marylyn D. |0 (orcid)0000-0002-1208-1720 |4 aut | |
700 | 1 | |a Moore, Jason H. |0 (orcid)0000-0002-5015-1099 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t BioData Mining |d London : BioMed Central, 2008 |g 12(2019), 1 vom: 09. Juli |w (DE-627)572421893 |w (DE-600)2438773-3 |x 1756-0381 |7 nnns |
773 | 1 | 8 | |g volume:12 |g year:2019 |g number:1 |g day:09 |g month:07 |
856 | 4 | 0 | |u https://dx.doi.org/10.1186/s13040-019-0201-4 |z kostenfrei |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_SPRINGER | ||
912 | |a SSG-OLC-PHA | ||
912 | |a GBV_ILN_11 | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_74 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2003 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2055 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 12 |j 2019 |e 1 |b 09 |c 07 |
author_variant |
e m em p r o pr pro m d r md mdr j h m jh jhm |
---|---|
matchkey_str |
article:17560381:2019----::xlrtooaiestocmuainlnsaitclesrsfsoitof |
hierarchy_sort_str |
2019 |
publishDate |
2019 |
allfields |
10.1186/s13040-019-0201-4 doi (DE-627)SPR029588634 (SPR)s13040-019-0201-4-e DE-627 ger DE-627 rakwb eng Manduchi, Elisabetta verfasserin (orcid)0000-0002-4110-3714 aut Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s). 2019 Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 Orzechowski, Patryk R. (orcid)0000-0003-3578-9809 aut Ritchie, Marylyn D. (orcid)0000-0002-1208-1720 aut Moore, Jason H. (orcid)0000-0002-5015-1099 aut Enthalten in BioData Mining London : BioMed Central, 2008 12(2019), 1 vom: 09. Juli (DE-627)572421893 (DE-600)2438773-3 1756-0381 nnns volume:12 year:2019 number:1 day:09 month:07 https://dx.doi.org/10.1186/s13040-019-0201-4 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 12 2019 1 09 07 |
spelling |
10.1186/s13040-019-0201-4 doi (DE-627)SPR029588634 (SPR)s13040-019-0201-4-e DE-627 ger DE-627 rakwb eng Manduchi, Elisabetta verfasserin (orcid)0000-0002-4110-3714 aut Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s). 2019 Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 Orzechowski, Patryk R. (orcid)0000-0003-3578-9809 aut Ritchie, Marylyn D. (orcid)0000-0002-1208-1720 aut Moore, Jason H. (orcid)0000-0002-5015-1099 aut Enthalten in BioData Mining London : BioMed Central, 2008 12(2019), 1 vom: 09. Juli (DE-627)572421893 (DE-600)2438773-3 1756-0381 nnns volume:12 year:2019 number:1 day:09 month:07 https://dx.doi.org/10.1186/s13040-019-0201-4 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 12 2019 1 09 07 |
allfields_unstemmed |
10.1186/s13040-019-0201-4 doi (DE-627)SPR029588634 (SPR)s13040-019-0201-4-e DE-627 ger DE-627 rakwb eng Manduchi, Elisabetta verfasserin (orcid)0000-0002-4110-3714 aut Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s). 2019 Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 Orzechowski, Patryk R. (orcid)0000-0003-3578-9809 aut Ritchie, Marylyn D. (orcid)0000-0002-1208-1720 aut Moore, Jason H. (orcid)0000-0002-5015-1099 aut Enthalten in BioData Mining London : BioMed Central, 2008 12(2019), 1 vom: 09. Juli (DE-627)572421893 (DE-600)2438773-3 1756-0381 nnns volume:12 year:2019 number:1 day:09 month:07 https://dx.doi.org/10.1186/s13040-019-0201-4 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 12 2019 1 09 07 |
allfieldsGer |
10.1186/s13040-019-0201-4 doi (DE-627)SPR029588634 (SPR)s13040-019-0201-4-e DE-627 ger DE-627 rakwb eng Manduchi, Elisabetta verfasserin (orcid)0000-0002-4110-3714 aut Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s). 2019 Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 Orzechowski, Patryk R. (orcid)0000-0003-3578-9809 aut Ritchie, Marylyn D. (orcid)0000-0002-1208-1720 aut Moore, Jason H. (orcid)0000-0002-5015-1099 aut Enthalten in BioData Mining London : BioMed Central, 2008 12(2019), 1 vom: 09. Juli (DE-627)572421893 (DE-600)2438773-3 1756-0381 nnns volume:12 year:2019 number:1 day:09 month:07 https://dx.doi.org/10.1186/s13040-019-0201-4 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 12 2019 1 09 07 |
allfieldsSound |
10.1186/s13040-019-0201-4 doi (DE-627)SPR029588634 (SPR)s13040-019-0201-4-e DE-627 ger DE-627 rakwb eng Manduchi, Elisabetta verfasserin (orcid)0000-0002-4110-3714 aut Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies 2019 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s). 2019 Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 Orzechowski, Patryk R. (orcid)0000-0003-3578-9809 aut Ritchie, Marylyn D. (orcid)0000-0002-1208-1720 aut Moore, Jason H. (orcid)0000-0002-5015-1099 aut Enthalten in BioData Mining London : BioMed Central, 2008 12(2019), 1 vom: 09. Juli (DE-627)572421893 (DE-600)2438773-3 1756-0381 nnns volume:12 year:2019 number:1 day:09 month:07 https://dx.doi.org/10.1186/s13040-019-0201-4 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 12 2019 1 09 07 |
language |
English |
source |
Enthalten in BioData Mining 12(2019), 1 vom: 09. Juli volume:12 year:2019 number:1 day:09 month:07 |
sourceStr |
Enthalten in BioData Mining 12(2019), 1 vom: 09. Juli volume:12 year:2019 number:1 day:09 month:07 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
GWAS Association analysis Univariate analysis Ranked list Canberra metric |
isfreeaccess_bool |
true |
container_title |
BioData Mining |
authorswithroles_txt_mv |
Manduchi, Elisabetta @@aut@@ Orzechowski, Patryk R. @@aut@@ Ritchie, Marylyn D. @@aut@@ Moore, Jason H. @@aut@@ |
publishDateDaySort_date |
2019-07-09T00:00:00Z |
hierarchy_top_id |
572421893 |
id |
SPR029588634 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">SPR029588634</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230519154636.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">201007s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s13040-019-0201-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR029588634</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s13040-019-0201-4-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Manduchi, Elisabetta</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-4110-3714</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s). 2019</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GWAS</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Association analysis</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Univariate analysis</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Ranked list</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Canberra metric</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Orzechowski, Patryk R.</subfield><subfield code="0">(orcid)0000-0003-3578-9809</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ritchie, Marylyn D.</subfield><subfield code="0">(orcid)0000-0002-1208-1720</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moore, Jason H.</subfield><subfield code="0">(orcid)0000-0002-5015-1099</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">BioData Mining</subfield><subfield code="d">London : BioMed Central, 2008</subfield><subfield code="g">12(2019), 1 vom: 09. Juli</subfield><subfield code="w">(DE-627)572421893</subfield><subfield code="w">(DE-600)2438773-3</subfield><subfield code="x">1756-0381</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:12</subfield><subfield code="g">year:2019</subfield><subfield code="g">number:1</subfield><subfield code="g">day:09</subfield><subfield code="g">month:07</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s13040-019-0201-4</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">12</subfield><subfield code="j">2019</subfield><subfield code="e">1</subfield><subfield code="b">09</subfield><subfield code="c">07</subfield></datafield></record></collection>
|
author |
Manduchi, Elisabetta |
spellingShingle |
Manduchi, Elisabetta misc GWAS misc Association analysis misc Univariate analysis misc Ranked list misc Canberra metric Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
authorStr |
Manduchi, Elisabetta |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)572421893 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
springer |
remote_str |
true |
illustrated |
Not Illustrated |
issn |
1756-0381 |
topic_title |
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies GWAS (dpeaa)DE-He213 Association analysis (dpeaa)DE-He213 Univariate analysis (dpeaa)DE-He213 Ranked list (dpeaa)DE-He213 Canberra metric (dpeaa)DE-He213 |
topic |
misc GWAS misc Association analysis misc Univariate analysis misc Ranked list misc Canberra metric |
topic_unstemmed |
misc GWAS misc Association analysis misc Univariate analysis misc Ranked list misc Canberra metric |
topic_browse |
misc GWAS misc Association analysis misc Univariate analysis misc Ranked list misc Canberra metric |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
BioData Mining |
hierarchy_parent_id |
572421893 |
hierarchy_top_title |
BioData Mining |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)572421893 (DE-600)2438773-3 |
title |
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
ctrlnum |
(DE-627)SPR029588634 (SPR)s13040-019-0201-4-e |
title_full |
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
author_sort |
Manduchi, Elisabetta |
journal |
BioData Mining |
journalStr |
BioData Mining |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2019 |
contenttype_str_mv |
txt |
author_browse |
Manduchi, Elisabetta Orzechowski, Patryk R. Ritchie, Marylyn D. Moore, Jason H. |
container_volume |
12 |
format_se |
Elektronische Aufsätze |
author-letter |
Manduchi, Elisabetta |
doi_str_mv |
10.1186/s13040-019-0201-4 |
normlink |
(ORCID)0000-0002-4110-3714 (ORCID)0000-0003-3578-9809 (ORCID)0000-0002-1208-1720 (ORCID)0000-0002-5015-1099 |
normlink_prefix_str_mv |
(orcid)0000-0002-4110-3714 (orcid)0000-0003-3578-9809 (orcid)0000-0002-1208-1720 (orcid)0000-0002-5015-1099 |
title_sort |
exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
title_auth |
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
abstract |
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. © The Author(s). 2019 |
abstractGer |
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. © The Author(s). 2019 |
abstract_unstemmed |
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. © The Author(s). 2019 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
1 |
title_short |
Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies |
url |
https://dx.doi.org/10.1186/s13040-019-0201-4 |
remote_bool |
true |
author2 |
Orzechowski, Patryk R. Ritchie, Marylyn D. Moore, Jason H. |
author2Str |
Orzechowski, Patryk R. Ritchie, Marylyn D. Moore, Jason H. |
ppnlink |
572421893 |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.1186/s13040-019-0201-4 |
up_date |
2024-07-04T01:35:54.076Z |
_version_ |
1803610440679292928 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">SPR029588634</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230519154636.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">201007s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s13040-019-0201-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR029588634</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s13040-019-0201-4-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Manduchi, Elisabetta</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-4110-3714</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s). 2019</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">GWAS</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Association analysis</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Univariate analysis</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Ranked list</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Canberra metric</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Orzechowski, Patryk R.</subfield><subfield code="0">(orcid)0000-0003-3578-9809</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ritchie, Marylyn D.</subfield><subfield code="0">(orcid)0000-0002-1208-1720</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Moore, Jason H.</subfield><subfield code="0">(orcid)0000-0002-5015-1099</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">BioData Mining</subfield><subfield code="d">London : BioMed Central, 2008</subfield><subfield code="g">12(2019), 1 vom: 09. Juli</subfield><subfield code="w">(DE-627)572421893</subfield><subfield code="w">(DE-600)2438773-3</subfield><subfield code="x">1756-0381</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:12</subfield><subfield code="g">year:2019</subfield><subfield code="g">number:1</subfield><subfield code="g">day:09</subfield><subfield code="g">month:07</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s13040-019-0201-4</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">12</subfield><subfield code="j">2019</subfield><subfield code="e">1</subfield><subfield code="b">09</subfield><subfield code="c">07</subfield></datafield></record></collection>
|
score |
7.400483 |