Artificial and natural duplicates in pyrosequencing reads of metagenomic data
<p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the du...
Ausführliche Beschreibung
Autor*in: |
Li Weizhong [verfasserIn] Sun Shulei [verfasserIn] Fu Limin [verfasserIn] Niu Beifang [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2010 |
---|
Übergeordnetes Werk: |
In: BMC Bioinformatics - BMC, 2003, 11(2010), 1, p 187 |
---|---|
Übergeordnetes Werk: |
volume:11 ; year:2010 ; number:1, p 187 |
Links: |
---|
DOI / URN: |
10.1186/1471-2105-11-187 |
---|
Katalog-ID: |
DOAJ037627864 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ037627864 | ||
003 | DE-627 | ||
005 | 20230308011953.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230227s2010 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/1471-2105-11-187 |2 doi | |
035 | |a (DE-627)DOAJ037627864 | ||
035 | |a (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a R858-859.7 | |
050 | 0 | |a QH301-705.5 | |
100 | 0 | |a Li Weizhong |e verfasserin |4 aut | |
245 | 1 | 0 | |a Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
264 | 1 | |c 2010 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< | ||
653 | 0 | |a Computer applications to medicine. Medical informatics | |
653 | 0 | |a Biology (General) | |
700 | 0 | |a Sun Shulei |e verfasserin |4 aut | |
700 | 0 | |a Fu Limin |e verfasserin |4 aut | |
700 | 0 | |a Niu Beifang |e verfasserin |4 aut | |
773 | 0 | 8 | |i In |t BMC Bioinformatics |d BMC, 2003 |g 11(2010), 1, p 187 |w (DE-627)326644814 |w (DE-600)2041484-5 |x 14712105 |7 nnns |
773 | 1 | 8 | |g volume:11 |g year:2010 |g number:1, p 187 |
856 | 4 | 0 | |u https://doi.org/10.1186/1471-2105-11-187 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 |z kostenfrei |
856 | 4 | 0 | |u http://www.biomedcentral.com/1471-2105/11/187 |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/1471-2105 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
912 | |a GBV_ILN_11 | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_74 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_702 | ||
912 | |a GBV_ILN_2001 | ||
912 | |a GBV_ILN_2003 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2006 | ||
912 | |a GBV_ILN_2008 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2010 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2015 | ||
912 | |a GBV_ILN_2020 | ||
912 | |a GBV_ILN_2021 | ||
912 | |a GBV_ILN_2025 | ||
912 | |a GBV_ILN_2031 | ||
912 | |a GBV_ILN_2038 | ||
912 | |a GBV_ILN_2044 | ||
912 | |a GBV_ILN_2048 | ||
912 | |a GBV_ILN_2050 | ||
912 | |a GBV_ILN_2055 | ||
912 | |a GBV_ILN_2056 | ||
912 | |a GBV_ILN_2057 | ||
912 | |a GBV_ILN_2061 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_2113 | ||
912 | |a GBV_ILN_2190 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 11 |j 2010 |e 1, p 187 |
author_variant |
l w lw s s ss f l fl n b nb |
---|---|
matchkey_str |
article:14712105:2010----::riiilnntrlulctsnyoeuniged |
hierarchy_sort_str |
2010 |
callnumber-subject-code |
R |
publishDate |
2010 |
allfields |
10.1186/1471-2105-11-187 doi (DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 DE-627 ger DE-627 rakwb eng R858-859.7 QH301-705.5 Li Weizhong verfasserin aut Artificial and natural duplicates in pyrosequencing reads of metagenomic data 2010 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< Computer applications to medicine. Medical informatics Biology (General) Sun Shulei verfasserin aut Fu Limin verfasserin aut Niu Beifang verfasserin aut In BMC Bioinformatics BMC, 2003 11(2010), 1, p 187 (DE-627)326644814 (DE-600)2041484-5 14712105 nnns volume:11 year:2010 number:1, p 187 https://doi.org/10.1186/1471-2105-11-187 kostenfrei https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 kostenfrei http://www.biomedcentral.com/1471-2105/11/187 kostenfrei https://doaj.org/toc/1471-2105 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2010 1, p 187 |
spelling |
10.1186/1471-2105-11-187 doi (DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 DE-627 ger DE-627 rakwb eng R858-859.7 QH301-705.5 Li Weizhong verfasserin aut Artificial and natural duplicates in pyrosequencing reads of metagenomic data 2010 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< Computer applications to medicine. Medical informatics Biology (General) Sun Shulei verfasserin aut Fu Limin verfasserin aut Niu Beifang verfasserin aut In BMC Bioinformatics BMC, 2003 11(2010), 1, p 187 (DE-627)326644814 (DE-600)2041484-5 14712105 nnns volume:11 year:2010 number:1, p 187 https://doi.org/10.1186/1471-2105-11-187 kostenfrei https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 kostenfrei http://www.biomedcentral.com/1471-2105/11/187 kostenfrei https://doaj.org/toc/1471-2105 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2010 1, p 187 |
allfields_unstemmed |
10.1186/1471-2105-11-187 doi (DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 DE-627 ger DE-627 rakwb eng R858-859.7 QH301-705.5 Li Weizhong verfasserin aut Artificial and natural duplicates in pyrosequencing reads of metagenomic data 2010 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< Computer applications to medicine. Medical informatics Biology (General) Sun Shulei verfasserin aut Fu Limin verfasserin aut Niu Beifang verfasserin aut In BMC Bioinformatics BMC, 2003 11(2010), 1, p 187 (DE-627)326644814 (DE-600)2041484-5 14712105 nnns volume:11 year:2010 number:1, p 187 https://doi.org/10.1186/1471-2105-11-187 kostenfrei https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 kostenfrei http://www.biomedcentral.com/1471-2105/11/187 kostenfrei https://doaj.org/toc/1471-2105 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2010 1, p 187 |
allfieldsGer |
10.1186/1471-2105-11-187 doi (DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 DE-627 ger DE-627 rakwb eng R858-859.7 QH301-705.5 Li Weizhong verfasserin aut Artificial and natural duplicates in pyrosequencing reads of metagenomic data 2010 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< Computer applications to medicine. Medical informatics Biology (General) Sun Shulei verfasserin aut Fu Limin verfasserin aut Niu Beifang verfasserin aut In BMC Bioinformatics BMC, 2003 11(2010), 1, p 187 (DE-627)326644814 (DE-600)2041484-5 14712105 nnns volume:11 year:2010 number:1, p 187 https://doi.org/10.1186/1471-2105-11-187 kostenfrei https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 kostenfrei http://www.biomedcentral.com/1471-2105/11/187 kostenfrei https://doaj.org/toc/1471-2105 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2010 1, p 187 |
allfieldsSound |
10.1186/1471-2105-11-187 doi (DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 DE-627 ger DE-627 rakwb eng R858-859.7 QH301-705.5 Li Weizhong verfasserin aut Artificial and natural duplicates in pyrosequencing reads of metagenomic data 2010 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier <p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< Computer applications to medicine. Medical informatics Biology (General) Sun Shulei verfasserin aut Fu Limin verfasserin aut Niu Beifang verfasserin aut In BMC Bioinformatics BMC, 2003 11(2010), 1, p 187 (DE-627)326644814 (DE-600)2041484-5 14712105 nnns volume:11 year:2010 number:1, p 187 https://doi.org/10.1186/1471-2105-11-187 kostenfrei https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 kostenfrei http://www.biomedcentral.com/1471-2105/11/187 kostenfrei https://doaj.org/toc/1471-2105 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2010 1, p 187 |
language |
English |
source |
In BMC Bioinformatics 11(2010), 1, p 187 volume:11 year:2010 number:1, p 187 |
sourceStr |
In BMC Bioinformatics 11(2010), 1, p 187 volume:11 year:2010 number:1, p 187 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Computer applications to medicine. Medical informatics Biology (General) |
isfreeaccess_bool |
true |
container_title |
BMC Bioinformatics |
authorswithroles_txt_mv |
Li Weizhong @@aut@@ Sun Shulei @@aut@@ Fu Limin @@aut@@ Niu Beifang @@aut@@ |
publishDateDaySort_date |
2010-01-01T00:00:00Z |
hierarchy_top_id |
326644814 |
id |
DOAJ037627864 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ037627864</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230308011953.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230227s2010 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/1471-2105-11-187</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ037627864</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ4521b6f16c114f38b537925139b18aa3</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">R858-859.7</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QH301-705.5</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Li Weizhong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Artificial and natural duplicates in pyrosequencing reads of metagenomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2010</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a"><p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p<</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer applications to medicine. Medical informatics</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Biology (General)</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Sun Shulei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Fu Limin</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Niu Beifang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">BMC Bioinformatics</subfield><subfield code="d">BMC, 2003</subfield><subfield code="g">11(2010), 1, p 187</subfield><subfield code="w">(DE-627)326644814</subfield><subfield code="w">(DE-600)2041484-5</subfield><subfield code="x">14712105</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:11</subfield><subfield code="g">year:2010</subfield><subfield code="g">number:1, p 187</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1186/1471-2105-11-187</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/4521b6f16c114f38b537925139b18aa3</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.biomedcentral.com/1471-2105/11/187</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1471-2105</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_702</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2001</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2008</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2015</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2025</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2031</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2038</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2044</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2048</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2056</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2057</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2061</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2113</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">11</subfield><subfield code="j">2010</subfield><subfield code="e">1, p 187</subfield></datafield></record></collection>
|
callnumber-first |
R - Medicine |
author |
Li Weizhong |
spellingShingle |
Li Weizhong misc R858-859.7 misc QH301-705.5 misc Computer applications to medicine. Medical informatics misc Biology (General) Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
authorStr |
Li Weizhong |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)326644814 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
R858-859 |
illustrated |
Not Illustrated |
issn |
14712105 |
topic_title |
R858-859.7 QH301-705.5 Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
topic |
misc R858-859.7 misc QH301-705.5 misc Computer applications to medicine. Medical informatics misc Biology (General) |
topic_unstemmed |
misc R858-859.7 misc QH301-705.5 misc Computer applications to medicine. Medical informatics misc Biology (General) |
topic_browse |
misc R858-859.7 misc QH301-705.5 misc Computer applications to medicine. Medical informatics misc Biology (General) |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
BMC Bioinformatics |
hierarchy_parent_id |
326644814 |
hierarchy_top_title |
BMC Bioinformatics |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)326644814 (DE-600)2041484-5 |
title |
Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
ctrlnum |
(DE-627)DOAJ037627864 (DE-599)DOAJ4521b6f16c114f38b537925139b18aa3 |
title_full |
Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
author_sort |
Li Weizhong |
journal |
BMC Bioinformatics |
journalStr |
BMC Bioinformatics |
callnumber-first-code |
R |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2010 |
contenttype_str_mv |
txt |
author_browse |
Li Weizhong Sun Shulei Fu Limin Niu Beifang |
container_volume |
11 |
class |
R858-859.7 QH301-705.5 |
format_se |
Elektronische Aufsätze |
author-letter |
Li Weizhong |
doi_str_mv |
10.1186/1471-2105-11-187 |
author2-role |
verfasserin |
title_sort |
artificial and natural duplicates in pyrosequencing reads of metagenomic data |
callnumber |
R858-859.7 |
title_auth |
Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
abstract |
<p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< |
abstractGer |
<p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< |
abstract_unstemmed |
<p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p< |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_702 GBV_ILN_2001 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2006 GBV_ILN_2008 GBV_ILN_2009 GBV_ILN_2010 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2015 GBV_ILN_2020 GBV_ILN_2021 GBV_ILN_2025 GBV_ILN_2031 GBV_ILN_2038 GBV_ILN_2044 GBV_ILN_2048 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2056 GBV_ILN_2057 GBV_ILN_2061 GBV_ILN_2111 GBV_ILN_2113 GBV_ILN_2190 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
1, p 187 |
title_short |
Artificial and natural duplicates in pyrosequencing reads of metagenomic data |
url |
https://doi.org/10.1186/1471-2105-11-187 https://doaj.org/article/4521b6f16c114f38b537925139b18aa3 http://www.biomedcentral.com/1471-2105/11/187 https://doaj.org/toc/1471-2105 |
remote_bool |
true |
author2 |
Sun Shulei Fu Limin Niu Beifang |
author2Str |
Sun Shulei Fu Limin Niu Beifang |
ppnlink |
326644814 |
callnumber-subject |
R - General Medicine |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.1186/1471-2105-11-187 |
callnumber-a |
R858-859.7 |
up_date |
2024-07-04T01:55:30.336Z |
_version_ |
1803611674077298688 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ037627864</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230308011953.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230227s2010 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/1471-2105-11-187</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ037627864</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ4521b6f16c114f38b537925139b18aa3</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">R858-859.7</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QH301-705.5</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Li Weizhong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Artificial and natural duplicates in pyrosequencing reads of metagenomic data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2010</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a"><p<Abstract</p< <p<Background</p< <p<Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates.</p< <p<Results</p< <p<We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates.</p< <p<Conclusions</p< <p<Our method is available from <url<http://cd-hit.org</url< as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p<</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer applications to medicine. Medical informatics</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Biology (General)</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Sun Shulei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Fu Limin</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Niu Beifang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">BMC Bioinformatics</subfield><subfield code="d">BMC, 2003</subfield><subfield code="g">11(2010), 1, p 187</subfield><subfield code="w">(DE-627)326644814</subfield><subfield code="w">(DE-600)2041484-5</subfield><subfield code="x">14712105</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:11</subfield><subfield code="g">year:2010</subfield><subfield code="g">number:1, p 187</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1186/1471-2105-11-187</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/4521b6f16c114f38b537925139b18aa3</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.biomedcentral.com/1471-2105/11/187</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1471-2105</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_702</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2001</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2006</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2008</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2015</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2020</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2021</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2025</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2031</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2038</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2044</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2048</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2056</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2057</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2061</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2113</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2190</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">11</subfield><subfield code="j">2010</subfield><subfield code="e">1, p 187</subfield></datafield></record></collection>
|
score |
7.3982067 |