T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In t...
Ausführliche Beschreibung
Autor*in: |
Yueming Hu [verfasserIn] Yejun Wang [verfasserIn] Xiaotian Hu [verfasserIn] Haoyu Chao [verfasserIn] Sida Li [verfasserIn] Qinyang Ni [verfasserIn] Yanyan Zhu [verfasserIn] Yixue Hu [verfasserIn] Ziyi Zhao [verfasserIn] Ming Chen [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2024 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
In: Computational and Structural Biotechnology Journal - Elsevier, 2013, 23(2024), Seite 801-812 |
---|---|
Übergeordnetes Werk: |
volume:23 ; year:2024 ; pages:801-812 |
Links: |
---|
DOI / URN: |
10.1016/j.csbj.2024.01.015 |
---|
Katalog-ID: |
DOAJ095908463 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ095908463 | ||
003 | DE-627 | ||
005 | 20240413125836.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240413s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.csbj.2024.01.015 |2 doi | |
035 | |a (DE-627)DOAJ095908463 | ||
035 | |a (DE-599)DOAJd38f51f29d5447beb54e80e50788113d | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a TP248.13-248.65 | |
100 | 0 | |a Yueming Hu |e verfasserin |4 aut | |
245 | 1 | 0 | |a T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. | ||
650 | 4 | |a T4SEpp | |
650 | 4 | |a T4SE Prediction | |
650 | 4 | |a T4SS | |
650 | 4 | |a Deep learning | |
650 | 4 | |a Protein language model | |
650 | 4 | |a Helicobacter pylori T4SEs | |
653 | 0 | |a Biotechnology | |
700 | 0 | |a Yejun Wang |e verfasserin |4 aut | |
700 | 0 | |a Xiaotian Hu |e verfasserin |4 aut | |
700 | 0 | |a Haoyu Chao |e verfasserin |4 aut | |
700 | 0 | |a Sida Li |e verfasserin |4 aut | |
700 | 0 | |a Qinyang Ni |e verfasserin |4 aut | |
700 | 0 | |a Yanyan Zhu |e verfasserin |4 aut | |
700 | 0 | |a Yixue Hu |e verfasserin |4 aut | |
700 | 0 | |a Ziyi Zhao |e verfasserin |4 aut | |
700 | 0 | |a Ming Chen |e verfasserin |4 aut | |
773 | 0 | 8 | |i In |t Computational and Structural Biotechnology Journal |d Elsevier, 2013 |g 23(2024), Seite 801-812 |w (DE-627)731890086 |w (DE-600)2694435-2 |x 20010370 |7 nnns |
773 | 1 | 8 | |g volume:23 |g year:2024 |g pages:801-812 |
856 | 4 | 0 | |u https://doi.org/10.1016/j.csbj.2024.01.015 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/d38f51f29d5447beb54e80e50788113d |z kostenfrei |
856 | 4 | 0 | |u http://www.sciencedirect.com/science/article/pii/S2001037024000151 |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/2001-0370 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_74 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 23 |j 2024 |h 801-812 |
author_variant |
y h yh y w yw x h xh h c hc s l sl q n qn y z yz y h yh z z zz m c mc |
---|---|
matchkey_str |
article:20010370:2024----::4eppplnitgaigrtilnugmdltpeitatrat |
hierarchy_sort_str |
2024 |
callnumber-subject-code |
TP |
publishDate |
2024 |
allfields |
10.1016/j.csbj.2024.01.015 doi (DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d DE-627 ger DE-627 rakwb eng TP248.13-248.65 Yueming Hu verfasserin aut T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology Yejun Wang verfasserin aut Xiaotian Hu verfasserin aut Haoyu Chao verfasserin aut Sida Li verfasserin aut Qinyang Ni verfasserin aut Yanyan Zhu verfasserin aut Yixue Hu verfasserin aut Ziyi Zhao verfasserin aut Ming Chen verfasserin aut In Computational and Structural Biotechnology Journal Elsevier, 2013 23(2024), Seite 801-812 (DE-627)731890086 (DE-600)2694435-2 20010370 nnns volume:23 year:2024 pages:801-812 https://doi.org/10.1016/j.csbj.2024.01.015 kostenfrei https://doaj.org/article/d38f51f29d5447beb54e80e50788113d kostenfrei http://www.sciencedirect.com/science/article/pii/S2001037024000151 kostenfrei https://doaj.org/toc/2001-0370 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 23 2024 801-812 |
spelling |
10.1016/j.csbj.2024.01.015 doi (DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d DE-627 ger DE-627 rakwb eng TP248.13-248.65 Yueming Hu verfasserin aut T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology Yejun Wang verfasserin aut Xiaotian Hu verfasserin aut Haoyu Chao verfasserin aut Sida Li verfasserin aut Qinyang Ni verfasserin aut Yanyan Zhu verfasserin aut Yixue Hu verfasserin aut Ziyi Zhao verfasserin aut Ming Chen verfasserin aut In Computational and Structural Biotechnology Journal Elsevier, 2013 23(2024), Seite 801-812 (DE-627)731890086 (DE-600)2694435-2 20010370 nnns volume:23 year:2024 pages:801-812 https://doi.org/10.1016/j.csbj.2024.01.015 kostenfrei https://doaj.org/article/d38f51f29d5447beb54e80e50788113d kostenfrei http://www.sciencedirect.com/science/article/pii/S2001037024000151 kostenfrei https://doaj.org/toc/2001-0370 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 23 2024 801-812 |
allfields_unstemmed |
10.1016/j.csbj.2024.01.015 doi (DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d DE-627 ger DE-627 rakwb eng TP248.13-248.65 Yueming Hu verfasserin aut T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology Yejun Wang verfasserin aut Xiaotian Hu verfasserin aut Haoyu Chao verfasserin aut Sida Li verfasserin aut Qinyang Ni verfasserin aut Yanyan Zhu verfasserin aut Yixue Hu verfasserin aut Ziyi Zhao verfasserin aut Ming Chen verfasserin aut In Computational and Structural Biotechnology Journal Elsevier, 2013 23(2024), Seite 801-812 (DE-627)731890086 (DE-600)2694435-2 20010370 nnns volume:23 year:2024 pages:801-812 https://doi.org/10.1016/j.csbj.2024.01.015 kostenfrei https://doaj.org/article/d38f51f29d5447beb54e80e50788113d kostenfrei http://www.sciencedirect.com/science/article/pii/S2001037024000151 kostenfrei https://doaj.org/toc/2001-0370 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 23 2024 801-812 |
allfieldsGer |
10.1016/j.csbj.2024.01.015 doi (DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d DE-627 ger DE-627 rakwb eng TP248.13-248.65 Yueming Hu verfasserin aut T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology Yejun Wang verfasserin aut Xiaotian Hu verfasserin aut Haoyu Chao verfasserin aut Sida Li verfasserin aut Qinyang Ni verfasserin aut Yanyan Zhu verfasserin aut Yixue Hu verfasserin aut Ziyi Zhao verfasserin aut Ming Chen verfasserin aut In Computational and Structural Biotechnology Journal Elsevier, 2013 23(2024), Seite 801-812 (DE-627)731890086 (DE-600)2694435-2 20010370 nnns volume:23 year:2024 pages:801-812 https://doi.org/10.1016/j.csbj.2024.01.015 kostenfrei https://doaj.org/article/d38f51f29d5447beb54e80e50788113d kostenfrei http://www.sciencedirect.com/science/article/pii/S2001037024000151 kostenfrei https://doaj.org/toc/2001-0370 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 23 2024 801-812 |
allfieldsSound |
10.1016/j.csbj.2024.01.015 doi (DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d DE-627 ger DE-627 rakwb eng TP248.13-248.65 Yueming Hu verfasserin aut T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology Yejun Wang verfasserin aut Xiaotian Hu verfasserin aut Haoyu Chao verfasserin aut Sida Li verfasserin aut Qinyang Ni verfasserin aut Yanyan Zhu verfasserin aut Yixue Hu verfasserin aut Ziyi Zhao verfasserin aut Ming Chen verfasserin aut In Computational and Structural Biotechnology Journal Elsevier, 2013 23(2024), Seite 801-812 (DE-627)731890086 (DE-600)2694435-2 20010370 nnns volume:23 year:2024 pages:801-812 https://doi.org/10.1016/j.csbj.2024.01.015 kostenfrei https://doaj.org/article/d38f51f29d5447beb54e80e50788113d kostenfrei http://www.sciencedirect.com/science/article/pii/S2001037024000151 kostenfrei https://doaj.org/toc/2001-0370 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 23 2024 801-812 |
language |
English |
source |
In Computational and Structural Biotechnology Journal 23(2024), Seite 801-812 volume:23 year:2024 pages:801-812 |
sourceStr |
In Computational and Structural Biotechnology Journal 23(2024), Seite 801-812 volume:23 year:2024 pages:801-812 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs Biotechnology |
isfreeaccess_bool |
true |
container_title |
Computational and Structural Biotechnology Journal |
authorswithroles_txt_mv |
Yueming Hu @@aut@@ Yejun Wang @@aut@@ Xiaotian Hu @@aut@@ Haoyu Chao @@aut@@ Sida Li @@aut@@ Qinyang Ni @@aut@@ Yanyan Zhu @@aut@@ Yixue Hu @@aut@@ Ziyi Zhao @@aut@@ Ming Chen @@aut@@ |
publishDateDaySort_date |
2024-01-01T00:00:00Z |
hierarchy_top_id |
731890086 |
id |
DOAJ095908463 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ095908463</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240413125836.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240413s2024 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.csbj.2024.01.015</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ095908463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJd38f51f29d5447beb54e80e50788113d</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TP248.13-248.65</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Yueming Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SEpp</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SE Prediction</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Protein language model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Helicobacter pylori T4SEs</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Biotechnology</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yejun Wang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Xiaotian Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Haoyu Chao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Sida Li</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Qinyang Ni</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yanyan Zhu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yixue Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Ziyi Zhao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Ming Chen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Computational and Structural Biotechnology Journal</subfield><subfield code="d">Elsevier, 2013</subfield><subfield code="g">23(2024), Seite 801-812</subfield><subfield code="w">(DE-627)731890086</subfield><subfield code="w">(DE-600)2694435-2</subfield><subfield code="x">20010370</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2024</subfield><subfield code="g">pages:801-812</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.csbj.2024.01.015</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/d38f51f29d5447beb54e80e50788113d</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.sciencedirect.com/science/article/pii/S2001037024000151</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2001-0370</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2024</subfield><subfield code="h">801-812</subfield></datafield></record></collection>
|
callnumber-first |
T - Technology |
author |
Yueming Hu |
spellingShingle |
Yueming Hu misc TP248.13-248.65 misc T4SEpp misc T4SE Prediction misc T4SS misc Deep learning misc Protein language model misc Helicobacter pylori T4SEs misc Biotechnology T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
authorStr |
Yueming Hu |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)731890086 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut aut aut aut aut aut aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
TP248 |
illustrated |
Not Illustrated |
issn |
20010370 |
topic_title |
TP248.13-248.65 T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors T4SEpp T4SE Prediction T4SS Deep learning Protein language model Helicobacter pylori T4SEs |
topic |
misc TP248.13-248.65 misc T4SEpp misc T4SE Prediction misc T4SS misc Deep learning misc Protein language model misc Helicobacter pylori T4SEs misc Biotechnology |
topic_unstemmed |
misc TP248.13-248.65 misc T4SEpp misc T4SE Prediction misc T4SS misc Deep learning misc Protein language model misc Helicobacter pylori T4SEs misc Biotechnology |
topic_browse |
misc TP248.13-248.65 misc T4SEpp misc T4SE Prediction misc T4SS misc Deep learning misc Protein language model misc Helicobacter pylori T4SEs misc Biotechnology |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Computational and Structural Biotechnology Journal |
hierarchy_parent_id |
731890086 |
hierarchy_top_title |
Computational and Structural Biotechnology Journal |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)731890086 (DE-600)2694435-2 |
title |
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
ctrlnum |
(DE-627)DOAJ095908463 (DE-599)DOAJd38f51f29d5447beb54e80e50788113d |
title_full |
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
author_sort |
Yueming Hu |
journal |
Computational and Structural Biotechnology Journal |
journalStr |
Computational and Structural Biotechnology Journal |
callnumber-first-code |
T |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2024 |
contenttype_str_mv |
txt |
container_start_page |
801 |
author_browse |
Yueming Hu Yejun Wang Xiaotian Hu Haoyu Chao Sida Li Qinyang Ni Yanyan Zhu Yixue Hu Ziyi Zhao Ming Chen |
container_volume |
23 |
class |
TP248.13-248.65 |
format_se |
Elektronische Aufsätze |
author-letter |
Yueming Hu |
doi_str_mv |
10.1016/j.csbj.2024.01.015 |
author2-role |
verfasserin |
title_sort |
t4sepp: a pipeline integrating protein language models to predict bacterial type iv secreted effectors |
callnumber |
TP248.13-248.65 |
title_auth |
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
abstract |
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. |
abstractGer |
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. |
abstract_unstemmed |
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_74 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
title_short |
T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors |
url |
https://doi.org/10.1016/j.csbj.2024.01.015 https://doaj.org/article/d38f51f29d5447beb54e80e50788113d http://www.sciencedirect.com/science/article/pii/S2001037024000151 https://doaj.org/toc/2001-0370 |
remote_bool |
true |
author2 |
Yejun Wang Xiaotian Hu Haoyu Chao Sida Li Qinyang Ni Yanyan Zhu Yixue Hu Ziyi Zhao Ming Chen |
author2Str |
Yejun Wang Xiaotian Hu Haoyu Chao Sida Li Qinyang Ni Yanyan Zhu Yixue Hu Ziyi Zhao Ming Chen |
ppnlink |
731890086 |
callnumber-subject |
TP - Chemical Technology |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.1016/j.csbj.2024.01.015 |
callnumber-a |
TP248.13-248.65 |
up_date |
2024-07-03T17:19:36.594Z |
_version_ |
1803579216725278720 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ095908463</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240413125836.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240413s2024 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.csbj.2024.01.015</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ095908463</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJd38f51f29d5447beb54e80e50788113d</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TP248.13-248.65</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Yueming Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SEpp</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SE Prediction</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">T4SS</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Deep learning</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Protein language model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Helicobacter pylori T4SEs</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Biotechnology</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yejun Wang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Xiaotian Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Haoyu Chao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Sida Li</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Qinyang Ni</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yanyan Zhu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Yixue Hu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Ziyi Zhao</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Ming Chen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Computational and Structural Biotechnology Journal</subfield><subfield code="d">Elsevier, 2013</subfield><subfield code="g">23(2024), Seite 801-812</subfield><subfield code="w">(DE-627)731890086</subfield><subfield code="w">(DE-600)2694435-2</subfield><subfield code="x">20010370</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:23</subfield><subfield code="g">year:2024</subfield><subfield code="g">pages:801-812</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.csbj.2024.01.015</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/d38f51f29d5447beb54e80e50788113d</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.sciencedirect.com/science/article/pii/S2001037024000151</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2001-0370</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_74</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">23</subfield><subfield code="j">2024</subfield><subfield code="h">801-812</subfield></datafield></record></collection>
|
score |
7.3993616 |