A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example
Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly fo...
Ausführliche Beschreibung
Autor*in: |
Lin, Run-Hsin [verfasserIn] Lin, Pinpin [verfasserIn] Wang, Chia-Chi [verfasserIn] Tung, Chun-Wei [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2024 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2024 |
---|
Übergeordnetes Werk: |
Enthalten in: Journal of cheminformatics - Springer International Publishing, 2009, 16(2024), 1 vom: 02. Aug. |
---|---|
Übergeordnetes Werk: |
volume:16 ; year:2024 ; number:1 ; day:02 ; month:08 |
Links: |
---|
DOI / URN: |
10.1186/s13321-024-00891-4 |
---|
Katalog-ID: |
SPR056833091 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | SPR056833091 | ||
003 | DE-627 | ||
005 | 20240803064802.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240803s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s13321-024-00891-4 |2 doi | |
035 | |a (DE-627)SPR056833091 | ||
035 | |a (SPR)s13321-024-00891-4-e | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 540 |q VZ |
100 | 1 | |a Lin, Run-Hsin |e verfasserin |0 (orcid)0000-0002-1205-2874 |4 aut | |
245 | 1 | 0 | |a A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
500 | |a © The Author(s) 2024 | ||
520 | |a Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. | ||
650 | 4 | |a Chemical space |7 (dpeaa)DE-He213 | |
650 | 4 | |a Developmental toxicity |7 (dpeaa)DE-He213 | |
650 | 4 | |a Chemical toxicity |7 (dpeaa)DE-He213 | |
650 | 4 | |a Multitask learning |7 (dpeaa)DE-He213 | |
650 | 4 | |a Zebrafish |7 (dpeaa)DE-He213 | |
700 | 1 | |a Lin, Pinpin |e verfasserin |0 (orcid)0000-0001-5459-907X |4 aut | |
700 | 1 | |a Wang, Chia-Chi |e verfasserin |0 (orcid)0000-0002-1272-1513 |4 aut | |
700 | 1 | |a Tung, Chun-Wei |e verfasserin |0 (orcid)0000-0003-3011-8440 |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of cheminformatics |d Springer International Publishing, 2009 |g 16(2024), 1 vom: 02. Aug. |w (DE-627)594779219 |w (DE-600)2486539-4 |x 1758-2946 |7 nnns |
773 | 1 | 8 | |g volume:16 |g year:2024 |g number:1 |g day:02 |g month:08 |
856 | 4 | 0 | |u https://dx.doi.org/10.1186/s13321-024-00891-4 |m X:SPRINGER |x Resolving-System |z kostenfrei |3 Volltext |
912 | |a SYSFLAG_0 | ||
912 | |a GBV_SPRINGER | ||
912 | |a SSG-OLC-PHA | ||
912 | |a GBV_ILN_11 | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2003 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2027 | ||
912 | |a GBV_ILN_2050 | ||
912 | |a GBV_ILN_2055 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 16 |j 2024 |e 1 |b 02 |c 08 |
author_variant |
r h l rhl p l pl c c w ccw c w t cwt |
---|---|
matchkey_str |
article:17582946:2024----::nvluttslannagrtmotssihitnthmclpczbaih |
hierarchy_sort_str |
2024 |
publishDate |
2024 |
allfields |
10.1186/s13321-024-00891-4 doi (DE-627)SPR056833091 (SPR)s13321-024-00891-4-e DE-627 ger DE-627 rakwb eng 540 VZ Lin, Run-Hsin verfasserin (orcid)0000-0002-1205-2874 aut A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2024 Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 Lin, Pinpin verfasserin (orcid)0000-0001-5459-907X aut Wang, Chia-Chi verfasserin (orcid)0000-0002-1272-1513 aut Tung, Chun-Wei verfasserin (orcid)0000-0003-3011-8440 aut Enthalten in Journal of cheminformatics Springer International Publishing, 2009 16(2024), 1 vom: 02. Aug. (DE-627)594779219 (DE-600)2486539-4 1758-2946 nnns volume:16 year:2024 number:1 day:02 month:08 https://dx.doi.org/10.1186/s13321-024-00891-4 X:SPRINGER Resolving-System kostenfrei Volltext SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 16 2024 1 02 08 |
spelling |
10.1186/s13321-024-00891-4 doi (DE-627)SPR056833091 (SPR)s13321-024-00891-4-e DE-627 ger DE-627 rakwb eng 540 VZ Lin, Run-Hsin verfasserin (orcid)0000-0002-1205-2874 aut A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2024 Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 Lin, Pinpin verfasserin (orcid)0000-0001-5459-907X aut Wang, Chia-Chi verfasserin (orcid)0000-0002-1272-1513 aut Tung, Chun-Wei verfasserin (orcid)0000-0003-3011-8440 aut Enthalten in Journal of cheminformatics Springer International Publishing, 2009 16(2024), 1 vom: 02. Aug. (DE-627)594779219 (DE-600)2486539-4 1758-2946 nnns volume:16 year:2024 number:1 day:02 month:08 https://dx.doi.org/10.1186/s13321-024-00891-4 X:SPRINGER Resolving-System kostenfrei Volltext SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 16 2024 1 02 08 |
allfields_unstemmed |
10.1186/s13321-024-00891-4 doi (DE-627)SPR056833091 (SPR)s13321-024-00891-4-e DE-627 ger DE-627 rakwb eng 540 VZ Lin, Run-Hsin verfasserin (orcid)0000-0002-1205-2874 aut A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2024 Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 Lin, Pinpin verfasserin (orcid)0000-0001-5459-907X aut Wang, Chia-Chi verfasserin (orcid)0000-0002-1272-1513 aut Tung, Chun-Wei verfasserin (orcid)0000-0003-3011-8440 aut Enthalten in Journal of cheminformatics Springer International Publishing, 2009 16(2024), 1 vom: 02. Aug. (DE-627)594779219 (DE-600)2486539-4 1758-2946 nnns volume:16 year:2024 number:1 day:02 month:08 https://dx.doi.org/10.1186/s13321-024-00891-4 X:SPRINGER Resolving-System kostenfrei Volltext SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 16 2024 1 02 08 |
allfieldsGer |
10.1186/s13321-024-00891-4 doi (DE-627)SPR056833091 (SPR)s13321-024-00891-4-e DE-627 ger DE-627 rakwb eng 540 VZ Lin, Run-Hsin verfasserin (orcid)0000-0002-1205-2874 aut A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2024 Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 Lin, Pinpin verfasserin (orcid)0000-0001-5459-907X aut Wang, Chia-Chi verfasserin (orcid)0000-0002-1272-1513 aut Tung, Chun-Wei verfasserin (orcid)0000-0003-3011-8440 aut Enthalten in Journal of cheminformatics Springer International Publishing, 2009 16(2024), 1 vom: 02. Aug. (DE-627)594779219 (DE-600)2486539-4 1758-2946 nnns volume:16 year:2024 number:1 day:02 month:08 https://dx.doi.org/10.1186/s13321-024-00891-4 X:SPRINGER Resolving-System kostenfrei Volltext SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 16 2024 1 02 08 |
allfieldsSound |
10.1186/s13321-024-00891-4 doi (DE-627)SPR056833091 (SPR)s13321-024-00891-4-e DE-627 ger DE-627 rakwb eng 540 VZ Lin, Run-Hsin verfasserin (orcid)0000-0002-1205-2874 aut A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2024 Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 Lin, Pinpin verfasserin (orcid)0000-0001-5459-907X aut Wang, Chia-Chi verfasserin (orcid)0000-0002-1272-1513 aut Tung, Chun-Wei verfasserin (orcid)0000-0003-3011-8440 aut Enthalten in Journal of cheminformatics Springer International Publishing, 2009 16(2024), 1 vom: 02. Aug. (DE-627)594779219 (DE-600)2486539-4 1758-2946 nnns volume:16 year:2024 number:1 day:02 month:08 https://dx.doi.org/10.1186/s13321-024-00891-4 X:SPRINGER Resolving-System kostenfrei Volltext SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 16 2024 1 02 08 |
language |
English |
source |
Enthalten in Journal of cheminformatics 16(2024), 1 vom: 02. Aug. volume:16 year:2024 number:1 day:02 month:08 |
sourceStr |
Enthalten in Journal of cheminformatics 16(2024), 1 vom: 02. Aug. volume:16 year:2024 number:1 day:02 month:08 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Chemical space Developmental toxicity Chemical toxicity Multitask learning Zebrafish |
dewey-raw |
540 |
isfreeaccess_bool |
true |
container_title |
Journal of cheminformatics |
authorswithroles_txt_mv |
Lin, Run-Hsin @@aut@@ Lin, Pinpin @@aut@@ Wang, Chia-Chi @@aut@@ Tung, Chun-Wei @@aut@@ |
publishDateDaySort_date |
2024-08-02T00:00:00Z |
hierarchy_top_id |
594779219 |
dewey-sort |
3540 |
id |
SPR056833091 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR056833091</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240803064802.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240803s2024 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s13321-024-00891-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR056833091</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s13321-024-00891-4-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">540</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lin, Run-Hsin</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-1205-2874</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2024</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Chemical space</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Developmental toxicity</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Chemical toxicity</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multitask learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zebrafish</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lin, Pinpin</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-5459-907X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Chia-Chi</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-1272-1513</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tung, Chun-Wei</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0003-3011-8440</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of cheminformatics</subfield><subfield code="d">Springer International Publishing, 2009</subfield><subfield code="g">16(2024), 1 vom: 02. Aug.</subfield><subfield code="w">(DE-627)594779219</subfield><subfield code="w">(DE-600)2486539-4</subfield><subfield code="x">1758-2946</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2024</subfield><subfield code="g">number:1</subfield><subfield code="g">day:02</subfield><subfield code="g">month:08</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s13321-024-00891-4</subfield><subfield code="m">X:SPRINGER</subfield><subfield code="x">Resolving-System</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_0</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2027</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2024</subfield><subfield code="e">1</subfield><subfield code="b">02</subfield><subfield code="c">08</subfield></datafield></record></collection>
|
author |
Lin, Run-Hsin |
spellingShingle |
Lin, Run-Hsin ddc 540 misc Chemical space misc Developmental toxicity misc Chemical toxicity misc Multitask learning misc Zebrafish A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
authorStr |
Lin, Run-Hsin |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)594779219 |
format |
electronic Article |
dewey-ones |
540 - Chemistry & allied sciences |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
springer |
remote_str |
true |
illustrated |
Not Illustrated |
issn |
1758-2946 |
topic_title |
540 VZ A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example Chemical space (dpeaa)DE-He213 Developmental toxicity (dpeaa)DE-He213 Chemical toxicity (dpeaa)DE-He213 Multitask learning (dpeaa)DE-He213 Zebrafish (dpeaa)DE-He213 |
topic |
ddc 540 misc Chemical space misc Developmental toxicity misc Chemical toxicity misc Multitask learning misc Zebrafish |
topic_unstemmed |
ddc 540 misc Chemical space misc Developmental toxicity misc Chemical toxicity misc Multitask learning misc Zebrafish |
topic_browse |
ddc 540 misc Chemical space misc Developmental toxicity misc Chemical toxicity misc Multitask learning misc Zebrafish |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Journal of cheminformatics |
hierarchy_parent_id |
594779219 |
dewey-tens |
540 - Chemistry |
hierarchy_top_title |
Journal of cheminformatics |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)594779219 (DE-600)2486539-4 |
title |
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
ctrlnum |
(DE-627)SPR056833091 (SPR)s13321-024-00891-4-e |
title_full |
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
author_sort |
Lin, Run-Hsin |
journal |
Journal of cheminformatics |
journalStr |
Journal of cheminformatics |
lang_code |
eng |
isOA_bool |
true |
dewey-hundreds |
500 - Science |
recordtype |
marc |
publishDateSort |
2024 |
contenttype_str_mv |
txt |
author_browse |
Lin, Run-Hsin Lin, Pinpin Wang, Chia-Chi Tung, Chun-Wei |
container_volume |
16 |
class |
540 VZ |
format_se |
Elektronische Aufsätze |
author-letter |
Lin, Run-Hsin |
doi_str_mv |
10.1186/s13321-024-00891-4 |
normlink |
(ORCID)0000-0002-1205-2874 (ORCID)0000-0001-5459-907X (ORCID)0000-0002-1272-1513 (ORCID)0000-0003-3011-8440 |
normlink_prefix_str_mv |
(orcid)0000-0002-1205-2874 (orcid)0000-0001-5459-907X (orcid)0000-0002-1272-1513 (orcid)0000-0003-3011-8440 |
dewey-full |
540 |
author2-role |
verfasserin |
title_sort |
a novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
title_auth |
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
abstract |
Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. © The Author(s) 2024 |
abstractGer |
Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. © The Author(s) 2024 |
abstract_unstemmed |
Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing. © The Author(s) 2024 |
collection_details |
SYSFLAG_0 GBV_SPRINGER SSG-OLC-PHA GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_602 GBV_ILN_2003 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2027 GBV_ILN_2050 GBV_ILN_2055 GBV_ILN_2111 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
1 |
title_short |
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example |
url |
https://dx.doi.org/10.1186/s13321-024-00891-4 |
remote_bool |
true |
author2 |
Lin, Pinpin Wang, Chia-Chi Tung, Chun-Wei |
author2Str |
Lin, Pinpin Wang, Chia-Chi Tung, Chun-Wei |
ppnlink |
594779219 |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.1186/s13321-024-00891-4 |
up_date |
2024-08-03T07:48:17.524Z |
_version_ |
1806351778474622976 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">SPR056833091</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240803064802.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240803s2024 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s13321-024-00891-4</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR056833091</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s13321-024-00891-4-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">540</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lin, Run-Hsin</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-1205-2874</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2024</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Chemical space</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Developmental toxicity</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Chemical toxicity</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multitask learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Zebrafish</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lin, Pinpin</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-5459-907X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Chia-Chi</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0002-1272-1513</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Tung, Chun-Wei</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0003-3011-8440</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of cheminformatics</subfield><subfield code="d">Springer International Publishing, 2009</subfield><subfield code="g">16(2024), 1 vom: 02. Aug.</subfield><subfield code="w">(DE-627)594779219</subfield><subfield code="w">(DE-600)2486539-4</subfield><subfield code="x">1758-2946</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:16</subfield><subfield code="g">year:2024</subfield><subfield code="g">number:1</subfield><subfield code="g">day:02</subfield><subfield code="g">month:08</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s13321-024-00891-4</subfield><subfield code="m">X:SPRINGER</subfield><subfield code="x">Resolving-System</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_0</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2003</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2027</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2050</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">16</subfield><subfield code="j">2024</subfield><subfield code="e">1</subfield><subfield code="b">02</subfield><subfield code="c">08</subfield></datafield></record></collection>
|
score |
7.400591 |