Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and externa...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Yang, Cynthia [verfasserIn] Fridgeirsson, Egill A. Kors, Jan A. Reps, Jenna M. Rijnbeek, Peter R.

Format:	E-Artikel
Sprache:	Englisch

Erschienen:	2024

Schlagwörter:	Patient-level prediction Clinical prediction model Class Imbalance Problem Machine learning External validation Clinical decision support

Anmerkung:	© The Author(s) 2023

Übergeordnetes Werk:	Enthalten in: Journal of Big Data - Berlin : SpringerOpen, 2014, 11(2024), 1 vom: 03. Jan.
Übergeordnetes Werk:	volume:11 ; year:2024 ; number:1 ; day:03 ; month:01

Links:	Volltext

DOI / URN:	10.1186/s40537-023-00857-7

Katalog-ID:	SPR054253594

Internformat


LEADER	01000caa a22002652 4500
001	SPR054253594
003	DE-627
005	20240216064738.0
007	cr uuu---uuuuu
008	240104s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1186/s40537-023-00857-7 \|2 doi
035			\|a (DE-627)SPR054253594
035			\|a (SPR)s40537-023-00857-7-e
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Yang, Cynthia \|e verfasserin \|0 (orcid)0000-0001-6769-3153 \|4 aut
245	1	0	\|a Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a © The Author(s) 2023
520			\|a Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.
650		4	\|a Patient-level prediction \|7 (dpeaa)DE-He213
650		4	\|a Clinical prediction model \|7 (dpeaa)DE-He213
650		4	\|a Class Imbalance Problem \|7 (dpeaa)DE-He213
650		4	\|a Machine learning \|7 (dpeaa)DE-He213
650		4	\|a External validation \|7 (dpeaa)DE-He213
650		4	\|a Clinical decision support \|7 (dpeaa)DE-He213
700	1		\|a Fridgeirsson, Egill A. \|4 aut
700	1		\|a Kors, Jan A. \|4 aut
700	1		\|a Reps, Jenna M. \|4 aut
700	1		\|a Rijnbeek, Peter R. \|4 aut
773	0	8	\|i Enthalten in \|t Journal of Big Data \|d Berlin : SpringerOpen, 2014 \|g 11(2024), 1 vom: 03. Jan. \|w (DE-627)79213219X \|w (DE-600)2780218-8 \|x 2196-1115 \|7 nnns
773	1	8	\|g volume:11 \|g year:2024 \|g number:1 \|g day:03 \|g month:01
856	4	0	\|u https://dx.doi.org/10.1186/s40537-023-00857-7 \|z kostenfrei \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_SPRINGER
912			\|a GBV_ILN_11
912			\|a GBV_ILN_20
912			\|a GBV_ILN_22
912			\|a GBV_ILN_23
912			\|a GBV_ILN_24
912			\|a GBV_ILN_39
912			\|a GBV_ILN_40
912			\|a GBV_ILN_60
912			\|a GBV_ILN_62
912			\|a GBV_ILN_63
912			\|a GBV_ILN_65
912			\|a GBV_ILN_69
912			\|a GBV_ILN_70
912			\|a GBV_ILN_73
912			\|a GBV_ILN_95
912			\|a GBV_ILN_105
912			\|a GBV_ILN_110
912			\|a GBV_ILN_151
912			\|a GBV_ILN_161
912			\|a GBV_ILN_170
912			\|a GBV_ILN_213
912			\|a GBV_ILN_230
912			\|a GBV_ILN_285
912			\|a GBV_ILN_293
912			\|a GBV_ILN_370
912			\|a GBV_ILN_602
912			\|a GBV_ILN_2014
912			\|a GBV_ILN_4012
912			\|a GBV_ILN_4037
912			\|a GBV_ILN_4112
912			\|a GBV_ILN_4125
912			\|a GBV_ILN_4126
912			\|a GBV_ILN_4249
912			\|a GBV_ILN_4305
912			\|a GBV_ILN_4306
912			\|a GBV_ILN_4307
912			\|a GBV_ILN_4313
912			\|a GBV_ILN_4322
912			\|a GBV_ILN_4323
912			\|a GBV_ILN_4324
912			\|a GBV_ILN_4325
912			\|a GBV_ILN_4326
912			\|a GBV_ILN_4334
912			\|a GBV_ILN_4335
912			\|a GBV_ILN_4338
912			\|a GBV_ILN_4367
912			\|a GBV_ILN_4700
951			\|a AR
952			\|d 11 \|j 2024 \|e 1 \|b 03 \|c 01

Indexfelder

author_variant	c y cy e a f ea eaf j a k ja jak j m r jm jmr p r r pr prr
matchkey_str	article:21961115:2024----::matfadmvrapignrnoudrapignhpromnefrdcinoesee
hierarchy_sort_str	2024
publishDate	2024
allfields	10.1186/s40537-023-00857-7 doi (DE-627)SPR054253594 (SPR)s40537-023-00857-7-e DE-627 ger DE-627 rakwb eng Yang, Cynthia verfasserin (orcid)0000-0001-6769-3153 aut Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2023 Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213 Fridgeirsson, Egill A. aut Kors, Jan A. aut Reps, Jenna M. aut Rijnbeek, Peter R. aut Enthalten in Journal of Big Data Berlin : SpringerOpen, 2014 11(2024), 1 vom: 03. Jan. (DE-627)79213219X (DE-600)2780218-8 2196-1115 nnns volume:11 year:2024 number:1 day:03 month:01 https://dx.doi.org/10.1186/s40537-023-00857-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2024 1 03 01
spelling	10.1186/s40537-023-00857-7 doi (DE-627)SPR054253594 (SPR)s40537-023-00857-7-e DE-627 ger DE-627 rakwb eng Yang, Cynthia verfasserin (orcid)0000-0001-6769-3153 aut Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2023 Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213 Fridgeirsson, Egill A. aut Kors, Jan A. aut Reps, Jenna M. aut Rijnbeek, Peter R. aut Enthalten in Journal of Big Data Berlin : SpringerOpen, 2014 11(2024), 1 vom: 03. Jan. (DE-627)79213219X (DE-600)2780218-8 2196-1115 nnns volume:11 year:2024 number:1 day:03 month:01 https://dx.doi.org/10.1186/s40537-023-00857-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2024 1 03 01
allfields_unstemmed	10.1186/s40537-023-00857-7 doi (DE-627)SPR054253594 (SPR)s40537-023-00857-7-e DE-627 ger DE-627 rakwb eng Yang, Cynthia verfasserin (orcid)0000-0001-6769-3153 aut Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2023 Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213 Fridgeirsson, Egill A. aut Kors, Jan A. aut Reps, Jenna M. aut Rijnbeek, Peter R. aut Enthalten in Journal of Big Data Berlin : SpringerOpen, 2014 11(2024), 1 vom: 03. Jan. (DE-627)79213219X (DE-600)2780218-8 2196-1115 nnns volume:11 year:2024 number:1 day:03 month:01 https://dx.doi.org/10.1186/s40537-023-00857-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2024 1 03 01
allfieldsGer	10.1186/s40537-023-00857-7 doi (DE-627)SPR054253594 (SPR)s40537-023-00857-7-e DE-627 ger DE-627 rakwb eng Yang, Cynthia verfasserin (orcid)0000-0001-6769-3153 aut Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2023 Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213 Fridgeirsson, Egill A. aut Kors, Jan A. aut Reps, Jenna M. aut Rijnbeek, Peter R. aut Enthalten in Journal of Big Data Berlin : SpringerOpen, 2014 11(2024), 1 vom: 03. Jan. (DE-627)79213219X (DE-600)2780218-8 2196-1115 nnns volume:11 year:2024 number:1 day:03 month:01 https://dx.doi.org/10.1186/s40537-023-00857-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2024 1 03 01
allfieldsSound	10.1186/s40537-023-00857-7 doi (DE-627)SPR054253594 (SPR)s40537-023-00857-7-e DE-627 ger DE-627 rakwb eng Yang, Cynthia verfasserin (orcid)0000-0001-6769-3153 aut Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data 2024 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier © The Author(s) 2023 Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213 Fridgeirsson, Egill A. aut Kors, Jan A. aut Reps, Jenna M. aut Rijnbeek, Peter R. aut Enthalten in Journal of Big Data Berlin : SpringerOpen, 2014 11(2024), 1 vom: 03. Jan. (DE-627)79213219X (DE-600)2780218-8 2196-1115 nnns volume:11 year:2024 number:1 day:03 month:01 https://dx.doi.org/10.1186/s40537-023-00857-7 kostenfrei Volltext GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 11 2024 1 03 01
language	English
source	Enthalten in Journal of Big Data 11(2024), 1 vom: 03. Jan. volume:11 year:2024 number:1 day:03 month:01
sourceStr	Enthalten in Journal of Big Data 11(2024), 1 vom: 03. Jan. volume:11 year:2024 number:1 day:03 month:01
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Patient-level prediction Clinical prediction model Class Imbalance Problem Machine learning External validation Clinical decision support
isfreeaccess_bool	true
container_title	Journal of Big Data
authorswithroles_txt_mv	Yang, Cynthia @@aut@@ Fridgeirsson, Egill A. @@aut@@ Kors, Jan A. @@aut@@ Reps, Jenna M. @@aut@@ Rijnbeek, Peter R. @@aut@@
publishDateDaySort_date	2024-01-03T00:00:00Z
hierarchy_top_id	79213219X
id	SPR054253594
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">SPR054253594</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240216064738.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240104s2024 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s40537-023-00857-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR054253594</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s40537-023-00857-7-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Yang, Cynthia</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-6769-3153</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2023</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Patient-level prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clinical prediction model</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class Imbalance Problem</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">External validation</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clinical decision support</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Fridgeirsson, Egill A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kors, Jan A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Reps, Jenna M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rijnbeek, Peter R.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of Big Data</subfield><subfield code="d">Berlin : SpringerOpen, 2014</subfield><subfield code="g">11(2024), 1 vom: 03. Jan.</subfield><subfield code="w">(DE-627)79213219X</subfield><subfield code="w">(DE-600)2780218-8</subfield><subfield code="x">2196-1115</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:11</subfield><subfield code="g">year:2024</subfield><subfield code="g">number:1</subfield><subfield code="g">day:03</subfield><subfield code="g">month:01</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s40537-023-00857-7</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4334</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">11</subfield><subfield code="j">2024</subfield><subfield code="e">1</subfield><subfield code="b">03</subfield><subfield code="c">01</subfield></datafield></record></collection>
author	Yang, Cynthia
spellingShingle	Yang, Cynthia misc Patient-level prediction misc Clinical prediction model misc Class Imbalance Problem misc Machine learning misc External validation misc Clinical decision support Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
authorStr	Yang, Cynthia
ppnlink_with_tag_str_mv	@@773@@(DE-627)79213219X
format	electronic Article
delete_txt_mv	keep
author_role	aut aut aut aut aut
collection	springer
remote_str	true
illustrated	Not Illustrated
issn	2196-1115
topic_title	Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data Patient-level prediction (dpeaa)DE-He213 Clinical prediction model (dpeaa)DE-He213 Class Imbalance Problem (dpeaa)DE-He213 Machine learning (dpeaa)DE-He213 External validation (dpeaa)DE-He213 Clinical decision support (dpeaa)DE-He213
topic	misc Patient-level prediction misc Clinical prediction model misc Class Imbalance Problem misc Machine learning misc External validation misc Clinical decision support
topic_unstemmed	misc Patient-level prediction misc Clinical prediction model misc Class Imbalance Problem misc Machine learning misc External validation misc Clinical decision support
topic_browse	misc Patient-level prediction misc Clinical prediction model misc Class Imbalance Problem misc Machine learning misc External validation misc Clinical decision support
format_facet	Elektronische Aufsätze Aufsätze Elektronische Ressource
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	cr
hierarchy_parent_title	Journal of Big Data
hierarchy_parent_id	79213219X
hierarchy_top_title	Journal of Big Data
isfreeaccess_txt	true
familylinks_str_mv	(DE-627)79213219X (DE-600)2780218-8
title	Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
ctrlnum	(DE-627)SPR054253594 (SPR)s40537-023-00857-7-e
title_full	Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
author_sort	Yang, Cynthia
journal	Journal of Big Data
journalStr	Journal of Big Data
lang_code	eng
isOA_bool	true
recordtype	marc
publishDateSort	2024
contenttype_str_mv	txt
author_browse	Yang, Cynthia Fridgeirsson, Egill A. Kors, Jan A. Reps, Jenna M. Rijnbeek, Peter R.
container_volume	11
format_se	Elektronische Aufsätze
author-letter	Yang, Cynthia
doi_str_mv	10.1186/s40537-023-00857-7
normlink	(ORCID)0000-0001-6769-3153
normlink_prefix_str_mv	(orcid)0000-0001-6769-3153
title_sort	impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
title_auth	Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
abstract	Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. © The Author(s) 2023
abstractGer	Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. © The Author(s) 2023
abstract_unstemmed	Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases. © The Author(s) 2023
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_SPRINGER GBV_ILN_11 GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4334 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700
container_issue	1
title_short	Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data
url	https://dx.doi.org/10.1186/s40537-023-00857-7
remote_bool	true
author2	Fridgeirsson, Egill A. Kors, Jan A. Reps, Jenna M. Rijnbeek, Peter R.
author2Str	Fridgeirsson, Egill A. Kors, Jan A. Reps, Jenna M. Rijnbeek, Peter R.
ppnlink	79213219X
mediatype_str_mv	c
isOA_txt	true
hochschulschrift_bool	false
doi_str	10.1186/s40537-023-00857-7
up_date	2024-07-04T00:42:33.234Z
_version_	1803607084356337664
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">SPR054253594</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240216064738.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">240104s2024 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1186/s40537-023-00857-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)SPR054253594</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(SPR)s40537-023-00857-7-e</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Yang, Cynthia</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-6769-3153</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2024</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2023</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Background There is currently no consensus on the impact of class imbalance methods on the performance of clinical prediction models. We aimed to empirically investigate the impact of random oversampling and random undersampling, two commonly used class imbalance methods, on the internal and external validation performance of prediction models developed using observational health data. Methods We developed and externally validated prediction models for various outcomes of interest within a target population of people with pharmaceutically treated depression across four large observational health databases. We used three different classifiers (lasso logistic regression, random forest, XGBoost) and varied the target imbalance ratio. We evaluated the impact on model performance in terms of discrimination and calibration. Discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) and calibration was assessed using calibration plots. Results We developed and externally validated a total of 1,566 prediction models. On internal and external validation, random oversampling and random undersampling generally did not result in higher AUROCs. Moreover, we found overestimated risks, although this miscalibration could largely be corrected by recalibrating the models towards the imbalance ratios in the original dataset. Conclusions Overall, we found that random oversampling or random undersampling generally does not improve the internal and external validation performance of prediction models developed in large observational health databases. Based on our findings, we do not recommend applying random oversampling or random undersampling when developing prediction models in large observational health databases.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Patient-level prediction</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clinical prediction model</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Class Imbalance Problem</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Machine learning</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">External validation</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clinical decision support</subfield><subfield code="7">(dpeaa)DE-He213</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Fridgeirsson, Egill A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kors, Jan A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Reps, Jenna M.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Rijnbeek, Peter R.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of Big Data</subfield><subfield code="d">Berlin : SpringerOpen, 2014</subfield><subfield code="g">11(2024), 1 vom: 03. Jan.</subfield><subfield code="w">(DE-627)79213219X</subfield><subfield code="w">(DE-600)2780218-8</subfield><subfield code="x">2196-1115</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:11</subfield><subfield code="g">year:2024</subfield><subfield code="g">number:1</subfield><subfield code="g">day:03</subfield><subfield code="g">month:01</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://dx.doi.org/10.1186/s40537-023-00857-7</subfield><subfield code="z">kostenfrei</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_SPRINGER</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_11</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4334</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">11</subfield><subfield code="j">2024</subfield><subfield code="e">1</subfield><subfield code="b">03</subfield><subfield code="c">01</subfield></datafield></record></collection>
score	7.4009495

Nicht das Richtige dabei?

Schreiben Sie uns!

Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?