Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods
Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to p...
Ausführliche Beschreibung
Autor*in: |
Areti Karamanou [verfasserIn] Petros Brimos [verfasserIn] Evangelos Kalampokis [verfasserIn] Konstantinos Tarabanis [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2022 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
In: Sensors - MDPI AG, 2003, 22(2022), 24, p 9684 |
---|---|
Übergeordnetes Werk: |
volume:22 ; year:2022 ; number:24, p 9684 |
Links: |
---|
DOI / URN: |
10.3390/s22249684 |
---|
Katalog-ID: |
DOAJ082982597 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ082982597 | ||
003 | DE-627 | ||
005 | 20240414145904.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230311s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.3390/s22249684 |2 doi | |
035 | |a (DE-627)DOAJ082982597 | ||
035 | |a (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a TP1-1185 | |
100 | 0 | |a Areti Karamanou |e verfasserin |4 aut | |
245 | 1 | 0 | |a Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. | ||
650 | 4 | |a open government data | |
650 | 4 | |a dynamic government data | |
650 | 4 | |a high-valuable data | |
650 | 4 | |a real-time data | |
650 | 4 | |a traffic data | |
650 | 4 | |a data quality | |
653 | 0 | |a Chemical technology | |
700 | 0 | |a Petros Brimos |e verfasserin |4 aut | |
700 | 0 | |a Evangelos Kalampokis |e verfasserin |4 aut | |
700 | 0 | |a Konstantinos Tarabanis |e verfasserin |4 aut | |
773 | 0 | 8 | |i In |t Sensors |d MDPI AG, 2003 |g 22(2022), 24, p 9684 |w (DE-627)331640910 |w (DE-600)2052857-7 |x 14248220 |7 nnns |
773 | 1 | 8 | |g volume:22 |g year:2022 |g number:24, p 9684 |
856 | 4 | 0 | |u https://doi.org/10.3390/s22249684 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 |z kostenfrei |
856 | 4 | 0 | |u https://www.mdpi.com/1424-8220/22/24/9684 |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/1424-8220 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_206 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2005 | ||
912 | |a GBV_ILN_2009 | ||
912 | |a GBV_ILN_2011 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_2055 | ||
912 | |a GBV_ILN_2057 | ||
912 | |a GBV_ILN_2111 | ||
912 | |a GBV_ILN_2507 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 22 |j 2022 |e 24, p 9684 |
author_variant |
a k ak p b pb e k ek k t kt |
---|---|
matchkey_str |
article:14248220:2022----::xlrnteultodnmcpnoenetaasnsaitcl |
hierarchy_sort_str |
2022 |
callnumber-subject-code |
TP |
publishDate |
2022 |
allfields |
10.3390/s22249684 doi (DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 DE-627 ger DE-627 rakwb eng TP1-1185 Areti Karamanou verfasserin aut Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology Petros Brimos verfasserin aut Evangelos Kalampokis verfasserin aut Konstantinos Tarabanis verfasserin aut In Sensors MDPI AG, 2003 22(2022), 24, p 9684 (DE-627)331640910 (DE-600)2052857-7 14248220 nnns volume:22 year:2022 number:24, p 9684 https://doi.org/10.3390/s22249684 kostenfrei https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 kostenfrei https://www.mdpi.com/1424-8220/22/24/9684 kostenfrei https://doaj.org/toc/1424-8220 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 22 2022 24, p 9684 |
spelling |
10.3390/s22249684 doi (DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 DE-627 ger DE-627 rakwb eng TP1-1185 Areti Karamanou verfasserin aut Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology Petros Brimos verfasserin aut Evangelos Kalampokis verfasserin aut Konstantinos Tarabanis verfasserin aut In Sensors MDPI AG, 2003 22(2022), 24, p 9684 (DE-627)331640910 (DE-600)2052857-7 14248220 nnns volume:22 year:2022 number:24, p 9684 https://doi.org/10.3390/s22249684 kostenfrei https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 kostenfrei https://www.mdpi.com/1424-8220/22/24/9684 kostenfrei https://doaj.org/toc/1424-8220 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 22 2022 24, p 9684 |
allfields_unstemmed |
10.3390/s22249684 doi (DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 DE-627 ger DE-627 rakwb eng TP1-1185 Areti Karamanou verfasserin aut Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology Petros Brimos verfasserin aut Evangelos Kalampokis verfasserin aut Konstantinos Tarabanis verfasserin aut In Sensors MDPI AG, 2003 22(2022), 24, p 9684 (DE-627)331640910 (DE-600)2052857-7 14248220 nnns volume:22 year:2022 number:24, p 9684 https://doi.org/10.3390/s22249684 kostenfrei https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 kostenfrei https://www.mdpi.com/1424-8220/22/24/9684 kostenfrei https://doaj.org/toc/1424-8220 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 22 2022 24, p 9684 |
allfieldsGer |
10.3390/s22249684 doi (DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 DE-627 ger DE-627 rakwb eng TP1-1185 Areti Karamanou verfasserin aut Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology Petros Brimos verfasserin aut Evangelos Kalampokis verfasserin aut Konstantinos Tarabanis verfasserin aut In Sensors MDPI AG, 2003 22(2022), 24, p 9684 (DE-627)331640910 (DE-600)2052857-7 14248220 nnns volume:22 year:2022 number:24, p 9684 https://doi.org/10.3390/s22249684 kostenfrei https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 kostenfrei https://www.mdpi.com/1424-8220/22/24/9684 kostenfrei https://doaj.org/toc/1424-8220 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 22 2022 24, p 9684 |
allfieldsSound |
10.3390/s22249684 doi (DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 DE-627 ger DE-627 rakwb eng TP1-1185 Areti Karamanou verfasserin aut Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods 2022 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology Petros Brimos verfasserin aut Evangelos Kalampokis verfasserin aut Konstantinos Tarabanis verfasserin aut In Sensors MDPI AG, 2003 22(2022), 24, p 9684 (DE-627)331640910 (DE-600)2052857-7 14248220 nnns volume:22 year:2022 number:24, p 9684 https://doi.org/10.3390/s22249684 kostenfrei https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 kostenfrei https://www.mdpi.com/1424-8220/22/24/9684 kostenfrei https://doaj.org/toc/1424-8220 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 22 2022 24, p 9684 |
language |
English |
source |
In Sensors 22(2022), 24, p 9684 volume:22 year:2022 number:24, p 9684 |
sourceStr |
In Sensors 22(2022), 24, p 9684 volume:22 year:2022 number:24, p 9684 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
open government data dynamic government data high-valuable data real-time data traffic data data quality Chemical technology |
isfreeaccess_bool |
true |
container_title |
Sensors |
authorswithroles_txt_mv |
Areti Karamanou @@aut@@ Petros Brimos @@aut@@ Evangelos Kalampokis @@aut@@ Konstantinos Tarabanis @@aut@@ |
publishDateDaySort_date |
2022-01-01T00:00:00Z |
hierarchy_top_id |
331640910 |
id |
DOAJ082982597 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ082982597</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240414145904.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230311s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3390/s22249684</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ082982597</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ30447ad4dec14f438558145ccc7fb860</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TP1-1185</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Areti Karamanou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">open government data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">dynamic government data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">high-valuable data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">real-time data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">traffic data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">data quality</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Chemical technology</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Petros Brimos</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Evangelos Kalampokis</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Konstantinos Tarabanis</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Sensors</subfield><subfield code="d">MDPI AG, 2003</subfield><subfield code="g">22(2022), 24, p 9684</subfield><subfield code="w">(DE-627)331640910</subfield><subfield code="w">(DE-600)2052857-7</subfield><subfield code="x">14248220</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:22</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:24, p 9684</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3390/s22249684</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/30447ad4dec14f438558145ccc7fb860</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.mdpi.com/1424-8220/22/24/9684</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1424-8220</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2057</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2507</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">22</subfield><subfield code="j">2022</subfield><subfield code="e">24, p 9684</subfield></datafield></record></collection>
|
callnumber-first |
T - Technology |
author |
Areti Karamanou |
spellingShingle |
Areti Karamanou misc TP1-1185 misc open government data misc dynamic government data misc high-valuable data misc real-time data misc traffic data misc data quality misc Chemical technology Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
authorStr |
Areti Karamanou |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)331640910 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
TP1-1185 |
illustrated |
Not Illustrated |
issn |
14248220 |
topic_title |
TP1-1185 Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods open government data dynamic government data high-valuable data real-time data traffic data data quality |
topic |
misc TP1-1185 misc open government data misc dynamic government data misc high-valuable data misc real-time data misc traffic data misc data quality misc Chemical technology |
topic_unstemmed |
misc TP1-1185 misc open government data misc dynamic government data misc high-valuable data misc real-time data misc traffic data misc data quality misc Chemical technology |
topic_browse |
misc TP1-1185 misc open government data misc dynamic government data misc high-valuable data misc real-time data misc traffic data misc data quality misc Chemical technology |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Sensors |
hierarchy_parent_id |
331640910 |
hierarchy_top_title |
Sensors |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)331640910 (DE-600)2052857-7 |
title |
Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
ctrlnum |
(DE-627)DOAJ082982597 (DE-599)DOAJ30447ad4dec14f438558145ccc7fb860 |
title_full |
Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
author_sort |
Areti Karamanou |
journal |
Sensors |
journalStr |
Sensors |
callnumber-first-code |
T |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2022 |
contenttype_str_mv |
txt |
author_browse |
Areti Karamanou Petros Brimos Evangelos Kalampokis Konstantinos Tarabanis |
container_volume |
22 |
class |
TP1-1185 |
format_se |
Elektronische Aufsätze |
author-letter |
Areti Karamanou |
doi_str_mv |
10.3390/s22249684 |
author2-role |
verfasserin |
title_sort |
exploring the quality of dynamic open government data using statistical and machine learning methods |
callnumber |
TP1-1185 |
title_auth |
Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
abstract |
Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. |
abstractGer |
Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. |
abstract_unstemmed |
Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_206 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2005 GBV_ILN_2009 GBV_ILN_2011 GBV_ILN_2014 GBV_ILN_2055 GBV_ILN_2057 GBV_ILN_2111 GBV_ILN_2507 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
24, p 9684 |
title_short |
Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods |
url |
https://doi.org/10.3390/s22249684 https://doaj.org/article/30447ad4dec14f438558145ccc7fb860 https://www.mdpi.com/1424-8220/22/24/9684 https://doaj.org/toc/1424-8220 |
remote_bool |
true |
author2 |
Petros Brimos Evangelos Kalampokis Konstantinos Tarabanis |
author2Str |
Petros Brimos Evangelos Kalampokis Konstantinos Tarabanis |
ppnlink |
331640910 |
callnumber-subject |
TP - Chemical Technology |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.3390/s22249684 |
callnumber-a |
TP1-1185 |
up_date |
2024-07-03T14:53:13.480Z |
_version_ |
1803570006963781632 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ082982597</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20240414145904.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230311s2022 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3390/s22249684</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ082982597</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ30447ad4dec14f438558145ccc7fb860</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">TP1-1185</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Areti Karamanou</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2022</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors’ knowledge, this is the first time a study has explored the quality of dynamic OGD.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">open government data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">dynamic government data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">high-valuable data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">real-time data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">traffic data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">data quality</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Chemical technology</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Petros Brimos</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Evangelos Kalampokis</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Konstantinos Tarabanis</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Sensors</subfield><subfield code="d">MDPI AG, 2003</subfield><subfield code="g">22(2022), 24, p 9684</subfield><subfield code="w">(DE-627)331640910</subfield><subfield code="w">(DE-600)2052857-7</subfield><subfield code="x">14248220</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:22</subfield><subfield code="g">year:2022</subfield><subfield code="g">number:24, p 9684</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3390/s22249684</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/30447ad4dec14f438558145ccc7fb860</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.mdpi.com/1424-8220/22/24/9684</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1424-8220</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_206</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2005</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2009</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2011</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2055</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2057</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2111</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2507</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">22</subfield><subfield code="j">2022</subfield><subfield code="e">24, p 9684</subfield></datafield></record></collection>
|
score |
7.399349 |