Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necess...
Ausführliche Beschreibung
Autor*in: |
Zhenyu Yang [verfasserIn] Lei Wu [verfasserIn] Peian Wen [verfasserIn] Peng Chen [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2023 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
In: Electronic Research Archive - AIMS Press, 2022, 31(2023), 4, Seite 1948-1965 |
---|---|
Übergeordnetes Werk: |
volume:31 ; year:2023 ; number:4 ; pages:1948-1965 |
Links: |
---|
DOI / URN: |
10.3934/era.2023100 |
---|
Katalog-ID: |
DOAJ090118413 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ090118413 | ||
003 | DE-627 | ||
005 | 20230526104517.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230526s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.3934/era.2023100 |2 doi | |
035 | |a (DE-627)DOAJ090118413 | ||
035 | |a (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
050 | 0 | |a QA1-939 | |
050 | 0 | |a T57-57.97 | |
100 | 0 | |a Zhenyu Yang |e verfasserin |4 aut | |
245 | 1 | 0 | |a Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. | ||
650 | 4 | |a visual question answering | |
650 | 4 | |a external knowledge | |
650 | 4 | |a bimodal fusion | |
650 | 4 | |a pre-trained language models | |
650 | 4 | |a knowledge graphs | |
653 | 0 | |a Mathematics | |
653 | 0 | |a Applied mathematics. Quantitative methods | |
700 | 0 | |a Lei Wu |e verfasserin |4 aut | |
700 | 0 | |a Peian Wen |e verfasserin |4 aut | |
700 | 0 | |a Peng Chen |e verfasserin |4 aut | |
773 | 0 | 8 | |i In |t Electronic Research Archive |d AIMS Press, 2022 |g 31(2023), 4, Seite 1948-1965 |w (DE-627)1835128319 |x 26881594 |7 nnns |
773 | 1 | 8 | |g volume:31 |g year:2023 |g number:4 |g pages:1948-1965 |
856 | 4 | 0 | |u https://doi.org/10.3934/era.2023100 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/78a0bf51799b43a5aee066699958a729 |z kostenfrei |
856 | 4 | 0 | |u https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/2688-1594 |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
912 | |a GBV_ILN_20 | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_23 | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_31 | ||
912 | |a GBV_ILN_39 | ||
912 | |a GBV_ILN_40 | ||
912 | |a GBV_ILN_60 | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_63 | ||
912 | |a GBV_ILN_65 | ||
912 | |a GBV_ILN_69 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_73 | ||
912 | |a GBV_ILN_95 | ||
912 | |a GBV_ILN_105 | ||
912 | |a GBV_ILN_110 | ||
912 | |a GBV_ILN_151 | ||
912 | |a GBV_ILN_161 | ||
912 | |a GBV_ILN_170 | ||
912 | |a GBV_ILN_213 | ||
912 | |a GBV_ILN_230 | ||
912 | |a GBV_ILN_285 | ||
912 | |a GBV_ILN_293 | ||
912 | |a GBV_ILN_370 | ||
912 | |a GBV_ILN_602 | ||
912 | |a GBV_ILN_2014 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4037 | ||
912 | |a GBV_ILN_4112 | ||
912 | |a GBV_ILN_4125 | ||
912 | |a GBV_ILN_4126 | ||
912 | |a GBV_ILN_4249 | ||
912 | |a GBV_ILN_4305 | ||
912 | |a GBV_ILN_4306 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4313 | ||
912 | |a GBV_ILN_4322 | ||
912 | |a GBV_ILN_4323 | ||
912 | |a GBV_ILN_4324 | ||
912 | |a GBV_ILN_4325 | ||
912 | |a GBV_ILN_4326 | ||
912 | |a GBV_ILN_4335 | ||
912 | |a GBV_ILN_4338 | ||
912 | |a GBV_ILN_4367 | ||
912 | |a GBV_ILN_4700 | ||
951 | |a AR | ||
952 | |d 31 |j 2023 |e 4 |h 1948-1965 |
author_variant |
z y zy l w lw p w pw p c pc |
---|---|
matchkey_str |
article:26881594:2023----::iulusinnwrnraoigihxenlnwegbsdni |
hierarchy_sort_str |
2023 |
callnumber-subject-code |
QA |
publishDate |
2023 |
allfields |
10.3934/era.2023100 doi (DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 DE-627 ger DE-627 rakwb eng QA1-939 T57-57.97 Zhenyu Yang verfasserin aut Visual Question Answering reasoning with external knowledge based on bimodal graph neural network 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods Lei Wu verfasserin aut Peian Wen verfasserin aut Peng Chen verfasserin aut In Electronic Research Archive AIMS Press, 2022 31(2023), 4, Seite 1948-1965 (DE-627)1835128319 26881594 nnns volume:31 year:2023 number:4 pages:1948-1965 https://doi.org/10.3934/era.2023100 kostenfrei https://doaj.org/article/78a0bf51799b43a5aee066699958a729 kostenfrei https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML kostenfrei https://doaj.org/toc/2688-1594 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 31 2023 4 1948-1965 |
spelling |
10.3934/era.2023100 doi (DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 DE-627 ger DE-627 rakwb eng QA1-939 T57-57.97 Zhenyu Yang verfasserin aut Visual Question Answering reasoning with external knowledge based on bimodal graph neural network 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods Lei Wu verfasserin aut Peian Wen verfasserin aut Peng Chen verfasserin aut In Electronic Research Archive AIMS Press, 2022 31(2023), 4, Seite 1948-1965 (DE-627)1835128319 26881594 nnns volume:31 year:2023 number:4 pages:1948-1965 https://doi.org/10.3934/era.2023100 kostenfrei https://doaj.org/article/78a0bf51799b43a5aee066699958a729 kostenfrei https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML kostenfrei https://doaj.org/toc/2688-1594 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 31 2023 4 1948-1965 |
allfields_unstemmed |
10.3934/era.2023100 doi (DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 DE-627 ger DE-627 rakwb eng QA1-939 T57-57.97 Zhenyu Yang verfasserin aut Visual Question Answering reasoning with external knowledge based on bimodal graph neural network 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods Lei Wu verfasserin aut Peian Wen verfasserin aut Peng Chen verfasserin aut In Electronic Research Archive AIMS Press, 2022 31(2023), 4, Seite 1948-1965 (DE-627)1835128319 26881594 nnns volume:31 year:2023 number:4 pages:1948-1965 https://doi.org/10.3934/era.2023100 kostenfrei https://doaj.org/article/78a0bf51799b43a5aee066699958a729 kostenfrei https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML kostenfrei https://doaj.org/toc/2688-1594 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 31 2023 4 1948-1965 |
allfieldsGer |
10.3934/era.2023100 doi (DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 DE-627 ger DE-627 rakwb eng QA1-939 T57-57.97 Zhenyu Yang verfasserin aut Visual Question Answering reasoning with external knowledge based on bimodal graph neural network 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods Lei Wu verfasserin aut Peian Wen verfasserin aut Peng Chen verfasserin aut In Electronic Research Archive AIMS Press, 2022 31(2023), 4, Seite 1948-1965 (DE-627)1835128319 26881594 nnns volume:31 year:2023 number:4 pages:1948-1965 https://doi.org/10.3934/era.2023100 kostenfrei https://doaj.org/article/78a0bf51799b43a5aee066699958a729 kostenfrei https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML kostenfrei https://doaj.org/toc/2688-1594 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 31 2023 4 1948-1965 |
allfieldsSound |
10.3934/era.2023100 doi (DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 DE-627 ger DE-627 rakwb eng QA1-939 T57-57.97 Zhenyu Yang verfasserin aut Visual Question Answering reasoning with external knowledge based on bimodal graph neural network 2023 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods Lei Wu verfasserin aut Peian Wen verfasserin aut Peng Chen verfasserin aut In Electronic Research Archive AIMS Press, 2022 31(2023), 4, Seite 1948-1965 (DE-627)1835128319 26881594 nnns volume:31 year:2023 number:4 pages:1948-1965 https://doi.org/10.3934/era.2023100 kostenfrei https://doaj.org/article/78a0bf51799b43a5aee066699958a729 kostenfrei https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML kostenfrei https://doaj.org/toc/2688-1594 Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 AR 31 2023 4 1948-1965 |
language |
English |
source |
In Electronic Research Archive 31(2023), 4, Seite 1948-1965 volume:31 year:2023 number:4 pages:1948-1965 |
sourceStr |
In Electronic Research Archive 31(2023), 4, Seite 1948-1965 volume:31 year:2023 number:4 pages:1948-1965 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs Mathematics Applied mathematics. Quantitative methods |
isfreeaccess_bool |
true |
container_title |
Electronic Research Archive |
authorswithroles_txt_mv |
Zhenyu Yang @@aut@@ Lei Wu @@aut@@ Peian Wen @@aut@@ Peng Chen @@aut@@ |
publishDateDaySort_date |
2023-01-01T00:00:00Z |
hierarchy_top_id |
1835128319 |
id |
DOAJ090118413 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ090118413</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230526104517.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230526s2023 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3934/era.2023100</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ090118413</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ78a0bf51799b43a5aee066699958a729</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA1-939</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T57-57.97</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Zhenyu Yang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Visual Question Answering reasoning with external knowledge based on bimodal graph neural network</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">visual question answering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">external knowledge</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bimodal fusion</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">pre-trained language models</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">knowledge graphs</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Mathematics</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Applied mathematics. Quantitative methods</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Lei Wu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Peian Wen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Peng Chen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Electronic Research Archive</subfield><subfield code="d">AIMS Press, 2022</subfield><subfield code="g">31(2023), 4, Seite 1948-1965</subfield><subfield code="w">(DE-627)1835128319</subfield><subfield code="x">26881594</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:31</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:4</subfield><subfield code="g">pages:1948-1965</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3934/era.2023100</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/78a0bf51799b43a5aee066699958a729</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2688-1594</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">31</subfield><subfield code="j">2023</subfield><subfield code="e">4</subfield><subfield code="h">1948-1965</subfield></datafield></record></collection>
|
callnumber-first |
Q - Science |
author |
Zhenyu Yang |
spellingShingle |
Zhenyu Yang misc QA1-939 misc T57-57.97 misc visual question answering misc external knowledge misc bimodal fusion misc pre-trained language models misc knowledge graphs misc Mathematics misc Applied mathematics. Quantitative methods Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
authorStr |
Zhenyu Yang |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)1835128319 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
QA1-939 |
illustrated |
Not Illustrated |
issn |
26881594 |
topic_title |
QA1-939 T57-57.97 Visual Question Answering reasoning with external knowledge based on bimodal graph neural network visual question answering external knowledge bimodal fusion pre-trained language models knowledge graphs |
topic |
misc QA1-939 misc T57-57.97 misc visual question answering misc external knowledge misc bimodal fusion misc pre-trained language models misc knowledge graphs misc Mathematics misc Applied mathematics. Quantitative methods |
topic_unstemmed |
misc QA1-939 misc T57-57.97 misc visual question answering misc external knowledge misc bimodal fusion misc pre-trained language models misc knowledge graphs misc Mathematics misc Applied mathematics. Quantitative methods |
topic_browse |
misc QA1-939 misc T57-57.97 misc visual question answering misc external knowledge misc bimodal fusion misc pre-trained language models misc knowledge graphs misc Mathematics misc Applied mathematics. Quantitative methods |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Electronic Research Archive |
hierarchy_parent_id |
1835128319 |
hierarchy_top_title |
Electronic Research Archive |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)1835128319 |
title |
Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
ctrlnum |
(DE-627)DOAJ090118413 (DE-599)DOAJ78a0bf51799b43a5aee066699958a729 |
title_full |
Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
author_sort |
Zhenyu Yang |
journal |
Electronic Research Archive |
journalStr |
Electronic Research Archive |
callnumber-first-code |
Q |
lang_code |
eng |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2023 |
contenttype_str_mv |
txt |
container_start_page |
1948 |
author_browse |
Zhenyu Yang Lei Wu Peian Wen Peng Chen |
container_volume |
31 |
class |
QA1-939 T57-57.97 |
format_se |
Elektronische Aufsätze |
author-letter |
Zhenyu Yang |
doi_str_mv |
10.3934/era.2023100 |
author2-role |
verfasserin |
title_sort |
visual question answering reasoning with external knowledge based on bimodal graph neural network |
callnumber |
QA1-939 |
title_auth |
Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
abstract |
Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. |
abstractGer |
Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. |
abstract_unstemmed |
Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ GBV_ILN_20 GBV_ILN_22 GBV_ILN_23 GBV_ILN_24 GBV_ILN_31 GBV_ILN_39 GBV_ILN_40 GBV_ILN_60 GBV_ILN_62 GBV_ILN_63 GBV_ILN_65 GBV_ILN_69 GBV_ILN_70 GBV_ILN_73 GBV_ILN_95 GBV_ILN_105 GBV_ILN_110 GBV_ILN_151 GBV_ILN_161 GBV_ILN_170 GBV_ILN_213 GBV_ILN_230 GBV_ILN_285 GBV_ILN_293 GBV_ILN_370 GBV_ILN_602 GBV_ILN_2014 GBV_ILN_4012 GBV_ILN_4037 GBV_ILN_4112 GBV_ILN_4125 GBV_ILN_4126 GBV_ILN_4249 GBV_ILN_4305 GBV_ILN_4306 GBV_ILN_4307 GBV_ILN_4313 GBV_ILN_4322 GBV_ILN_4323 GBV_ILN_4324 GBV_ILN_4325 GBV_ILN_4326 GBV_ILN_4335 GBV_ILN_4338 GBV_ILN_4367 GBV_ILN_4700 |
container_issue |
4 |
title_short |
Visual Question Answering reasoning with external knowledge based on bimodal graph neural network |
url |
https://doi.org/10.3934/era.2023100 https://doaj.org/article/78a0bf51799b43a5aee066699958a729 https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML https://doaj.org/toc/2688-1594 |
remote_bool |
true |
author2 |
Lei Wu Peian Wen Peng Chen |
author2Str |
Lei Wu Peian Wen Peng Chen |
ppnlink |
1835128319 |
callnumber-subject |
QA - Mathematics |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.3934/era.2023100 |
callnumber-a |
QA1-939 |
up_date |
2024-07-04T01:58:26.445Z |
_version_ |
1803611858742018048 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">DOAJ090118413</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230526104517.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230526s2023 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.3934/era.2023100</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ090118413</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ78a0bf51799b43a5aee066699958a729</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA1-939</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T57-57.97</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">Zhenyu Yang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Visual Question Answering reasoning with external knowledge based on bimodal graph neural network</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2023</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Visual Question Answering (VQA) with external knowledge requires external knowledge and visual content to answer questions about images. The defect of existing VQA solutions is that they need to identify task-related information in the obtained pictures, questions, and knowledge graphs. It is necessary to properly fuse and embed the information between different modes identified, to reduce the noise and difficulty in cross-modality reasoning of VQA models. However, this process of rationally integrating information between different modes and joint reasoning to find relevant evidence to correctly predict the answer to the question still deserves further study. This paper proposes a bimodal Graph Neural Network model combining pre-trained Language Models and Knowledge Graphs (BIGNN-LM-KG). Researchers built the concepts graph by the images and questions concepts separately. In constructing the concept graph, we used the combined reasoning advantages of LM+KG. Specifically, use KG to jointly infer the images and question entity concepts to build a concept graph. Use LM to calculate the correlation score to screen the nodes and paths of the concept graph. Then, we form a visual graph from the visual and spatial features of the filtered image entities. We use the improved GNN to learn the representation of the two graphs and to predict the most likely answer by fusing the information of two different modality graphs using a modality fusion GNN. On the common dataset of VQA, the model we proposed obtains good experiment results. It also verifies the validity of each component in the model and the interpretability of the model.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">visual question answering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">external knowledge</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">bimodal fusion</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">pre-trained language models</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">knowledge graphs</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Mathematics</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Applied mathematics. Quantitative methods</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Lei Wu</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Peian Wen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="0" ind2=" "><subfield code="a">Peng Chen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Electronic Research Archive</subfield><subfield code="d">AIMS Press, 2022</subfield><subfield code="g">31(2023), 4, Seite 1948-1965</subfield><subfield code="w">(DE-627)1835128319</subfield><subfield code="x">26881594</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:31</subfield><subfield code="g">year:2023</subfield><subfield code="g">number:4</subfield><subfield code="g">pages:1948-1965</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.3934/era.2023100</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/78a0bf51799b43a5aee066699958a729</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://www.aimspress.com/article/doi/10.3934/era.2023100?viewType=HTML</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/2688-1594</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_20</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_23</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_31</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_39</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_40</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_60</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_63</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_65</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_69</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_73</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_95</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_105</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_110</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_151</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_161</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_170</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_213</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_230</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_285</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_293</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_370</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_602</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2014</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4037</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4112</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4125</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4126</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4249</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4305</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4306</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4313</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4322</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4325</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4326</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4335</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4338</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4367</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4700</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">31</subfield><subfield code="j">2023</subfield><subfield code="e">4</subfield><subfield code="h">1948-1965</subfield></datafield></record></collection>
|
score |
7.4021854 |