Pre-trained multilevel fuse network based on vision-conditioned reasoning and bilinear attentions for medical image visual question answering

Abstract Current Medical Image Visual Question Answering (Med-VQA) models often tend to exploit language bias instead of learning the multimodal features from both vision and language, which often suffers from the sparse data and bad performance. In this paper, we propose a new pre-trained multileve...
Ausführliche Beschreibung

Gespeichert in:
Autor*in:

Cai, Linqin [verfasserIn]

Fang, Haodu

Li, Zhiqing

Format:

E-Artikel

Sprache:

Englisch

Erschienen:

2023

Schlagwörter:

Medical image visual question answering

Vision conditional reasoning

Contrastive language-image pre-training

Transfer learning

Anmerkung:

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Übergeordnetes Werk:

Enthalten in: The journal of supercomputing - Dordrecht [u.a.] : Springer Science + Business Media B.V, 1987, 79(2023), 12 vom: 29. März, Seite 13696-13723

Übergeordnetes Werk:

volume:79 ; year:2023 ; number:12 ; day:29 ; month:03 ; pages:13696-13723

Links:

Volltext

DOI / URN:

10.1007/s11227-023-05195-2

Katalog-ID:

SPR05196161X

Nicht das Richtige dabei?

Schreiben Sie uns!