Multi-Shared Attention with Global and Local Pathways for Video Question Answering

Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diver...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei [verfasserIn]

Format:	E-Artikel
Sprache:	Chinesisch

Erschienen:	2021

Schlagwörter:	video question answering\|shared attention mechanism\|global and local pathways

Übergeordnetes Werk:	In: Jisuanji kexue - Editorial office of Computer Science, 2021, 48(2021), 8, Seite 145-149
Übergeordnetes Werk:	volume:48 ; year:2021 ; number:8 ; pages:145-149

Links:	Link aufrufen Link aufrufen Link aufrufen Journal toc

DOI / URN:	10.11896/jsjkx.200800207

Katalog-ID:	DOAJ077300378

Internformat


LEADER	01000caa a22002652 4500
001	DOAJ077300378
003	DE-627
005	20230309151305.0
007	cr uuu---uuuuu
008	230228s2021 xx \|\|\|\|\|o 00\| \|\|chi c
024	7		\|a 10.11896/jsjkx.200800207 \|2 doi
035			\|a (DE-627)DOAJ077300378
035			\|a (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a chi
050		0	\|a QA76.75-76.765
050		0	\|a T1-995
100	0		\|a WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei \|e verfasserin \|4 aut
245	1	0	\|a Multi-Shared Attention with Global and Local Pathways for Video Question Answering
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.
650		4	\|a video question answering\|shared attention mechanism\|global and local pathways
653		0	\|a Computer software
653		0	\|a Technology (General)
773	0	8	\|i In \|t Jisuanji kexue \|d Editorial office of Computer Science, 2021 \|g 48(2021), 8, Seite 145-149 \|w (DE-627)DOAJ078619254 \|x 1002137X \|7 nnns
773	1	8	\|g volume:48 \|g year:2021 \|g number:8 \|g pages:145-149
856	4	0	\|u https://doi.org/10.11896/jsjkx.200800207 \|z kostenfrei
856	4	0	\|u https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 \|z kostenfrei
856	4	0	\|u http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf \|z kostenfrei
856	4	2	\|u https://doaj.org/toc/1002-137X \|y Journal toc \|z kostenfrei
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_DOAJ
951			\|a AR
952			\|d 48 \|j 2021 \|e 8 \|h 145-149

Indexfelder

author_variant	l q h w y y s z z x l y w c l w lqhwyyszzxlywcl lqhwyyszzxlywclw
matchkey_str	article:1002137X:2021----::utsaeatninihlblnlclahasov
hierarchy_sort_str	2021
callnumber-subject-code	QA
publishDate	2021
allfields	10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149
spelling	10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149
allfields_unstemmed	10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149
allfieldsGer	10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149
allfieldsSound	10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149
language	Chinese
source	In Jisuanji kexue 48(2021), 8, Seite 145-149 volume:48 year:2021 number:8 pages:145-149
sourceStr	In Jisuanji kexue 48(2021), 8, Seite 145-149 volume:48 year:2021 number:8 pages:145-149
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	video question answering\|shared attention mechanism\|global and local pathways Computer software Technology (General)
isfreeaccess_bool	true
container_title	Jisuanji kexue
authorswithroles_txt_mv	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei @@aut@@
publishDateDaySort_date	2021-01-01T00:00:00Z
hierarchy_top_id	DOAJ078619254
id	DOAJ077300378
language_de	chinesisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ077300378</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230309151305.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230228s2021 xx \|\|\|\|\|o 00\| \|\|chi c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.11896/jsjkx.200800207</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ077300378</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">chi</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.75-76.765</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T1-995</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Multi-Shared Attention with Global and Local Pathways for Video Question Answering</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">video question answering\|shared attention mechanism\|global and local pathways</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer software</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Technology (General)</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Jisuanji kexue</subfield><subfield code="d">Editorial office of Computer Science, 2021</subfield><subfield code="g">48(2021), 8, Seite 145-149</subfield><subfield code="w">(DE-627)DOAJ078619254</subfield><subfield code="x">1002137X</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:48</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:8</subfield><subfield code="g">pages:145-149</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.11896/jsjkx.200800207</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1002-137X</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">48</subfield><subfield code="j">2021</subfield><subfield code="e">8</subfield><subfield code="h">145-149</subfield></datafield></record></collection>
callnumber-first	Q - Science
author	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
spellingShingle	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei misc QA76.75-76.765 misc T1-995 misc video question answering\|shared attention mechanism\|global and local pathways misc Computer software misc Technology (General) Multi-Shared Attention with Global and Local Pathways for Video Question Answering
authorStr	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
ppnlink_with_tag_str_mv	@@773@@(DE-627)DOAJ078619254
format	electronic Article
delete_txt_mv	keep
author_role	aut
collection	DOAJ
remote_str	true
callnumber-label	QA76
illustrated	Not Illustrated
issn	1002137X
topic_title	QA76.75-76.765 T1-995 Multi-Shared Attention with Global and Local Pathways for Video Question Answering video question answering\|shared attention mechanism\|global and local pathways
topic	misc QA76.75-76.765 misc T1-995 misc video question answering\|shared attention mechanism\|global and local pathways misc Computer software misc Technology (General)
topic_unstemmed	misc QA76.75-76.765 misc T1-995 misc video question answering\|shared attention mechanism\|global and local pathways misc Computer software misc Technology (General)
topic_browse	misc QA76.75-76.765 misc T1-995 misc video question answering\|shared attention mechanism\|global and local pathways misc Computer software misc Technology (General)
format_facet	Elektronische Aufsätze Aufsätze Elektronische Ressource
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	cr
hierarchy_parent_title	Jisuanji kexue
hierarchy_parent_id	DOAJ078619254
hierarchy_top_title	Jisuanji kexue
isfreeaccess_txt	true
familylinks_str_mv	(DE-627)DOAJ078619254
title	Multi-Shared Attention with Global and Local Pathways for Video Question Answering
ctrlnum	(DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05
title_full	Multi-Shared Attention with Global and Local Pathways for Video Question Answering
author_sort	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
journal	Jisuanji kexue
journalStr	Jisuanji kexue
callnumber-first-code	Q
lang_code	chi
isOA_bool	true
recordtype	marc
publishDateSort	2021
contenttype_str_mv	txt
container_start_page	145
author_browse	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
container_volume	48
class	QA76.75-76.765 T1-995
format_se	Elektronische Aufsätze
author-letter	WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
doi_str_mv	10.11896/jsjkx.200800207
title_sort	multi-shared attention with global and local pathways for video question answering
callnumber	QA76.75-76.765
title_auth	Multi-Shared Attention with Global and Local Pathways for Video Question Answering
abstract	Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.
abstractGer	Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.
abstract_unstemmed	Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ
container_issue	8
title_short	Multi-Shared Attention with Global and Local Pathways for Video Question Answering
url	https://doi.org/10.11896/jsjkx.200800207 https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf https://doaj.org/toc/1002-137X
remote_bool	true
ppnlink	DOAJ078619254
callnumber-subject	QA - Mathematics
mediatype_str_mv	c
isOA_txt	true
hochschulschrift_bool	false
doi_str	10.11896/jsjkx.200800207
callnumber-a	QA76.75-76.765
up_date	2024-07-04T00:41:22.246Z
_version_	1803607009916878848
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ077300378</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230309151305.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230228s2021 xx \|\|\|\|\|o 00\| \|\|chi c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.11896/jsjkx.200800207</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ077300378</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">chi</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.75-76.765</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T1-995</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Multi-Shared Attention with Global and Local Pathways for Video Question Answering</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">video question answering\|shared attention mechanism\|global and local pathways</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer software</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Technology (General)</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Jisuanji kexue</subfield><subfield code="d">Editorial office of Computer Science, 2021</subfield><subfield code="g">48(2021), 8, Seite 145-149</subfield><subfield code="w">(DE-627)DOAJ078619254</subfield><subfield code="x">1002137X</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:48</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:8</subfield><subfield code="g">pages:145-149</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.11896/jsjkx.200800207</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1002-137X</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">48</subfield><subfield code="j">2021</subfield><subfield code="e">8</subfield><subfield code="h">145-149</subfield></datafield></record></collection>
score	7.399686

Nicht das Richtige dabei?

Schreiben Sie uns!

Multi-Shared Attention with Global and Local Pathways for Video Question Answering

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?