Multi-Shared Attention with Global and Local Pathways for Video Question Answering
Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diver...
Ausführliche Beschreibung
Autor*in: |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Chinesisch |
Erschienen: |
2021 |
---|
Schlagwörter: |
video question answering|shared attention mechanism|global and local pathways |
---|
Übergeordnetes Werk: |
In: Jisuanji kexue - Editorial office of Computer Science, 2021, 48(2021), 8, Seite 145-149 |
---|---|
Übergeordnetes Werk: |
volume:48 ; year:2021 ; number:8 ; pages:145-149 |
Links: |
---|
DOI / URN: |
10.11896/jsjkx.200800207 |
---|
Katalog-ID: |
DOAJ077300378 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | DOAJ077300378 | ||
003 | DE-627 | ||
005 | 20230309151305.0 | ||
007 | cr uuu---uuuuu | ||
008 | 230228s2021 xx |||||o 00| ||chi c | ||
024 | 7 | |a 10.11896/jsjkx.200800207 |2 doi | |
035 | |a (DE-627)DOAJ077300378 | ||
035 | |a (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a chi | ||
050 | 0 | |a QA76.75-76.765 | |
050 | 0 | |a T1-995 | |
100 | 0 | |a WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |e verfasserin |4 aut | |
245 | 1 | 0 | |a Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a Computermedien |b c |2 rdamedia | ||
338 | |a Online-Ressource |b cr |2 rdacarrier | ||
520 | |a Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. | ||
650 | 4 | |a video question answering|shared attention mechanism|global and local pathways | |
653 | 0 | |a Computer software | |
653 | 0 | |a Technology (General) | |
773 | 0 | 8 | |i In |t Jisuanji kexue |d Editorial office of Computer Science, 2021 |g 48(2021), 8, Seite 145-149 |w (DE-627)DOAJ078619254 |x 1002137X |7 nnns |
773 | 1 | 8 | |g volume:48 |g year:2021 |g number:8 |g pages:145-149 |
856 | 4 | 0 | |u https://doi.org/10.11896/jsjkx.200800207 |z kostenfrei |
856 | 4 | 0 | |u https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 |z kostenfrei |
856 | 4 | 0 | |u http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf |z kostenfrei |
856 | 4 | 2 | |u https://doaj.org/toc/1002-137X |y Journal toc |z kostenfrei |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_DOAJ | ||
951 | |a AR | ||
952 | |d 48 |j 2021 |e 8 |h 145-149 |
author_variant |
l q h w y y s z z x l y w c l w lqhwyyszzxlywcl lqhwyyszzxlywclw |
---|---|
matchkey_str |
article:1002137X:2021----::utsaeatninihlblnlclahasov |
hierarchy_sort_str |
2021 |
callnumber-subject-code |
QA |
publishDate |
2021 |
allfields |
10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149 |
spelling |
10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149 |
allfields_unstemmed |
10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149 |
allfieldsGer |
10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149 |
allfieldsSound |
10.11896/jsjkx.200800207 doi (DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 DE-627 ger DE-627 rakwb chi QA76.75-76.765 T1-995 WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei verfasserin aut Multi-Shared Attention with Global and Local Pathways for Video Question Answering 2021 Text txt rdacontent Computermedien c rdamedia Online-Ressource cr rdacarrier Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) In Jisuanji kexue Editorial office of Computer Science, 2021 48(2021), 8, Seite 145-149 (DE-627)DOAJ078619254 1002137X nnns volume:48 year:2021 number:8 pages:145-149 https://doi.org/10.11896/jsjkx.200800207 kostenfrei https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 kostenfrei http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf kostenfrei https://doaj.org/toc/1002-137X Journal toc kostenfrei GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ AR 48 2021 8 145-149 |
language |
Chinese |
source |
In Jisuanji kexue 48(2021), 8, Seite 145-149 volume:48 year:2021 number:8 pages:145-149 |
sourceStr |
In Jisuanji kexue 48(2021), 8, Seite 145-149 volume:48 year:2021 number:8 pages:145-149 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
video question answering|shared attention mechanism|global and local pathways Computer software Technology (General) |
isfreeaccess_bool |
true |
container_title |
Jisuanji kexue |
authorswithroles_txt_mv |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei @@aut@@ |
publishDateDaySort_date |
2021-01-01T00:00:00Z |
hierarchy_top_id |
DOAJ078619254 |
id |
DOAJ077300378 |
language_de |
chinesisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ077300378</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230309151305.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230228s2021 xx |||||o 00| ||chi c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.11896/jsjkx.200800207</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ077300378</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">chi</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.75-76.765</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T1-995</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Multi-Shared Attention with Global and Local Pathways for Video Question Answering</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">video question answering|shared attention mechanism|global and local pathways</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer software</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Technology (General)</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Jisuanji kexue</subfield><subfield code="d">Editorial office of Computer Science, 2021</subfield><subfield code="g">48(2021), 8, Seite 145-149</subfield><subfield code="w">(DE-627)DOAJ078619254</subfield><subfield code="x">1002137X</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:48</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:8</subfield><subfield code="g">pages:145-149</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.11896/jsjkx.200800207</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1002-137X</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">48</subfield><subfield code="j">2021</subfield><subfield code="e">8</subfield><subfield code="h">145-149</subfield></datafield></record></collection>
|
callnumber-first |
Q - Science |
author |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |
spellingShingle |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei misc QA76.75-76.765 misc T1-995 misc video question answering|shared attention mechanism|global and local pathways misc Computer software misc Technology (General) Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
authorStr |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)DOAJ078619254 |
format |
electronic Article |
delete_txt_mv |
keep |
author_role |
aut |
collection |
DOAJ |
remote_str |
true |
callnumber-label |
QA76 |
illustrated |
Not Illustrated |
issn |
1002137X |
topic_title |
QA76.75-76.765 T1-995 Multi-Shared Attention with Global and Local Pathways for Video Question Answering video question answering|shared attention mechanism|global and local pathways |
topic |
misc QA76.75-76.765 misc T1-995 misc video question answering|shared attention mechanism|global and local pathways misc Computer software misc Technology (General) |
topic_unstemmed |
misc QA76.75-76.765 misc T1-995 misc video question answering|shared attention mechanism|global and local pathways misc Computer software misc Technology (General) |
topic_browse |
misc QA76.75-76.765 misc T1-995 misc video question answering|shared attention mechanism|global and local pathways misc Computer software misc Technology (General) |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
cr |
hierarchy_parent_title |
Jisuanji kexue |
hierarchy_parent_id |
DOAJ078619254 |
hierarchy_top_title |
Jisuanji kexue |
isfreeaccess_txt |
true |
familylinks_str_mv |
(DE-627)DOAJ078619254 |
title |
Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
ctrlnum |
(DE-627)DOAJ077300378 (DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05 |
title_full |
Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
author_sort |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |
journal |
Jisuanji kexue |
journalStr |
Jisuanji kexue |
callnumber-first-code |
Q |
lang_code |
chi |
isOA_bool |
true |
recordtype |
marc |
publishDateSort |
2021 |
contenttype_str_mv |
txt |
container_start_page |
145 |
author_browse |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |
container_volume |
48 |
class |
QA76.75-76.765 T1-995 |
format_se |
Elektronische Aufsätze |
author-letter |
WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei |
doi_str_mv |
10.11896/jsjkx.200800207 |
title_sort |
multi-shared attention with global and local pathways for video question answering |
callnumber |
QA76.75-76.765 |
title_auth |
Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
abstract |
Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. |
abstractGer |
Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. |
abstract_unstemmed |
Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_DOAJ |
container_issue |
8 |
title_short |
Multi-Shared Attention with Global and Local Pathways for Video Question Answering |
url |
https://doi.org/10.11896/jsjkx.200800207 https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05 http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf https://doaj.org/toc/1002-137X |
remote_bool |
true |
ppnlink |
DOAJ078619254 |
callnumber-subject |
QA - Mathematics |
mediatype_str_mv |
c |
isOA_txt |
true |
hochschulschrift_bool |
false |
doi_str |
10.11896/jsjkx.200800207 |
callnumber-a |
QA76.75-76.765 |
up_date |
2024-07-04T00:41:22.246Z |
_version_ |
1803607009916878848 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">DOAJ077300378</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230309151305.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">230228s2021 xx |||||o 00| ||chi c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.11896/jsjkx.200800207</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)DOAJ077300378</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)DOAJ70e3f302dc724b29af8a7094be60fa05</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">chi</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.75-76.765</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">T1-995</subfield></datafield><datafield tag="100" ind1="0" ind2=" "><subfield code="a">WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Multi-Shared Attention with Global and Local Pathways for Video Question Answering</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">Computermedien</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Online-Ressource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Video question answering is a challenging task of significant importance toward visual understanding.However,current visual question answering (VQA) methods mainly focus on a single static image,which is distinct from the sequential visual data we faced in the real world.In addition,due to the diversity of textual questions,the VideoQA task has to deal with various visual features to obtain the answers.This paper presents a multi-shared attention network by utilizing local and global frame-level visualinformation for video question answering (VideoQA).Specifically,a two-pathway model is proposed to capture the global and local frame-level features with different frame rates.The two pathways are fused together with the multi-shared attention by sharing the same attention funtion.Extensive experiments are conducted on Tianchi VideoQA dataset to validate the effectiveness of the proposed method.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">video question answering|shared attention mechanism|global and local pathways</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Computer software</subfield></datafield><datafield tag="653" ind1=" " ind2="0"><subfield code="a">Technology (General)</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">In</subfield><subfield code="t">Jisuanji kexue</subfield><subfield code="d">Editorial office of Computer Science, 2021</subfield><subfield code="g">48(2021), 8, Seite 145-149</subfield><subfield code="w">(DE-627)DOAJ078619254</subfield><subfield code="x">1002137X</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:48</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:8</subfield><subfield code="g">pages:145-149</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.11896/jsjkx.200800207</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doaj.org/article/70e3f302dc724b29af8a7094be60fa05</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-145.pdf</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">https://doaj.org/toc/1002-137X</subfield><subfield code="y">Journal toc</subfield><subfield code="z">kostenfrei</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_DOAJ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">48</subfield><subfield code="j">2021</subfield><subfield code="e">8</subfield><subfield code="h">145-149</subfield></datafield></record></collection>
|
score |
7.399686 |