Hilfe beim Zugang
Sign language translation with hierarchical memorized context in question answering scenarios
Abstract Vision-based sign language translation (SLT) targets to translate sign language videos into understandable natural language sentences. Current SLT methods ignore the utilization of contextual information in specific dialogue scenarios, which may lead to incorrect translations that do not ma...
Ausführliche Beschreibung
Abstract Vision-based sign language translation (SLT) targets to translate sign language videos into understandable natural language sentences. Current SLT methods ignore the utilization of contextual information in specific dialogue scenarios, which may lead to incorrect translations that do not match the dialogue content. Accordingly, this work proposes a novel framework for SLT in the question answering scenarios, called SLQA, which attempts to learn contextual knowledge from multimodal QA pairs between the hearing and the deaf to improve the model reasoning capabilities of SLT. The SLQA framework is composed of two main components: One is to integrate local context under the guidance of semantic relevance within the QA pair, and the other is to excavate the hierarchical memorized context from a three-layer memory hierarchy, i.e., scenario, dialogue and cue memory, by exploiting the logical dependency between QA pairs. To facilitate SLQA research, we further contribute the SLQA dataset with abundant natural language and sign language QA pairs. Extensive experimental results and analysis of our method are reported on SLQA and four public benchmark datasets. With the proposed SLQA framework, we obtain a substantial improvement over previous state-of-the-art SLT methods, showing about 13.2 improvements for BLEU-4 on the SLQA test set, which demonstrates the effectiveness of our method. Ausführliche Beschreibung