Hilfe beim Zugang
Neural topic-enhanced cross-lingual word embeddings for CLIR
Cross-lingual information retrieval (CLIR) methods have quickly made the transition from translation-based approaches to semantic-based approaches. In this paper, we examine the limitations of current unsupervised neural CLIR methods, especially those leveraging aligned cross-lingual word embedding...
Ausführliche Beschreibung
Cross-lingual information retrieval (CLIR) methods have quickly made the transition from translation-based approaches to semantic-based approaches. In this paper, we examine the limitations of current unsupervised neural CLIR methods, especially those leveraging aligned cross-lingual word embedding (CLWE) spaces. At the moment, CLWEs are normally constructed on the monolingual corpus of bilingual texts through an iterative induction process. Homonymy and polysemy have become major obstacles in this process. On the other hand, contextual text representation methods often fail to outperform static CLWE methods significantly for CLIR. We propose a method utilizing a novel neural generative model with Wasserstein autoencoders to learn neural topic-enhanced CLWEs for CLIR purposes. Our method requires minimal or no supervision at all. On the CLEF test collections, we perform a comparative evaluation of the state-of-the-art semantic CLWE methods along with our proposed method for neural CLIR tasks. We demonstrate that our method outperforms the existing CLWE methods and multilingual contextual text encoders. We also show that our proposed method obtains significant improvements over the CLWE methods based upon representative topical embeddings. Ausführliche Beschreibung