A word embedding-based approach to cross-lingual topic modeling
Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a link...
Ausführliche Beschreibung
Autor*in: |
Chang, Chia-Hsuan [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2021 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 |
---|
Übergeordnetes Werk: |
Enthalten in: Knowledge and information systems - Springer London, 2000, 63(2021), 6 vom: 24. Apr., Seite 1529-1555 |
---|---|
Übergeordnetes Werk: |
volume:63 ; year:2021 ; number:6 ; day:24 ; month:04 ; pages:1529-1555 |
Links: |
---|
DOI / URN: |
10.1007/s10115-021-01555-7 |
---|
Katalog-ID: |
OLC2125661241 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | OLC2125661241 | ||
003 | DE-627 | ||
005 | 20230505104325.0 | ||
007 | tu | ||
008 | 230505s2021 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10115-021-01555-7 |2 doi | |
035 | |a (DE-627)OLC2125661241 | ||
035 | |a (DE-He213)s10115-021-01555-7-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q VZ |
082 | 0 | 4 | |a 004 |q VZ |
084 | |a 06.74$jInformationssysteme |2 bkl | ||
084 | |a 54.64$jDatenbanken |2 bkl | ||
100 | 1 | |a Chang, Chia-Hsuan |e verfasserin |0 (orcid)0000-0001-9116-8244 |4 aut | |
245 | 1 | 0 | |a A word embedding-based approach to cross-lingual topic modeling |
264 | 1 | |c 2021 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 | ||
520 | |a Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. | ||
650 | 4 | |a Cross-language | |
650 | 4 | |a Cross-lingual topic model | |
650 | 4 | |a Cross-lingual word embedding | |
700 | 1 | |a Hwang, San-Yih |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Knowledge and information systems |d Springer London, 2000 |g 63(2021), 6 vom: 24. Apr., Seite 1529-1555 |w (DE-627)323971725 |w (DE-600)2036569-X |w (DE-576)9323971723 |x 0219-1377 |7 nnns |
773 | 1 | 8 | |g volume:63 |g year:2021 |g number:6 |g day:24 |g month:04 |g pages:1529-1555 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10115-021-01555-7 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
936 | b | k | |a 06.74$jInformationssysteme |q VZ |0 106415212 |0 (DE-625)106415212 |
936 | b | k | |a 54.64$jDatenbanken |q VZ |0 106410865 |0 (DE-625)106410865 |
951 | |a AR | ||
952 | |d 63 |j 2021 |e 6 |b 24 |c 04 |h 1529-1555 |
author_variant |
c h c chc s y h syh |
---|---|
matchkey_str |
article:02191377:2021----::wrebdigaeapoctcosig |
hierarchy_sort_str |
2021 |
bklnumber |
06.74$jInformationssysteme 54.64$jDatenbanken |
publishDate |
2021 |
allfields |
10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555 |
spelling |
10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555 |
allfields_unstemmed |
10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555 |
allfieldsGer |
10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555 |
allfieldsSound |
10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555 |
language |
English |
source |
Enthalten in Knowledge and information systems 63(2021), 6 vom: 24. Apr., Seite 1529-1555 volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 |
sourceStr |
Enthalten in Knowledge and information systems 63(2021), 6 vom: 24. Apr., Seite 1529-1555 volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Cross-language Cross-lingual topic model Cross-lingual word embedding |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Knowledge and information systems |
authorswithroles_txt_mv |
Chang, Chia-Hsuan @@aut@@ Hwang, San-Yih @@aut@@ |
publishDateDaySort_date |
2021-04-24T00:00:00Z |
hierarchy_top_id |
323971725 |
dewey-sort |
14 |
id |
OLC2125661241 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2125661241</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505104325.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230505s2021 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-021-01555-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2125661241</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-021-01555-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chang, Chia-Hsuan</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9116-8244</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A word embedding-based approach to cross-lingual topic modeling</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-language</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual topic model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual word embedding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hwang, San-Yih</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer London, 2000</subfield><subfield code="g">63(2021), 6 vom: 24. Apr., Seite 1529-1555</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:6</subfield><subfield code="g">day:24</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:1529-1555</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-021-01555-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2021</subfield><subfield code="e">6</subfield><subfield code="b">24</subfield><subfield code="c">04</subfield><subfield code="h">1529-1555</subfield></datafield></record></collection>
|
author |
Chang, Chia-Hsuan |
spellingShingle |
Chang, Chia-Hsuan ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding A word embedding-based approach to cross-lingual topic modeling |
authorStr |
Chang, Chia-Hsuan |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)323971725 |
format |
Article |
dewey-ones |
004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0219-1377 |
topic_title |
004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl A word embedding-based approach to cross-lingual topic modeling Cross-language Cross-lingual topic model Cross-lingual word embedding |
topic |
ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding |
topic_unstemmed |
ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding |
topic_browse |
ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Knowledge and information systems |
hierarchy_parent_id |
323971725 |
dewey-tens |
000 - Computer science, knowledge & systems |
hierarchy_top_title |
Knowledge and information systems |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 |
title |
A word embedding-based approach to cross-lingual topic modeling |
ctrlnum |
(DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p |
title_full |
A word embedding-based approach to cross-lingual topic modeling |
author_sort |
Chang, Chia-Hsuan |
journal |
Knowledge and information systems |
journalStr |
Knowledge and information systems |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2021 |
contenttype_str_mv |
txt |
container_start_page |
1529 |
author_browse |
Chang, Chia-Hsuan Hwang, San-Yih |
container_volume |
63 |
class |
004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl |
format_se |
Aufsätze |
author-letter |
Chang, Chia-Hsuan |
doi_str_mv |
10.1007/s10115-021-01555-7 |
normlink |
(ORCID)0000-0001-9116-8244 106415212 106410865 |
normlink_prefix_str_mv |
(orcid)0000-0001-9116-8244 106415212 (DE-625)106415212 106410865 (DE-625)106410865 |
dewey-full |
004 |
title_sort |
a word embedding-based approach to cross-lingual topic modeling |
title_auth |
A word embedding-based approach to cross-lingual topic modeling |
abstract |
Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 |
abstractGer |
Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 |
abstract_unstemmed |
Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB |
container_issue |
6 |
title_short |
A word embedding-based approach to cross-lingual topic modeling |
url |
https://doi.org/10.1007/s10115-021-01555-7 |
remote_bool |
false |
author2 |
Hwang, San-Yih |
author2Str |
Hwang, San-Yih |
ppnlink |
323971725 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10115-021-01555-7 |
up_date |
2024-07-04T04:27:31.854Z |
_version_ |
1803621238677962752 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2125661241</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505104325.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230505s2021 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-021-01555-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2125661241</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-021-01555-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chang, Chia-Hsuan</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9116-8244</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A word embedding-based approach to cross-lingual topic modeling</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-language</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual topic model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual word embedding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hwang, San-Yih</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer London, 2000</subfield><subfield code="g">63(2021), 6 vom: 24. Apr., Seite 1529-1555</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:6</subfield><subfield code="g">day:24</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:1529-1555</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-021-01555-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2021</subfield><subfield code="e">6</subfield><subfield code="b">24</subfield><subfield code="c">04</subfield><subfield code="h">1529-1555</subfield></datafield></record></collection>
|
score |
7.4020357 |