A word embedding-based approach to cross-lingual topic modeling

Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a link...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Chang, Chia-Hsuan [verfasserIn] Hwang, San-Yih

Format:	Artikel
Sprache:	Englisch

Erschienen:	2021

Schlagwörter:	Cross-language Cross-lingual topic model Cross-lingual word embedding

Anmerkung:	© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021

Übergeordnetes Werk:	Enthalten in: Knowledge and information systems - Springer London, 2000, 63(2021), 6 vom: 24. Apr., Seite 1529-1555
Übergeordnetes Werk:	volume:63 ; year:2021 ; number:6 ; day:24 ; month:04 ; pages:1529-1555

Links:	Volltext

DOI / URN:	10.1007/s10115-021-01555-7

Katalog-ID:	OLC2125661241

Internformat


LEADER	01000naa a22002652 4500
001	OLC2125661241
003	DE-627
005	20230505104325.0
007	tu
008	230505s2021 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10115-021-01555-7 \|2 doi
035			\|a (DE-627)OLC2125661241
035			\|a (DE-He213)s10115-021-01555-7-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|q VZ
082	0	4	\|a 004 \|q VZ
084			\|a 06.74$jInformationssysteme \|2 bkl
084			\|a 54.64$jDatenbanken \|2 bkl
100	1		\|a Chang, Chia-Hsuan \|e verfasserin \|0 (orcid)0000-0001-9116-8244 \|4 aut
245	1	0	\|a A word embedding-based approach to cross-lingual topic modeling
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
520			\|a Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification.
650		4	\|a Cross-language
650		4	\|a Cross-lingual topic model
650		4	\|a Cross-lingual word embedding
700	1		\|a Hwang, San-Yih \|4 aut
773	0	8	\|i Enthalten in \|t Knowledge and information systems \|d Springer London, 2000 \|g 63(2021), 6 vom: 24. Apr., Seite 1529-1555 \|w (DE-627)323971725 \|w (DE-600)2036569-X \|w (DE-576)9323971723 \|x 0219-1377 \|7 nnns
773	1	8	\|g volume:63 \|g year:2021 \|g number:6 \|g day:24 \|g month:04 \|g pages:1529-1555
856	4	1	\|u https://doi.org/10.1007/s10115-021-01555-7 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-BUB
936	b	k	\|a 06.74$jInformationssysteme \|q VZ \|0 106415212 \|0 (DE-625)106415212
936	b	k	\|a 54.64$jDatenbanken \|q VZ \|0 106410865 \|0 (DE-625)106410865
951			\|a AR
952			\|d 63 \|j 2021 \|e 6 \|b 24 \|c 04 \|h 1529-1555

Indexfelder

author_variant	c h c chc s y h syh
matchkey_str	article:02191377:2021----::wrebdigaeapoctcosig
hierarchy_sort_str	2021
bklnumber	06.74$jInformationssysteme 54.64$jDatenbanken
publishDate	2021
allfields	10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555
spelling	10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555
allfields_unstemmed	10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555
allfieldsGer	10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555
allfieldsSound	10.1007/s10115-021-01555-7 doi (DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Chang, Chia-Hsuan verfasserin (orcid)0000-0001-9116-8244 aut A word embedding-based approach to cross-lingual topic modeling 2021 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. Cross-language Cross-lingual topic model Cross-lingual word embedding Hwang, San-Yih aut Enthalten in Knowledge and information systems Springer London, 2000 63(2021), 6 vom: 24. Apr., Seite 1529-1555 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555 https://doi.org/10.1007/s10115-021-01555-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 63 2021 6 24 04 1529-1555
language	English
source	Enthalten in Knowledge and information systems 63(2021), 6 vom: 24. Apr., Seite 1529-1555 volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555
sourceStr	Enthalten in Knowledge and information systems 63(2021), 6 vom: 24. Apr., Seite 1529-1555 volume:63 year:2021 number:6 day:24 month:04 pages:1529-1555
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Cross-language Cross-lingual topic model Cross-lingual word embedding
dewey-raw	004
isfreeaccess_bool	false
container_title	Knowledge and information systems
authorswithroles_txt_mv	Chang, Chia-Hsuan @@aut@@ Hwang, San-Yih @@aut@@
publishDateDaySort_date	2021-04-24T00:00:00Z
hierarchy_top_id	323971725
dewey-sort	14
id	OLC2125661241
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2125661241</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505104325.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230505s2021 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-021-01555-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2125661241</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-021-01555-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chang, Chia-Hsuan</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9116-8244</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A word embedding-based approach to cross-lingual topic modeling</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-language</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual topic model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual word embedding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hwang, San-Yih</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer London, 2000</subfield><subfield code="g">63(2021), 6 vom: 24. Apr., Seite 1529-1555</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:6</subfield><subfield code="g">day:24</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:1529-1555</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-021-01555-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2021</subfield><subfield code="e">6</subfield><subfield code="b">24</subfield><subfield code="c">04</subfield><subfield code="h">1529-1555</subfield></datafield></record></collection>
author	Chang, Chia-Hsuan
spellingShingle	Chang, Chia-Hsuan ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding A word embedding-based approach to cross-lingual topic modeling
authorStr	Chang, Chia-Hsuan
ppnlink_with_tag_str_mv	@@773@@(DE-627)323971725
format	Article
dewey-ones	004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0219-1377
topic_title	004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl A word embedding-based approach to cross-lingual topic modeling Cross-language Cross-lingual topic model Cross-lingual word embedding
topic	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding
topic_unstemmed	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding
topic_browse	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Cross-language misc Cross-lingual topic model misc Cross-lingual word embedding
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Knowledge and information systems
hierarchy_parent_id	323971725
dewey-tens	000 - Computer science, knowledge & systems
hierarchy_top_title	Knowledge and information systems
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723
title	A word embedding-based approach to cross-lingual topic modeling
ctrlnum	(DE-627)OLC2125661241 (DE-He213)s10115-021-01555-7-p
title_full	A word embedding-based approach to cross-lingual topic modeling
author_sort	Chang, Chia-Hsuan
journal	Knowledge and information systems
journalStr	Knowledge and information systems
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2021
contenttype_str_mv	txt
container_start_page	1529
author_browse	Chang, Chia-Hsuan Hwang, San-Yih
container_volume	63
class	004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl
format_se	Aufsätze
author-letter	Chang, Chia-Hsuan
doi_str_mv	10.1007/s10115-021-01555-7
normlink	(ORCID)0000-0001-9116-8244 106415212 106410865
normlink_prefix_str_mv	(orcid)0000-0001-9116-8244 106415212 (DE-625)106415212 106410865 (DE-625)106410865
dewey-full	004
title_sort	a word embedding-based approach to cross-lingual topic modeling
title_auth	A word embedding-based approach to cross-lingual topic modeling
abstract	Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
abstractGer	Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
abstract_unstemmed	Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB
container_issue	6
title_short	A word embedding-based approach to cross-lingual topic modeling
url	https://doi.org/10.1007/s10115-021-01555-7
remote_bool	false
author2	Hwang, San-Yih
author2Str	Hwang, San-Yih
ppnlink	323971725
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10115-021-01555-7
up_date	2024-07-04T04:27:31.854Z
_version_	1803621238677962752
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2125661241</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230505104325.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230505s2021 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-021-01555-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2125661241</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-021-01555-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chang, Chia-Hsuan</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0001-9116-8244</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A word embedding-based approach to cross-lingual topic modeling</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2021</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources (e.g., a parallel corpus), which is hard to come by in many real cases. Some works only require a translation dictionary as a linkage between languages; however, when given an inappropriate dictionary (e.g., small coverage of dictionary), the cross-lingual topic model would shrink to a monolingual topic model and generate less diversified topics. Therefore, it is imperative to investigate a cross-lingual topic model requiring fewer bilingual resources. Recently, some space-mapping techniques have been proposed to help align multiple word embedding of different languages into a quality cross-lingual word embedding by referring to a small number of translation pairs. This work proposes a cross-lingual topic model, called Cb-CLTM, which incorporates with cross-lingual word embedding. To leverage the power of word semantics and the linkage between languages from the cross-lingual word embedding, the Cb-CLTM considers each word as a continuous embedding vector rather than a discrete word type. The experiments demonstrate that, when cross-lingual word space exhibits strong isomorphism, Cb-CLTM can generate more coherent topics with higher diversity and induce better representations of documents across languages for further tasks such as cross-lingual document clustering and classification. When the cross-lingual word space is less isomorphic, Cb-CLTM generates less coherent topics yet still prevails in topic diversity and document classification.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-language</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual topic model</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cross-lingual word embedding</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hwang, San-Yih</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer London, 2000</subfield><subfield code="g">63(2021), 6 vom: 24. Apr., Seite 1529-1555</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:63</subfield><subfield code="g">year:2021</subfield><subfield code="g">number:6</subfield><subfield code="g">day:24</subfield><subfield code="g">month:04</subfield><subfield code="g">pages:1529-1555</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-021-01555-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">63</subfield><subfield code="j">2021</subfield><subfield code="e">6</subfield><subfield code="b">24</subfield><subfield code="c">04</subfield><subfield code="h">1529-1555</subfield></datafield></record></collection>
score	7.4020357

Nicht das Richtige dabei?

Schreiben Sie uns!

A word embedding-based approach to cross-lingual topic modeling

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?