A nonparametric model for online topic discovery with word embeddings

With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting som...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Chen, Junyang [verfasserIn] Gong, Zhiguo Liu, Weiwen

Format:	E-Artikel
Sprache:	Englisch

Erschienen:	2019transfer abstract

Umfang:	16

Übergeordnetes Werk:	Enthalten in: Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study - Petrruzziello, Carmelina ELSEVIER, 2013, an international journal, New York, NY
Übergeordnetes Werk:	volume:504 ; year:2019 ; pages:32-47 ; extent:16

Links:	Volltext

DOI / URN:	10.1016/j.ins.2019.07.048

Katalog-ID:	ELV047550503

Internformat


LEADER	01000caa a22002652 4500
001	ELV047550503
003	DE-627
005	20230626015927.0
007	cr uuu---uuuuu
008	191021s2019 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1016/j.ins.2019.07.048 \|2 doi
028	5	2	\|a GBV00000000000713.pica
035			\|a (DE-627)ELV047550503
035			\|a (ELSEVIER)S0020-0255(19)30654-1
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 610 \|q VZ
082	0	4	\|a 570 \|q VZ
084			\|a BIODIV \|q DE-30 \|2 fid
084			\|a 35.70 \|2 bkl
084			\|a 42.12 \|2 bkl
084			\|a 42.15 \|2 bkl
100	1		\|a Chen, Junyang \|e verfasserin \|4 aut
245	1	0	\|a A nonparametric model for online topic discovery with word embeddings
264		1	\|c 2019transfer abstract
300			\|a 16
336			\|a nicht spezifiziert \|b zzz \|2 rdacontent
337			\|a nicht spezifiziert \|b z \|2 rdamedia
338			\|a nicht spezifiziert \|b zu \|2 rdacarrier
520			\|a With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
520			\|a With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
700	1		\|a Gong, Zhiguo \|4 oth
700	1		\|a Liu, Weiwen \|4 oth
773	0	8	\|i Enthalten in \|n Elsevier Science Inc \|a Petrruzziello, Carmelina ELSEVIER \|t Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study \|d 2013 \|d an international journal \|g New York, NY \|w (DE-627)ELV011843691
773	1	8	\|g volume:504 \|g year:2019 \|g pages:32-47 \|g extent:16
856	4	0	\|u https://doi.org/10.1016/j.ins.2019.07.048 \|3 Volltext
912			\|a GBV_USEFLAG_U
912			\|a GBV_ELV
912			\|a SYSFLAG_U
912			\|a FID-BIODIV
912			\|a SSG-OLC-PHA
936	b	k	\|a 35.70 \|j Biochemie: Allgemeines \|q VZ
936	b	k	\|a 42.12 \|j Biophysik \|q VZ
936	b	k	\|a 42.15 \|j Zellbiologie \|q VZ
951			\|a AR
952			\|d 504 \|j 2019 \|h 32-47 \|g 16

Indexfelder

author_variant	j c jc
matchkey_str	chenjunyanggongzhiguoliuweiwen:2019----:nnaaercoefrnieoidsoeyi
hierarchy_sort_str	2019transfer abstract
bklnumber	35.70 42.12 42.15
publishDate	2019
allfields	10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16
spelling	10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16
allfields_unstemmed	10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16
allfieldsGer	10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16
allfieldsSound	10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16
language	English
source	Enthalten in Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study New York, NY volume:504 year:2019 pages:32-47 extent:16
sourceStr	Enthalten in Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study New York, NY volume:504 year:2019 pages:32-47 extent:16
format_phy_str_mv	Article
bklname	Biochemie: Allgemeines Biophysik Zellbiologie
institution	findex.gbv.de
dewey-raw	610
isfreeaccess_bool	false
container_title	Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study
authorswithroles_txt_mv	Chen, Junyang @@aut@@ Gong, Zhiguo @@oth@@ Liu, Weiwen @@oth@@
publishDateDaySort_date	2019-01-01T00:00:00Z
hierarchy_top_id	ELV011843691
dewey-sort	3610
id	ELV047550503
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV047550503</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626015927.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191021s2019 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.ins.2019.07.048</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBV00000000000713.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV047550503</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0020-0255(19)30654-1</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="q">DE-30</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.70</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.12</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.15</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chen, Junyang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A nonparametric model for online topic discovery with word embeddings</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">16</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gong, Zhiguo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Weiwen</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier Science Inc</subfield><subfield code="a">Petrruzziello, Carmelina ELSEVIER</subfield><subfield code="t">Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study</subfield><subfield code="d">2013</subfield><subfield code="d">an international journal</subfield><subfield code="g">New York, NY</subfield><subfield code="w">(DE-627)ELV011843691</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:504</subfield><subfield code="g">year:2019</subfield><subfield code="g">pages:32-47</subfield><subfield code="g">extent:16</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.ins.2019.07.048</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.70</subfield><subfield code="j">Biochemie: Allgemeines</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.12</subfield><subfield code="j">Biophysik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.15</subfield><subfield code="j">Zellbiologie</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">504</subfield><subfield code="j">2019</subfield><subfield code="h">32-47</subfield><subfield code="g">16</subfield></datafield></record></collection>
author	Chen, Junyang
spellingShingle	Chen, Junyang ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15 A nonparametric model for online topic discovery with word embeddings
authorStr	Chen, Junyang
ppnlink_with_tag_str_mv	@@773@@(DE-627)ELV011843691
format	electronic Article
dewey-ones	610 - Medicine & health 570 - Life sciences; biology
delete_txt_mv	keep
author_role	aut
collection	elsevier
remote_str	true
illustrated	Not Illustrated
topic_title	610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl A nonparametric model for online topic discovery with word embeddings
topic	ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15
topic_unstemmed	ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15
topic_browse	ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15
format_facet	Elektronische Aufsätze Aufsätze Elektronische Ressource
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	zu
author2_variant	z g zg w l wl
hierarchy_parent_title	Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study
hierarchy_parent_id	ELV011843691
dewey-tens	610 - Medicine & health 570 - Life sciences; biology
hierarchy_top_title	Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)ELV011843691
title	A nonparametric model for online topic discovery with word embeddings
ctrlnum	(DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1
title_full	A nonparametric model for online topic discovery with word embeddings
author_sort	Chen, Junyang
journal	Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study
journalStr	Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study
lang_code	eng
isOA_bool	false
dewey-hundreds	600 - Technology 500 - Science
recordtype	marc
publishDateSort	2019
contenttype_str_mv	zzz
container_start_page	32
author_browse	Chen, Junyang
container_volume	504
physical	16
class	610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl
format_se	Elektronische Aufsätze
author-letter	Chen, Junyang
doi_str_mv	10.1016/j.ins.2019.07.048
dewey-full	610 570
title_sort	a nonparametric model for online topic discovery with word embeddings
title_auth	A nonparametric model for online topic discovery with word embeddings
abstract	With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
abstractGer	With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
abstract_unstemmed	With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.
collection_details	GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA
title_short	A nonparametric model for online topic discovery with word embeddings
url	https://doi.org/10.1016/j.ins.2019.07.048
remote_bool	true
author2	Gong, Zhiguo Liu, Weiwen
author2Str	Gong, Zhiguo Liu, Weiwen
ppnlink	ELV011843691
mediatype_str_mv	z
isOA_txt	false
hochschulschrift_bool	false
author2_role	oth oth
doi_str	10.1016/j.ins.2019.07.048
up_date	2024-07-06T23:11:41.115Z
_version_	1803873158291259392
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV047550503</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626015927.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191021s2019 xx \|\|\|\|\|o 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.ins.2019.07.048</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBV00000000000713.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV047550503</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0020-0255(19)30654-1</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="q">DE-30</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.70</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.12</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.15</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chen, Junyang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A nonparametric model for online topic discovery with word embeddings</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">16</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gong, Zhiguo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Weiwen</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier Science Inc</subfield><subfield code="a">Petrruzziello, Carmelina ELSEVIER</subfield><subfield code="t">Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study</subfield><subfield code="d">2013</subfield><subfield code="d">an international journal</subfield><subfield code="g">New York, NY</subfield><subfield code="w">(DE-627)ELV011843691</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:504</subfield><subfield code="g">year:2019</subfield><subfield code="g">pages:32-47</subfield><subfield code="g">extent:16</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.ins.2019.07.048</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.70</subfield><subfield code="j">Biochemie: Allgemeines</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.12</subfield><subfield code="j">Biophysik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.15</subfield><subfield code="j">Zellbiologie</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">504</subfield><subfield code="j">2019</subfield><subfield code="h">32-47</subfield><subfield code="g">16</subfield></datafield></record></collection>
score	7.399596

Nicht das Richtige dabei?

Schreiben Sie uns!

A nonparametric model for online topic discovery with word embeddings

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?