A nonparametric model for online topic discovery with word embeddings
With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting som...
Ausführliche Beschreibung
Autor*in: |
Chen, Junyang [verfasserIn] |
---|
Format: |
E-Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2019transfer abstract |
---|
Umfang: |
16 |
---|
Übergeordnetes Werk: |
Enthalten in: Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study - Petrruzziello, Carmelina ELSEVIER, 2013, an international journal, New York, NY |
---|---|
Übergeordnetes Werk: |
volume:504 ; year:2019 ; pages:32-47 ; extent:16 |
Links: |
---|
DOI / URN: |
10.1016/j.ins.2019.07.048 |
---|
Katalog-ID: |
ELV047550503 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | ELV047550503 | ||
003 | DE-627 | ||
005 | 20230626015927.0 | ||
007 | cr uuu---uuuuu | ||
008 | 191021s2019 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1016/j.ins.2019.07.048 |2 doi | |
028 | 5 | 2 | |a GBV00000000000713.pica |
035 | |a (DE-627)ELV047550503 | ||
035 | |a (ELSEVIER)S0020-0255(19)30654-1 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 610 |q VZ |
082 | 0 | 4 | |a 570 |q VZ |
084 | |a BIODIV |q DE-30 |2 fid | ||
084 | |a 35.70 |2 bkl | ||
084 | |a 42.12 |2 bkl | ||
084 | |a 42.15 |2 bkl | ||
100 | 1 | |a Chen, Junyang |e verfasserin |4 aut | |
245 | 1 | 0 | |a A nonparametric model for online topic discovery with word embeddings |
264 | 1 | |c 2019transfer abstract | |
300 | |a 16 | ||
336 | |a nicht spezifiziert |b zzz |2 rdacontent | ||
337 | |a nicht spezifiziert |b z |2 rdamedia | ||
338 | |a nicht spezifiziert |b zu |2 rdacarrier | ||
520 | |a With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. | ||
520 | |a With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. | ||
700 | 1 | |a Gong, Zhiguo |4 oth | |
700 | 1 | |a Liu, Weiwen |4 oth | |
773 | 0 | 8 | |i Enthalten in |n Elsevier Science Inc |a Petrruzziello, Carmelina ELSEVIER |t Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |d 2013 |d an international journal |g New York, NY |w (DE-627)ELV011843691 |
773 | 1 | 8 | |g volume:504 |g year:2019 |g pages:32-47 |g extent:16 |
856 | 4 | 0 | |u https://doi.org/10.1016/j.ins.2019.07.048 |3 Volltext |
912 | |a GBV_USEFLAG_U | ||
912 | |a GBV_ELV | ||
912 | |a SYSFLAG_U | ||
912 | |a FID-BIODIV | ||
912 | |a SSG-OLC-PHA | ||
936 | b | k | |a 35.70 |j Biochemie: Allgemeines |q VZ |
936 | b | k | |a 42.12 |j Biophysik |q VZ |
936 | b | k | |a 42.15 |j Zellbiologie |q VZ |
951 | |a AR | ||
952 | |d 504 |j 2019 |h 32-47 |g 16 |
author_variant |
j c jc |
---|---|
matchkey_str |
chenjunyanggongzhiguoliuweiwen:2019----:nnaaercoefrnieoidsoeyi |
hierarchy_sort_str |
2019transfer abstract |
bklnumber |
35.70 42.12 42.15 |
publishDate |
2019 |
allfields |
10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16 |
spelling |
10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16 |
allfields_unstemmed |
10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16 |
allfieldsGer |
10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16 |
allfieldsSound |
10.1016/j.ins.2019.07.048 doi GBV00000000000713.pica (DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 DE-627 ger DE-627 rakwb eng 610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl Chen, Junyang verfasserin aut A nonparametric model for online topic discovery with word embeddings 2019transfer abstract 16 nicht spezifiziert zzz rdacontent nicht spezifiziert z rdamedia nicht spezifiziert zu rdacarrier With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. Gong, Zhiguo oth Liu, Weiwen oth Enthalten in Elsevier Science Inc Petrruzziello, Carmelina ELSEVIER Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study 2013 an international journal New York, NY (DE-627)ELV011843691 volume:504 year:2019 pages:32-47 extent:16 https://doi.org/10.1016/j.ins.2019.07.048 Volltext GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA 35.70 Biochemie: Allgemeines VZ 42.12 Biophysik VZ 42.15 Zellbiologie VZ AR 504 2019 32-47 16 |
language |
English |
source |
Enthalten in Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study New York, NY volume:504 year:2019 pages:32-47 extent:16 |
sourceStr |
Enthalten in Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study New York, NY volume:504 year:2019 pages:32-47 extent:16 |
format_phy_str_mv |
Article |
bklname |
Biochemie: Allgemeines Biophysik Zellbiologie |
institution |
findex.gbv.de |
dewey-raw |
610 |
isfreeaccess_bool |
false |
container_title |
Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |
authorswithroles_txt_mv |
Chen, Junyang @@aut@@ Gong, Zhiguo @@oth@@ Liu, Weiwen @@oth@@ |
publishDateDaySort_date |
2019-01-01T00:00:00Z |
hierarchy_top_id |
ELV011843691 |
dewey-sort |
3610 |
id |
ELV047550503 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV047550503</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626015927.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191021s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.ins.2019.07.048</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBV00000000000713.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV047550503</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0020-0255(19)30654-1</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="q">DE-30</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.70</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.12</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.15</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chen, Junyang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A nonparametric model for online topic discovery with word embeddings</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">16</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gong, Zhiguo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Weiwen</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier Science Inc</subfield><subfield code="a">Petrruzziello, Carmelina ELSEVIER</subfield><subfield code="t">Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study</subfield><subfield code="d">2013</subfield><subfield code="d">an international journal</subfield><subfield code="g">New York, NY</subfield><subfield code="w">(DE-627)ELV011843691</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:504</subfield><subfield code="g">year:2019</subfield><subfield code="g">pages:32-47</subfield><subfield code="g">extent:16</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.ins.2019.07.048</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.70</subfield><subfield code="j">Biochemie: Allgemeines</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.12</subfield><subfield code="j">Biophysik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.15</subfield><subfield code="j">Zellbiologie</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">504</subfield><subfield code="j">2019</subfield><subfield code="h">32-47</subfield><subfield code="g">16</subfield></datafield></record></collection>
|
author |
Chen, Junyang |
spellingShingle |
Chen, Junyang ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15 A nonparametric model for online topic discovery with word embeddings |
authorStr |
Chen, Junyang |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)ELV011843691 |
format |
electronic Article |
dewey-ones |
610 - Medicine & health 570 - Life sciences; biology |
delete_txt_mv |
keep |
author_role |
aut |
collection |
elsevier |
remote_str |
true |
illustrated |
Not Illustrated |
topic_title |
610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl A nonparametric model for online topic discovery with word embeddings |
topic |
ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15 |
topic_unstemmed |
ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15 |
topic_browse |
ddc 610 ddc 570 fid BIODIV bkl 35.70 bkl 42.12 bkl 42.15 |
format_facet |
Elektronische Aufsätze Aufsätze Elektronische Ressource |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
zu |
author2_variant |
z g zg w l wl |
hierarchy_parent_title |
Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |
hierarchy_parent_id |
ELV011843691 |
dewey-tens |
610 - Medicine & health 570 - Life sciences; biology |
hierarchy_top_title |
Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)ELV011843691 |
title |
A nonparametric model for online topic discovery with word embeddings |
ctrlnum |
(DE-627)ELV047550503 (ELSEVIER)S0020-0255(19)30654-1 |
title_full |
A nonparametric model for online topic discovery with word embeddings |
author_sort |
Chen, Junyang |
journal |
Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |
journalStr |
Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
600 - Technology 500 - Science |
recordtype |
marc |
publishDateSort |
2019 |
contenttype_str_mv |
zzz |
container_start_page |
32 |
author_browse |
Chen, Junyang |
container_volume |
504 |
physical |
16 |
class |
610 VZ 570 VZ BIODIV DE-30 fid 35.70 bkl 42.12 bkl 42.15 bkl |
format_se |
Elektronische Aufsätze |
author-letter |
Chen, Junyang |
doi_str_mv |
10.1016/j.ins.2019.07.048 |
dewey-full |
610 570 |
title_sort |
a nonparametric model for online topic discovery with word embeddings |
title_auth |
A nonparametric model for online topic discovery with word embeddings |
abstract |
With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. |
abstractGer |
With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. |
abstract_unstemmed |
With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms. |
collection_details |
GBV_USEFLAG_U GBV_ELV SYSFLAG_U FID-BIODIV SSG-OLC-PHA |
title_short |
A nonparametric model for online topic discovery with word embeddings |
url |
https://doi.org/10.1016/j.ins.2019.07.048 |
remote_bool |
true |
author2 |
Gong, Zhiguo Liu, Weiwen |
author2Str |
Gong, Zhiguo Liu, Weiwen |
ppnlink |
ELV011843691 |
mediatype_str_mv |
z |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth |
doi_str |
10.1016/j.ins.2019.07.048 |
up_date |
2024-07-06T23:11:41.115Z |
_version_ |
1803873158291259392 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">ELV047550503</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230626015927.0</controlfield><controlfield tag="007">cr uuu---uuuuu</controlfield><controlfield tag="008">191021s2019 xx |||||o 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1016/j.ins.2019.07.048</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">GBV00000000000713.pica</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)ELV047550503</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(ELSEVIER)S0020-0255(19)30654-1</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">610</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">570</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">BIODIV</subfield><subfield code="q">DE-30</subfield><subfield code="2">fid</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">35.70</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.12</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">42.15</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Chen, Junyang</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A nonparametric model for online topic discovery with word embeddings</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2019transfer abstract</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">16</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zzz</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">z</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">nicht spezifiziert</subfield><subfield code="b">zu</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">With the explosive growth of short documents generated from streaming textual sources (e.g., Twitter), latent topic discovery has become a critical task for short text stream clustering. However, most online clustering models determine the probability of producing a new topic by manually setting some hyper-parameter/threshold, which becomes barrier to achieve better topic discovery results. Moreover, topics generated by using existing models often involve a wide coverage of the vocabulary which is not suitable for online social media analysis. Therefore, we propose a nonparametric model (NPMM) which exploits auxiliary word embeddings to infer the topic number and employs a “spike and slab” function to alleviate the sparsity problem of topic-word distributions in online short text analyses. NPMM can automatically decide whether a given document belongs to existing topics, measured by the squared Mahalanobis distance. Hence, the proposed model is free from tuning the hyper-parameter to obtain the probability of generating new topics. Additionally, we propose a nonparametric sampling strategy to discover representative terms for each topic. To perform inference, we introduce a one-pass Gibbs sampling algorithm based on Cholesky decomposition of covariance matrices, which can further be sped up using a Metropolis-Hastings step. Our experiments demonstrate that NPMM significantly outperforms the state-of-the-art algorithms.</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gong, Zhiguo</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Weiwen</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="n">Elsevier Science Inc</subfield><subfield code="a">Petrruzziello, Carmelina ELSEVIER</subfield><subfield code="t">Mo1264 Clinical Characteristics of Inflammatory Bowel Disease May Influence the Cancer Risk When Using Immunomodulators: Incident Cases of Cancer in a Multicenter Case-Control Study</subfield><subfield code="d">2013</subfield><subfield code="d">an international journal</subfield><subfield code="g">New York, NY</subfield><subfield code="w">(DE-627)ELV011843691</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:504</subfield><subfield code="g">year:2019</subfield><subfield code="g">pages:32-47</subfield><subfield code="g">extent:16</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://doi.org/10.1016/j.ins.2019.07.048</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ELV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_U</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">FID-BIODIV</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-PHA</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">35.70</subfield><subfield code="j">Biochemie: Allgemeines</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.12</subfield><subfield code="j">Biophysik</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">42.15</subfield><subfield code="j">Zellbiologie</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">504</subfield><subfield code="j">2019</subfield><subfield code="h">32-47</subfield><subfield code="g">16</subfield></datafield></record></collection>
|
score |
7.399596 |