Hybrid data labeling algorithm for clustering large mixed type data
Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a...
Ausführliche Beschreibung
Autor*in: |
Sangam, Ravi Sankar [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2014 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media New York 2014 |
---|
Übergeordnetes Werk: |
Enthalten in: Journal of intelligent information systems - Springer US, 1992, 45(2014), 2 vom: 14. Dez., Seite 273-293 |
---|---|
Übergeordnetes Werk: |
volume:45 ; year:2014 ; number:2 ; day:14 ; month:12 ; pages:273-293 |
Links: |
---|
DOI / URN: |
10.1007/s10844-014-0348-x |
---|
Katalog-ID: |
OLC2052419519 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2052419519 | ||
003 | DE-627 | ||
005 | 20230503115448.0 | ||
007 | tu | ||
008 | 200820s2014 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10844-014-0348-x |2 doi | |
035 | |a (DE-627)OLC2052419519 | ||
035 | |a (DE-He213)s10844-014-0348-x-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 020 |a 004 |q VZ |
084 | |a 24,1 |2 ssgn | ||
084 | |a 54.00 |2 bkl | ||
100 | 1 | |a Sangam, Ravi Sankar |e verfasserin |4 aut | |
245 | 1 | 0 | |a Hybrid data labeling algorithm for clustering large mixed type data |
264 | 1 | |c 2014 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media New York 2014 | ||
520 | |a Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. | ||
650 | 4 | |a Clustering | |
650 | 4 | |a Data mining | |
650 | 4 | |a Data labeling | |
650 | 4 | |a Mixed type data | |
700 | 1 | |a Om, Hari |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Journal of intelligent information systems |d Springer US, 1992 |g 45(2014), 2 vom: 14. Dez., Seite 273-293 |w (DE-627)171028333 |w (DE-600)1141899-0 |w (DE-576)03304032X |x 0925-9902 |7 nnns |
773 | 1 | 8 | |g volume:45 |g year:2014 |g number:2 |g day:14 |g month:12 |g pages:273-293 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10844-014-0348-x |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OLC-BUB | ||
912 | |a SSG-OPC-BBI | ||
912 | |a GBV_ILN_32 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_2244 | ||
912 | |a GBV_ILN_4012 | ||
936 | b | k | |a 54.00 |q VZ |
951 | |a AR | ||
952 | |d 45 |j 2014 |e 2 |b 14 |c 12 |h 273-293 |
author_variant |
r s s rs rss h o ho |
---|---|
matchkey_str |
article:09259902:2014----::yrdaaaeigloihfrlseiga |
hierarchy_sort_str |
2014 |
bklnumber |
54.00 |
publishDate |
2014 |
allfields |
10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293 |
spelling |
10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293 |
allfields_unstemmed |
10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293 |
allfieldsGer |
10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293 |
allfieldsSound |
10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293 |
language |
English |
source |
Enthalten in Journal of intelligent information systems 45(2014), 2 vom: 14. Dez., Seite 273-293 volume:45 year:2014 number:2 day:14 month:12 pages:273-293 |
sourceStr |
Enthalten in Journal of intelligent information systems 45(2014), 2 vom: 14. Dez., Seite 273-293 volume:45 year:2014 number:2 day:14 month:12 pages:273-293 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Clustering Data mining Data labeling Mixed type data |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
Journal of intelligent information systems |
authorswithroles_txt_mv |
Sangam, Ravi Sankar @@aut@@ Om, Hari @@aut@@ |
publishDateDaySort_date |
2014-12-14T00:00:00Z |
hierarchy_top_id |
171028333 |
dewey-sort |
270 |
id |
OLC2052419519 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2052419519</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503115448.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10844-014-0348-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2052419519</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10844-014-0348-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.00</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sangam, Ravi Sankar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hybrid data labeling algorithm for clustering large mixed type data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clustering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data labeling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mixed type data</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Om, Hari</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of intelligent information systems</subfield><subfield code="d">Springer US, 1992</subfield><subfield code="g">45(2014), 2 vom: 14. Dez., Seite 273-293</subfield><subfield code="w">(DE-627)171028333</subfield><subfield code="w">(DE-600)1141899-0</subfield><subfield code="w">(DE-576)03304032X</subfield><subfield code="x">0925-9902</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:45</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:2</subfield><subfield code="g">day:14</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:273-293</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10844-014-0348-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.00</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">45</subfield><subfield code="j">2014</subfield><subfield code="e">2</subfield><subfield code="b">14</subfield><subfield code="c">12</subfield><subfield code="h">273-293</subfield></datafield></record></collection>
|
author |
Sangam, Ravi Sankar |
spellingShingle |
Sangam, Ravi Sankar ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data Hybrid data labeling algorithm for clustering large mixed type data |
authorStr |
Sangam, Ravi Sankar |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)171028333 |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 020 - Library & information sciences 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0925-9902 |
topic_title |
070 020 004 VZ 24,1 ssgn 54.00 bkl Hybrid data labeling algorithm for clustering large mixed type data Clustering Data mining Data labeling Mixed type data |
topic |
ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data |
topic_unstemmed |
ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data |
topic_browse |
ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Journal of intelligent information systems |
hierarchy_parent_id |
171028333 |
dewey-tens |
070 - News media, journalism & publishing 020 - Library & information sciences 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Journal of intelligent information systems |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X |
title |
Hybrid data labeling algorithm for clustering large mixed type data |
ctrlnum |
(DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p |
title_full |
Hybrid data labeling algorithm for clustering large mixed type data |
author_sort |
Sangam, Ravi Sankar |
journal |
Journal of intelligent information systems |
journalStr |
Journal of intelligent information systems |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2014 |
contenttype_str_mv |
txt |
container_start_page |
273 |
author_browse |
Sangam, Ravi Sankar Om, Hari |
container_volume |
45 |
class |
070 020 004 VZ 24,1 ssgn 54.00 bkl |
format_se |
Aufsätze |
author-letter |
Sangam, Ravi Sankar |
doi_str_mv |
10.1007/s10844-014-0348-x |
dewey-full |
070 020 004 |
title_sort |
hybrid data labeling algorithm for clustering large mixed type data |
title_auth |
Hybrid data labeling algorithm for clustering large mixed type data |
abstract |
Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014 |
abstractGer |
Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014 |
abstract_unstemmed |
Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 |
container_issue |
2 |
title_short |
Hybrid data labeling algorithm for clustering large mixed type data |
url |
https://doi.org/10.1007/s10844-014-0348-x |
remote_bool |
false |
author2 |
Om, Hari |
author2Str |
Om, Hari |
ppnlink |
171028333 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10844-014-0348-x |
up_date |
2024-07-03T15:00:26.902Z |
_version_ |
1803570461438640129 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2052419519</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503115448.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10844-014-0348-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2052419519</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10844-014-0348-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.00</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sangam, Ravi Sankar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hybrid data labeling algorithm for clustering large mixed type data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clustering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data labeling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mixed type data</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Om, Hari</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of intelligent information systems</subfield><subfield code="d">Springer US, 1992</subfield><subfield code="g">45(2014), 2 vom: 14. Dez., Seite 273-293</subfield><subfield code="w">(DE-627)171028333</subfield><subfield code="w">(DE-600)1141899-0</subfield><subfield code="w">(DE-576)03304032X</subfield><subfield code="x">0925-9902</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:45</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:2</subfield><subfield code="g">day:14</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:273-293</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10844-014-0348-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.00</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">45</subfield><subfield code="j">2014</subfield><subfield code="e">2</subfield><subfield code="b">14</subfield><subfield code="c">12</subfield><subfield code="h">273-293</subfield></datafield></record></collection>
|
score |
7.4007034 |