Hybrid data labeling algorithm for clustering large mixed type data

Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Sangam, Ravi Sankar [verfasserIn] Om, Hari

Format:	Artikel
Sprache:	Englisch

Erschienen:	2014

Schlagwörter:	Clustering Data mining Data labeling Mixed type data

Anmerkung:	© Springer Science+Business Media New York 2014

Übergeordnetes Werk:	Enthalten in: Journal of intelligent information systems - Springer US, 1992, 45(2014), 2 vom: 14. Dez., Seite 273-293
Übergeordnetes Werk:	volume:45 ; year:2014 ; number:2 ; day:14 ; month:12 ; pages:273-293

Links:	Volltext

DOI / URN:	10.1007/s10844-014-0348-x

Katalog-ID:	OLC2052419519

Internformat


LEADER	01000caa a22002652 4500
001	OLC2052419519
003	DE-627
005	20230503115448.0
007	tu
008	200820s2014 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10844-014-0348-x \|2 doi
035			\|a (DE-627)OLC2052419519
035			\|a (DE-He213)s10844-014-0348-x-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 070 \|a 020 \|a 004 \|q VZ
084			\|a 24,1 \|2 ssgn
084			\|a 54.00 \|2 bkl
100	1		\|a Sangam, Ravi Sankar \|e verfasserin \|4 aut
245	1	0	\|a Hybrid data labeling algorithm for clustering large mixed type data
264		1	\|c 2014
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Science+Business Media New York 2014
520			\|a Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA.
650		4	\|a Clustering
650		4	\|a Data mining
650		4	\|a Data labeling
650		4	\|a Mixed type data
700	1		\|a Om, Hari \|4 aut
773	0	8	\|i Enthalten in \|t Journal of intelligent information systems \|d Springer US, 1992 \|g 45(2014), 2 vom: 14. Dez., Seite 273-293 \|w (DE-627)171028333 \|w (DE-600)1141899-0 \|w (DE-576)03304032X \|x 0925-9902 \|7 nnns
773	1	8	\|g volume:45 \|g year:2014 \|g number:2 \|g day:14 \|g month:12 \|g pages:273-293
856	4	1	\|u https://doi.org/10.1007/s10844-014-0348-x \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-BUB
912			\|a SSG-OPC-BBI
912			\|a GBV_ILN_32
912			\|a GBV_ILN_70
912			\|a GBV_ILN_2244
912			\|a GBV_ILN_4012
936	b	k	\|a 54.00 \|q VZ
951			\|a AR
952			\|d 45 \|j 2014 \|e 2 \|b 14 \|c 12 \|h 273-293

Indexfelder

author_variant	r s s rs rss h o ho
matchkey_str	article:09259902:2014----::yrdaaaeigloihfrlseiga
hierarchy_sort_str	2014
bklnumber	54.00
publishDate	2014
allfields	10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293
spelling	10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293
allfields_unstemmed	10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293
allfieldsGer	10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293
allfieldsSound	10.1007/s10844-014-0348-x doi (DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p DE-627 ger DE-627 rakwb eng 070 020 004 VZ 24,1 ssgn 54.00 bkl Sangam, Ravi Sankar verfasserin aut Hybrid data labeling algorithm for clustering large mixed type data 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media New York 2014 Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. Clustering Data mining Data labeling Mixed type data Om, Hari aut Enthalten in Journal of intelligent information systems Springer US, 1992 45(2014), 2 vom: 14. Dez., Seite 273-293 (DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X 0925-9902 nnns volume:45 year:2014 number:2 day:14 month:12 pages:273-293 https://doi.org/10.1007/s10844-014-0348-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012 54.00 VZ AR 45 2014 2 14 12 273-293
language	English
source	Enthalten in Journal of intelligent information systems 45(2014), 2 vom: 14. Dez., Seite 273-293 volume:45 year:2014 number:2 day:14 month:12 pages:273-293
sourceStr	Enthalten in Journal of intelligent information systems 45(2014), 2 vom: 14. Dez., Seite 273-293 volume:45 year:2014 number:2 day:14 month:12 pages:273-293
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Clustering Data mining Data labeling Mixed type data
dewey-raw	070
isfreeaccess_bool	false
container_title	Journal of intelligent information systems
authorswithroles_txt_mv	Sangam, Ravi Sankar @@aut@@ Om, Hari @@aut@@
publishDateDaySort_date	2014-12-14T00:00:00Z
hierarchy_top_id	171028333
dewey-sort	270
id	OLC2052419519
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2052419519</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503115448.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10844-014-0348-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2052419519</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10844-014-0348-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.00</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sangam, Ravi Sankar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hybrid data labeling algorithm for clustering large mixed type data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clustering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data labeling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mixed type data</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Om, Hari</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of intelligent information systems</subfield><subfield code="d">Springer US, 1992</subfield><subfield code="g">45(2014), 2 vom: 14. Dez., Seite 273-293</subfield><subfield code="w">(DE-627)171028333</subfield><subfield code="w">(DE-600)1141899-0</subfield><subfield code="w">(DE-576)03304032X</subfield><subfield code="x">0925-9902</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:45</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:2</subfield><subfield code="g">day:14</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:273-293</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10844-014-0348-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.00</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">45</subfield><subfield code="j">2014</subfield><subfield code="e">2</subfield><subfield code="b">14</subfield><subfield code="c">12</subfield><subfield code="h">273-293</subfield></datafield></record></collection>
author	Sangam, Ravi Sankar
spellingShingle	Sangam, Ravi Sankar ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data Hybrid data labeling algorithm for clustering large mixed type data
authorStr	Sangam, Ravi Sankar
ppnlink_with_tag_str_mv	@@773@@(DE-627)171028333
format	Article
dewey-ones	070 - News media, journalism & publishing 020 - Library & information sciences 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0925-9902
topic_title	070 020 004 VZ 24,1 ssgn 54.00 bkl Hybrid data labeling algorithm for clustering large mixed type data Clustering Data mining Data labeling Mixed type data
topic	ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data
topic_unstemmed	ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data
topic_browse	ddc 070 ssgn 24,1 bkl 54.00 misc Clustering misc Data mining misc Data labeling misc Mixed type data
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Journal of intelligent information systems
hierarchy_parent_id	171028333
dewey-tens	070 - News media, journalism & publishing 020 - Library & information sciences 000 - Computer science, knowledge & systems
hierarchy_top_title	Journal of intelligent information systems
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)171028333 (DE-600)1141899-0 (DE-576)03304032X
title	Hybrid data labeling algorithm for clustering large mixed type data
ctrlnum	(DE-627)OLC2052419519 (DE-He213)s10844-014-0348-x-p
title_full	Hybrid data labeling algorithm for clustering large mixed type data
author_sort	Sangam, Ravi Sankar
journal	Journal of intelligent information systems
journalStr	Journal of intelligent information systems
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2014
contenttype_str_mv	txt
container_start_page	273
author_browse	Sangam, Ravi Sankar Om, Hari
container_volume	45
class	070 020 004 VZ 24,1 ssgn 54.00 bkl
format_se	Aufsätze
author-letter	Sangam, Ravi Sankar
doi_str_mv	10.1007/s10844-014-0348-x
dewey-full	070 020 004
title_sort	hybrid data labeling algorithm for clustering large mixed type data
title_auth	Hybrid data labeling algorithm for clustering large mixed type data
abstract	Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014
abstractGer	Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014
abstract_unstemmed	Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA. © Springer Science+Business Media New York 2014
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB SSG-OPC-BBI GBV_ILN_32 GBV_ILN_70 GBV_ILN_2244 GBV_ILN_4012
container_issue	2
title_short	Hybrid data labeling algorithm for clustering large mixed type data
url	https://doi.org/10.1007/s10844-014-0348-x
remote_bool	false
author2	Om, Hari
author2Str	Om, Hari
ppnlink	171028333
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10844-014-0348-x
up_date	2024-07-03T15:00:26.902Z
_version_	1803570461438640129
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2052419519</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503115448.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10844-014-0348-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2052419519</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10844-014-0348-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">020</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">24,1</subfield><subfield code="2">ssgn</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.00</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Sangam, Ravi Sankar</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Hybrid data labeling algorithm for clustering large mixed type data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media New York 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Due to enormous growth in both volume and variety of data, clustering a very large database is a time-consuming process. To speed up clustering process, sampling has been recognized as a very utilitarian approach to reduce the dataset size in which a collection of data points are taken as a sample and then a clustering algorithm is applied to partitioning the data points in that sample into clusters. In this approach, the data points, that are not sampled, do not get their cluster labels. The process of allocating unlabeled data points into proper clusters has been well explored purely in numerical or categorical domain only, but not the both. In this paper, we propose a hybrid similarity coefficient to find the resemblance between an unlabeled data point and a cluster, based on the importance of categorical attribute values and the mean values of numerical attributes. Furthermore, we propose a Hybrid Data Labeling Algorithm (HDLA), based on this similarity coefficient to designate an appropriate cluster label to each unlabeled data point. We analyze its time complexity and perform various experiments using synthetic and real world datasets to demonstrate the efficacy of HDLA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Clustering</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Data labeling</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mixed type data</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Om, Hari</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Journal of intelligent information systems</subfield><subfield code="d">Springer US, 1992</subfield><subfield code="g">45(2014), 2 vom: 14. Dez., Seite 273-293</subfield><subfield code="w">(DE-627)171028333</subfield><subfield code="w">(DE-600)1141899-0</subfield><subfield code="w">(DE-576)03304032X</subfield><subfield code="x">0925-9902</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:45</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:2</subfield><subfield code="g">day:14</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:273-293</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10844-014-0348-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_32</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2244</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.00</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">45</subfield><subfield code="j">2014</subfield><subfield code="e">2</subfield><subfield code="b">14</subfield><subfield code="c">12</subfield><subfield code="h">273-293</subfield></datafield></record></collection>
score	7.4007034

Nicht das Richtige dabei?

Schreiben Sie uns!

Hybrid data labeling algorithm for clustering large mixed type data

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?