Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data

Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed fo...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Koufakou, Anna [verfasserIn] Secretan, Jimmy Georgiopoulos, Michael

Format:	Artikel
Sprache:	Englisch

Erschienen:	2010

Schlagwörter:	Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets

Anmerkung:	© Springer-Verlag London Limited 2010

Übergeordnetes Werk:	Enthalten in: Knowledge and information systems - Springer-Verlag, 2000, 29(2010), 3 vom: 08. Dez., Seite 697-725
Übergeordnetes Werk:	volume:29 ; year:2010 ; number:3 ; day:08 ; month:12 ; pages:697-725

Links:	Volltext

DOI / URN:	10.1007/s10115-010-0343-7

Katalog-ID:	OLC2063375380

Internformat


LEADER	01000caa a22002652 4500
001	OLC2063375380
003	DE-627
005	20230502172711.0
007	tu
008	200820s2010 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10115-010-0343-7 \|2 doi
035			\|a (DE-627)OLC2063375380
035			\|a (DE-He213)s10115-010-0343-7-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|q VZ
082	0	4	\|a 004 \|q VZ
084			\|a 06.74$jInformationssysteme \|2 bkl
084			\|a 54.64$jDatenbanken \|2 bkl
100	1		\|a Koufakou, Anna \|e verfasserin \|4 aut
245	1	0	\|a Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
264		1	\|c 2010
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer-Verlag London Limited 2010
520			\|a Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values.
650		4	\|a Outlier detection
650		4	\|a Anomaly detection
650		4	\|a Frequent itemset mining
650		4	\|a Non-Derivable itemsets
650		4	\|a Categorical datasets
700	1		\|a Secretan, Jimmy \|4 aut
700	1		\|a Georgiopoulos, Michael \|4 aut
773	0	8	\|i Enthalten in \|t Knowledge and information systems \|d Springer-Verlag, 2000 \|g 29(2010), 3 vom: 08. Dez., Seite 697-725 \|w (DE-627)323971725 \|w (DE-600)2036569-X \|w (DE-576)9323971723 \|x 0219-1377 \|7 nnns
773	1	8	\|g volume:29 \|g year:2010 \|g number:3 \|g day:08 \|g month:12 \|g pages:697-725
856	4	1	\|u https://doi.org/10.1007/s10115-010-0343-7 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OLC-BUB
912			\|a GBV_ILN_26
912			\|a GBV_ILN_70
912			\|a GBV_ILN_100
912			\|a GBV_ILN_267
912			\|a GBV_ILN_4277
936	b	k	\|a 06.74$jInformationssysteme \|q VZ \|0 106415212 \|0 (DE-625)106415212
936	b	k	\|a 54.64$jDatenbanken \|q VZ \|0 106410865 \|0 (DE-625)106410865
951			\|a AR
952			\|d 29 \|j 2010 \|e 3 \|b 08 \|c 12 \|h 697-725

Indexfelder

author_variant	a k ak j s js m g mg
matchkey_str	article:02191377:2010----::odrvbetmesofsotireetoilreihies
hierarchy_sort_str	2010
bklnumber	06.74$jInformationssysteme 54.64$jDatenbanken
publishDate	2010
allfields	10.1007/s10115-010-0343-7 doi (DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Koufakou, Anna verfasserin aut Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data 2010 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer-Verlag London Limited 2010 Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets Secretan, Jimmy aut Georgiopoulos, Michael aut Enthalten in Knowledge and information systems Springer-Verlag, 2000 29(2010), 3 vom: 08. Dez., Seite 697-725 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:29 year:2010 number:3 day:08 month:12 pages:697-725 https://doi.org/10.1007/s10115-010-0343-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 29 2010 3 08 12 697-725
spelling	10.1007/s10115-010-0343-7 doi (DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Koufakou, Anna verfasserin aut Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data 2010 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer-Verlag London Limited 2010 Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets Secretan, Jimmy aut Georgiopoulos, Michael aut Enthalten in Knowledge and information systems Springer-Verlag, 2000 29(2010), 3 vom: 08. Dez., Seite 697-725 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:29 year:2010 number:3 day:08 month:12 pages:697-725 https://doi.org/10.1007/s10115-010-0343-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 29 2010 3 08 12 697-725
allfields_unstemmed	10.1007/s10115-010-0343-7 doi (DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Koufakou, Anna verfasserin aut Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data 2010 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer-Verlag London Limited 2010 Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets Secretan, Jimmy aut Georgiopoulos, Michael aut Enthalten in Knowledge and information systems Springer-Verlag, 2000 29(2010), 3 vom: 08. Dez., Seite 697-725 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:29 year:2010 number:3 day:08 month:12 pages:697-725 https://doi.org/10.1007/s10115-010-0343-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 29 2010 3 08 12 697-725
allfieldsGer	10.1007/s10115-010-0343-7 doi (DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Koufakou, Anna verfasserin aut Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data 2010 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer-Verlag London Limited 2010 Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets Secretan, Jimmy aut Georgiopoulos, Michael aut Enthalten in Knowledge and information systems Springer-Verlag, 2000 29(2010), 3 vom: 08. Dez., Seite 697-725 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:29 year:2010 number:3 day:08 month:12 pages:697-725 https://doi.org/10.1007/s10115-010-0343-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 29 2010 3 08 12 697-725
allfieldsSound	10.1007/s10115-010-0343-7 doi (DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p DE-627 ger DE-627 rakwb eng 004 VZ 004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Koufakou, Anna verfasserin aut Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data 2010 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer-Verlag London Limited 2010 Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets Secretan, Jimmy aut Georgiopoulos, Michael aut Enthalten in Knowledge and information systems Springer-Verlag, 2000 29(2010), 3 vom: 08. Dez., Seite 697-725 (DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723 0219-1377 nnns volume:29 year:2010 number:3 day:08 month:12 pages:697-725 https://doi.org/10.1007/s10115-010-0343-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277 06.74$jInformationssysteme VZ 106415212 (DE-625)106415212 54.64$jDatenbanken VZ 106410865 (DE-625)106410865 AR 29 2010 3 08 12 697-725
language	English
source	Enthalten in Knowledge and information systems 29(2010), 3 vom: 08. Dez., Seite 697-725 volume:29 year:2010 number:3 day:08 month:12 pages:697-725
sourceStr	Enthalten in Knowledge and information systems 29(2010), 3 vom: 08. Dez., Seite 697-725 volume:29 year:2010 number:3 day:08 month:12 pages:697-725
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets
dewey-raw	004
isfreeaccess_bool	false
container_title	Knowledge and information systems
authorswithroles_txt_mv	Koufakou, Anna @@aut@@ Secretan, Jimmy @@aut@@ Georgiopoulos, Michael @@aut@@
publishDateDaySort_date	2010-12-08T00:00:00Z
hierarchy_top_id	323971725
dewey-sort	14
id	OLC2063375380
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2063375380</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230502172711.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2010 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-010-0343-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2063375380</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-010-0343-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Koufakou, Anna</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2010</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer-Verlag London Limited 2010</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Outlier detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Frequent itemset mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Non-Derivable itemsets</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Categorical datasets</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Secretan, Jimmy</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Georgiopoulos, Michael</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer-Verlag, 2000</subfield><subfield code="g">29(2010), 3 vom: 08. Dez., Seite 697-725</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2010</subfield><subfield code="g">number:3</subfield><subfield code="g">day:08</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:697-725</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-010-0343-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_26</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_100</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_267</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4277</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2010</subfield><subfield code="e">3</subfield><subfield code="b">08</subfield><subfield code="c">12</subfield><subfield code="h">697-725</subfield></datafield></record></collection>
author	Koufakou, Anna
spellingShingle	Koufakou, Anna ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Outlier detection misc Anomaly detection misc Frequent itemset mining misc Non-Derivable itemsets misc Categorical datasets Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
authorStr	Koufakou, Anna
ppnlink_with_tag_str_mv	@@773@@(DE-627)323971725
format	Article
dewey-ones	004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0219-1377
topic_title	004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data Outlier detection Anomaly detection Frequent itemset mining Non-Derivable itemsets Categorical datasets
topic	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Outlier detection misc Anomaly detection misc Frequent itemset mining misc Non-Derivable itemsets misc Categorical datasets
topic_unstemmed	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Outlier detection misc Anomaly detection misc Frequent itemset mining misc Non-Derivable itemsets misc Categorical datasets
topic_browse	ddc 004 bkl 06.74$jInformationssysteme bkl 54.64$jDatenbanken misc Outlier detection misc Anomaly detection misc Frequent itemset mining misc Non-Derivable itemsets misc Categorical datasets
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Knowledge and information systems
hierarchy_parent_id	323971725
dewey-tens	000 - Computer science, knowledge & systems
hierarchy_top_title	Knowledge and information systems
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)323971725 (DE-600)2036569-X (DE-576)9323971723
title	Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
ctrlnum	(DE-627)OLC2063375380 (DE-He213)s10115-010-0343-7-p
title_full	Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
author_sort	Koufakou, Anna
journal	Knowledge and information systems
journalStr	Knowledge and information systems
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2010
contenttype_str_mv	txt
container_start_page	697
author_browse	Koufakou, Anna Secretan, Jimmy Georgiopoulos, Michael
container_volume	29
class	004 VZ 06.74$jInformationssysteme bkl 54.64$jDatenbanken bkl
format_se	Aufsätze
author-letter	Koufakou, Anna
doi_str_mv	10.1007/s10115-010-0343-7
normlink	106415212 106410865
normlink_prefix_str_mv	106415212 (DE-625)106415212 106410865 (DE-625)106410865
dewey-full	004
title_sort	non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
title_auth	Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
abstract	Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. © Springer-Verlag London Limited 2010
abstractGer	Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. © Springer-Verlag London Limited 2010
abstract_unstemmed	Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values. © Springer-Verlag London Limited 2010
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OLC-BUB GBV_ILN_26 GBV_ILN_70 GBV_ILN_100 GBV_ILN_267 GBV_ILN_4277
container_issue	3
title_short	Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
url	https://doi.org/10.1007/s10115-010-0343-7
remote_bool	false
author2	Secretan, Jimmy Georgiopoulos, Michael
author2Str	Secretan, Jimmy Georgiopoulos, Michael
ppnlink	323971725
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10115-010-0343-7
up_date	2024-07-03T18:48:15.229Z
_version_	1803584793719341057
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2063375380</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230502172711.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2010 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10115-010-0343-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2063375380</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10115-010-0343-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.64$jDatenbanken</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Koufakou, Anna</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2010</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer-Verlag London Limited 2010</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Detecting outliers in a dataset is an important data mining task with many applications, such as detection of credit card fraud or network intrusions. Traditional methods assume numerical data and compute pair-wise distances among points. Recently, outlier detection methods were proposed for categorical and mixed-attribute data using the concept of Frequent Itemsets (FIs). These methods face challenges when dealing with large high-dimensional data, where the number of generated FIs can be extremely large. To address this issue, we propose several outlier detection schemes inspired by the well-known condensed representation of FIs, Non-Derivable Itemsets (NDIs). Specifically, we contrast a method based on frequent NDIs, FNDI-OD, and a method based on the negative border of NDIs, NBNDI-OD, with their previously proposed FI-based counterparts. We also explore outlier detection based on Non-Almost Derivable Itemsets (NADIs), which approximate the NDIs in the data given a δ parameter. Our proposed methods use a far smaller collection of sets than the FI collection in order to compute an anomaly score for each data point. Experiments on real-life data show that, as expected, methods based on NDIs and NADIs offer substantial advantages in terms of speed and scalability over FI-based Outlier Detection method. What is significant is that NDI-based methods exhibit similar or better detection accuracy compared to the FI-based methods, which supports our claim that the NDI representation is especially well-suited for the task of detecting outliers. At the same time, the NDI approximation scheme, NADIs is shown to exhibit similar accuracy to the NDI-based method for various δ values and further runtime performance gains. Finally, we offer an in-depth discussion and experimentation regarding the trade-offs of the proposed algorithms and the choice of parameter values.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Outlier detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Frequent itemset mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Non-Derivable itemsets</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Categorical datasets</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Secretan, Jimmy</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Georgiopoulos, Michael</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Knowledge and information systems</subfield><subfield code="d">Springer-Verlag, 2000</subfield><subfield code="g">29(2010), 3 vom: 08. Dez., Seite 697-725</subfield><subfield code="w">(DE-627)323971725</subfield><subfield code="w">(DE-600)2036569-X</subfield><subfield code="w">(DE-576)9323971723</subfield><subfield code="x">0219-1377</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:29</subfield><subfield code="g">year:2010</subfield><subfield code="g">number:3</subfield><subfield code="g">day:08</subfield><subfield code="g">month:12</subfield><subfield code="g">pages:697-725</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10115-010-0343-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-BUB</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_26</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_100</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_267</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4277</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">06.74$jInformationssysteme</subfield><subfield code="q">VZ</subfield><subfield code="0">106415212</subfield><subfield code="0">(DE-625)106415212</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.64$jDatenbanken</subfield><subfield code="q">VZ</subfield><subfield code="0">106410865</subfield><subfield code="0">(DE-625)106410865</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">29</subfield><subfield code="j">2010</subfield><subfield code="e">3</subfield><subfield code="b">08</subfield><subfield code="c">12</subfield><subfield code="h">697-725</subfield></datafield></record></collection>
score	7.399473

Nicht das Richtige dabei?

Schreiben Sie uns!

Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?