Expected similarity estimation for large-scale batch and streaming anomaly detection

Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator i...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Schneider, Markus [verfasserIn] Ertel, Wolfgang Ramos, Fabio

Format:	Artikel
Sprache:	Englisch

Erschienen:	2016

Schlagwörter:	Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map

Anmerkung:	© The Author(s) 2016

Übergeordnetes Werk:	Enthalten in: Machine learning - Springer US, 1986, 105(2016), 3 vom: 18. Mai, Seite 305-333
Übergeordnetes Werk:	volume:105 ; year:2016 ; number:3 ; day:18 ; month:05 ; pages:305-333

Links:	Volltext

DOI / URN:	10.1007/s10994-016-5567-7

Katalog-ID:	OLC202652694X

Internformat


LEADER	01000caa a22002652 4500
001	OLC202652694X
003	DE-627
005	20230503172303.0
007	tu
008	200820s2016 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10994-016-5567-7 \|2 doi
035			\|a (DE-627)OLC202652694X
035			\|a (DE-He213)s10994-016-5567-7-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 150 \|a 004 \|q VZ
100	1		\|a Schneider, Markus \|e verfasserin \|4 aut
245	1	0	\|a Expected similarity estimation for large-scale batch and streaming anomaly detection
264		1	\|c 2016
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © The Author(s) 2016
520			\|a Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.
650		4	\|a Anomaly detection
650		4	\|a Large-scale data
650		4	\|a Kernel methods
650		4	\|a Hilbert space embedding
650		4	\|a Mean map
700	1		\|a Ertel, Wolfgang \|4 aut
700	1		\|a Ramos, Fabio \|4 aut
773	0	8	\|i Enthalten in \|t Machine learning \|d Springer US, 1986 \|g 105(2016), 3 vom: 18. Mai, Seite 305-333 \|w (DE-627)12920403X \|w (DE-600)54638-0 \|w (DE-576)014457377 \|x 0885-6125 \|7 nnns
773	1	8	\|g volume:105 \|g year:2016 \|g number:3 \|g day:18 \|g month:05 \|g pages:305-333
856	4	1	\|u https://doi.org/10.1007/s10994-016-5567-7 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_24
912			\|a GBV_ILN_70
912			\|a GBV_ILN_4012
912			\|a GBV_ILN_4046
912			\|a GBV_ILN_4318
951			\|a AR
952			\|d 105 \|j 2016 \|e 3 \|b 18 \|c 05 \|h 305-333

Indexfelder

author_variant	m s ms w e we f r fr
matchkey_str	article:08856125:2016----::xetdiiaiysiainolreclbthnsra
hierarchy_sort_str	2016
publishDate	2016
allfields	10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333
spelling	10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333
allfields_unstemmed	10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333
allfieldsGer	10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333
allfieldsSound	10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333
language	English
source	Enthalten in Machine learning 105(2016), 3 vom: 18. Mai, Seite 305-333 volume:105 year:2016 number:3 day:18 month:05 pages:305-333
sourceStr	Enthalten in Machine learning 105(2016), 3 vom: 18. Mai, Seite 305-333 volume:105 year:2016 number:3 day:18 month:05 pages:305-333
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map
dewey-raw	150
isfreeaccess_bool	false
container_title	Machine learning
authorswithroles_txt_mv	Schneider, Markus @@aut@@ Ertel, Wolfgang @@aut@@ Ramos, Fabio @@aut@@
publishDateDaySort_date	2016-05-18T00:00:00Z
hierarchy_top_id	12920403X
dewey-sort	3150
id	OLC202652694X
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC202652694X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172303.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2016 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10994-016-5567-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC202652694X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10994-016-5567-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schneider, Markus</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Expected similarity estimation for large-scale batch and streaming anomaly detection</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2016</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Large-scale data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Kernel methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hilbert space embedding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mean map</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ertel, Wolfgang</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ramos, Fabio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">105(2016), 3 vom: 18. Mai, Seite 305-333</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:105</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">day:18</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:305-333</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10994-016-5567-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">105</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="b">18</subfield><subfield code="c">05</subfield><subfield code="h">305-333</subfield></datafield></record></collection>
author	Schneider, Markus
spellingShingle	Schneider, Markus ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map Expected similarity estimation for large-scale batch and streaming anomaly detection
authorStr	Schneider, Markus
ppnlink_with_tag_str_mv	@@773@@(DE-627)12920403X
format	Article
dewey-ones	150 - Psychology 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0885-6125
topic_title	150 004 VZ Expected similarity estimation for large-scale batch and streaming anomaly detection Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map
topic	ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map
topic_unstemmed	ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map
topic_browse	ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Machine learning
hierarchy_parent_id	12920403X
dewey-tens	150 - Psychology 000 - Computer science, knowledge & systems
hierarchy_top_title	Machine learning
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)12920403X (DE-600)54638-0 (DE-576)014457377
title	Expected similarity estimation for large-scale batch and streaming anomaly detection
ctrlnum	(DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p
title_full	Expected similarity estimation for large-scale batch and streaming anomaly detection
author_sort	Schneider, Markus
journal	Machine learning
journalStr	Machine learning
lang_code	eng
isOA_bool	false
dewey-hundreds	100 - Philosophy & psychology 000 - Computer science, information & general works
recordtype	marc
publishDateSort	2016
contenttype_str_mv	txt
container_start_page	305
author_browse	Schneider, Markus Ertel, Wolfgang Ramos, Fabio
container_volume	105
class	150 004 VZ
format_se	Aufsätze
author-letter	Schneider, Markus
doi_str_mv	10.1007/s10994-016-5567-7
dewey-full	150 004
title_sort	expected similarity estimation for large-scale batch and streaming anomaly detection
title_auth	Expected similarity estimation for large-scale batch and streaming anomaly detection
abstract	Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016
abstractGer	Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016
abstract_unstemmed	Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318
container_issue	3
title_short	Expected similarity estimation for large-scale batch and streaming anomaly detection
url	https://doi.org/10.1007/s10994-016-5567-7
remote_bool	false
author2	Ertel, Wolfgang Ramos, Fabio
author2Str	Ertel, Wolfgang Ramos, Fabio
ppnlink	12920403X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10994-016-5567-7
up_date	2024-07-04T04:10:09.534Z
_version_	1803620145728323584
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC202652694X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172303.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2016 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10994-016-5567-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC202652694X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10994-016-5567-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schneider, Markus</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Expected similarity estimation for large-scale batch and streaming anomaly detection</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2016</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Large-scale data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Kernel methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hilbert space embedding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mean map</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ertel, Wolfgang</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ramos, Fabio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">105(2016), 3 vom: 18. Mai, Seite 305-333</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:105</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">day:18</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:305-333</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10994-016-5567-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">105</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="b">18</subfield><subfield code="c">05</subfield><subfield code="h">305-333</subfield></datafield></record></collection>
score	7.401326

Nicht das Richtige dabei?

Schreiben Sie uns!

Expected similarity estimation for large-scale batch and streaming anomaly detection

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?