Expected similarity estimation for large-scale batch and streaming anomaly detection
Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator i...
Ausführliche Beschreibung
Autor*in: |
Schneider, Markus [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2016 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2016 |
---|
Übergeordnetes Werk: |
Enthalten in: Machine learning - Springer US, 1986, 105(2016), 3 vom: 18. Mai, Seite 305-333 |
---|---|
Übergeordnetes Werk: |
volume:105 ; year:2016 ; number:3 ; day:18 ; month:05 ; pages:305-333 |
Links: |
---|
DOI / URN: |
10.1007/s10994-016-5567-7 |
---|
Katalog-ID: |
OLC202652694X |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC202652694X | ||
003 | DE-627 | ||
005 | 20230503172303.0 | ||
007 | tu | ||
008 | 200820s2016 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10994-016-5567-7 |2 doi | |
035 | |a (DE-627)OLC202652694X | ||
035 | |a (DE-He213)s10994-016-5567-7-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 150 |a 004 |q VZ |
100 | 1 | |a Schneider, Markus |e verfasserin |4 aut | |
245 | 1 | 0 | |a Expected similarity estimation for large-scale batch and streaming anomaly detection |
264 | 1 | |c 2016 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s) 2016 | ||
520 | |a Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. | ||
650 | 4 | |a Anomaly detection | |
650 | 4 | |a Large-scale data | |
650 | 4 | |a Kernel methods | |
650 | 4 | |a Hilbert space embedding | |
650 | 4 | |a Mean map | |
700 | 1 | |a Ertel, Wolfgang |4 aut | |
700 | 1 | |a Ramos, Fabio |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Machine learning |d Springer US, 1986 |g 105(2016), 3 vom: 18. Mai, Seite 305-333 |w (DE-627)12920403X |w (DE-600)54638-0 |w (DE-576)014457377 |x 0885-6125 |7 nnns |
773 | 1 | 8 | |g volume:105 |g year:2016 |g number:3 |g day:18 |g month:05 |g pages:305-333 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10994-016-5567-7 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_24 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_4012 | ||
912 | |a GBV_ILN_4046 | ||
912 | |a GBV_ILN_4318 | ||
951 | |a AR | ||
952 | |d 105 |j 2016 |e 3 |b 18 |c 05 |h 305-333 |
author_variant |
m s ms w e we f r fr |
---|---|
matchkey_str |
article:08856125:2016----::xetdiiaiysiainolreclbthnsra |
hierarchy_sort_str |
2016 |
publishDate |
2016 |
allfields |
10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333 |
spelling |
10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333 |
allfields_unstemmed |
10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333 |
allfieldsGer |
10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333 |
allfieldsSound |
10.1007/s10994-016-5567-7 doi (DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p DE-627 ger DE-627 rakwb eng 150 004 VZ Schneider, Markus verfasserin aut Expected similarity estimation for large-scale batch and streaming anomaly detection 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2016 Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map Ertel, Wolfgang aut Ramos, Fabio aut Enthalten in Machine learning Springer US, 1986 105(2016), 3 vom: 18. Mai, Seite 305-333 (DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 0885-6125 nnns volume:105 year:2016 number:3 day:18 month:05 pages:305-333 https://doi.org/10.1007/s10994-016-5567-7 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 AR 105 2016 3 18 05 305-333 |
language |
English |
source |
Enthalten in Machine learning 105(2016), 3 vom: 18. Mai, Seite 305-333 volume:105 year:2016 number:3 day:18 month:05 pages:305-333 |
sourceStr |
Enthalten in Machine learning 105(2016), 3 vom: 18. Mai, Seite 305-333 volume:105 year:2016 number:3 day:18 month:05 pages:305-333 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map |
dewey-raw |
150 |
isfreeaccess_bool |
false |
container_title |
Machine learning |
authorswithroles_txt_mv |
Schneider, Markus @@aut@@ Ertel, Wolfgang @@aut@@ Ramos, Fabio @@aut@@ |
publishDateDaySort_date |
2016-05-18T00:00:00Z |
hierarchy_top_id |
12920403X |
dewey-sort |
3150 |
id |
OLC202652694X |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC202652694X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172303.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10994-016-5567-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC202652694X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10994-016-5567-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schneider, Markus</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Expected similarity estimation for large-scale batch and streaming anomaly detection</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2016</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Large-scale data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Kernel methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hilbert space embedding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mean map</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ertel, Wolfgang</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ramos, Fabio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">105(2016), 3 vom: 18. Mai, Seite 305-333</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:105</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">day:18</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:305-333</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10994-016-5567-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">105</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="b">18</subfield><subfield code="c">05</subfield><subfield code="h">305-333</subfield></datafield></record></collection>
|
author |
Schneider, Markus |
spellingShingle |
Schneider, Markus ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map Expected similarity estimation for large-scale batch and streaming anomaly detection |
authorStr |
Schneider, Markus |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)12920403X |
format |
Article |
dewey-ones |
150 - Psychology 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0885-6125 |
topic_title |
150 004 VZ Expected similarity estimation for large-scale batch and streaming anomaly detection Anomaly detection Large-scale data Kernel methods Hilbert space embedding Mean map |
topic |
ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map |
topic_unstemmed |
ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map |
topic_browse |
ddc 150 misc Anomaly detection misc Large-scale data misc Kernel methods misc Hilbert space embedding misc Mean map |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Machine learning |
hierarchy_parent_id |
12920403X |
dewey-tens |
150 - Psychology 000 - Computer science, knowledge & systems |
hierarchy_top_title |
Machine learning |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)12920403X (DE-600)54638-0 (DE-576)014457377 |
title |
Expected similarity estimation for large-scale batch and streaming anomaly detection |
ctrlnum |
(DE-627)OLC202652694X (DE-He213)s10994-016-5567-7-p |
title_full |
Expected similarity estimation for large-scale batch and streaming anomaly detection |
author_sort |
Schneider, Markus |
journal |
Machine learning |
journalStr |
Machine learning |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
100 - Philosophy & psychology 000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2016 |
contenttype_str_mv |
txt |
container_start_page |
305 |
author_browse |
Schneider, Markus Ertel, Wolfgang Ramos, Fabio |
container_volume |
105 |
class |
150 004 VZ |
format_se |
Aufsätze |
author-letter |
Schneider, Markus |
doi_str_mv |
10.1007/s10994-016-5567-7 |
dewey-full |
150 004 |
title_sort |
expected similarity estimation for large-scale batch and streaming anomaly detection |
title_auth |
Expected similarity estimation for large-scale batch and streaming anomaly detection |
abstract |
Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016 |
abstractGer |
Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016 |
abstract_unstemmed |
Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches. © The Author(s) 2016 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_24 GBV_ILN_70 GBV_ILN_4012 GBV_ILN_4046 GBV_ILN_4318 |
container_issue |
3 |
title_short |
Expected similarity estimation for large-scale batch and streaming anomaly detection |
url |
https://doi.org/10.1007/s10994-016-5567-7 |
remote_bool |
false |
author2 |
Ertel, Wolfgang Ramos, Fabio |
author2Str |
Ertel, Wolfgang Ramos, Fabio |
ppnlink |
12920403X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10994-016-5567-7 |
up_date |
2024-07-04T04:10:09.534Z |
_version_ |
1803620145728323584 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC202652694X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503172303.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10994-016-5567-7</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC202652694X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10994-016-5567-7-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">150</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Schneider, Markus</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Expected similarity estimation for large-scale batch and streaming anomaly detection</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2016</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Anomaly detection</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Large-scale data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Kernel methods</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hilbert space embedding</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Mean map</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ertel, Wolfgang</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ramos, Fabio</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Machine learning</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">105(2016), 3 vom: 18. Mai, Seite 305-333</subfield><subfield code="w">(DE-627)12920403X</subfield><subfield code="w">(DE-600)54638-0</subfield><subfield code="w">(DE-576)014457377</subfield><subfield code="x">0885-6125</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:105</subfield><subfield code="g">year:2016</subfield><subfield code="g">number:3</subfield><subfield code="g">day:18</subfield><subfield code="g">month:05</subfield><subfield code="g">pages:305-333</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10994-016-5567-7</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_24</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4012</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4046</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">105</subfield><subfield code="j">2016</subfield><subfield code="e">3</subfield><subfield code="b">18</subfield><subfield code="c">05</subfield><subfield code="h">305-333</subfield></datafield></record></collection>
|
score |
7.401326 |