SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the ne...
Ausführliche Beschreibung
Autor*in: |
Xiao, Wen [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2020 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2020 |
---|
Übergeordnetes Werk: |
Enthalten in: The journal of supercomputing - Springer US, 1987, 76(2020), 10 vom: 04. Feb., Seite 7619-7634 |
---|---|
Übergeordnetes Werk: |
volume:76 ; year:2020 ; number:10 ; day:04 ; month:02 ; pages:7619-7634 |
Links: |
---|
DOI / URN: |
10.1007/s11227-020-03190-5 |
---|
Katalog-ID: |
OLC2119442053 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | OLC2119442053 | ||
003 | DE-627 | ||
005 | 20230504170435.0 | ||
007 | tu | ||
008 | 230504s2020 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11227-020-03190-5 |2 doi | |
035 | |a (DE-627)OLC2119442053 | ||
035 | |a (DE-He213)s11227-020-03190-5-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |a 620 |q VZ |
100 | 1 | |a Xiao, Wen |e verfasserin |0 (orcid)0000-0003-1444-908X |4 aut | |
245 | 1 | 0 | |a SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s) 2020 | ||
520 | |a Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. | ||
650 | 4 | |a Frequent itemset mining | |
650 | 4 | |a Streaming data | |
650 | 4 | |a Sliding window | |
650 | 4 | |a Distributed | |
650 | 4 | |a Spark Streaming | |
700 | 1 | |a Hu, Juan |4 aut | |
773 | 0 | 8 | |i Enthalten in |t The journal of supercomputing |d Springer US, 1987 |g 76(2020), 10 vom: 04. Feb., Seite 7619-7634 |w (DE-627)13046466X |w (DE-600)740510-8 |w (DE-576)018667775 |x 0920-8542 |7 nnns |
773 | 1 | 8 | |g volume:76 |g year:2020 |g number:10 |g day:04 |g month:02 |g pages:7619-7634 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11227-020-03190-5 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
951 | |a AR | ||
952 | |d 76 |j 2020 |e 10 |b 04 |c 02 |h 7619-7634 |
author_variant |
w x wx j h jh |
---|---|
matchkey_str |
article:09208542:2020----::wcaarqettmemnnagrtmvrtemndt |
hierarchy_sort_str |
2020 |
publishDate |
2020 |
allfields |
10.1007/s11227-020-03190-5 doi (DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p DE-627 ger DE-627 rakwb eng 004 620 VZ Xiao, Wen verfasserin (orcid)0000-0003-1444-908X aut SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2020 Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming Hu, Juan aut Enthalten in The journal of supercomputing Springer US, 1987 76(2020), 10 vom: 04. Feb., Seite 7619-7634 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 https://doi.org/10.1007/s11227-020-03190-5 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 76 2020 10 04 02 7619-7634 |
spelling |
10.1007/s11227-020-03190-5 doi (DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p DE-627 ger DE-627 rakwb eng 004 620 VZ Xiao, Wen verfasserin (orcid)0000-0003-1444-908X aut SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2020 Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming Hu, Juan aut Enthalten in The journal of supercomputing Springer US, 1987 76(2020), 10 vom: 04. Feb., Seite 7619-7634 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 https://doi.org/10.1007/s11227-020-03190-5 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 76 2020 10 04 02 7619-7634 |
allfields_unstemmed |
10.1007/s11227-020-03190-5 doi (DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p DE-627 ger DE-627 rakwb eng 004 620 VZ Xiao, Wen verfasserin (orcid)0000-0003-1444-908X aut SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2020 Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming Hu, Juan aut Enthalten in The journal of supercomputing Springer US, 1987 76(2020), 10 vom: 04. Feb., Seite 7619-7634 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 https://doi.org/10.1007/s11227-020-03190-5 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 76 2020 10 04 02 7619-7634 |
allfieldsGer |
10.1007/s11227-020-03190-5 doi (DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p DE-627 ger DE-627 rakwb eng 004 620 VZ Xiao, Wen verfasserin (orcid)0000-0003-1444-908X aut SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2020 Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming Hu, Juan aut Enthalten in The journal of supercomputing Springer US, 1987 76(2020), 10 vom: 04. Feb., Seite 7619-7634 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 https://doi.org/10.1007/s11227-020-03190-5 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 76 2020 10 04 02 7619-7634 |
allfieldsSound |
10.1007/s11227-020-03190-5 doi (DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p DE-627 ger DE-627 rakwb eng 004 620 VZ Xiao, Wen verfasserin (orcid)0000-0003-1444-908X aut SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming 2020 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2020 Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming Hu, Juan aut Enthalten in The journal of supercomputing Springer US, 1987 76(2020), 10 vom: 04. Feb., Seite 7619-7634 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 https://doi.org/10.1007/s11227-020-03190-5 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT AR 76 2020 10 04 02 7619-7634 |
language |
English |
source |
Enthalten in The journal of supercomputing 76(2020), 10 vom: 04. Feb., Seite 7619-7634 volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 |
sourceStr |
Enthalten in The journal of supercomputing 76(2020), 10 vom: 04. Feb., Seite 7619-7634 volume:76 year:2020 number:10 day:04 month:02 pages:7619-7634 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
The journal of supercomputing |
authorswithroles_txt_mv |
Xiao, Wen @@aut@@ Hu, Juan @@aut@@ |
publishDateDaySort_date |
2020-02-04T00:00:00Z |
hierarchy_top_id |
13046466X |
dewey-sort |
14 |
id |
OLC2119442053 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2119442053</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504170435.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230504s2020 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-020-03190-5</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2119442053</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-020-03190-5-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Xiao, Wen</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0003-1444-908X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Frequent itemset mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Streaming data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sliding window</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark Streaming</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hu, Juan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">76(2020), 10 vom: 04. Feb., Seite 7619-7634</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:76</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:10</subfield><subfield code="g">day:04</subfield><subfield code="g">month:02</subfield><subfield code="g">pages:7619-7634</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-020-03190-5</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">76</subfield><subfield code="j">2020</subfield><subfield code="e">10</subfield><subfield code="b">04</subfield><subfield code="c">02</subfield><subfield code="h">7619-7634</subfield></datafield></record></collection>
|
author |
Xiao, Wen |
spellingShingle |
Xiao, Wen ddc 004 misc Frequent itemset mining misc Streaming data misc Sliding window misc Distributed misc Spark Streaming SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
authorStr |
Xiao, Wen |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)13046466X |
format |
Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0920-8542 |
topic_title |
004 620 VZ SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming Frequent itemset mining Streaming data Sliding window Distributed Spark Streaming |
topic |
ddc 004 misc Frequent itemset mining misc Streaming data misc Sliding window misc Distributed misc Spark Streaming |
topic_unstemmed |
ddc 004 misc Frequent itemset mining misc Streaming data misc Sliding window misc Distributed misc Spark Streaming |
topic_browse |
ddc 004 misc Frequent itemset mining misc Streaming data misc Sliding window misc Distributed misc Spark Streaming |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
The journal of supercomputing |
hierarchy_parent_id |
13046466X |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering |
hierarchy_top_title |
The journal of supercomputing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 |
title |
SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
ctrlnum |
(DE-627)OLC2119442053 (DE-He213)s11227-020-03190-5-p |
title_full |
SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
author_sort |
Xiao, Wen |
journal |
The journal of supercomputing |
journalStr |
The journal of supercomputing |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2020 |
contenttype_str_mv |
txt |
container_start_page |
7619 |
author_browse |
Xiao, Wen Hu, Juan |
container_volume |
76 |
class |
004 620 VZ |
format_se |
Aufsätze |
author-letter |
Xiao, Wen |
doi_str_mv |
10.1007/s11227-020-03190-5 |
normlink |
(ORCID)0000-0003-1444-908X |
normlink_prefix_str_mv |
(orcid)0000-0003-1444-908X |
dewey-full |
004 620 |
title_sort |
sweclat: a frequent itemset mining algorithm over streaming data using spark streaming |
title_auth |
SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
abstract |
Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. © The Author(s) 2020 |
abstractGer |
Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. © The Author(s) 2020 |
abstract_unstemmed |
Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing. © The Author(s) 2020 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT |
container_issue |
10 |
title_short |
SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming |
url |
https://doi.org/10.1007/s11227-020-03190-5 |
remote_bool |
false |
author2 |
Hu, Juan |
author2Str |
Hu, Juan |
ppnlink |
13046466X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11227-020-03190-5 |
up_date |
2024-07-04T00:59:59.608Z |
_version_ |
1803608181554806784 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000naa a22002652 4500</leader><controlfield tag="001">OLC2119442053</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504170435.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">230504s2020 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-020-03190-5</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2119442053</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-020-03190-5-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Xiao, Wen</subfield><subfield code="e">verfasserin</subfield><subfield code="0">(orcid)0000-0003-1444-908X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2020</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2020</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Frequent itemset mining</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Streaming data</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Sliding window</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark Streaming</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hu, Juan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">76(2020), 10 vom: 04. Feb., Seite 7619-7634</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:76</subfield><subfield code="g">year:2020</subfield><subfield code="g">number:10</subfield><subfield code="g">day:04</subfield><subfield code="g">month:02</subfield><subfield code="g">pages:7619-7634</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-020-03190-5</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">76</subfield><subfield code="j">2020</subfield><subfield code="e">10</subfield><subfield code="b">04</subfield><subfield code="c">02</subfield><subfield code="h">7619-7634</subfield></datafield></record></collection>
|
score |
7.401513 |