An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that man...
Ausführliche Beschreibung
Autor*in: |
Kang, Minseo [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2017 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media, LLC 2017 |
---|
Übergeordnetes Werk: |
Enthalten in: Cluster computing - Springer US, 1998, 20(2017), 4 vom: 19. Sept., Seite 3593-3604 |
---|---|
Übergeordnetes Werk: |
volume:20 ; year:2017 ; number:4 ; day:19 ; month:09 ; pages:3593-3604 |
Links: |
---|
DOI / URN: |
10.1007/s10586-017-1167-y |
---|
Katalog-ID: |
OLC2066389137 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2066389137 | ||
003 | DE-627 | ||
005 | 20230503024829.0 | ||
007 | tu | ||
008 | 200819s2017 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10586-017-1167-y |2 doi | |
035 | |a (DE-627)OLC2066389137 | ||
035 | |a (DE-He213)s10586-017-1167-y-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q VZ |
084 | |a 54.50$jProgrammierung: Allgemeines |2 bkl | ||
084 | |a 54.32$jRechnerkommunikation |2 bkl | ||
084 | |a 54.25$jParallele Datenverarbeitung |2 bkl | ||
100 | 1 | |a Kang, Minseo |e verfasserin |4 aut | |
245 | 1 | 0 | |a An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
264 | 1 | |c 2017 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media, LLC 2017 | ||
520 | |a Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. | ||
650 | 4 | |a Iterative algorithms | |
650 | 4 | |a Hadoop | |
650 | 4 | |a Spark | |
650 | 4 | |a HaLoop | |
650 | 4 | |a iMapReduce | |
650 | 4 | |a Twister | |
700 | 1 | |a Lee, Jae-Gil |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Cluster computing |d Springer US, 1998 |g 20(2017), 4 vom: 19. Sept., Seite 3593-3604 |w (DE-627)265187907 |w (DE-600)1465290-0 |w (DE-576)9265187905 |x 1386-7857 |7 nnns |
773 | 1 | 8 | |g volume:20 |g year:2017 |g number:4 |g day:19 |g month:09 |g pages:3593-3604 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10586-017-1167-y |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_70 | ||
936 | b | k | |a 54.50$jProgrammierung: Allgemeines |q VZ |0 181569876 |0 (DE-625)181569876 |
936 | b | k | |a 54.32$jRechnerkommunikation |q VZ |0 10640623X |0 (DE-625)10640623X |
936 | b | k | |a 54.25$jParallele Datenverarbeitung |q VZ |0 181569892 |0 (DE-625)181569892 |
951 | |a AR | ||
952 | |d 20 |j 2017 |e 4 |b 19 |c 09 |h 3593-3604 |
author_variant |
m k mk j g l jgl |
---|---|
matchkey_str |
article:13867857:2017----::nxeietlnlssfiiainompeueoiea |
hierarchy_sort_str |
2017 |
bklnumber |
54.50$jProgrammierung: Allgemeines 54.32$jRechnerkommunikation 54.25$jParallele Datenverarbeitung |
publishDate |
2017 |
allfields |
10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604 |
spelling |
10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604 |
allfields_unstemmed |
10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604 |
allfieldsGer |
10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604 |
allfieldsSound |
10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604 |
language |
English |
source |
Enthalten in Cluster computing 20(2017), 4 vom: 19. Sept., Seite 3593-3604 volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 |
sourceStr |
Enthalten in Cluster computing 20(2017), 4 vom: 19. Sept., Seite 3593-3604 volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Cluster computing |
authorswithroles_txt_mv |
Kang, Minseo @@aut@@ Lee, Jae-Gil @@aut@@ |
publishDateDaySort_date |
2017-09-19T00:00:00Z |
hierarchy_top_id |
265187907 |
dewey-sort |
14 |
id |
OLC2066389137 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2066389137</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503024829.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10586-017-1167-y</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2066389137</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10586-017-1167-y-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kang, Minseo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">An experimental analysis of limitations of MapReduce for iterative algorithms on Spark</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Iterative algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hadoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HaLoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">iMapReduce</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Twister</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lee, Jae-Gil</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Cluster computing</subfield><subfield code="d">Springer US, 1998</subfield><subfield code="g">20(2017), 4 vom: 19. Sept., Seite 3593-3604</subfield><subfield code="w">(DE-627)265187907</subfield><subfield code="w">(DE-600)1465290-0</subfield><subfield code="w">(DE-576)9265187905</subfield><subfield code="x">1386-7857</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:20</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:4</subfield><subfield code="g">day:19</subfield><subfield code="g">month:09</subfield><subfield code="g">pages:3593-3604</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10586-017-1167-y</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="q">VZ</subfield><subfield code="0">181569876</subfield><subfield code="0">(DE-625)181569876</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="q">VZ</subfield><subfield code="0">10640623X</subfield><subfield code="0">(DE-625)10640623X</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="q">VZ</subfield><subfield code="0">181569892</subfield><subfield code="0">(DE-625)181569892</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">20</subfield><subfield code="j">2017</subfield><subfield code="e">4</subfield><subfield code="b">19</subfield><subfield code="c">09</subfield><subfield code="h">3593-3604</subfield></datafield></record></collection>
|
author |
Kang, Minseo |
spellingShingle |
Kang, Minseo ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
authorStr |
Kang, Minseo |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)265187907 |
format |
Article |
dewey-ones |
004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
1386-7857 |
topic_title |
004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl An experimental analysis of limitations of MapReduce for iterative algorithms on Spark Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister |
topic |
ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister |
topic_unstemmed |
ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister |
topic_browse |
ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
Cluster computing |
hierarchy_parent_id |
265187907 |
dewey-tens |
000 - Computer science, knowledge & systems |
hierarchy_top_title |
Cluster computing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 |
title |
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
ctrlnum |
(DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p |
title_full |
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
author_sort |
Kang, Minseo |
journal |
Cluster computing |
journalStr |
Cluster computing |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2017 |
contenttype_str_mv |
txt |
container_start_page |
3593 |
author_browse |
Kang, Minseo Lee, Jae-Gil |
container_volume |
20 |
class |
004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl |
format_se |
Aufsätze |
author-letter |
Kang, Minseo |
doi_str_mv |
10.1007/s10586-017-1167-y |
normlink |
181569876 10640623X 181569892 |
normlink_prefix_str_mv |
181569876 (DE-625)181569876 10640623X (DE-625)10640623X 181569892 (DE-625)181569892 |
dewey-full |
004 |
title_sort |
an experimental analysis of limitations of mapreduce for iterative algorithms on spark |
title_auth |
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
abstract |
Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017 |
abstractGer |
Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017 |
abstract_unstemmed |
Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 |
container_issue |
4 |
title_short |
An experimental analysis of limitations of MapReduce for iterative algorithms on Spark |
url |
https://doi.org/10.1007/s10586-017-1167-y |
remote_bool |
false |
author2 |
Lee, Jae-Gil |
author2Str |
Lee, Jae-Gil |
ppnlink |
265187907 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10586-017-1167-y |
up_date |
2024-07-04T04:25:23.332Z |
_version_ |
1803621103916023808 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2066389137</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503024829.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10586-017-1167-y</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2066389137</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10586-017-1167-y-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kang, Minseo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">An experimental analysis of limitations of MapReduce for iterative algorithms on Spark</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Iterative algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hadoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HaLoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">iMapReduce</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Twister</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lee, Jae-Gil</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Cluster computing</subfield><subfield code="d">Springer US, 1998</subfield><subfield code="g">20(2017), 4 vom: 19. Sept., Seite 3593-3604</subfield><subfield code="w">(DE-627)265187907</subfield><subfield code="w">(DE-600)1465290-0</subfield><subfield code="w">(DE-576)9265187905</subfield><subfield code="x">1386-7857</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:20</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:4</subfield><subfield code="g">day:19</subfield><subfield code="g">month:09</subfield><subfield code="g">pages:3593-3604</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10586-017-1167-y</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="q">VZ</subfield><subfield code="0">181569876</subfield><subfield code="0">(DE-625)181569876</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="q">VZ</subfield><subfield code="0">10640623X</subfield><subfield code="0">(DE-625)10640623X</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="q">VZ</subfield><subfield code="0">181569892</subfield><subfield code="0">(DE-625)181569892</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">20</subfield><subfield code="j">2017</subfield><subfield code="e">4</subfield><subfield code="b">19</subfield><subfield code="c">09</subfield><subfield code="h">3593-3604</subfield></datafield></record></collection>
|
score |
7.399355 |