An experimental analysis of limitations of MapReduce for iterative algorithms on Spark

Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that man...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Kang, Minseo [verfasserIn] Lee, Jae-Gil

Format:	Artikel
Sprache:	Englisch

Erschienen:	2017

Schlagwörter:	Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister

Anmerkung:	© Springer Science+Business Media, LLC 2017

Übergeordnetes Werk:	Enthalten in: Cluster computing - Springer US, 1998, 20(2017), 4 vom: 19. Sept., Seite 3593-3604
Übergeordnetes Werk:	volume:20 ; year:2017 ; number:4 ; day:19 ; month:09 ; pages:3593-3604

Links:	Volltext

DOI / URN:	10.1007/s10586-017-1167-y

Katalog-ID:	OLC2066389137

Internformat


LEADER	01000caa a22002652 4500
001	OLC2066389137
003	DE-627
005	20230503024829.0
007	tu
008	200819s2017 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10586-017-1167-y \|2 doi
035			\|a (DE-627)OLC2066389137
035			\|a (DE-He213)s10586-017-1167-y-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|q VZ
084			\|a 54.50$jProgrammierung: Allgemeines \|2 bkl
084			\|a 54.32$jRechnerkommunikation \|2 bkl
084			\|a 54.25$jParallele Datenverarbeitung \|2 bkl
100	1		\|a Kang, Minseo \|e verfasserin \|4 aut
245	1	0	\|a An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
264		1	\|c 2017
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Science+Business Media, LLC 2017
520			\|a Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.
650		4	\|a Iterative algorithms
650		4	\|a Hadoop
650		4	\|a Spark
650		4	\|a HaLoop
650		4	\|a iMapReduce
650		4	\|a Twister
700	1		\|a Lee, Jae-Gil \|4 aut
773	0	8	\|i Enthalten in \|t Cluster computing \|d Springer US, 1998 \|g 20(2017), 4 vom: 19. Sept., Seite 3593-3604 \|w (DE-627)265187907 \|w (DE-600)1465290-0 \|w (DE-576)9265187905 \|x 1386-7857 \|7 nnns
773	1	8	\|g volume:20 \|g year:2017 \|g number:4 \|g day:19 \|g month:09 \|g pages:3593-3604
856	4	1	\|u https://doi.org/10.1007/s10586-017-1167-y \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_70
936	b	k	\|a 54.50$jProgrammierung: Allgemeines \|q VZ \|0 181569876 \|0 (DE-625)181569876
936	b	k	\|a 54.32$jRechnerkommunikation \|q VZ \|0 10640623X \|0 (DE-625)10640623X
936	b	k	\|a 54.25$jParallele Datenverarbeitung \|q VZ \|0 181569892 \|0 (DE-625)181569892
951			\|a AR
952			\|d 20 \|j 2017 \|e 4 \|b 19 \|c 09 \|h 3593-3604

Indexfelder

author_variant	m k mk j g l jgl
matchkey_str	article:13867857:2017----::nxeietlnlssfiiainompeueoiea
hierarchy_sort_str	2017
bklnumber	54.50$jProgrammierung: Allgemeines 54.32$jRechnerkommunikation 54.25$jParallele Datenverarbeitung
publishDate	2017
allfields	10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604
spelling	10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604
allfields_unstemmed	10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604
allfieldsGer	10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604
allfieldsSound	10.1007/s10586-017-1167-y doi (DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p DE-627 ger DE-627 rakwb eng 004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl Kang, Minseo verfasserin aut An experimental analysis of limitations of MapReduce for iterative algorithms on Spark 2017 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2017 Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister Lee, Jae-Gil aut Enthalten in Cluster computing Springer US, 1998 20(2017), 4 vom: 19. Sept., Seite 3593-3604 (DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905 1386-7857 nnns volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604 https://doi.org/10.1007/s10586-017-1167-y lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70 54.50$jProgrammierung: Allgemeines VZ 181569876 (DE-625)181569876 54.32$jRechnerkommunikation VZ 10640623X (DE-625)10640623X 54.25$jParallele Datenverarbeitung VZ 181569892 (DE-625)181569892 AR 20 2017 4 19 09 3593-3604
language	English
source	Enthalten in Cluster computing 20(2017), 4 vom: 19. Sept., Seite 3593-3604 volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604
sourceStr	Enthalten in Cluster computing 20(2017), 4 vom: 19. Sept., Seite 3593-3604 volume:20 year:2017 number:4 day:19 month:09 pages:3593-3604
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister
dewey-raw	004
isfreeaccess_bool	false
container_title	Cluster computing
authorswithroles_txt_mv	Kang, Minseo @@aut@@ Lee, Jae-Gil @@aut@@
publishDateDaySort_date	2017-09-19T00:00:00Z
hierarchy_top_id	265187907
dewey-sort	14
id	OLC2066389137
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2066389137</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503024829.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10586-017-1167-y</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2066389137</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10586-017-1167-y-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kang, Minseo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">An experimental analysis of limitations of MapReduce for iterative algorithms on Spark</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Iterative algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hadoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HaLoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">iMapReduce</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Twister</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lee, Jae-Gil</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Cluster computing</subfield><subfield code="d">Springer US, 1998</subfield><subfield code="g">20(2017), 4 vom: 19. Sept., Seite 3593-3604</subfield><subfield code="w">(DE-627)265187907</subfield><subfield code="w">(DE-600)1465290-0</subfield><subfield code="w">(DE-576)9265187905</subfield><subfield code="x">1386-7857</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:20</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:4</subfield><subfield code="g">day:19</subfield><subfield code="g">month:09</subfield><subfield code="g">pages:3593-3604</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10586-017-1167-y</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="q">VZ</subfield><subfield code="0">181569876</subfield><subfield code="0">(DE-625)181569876</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="q">VZ</subfield><subfield code="0">10640623X</subfield><subfield code="0">(DE-625)10640623X</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="q">VZ</subfield><subfield code="0">181569892</subfield><subfield code="0">(DE-625)181569892</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">20</subfield><subfield code="j">2017</subfield><subfield code="e">4</subfield><subfield code="b">19</subfield><subfield code="c">09</subfield><subfield code="h">3593-3604</subfield></datafield></record></collection>
author	Kang, Minseo
spellingShingle	Kang, Minseo ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
authorStr	Kang, Minseo
ppnlink_with_tag_str_mv	@@773@@(DE-627)265187907
format	Article
dewey-ones	004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	1386-7857
topic_title	004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl An experimental analysis of limitations of MapReduce for iterative algorithms on Spark Iterative algorithms Hadoop Spark HaLoop iMapReduce Twister
topic	ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister
topic_unstemmed	ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister
topic_browse	ddc 004 bkl 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung misc Iterative algorithms misc Hadoop misc Spark misc HaLoop misc iMapReduce misc Twister
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	Cluster computing
hierarchy_parent_id	265187907
dewey-tens	000 - Computer science, knowledge & systems
hierarchy_top_title	Cluster computing
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)265187907 (DE-600)1465290-0 (DE-576)9265187905
title	An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
ctrlnum	(DE-627)OLC2066389137 (DE-He213)s10586-017-1167-y-p
title_full	An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
author_sort	Kang, Minseo
journal	Cluster computing
journalStr	Cluster computing
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2017
contenttype_str_mv	txt
container_start_page	3593
author_browse	Kang, Minseo Lee, Jae-Gil
container_volume	20
class	004 VZ 54.50$jProgrammierung: Allgemeines bkl 54.32$jRechnerkommunikation bkl 54.25$jParallele Datenverarbeitung bkl
format_se	Aufsätze
author-letter	Kang, Minseo
doi_str_mv	10.1007/s10586-017-1167-y
normlink	181569876 10640623X 181569892
normlink_prefix_str_mv	181569876 (DE-625)181569876 10640623X (DE-625)10640623X 181569892 (DE-625)181569892
dewey-full	004
title_sort	an experimental analysis of limitations of mapreduce for iterative algorithms on spark
title_auth	An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
abstract	Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017
abstractGer	Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017
abstract_unstemmed	Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached. © Springer Science+Business Media, LLC 2017
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT GBV_ILN_70
container_issue	4
title_short	An experimental analysis of limitations of MapReduce for iterative algorithms on Spark
url	https://doi.org/10.1007/s10586-017-1167-y
remote_bool	false
author2	Lee, Jae-Gil
author2Str	Lee, Jae-Gil
ppnlink	265187907
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10586-017-1167-y
up_date	2024-07-04T04:25:23.332Z
_version_	1803621103916023808
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2066389137</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503024829.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2017 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10586-017-1167-y</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2066389137</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10586-017-1167-y-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kang, Minseo</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">An experimental analysis of limitations of MapReduce for iterative algorithms on Spark</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2017</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2017</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract MapReduce is the most popular framework for distributed processing. Recently, the scalability of data mining and machine learning algorithms has significantly improved with help from MapReduce. However, MapReduce does not handle iterative algorithms very efficiently. The problem is that many data mining and machine learning algorithms are iterative by nature. In order to overcome the limitations of MapReduce, many advanced distributed systems have been developed, including HaLoop, iMapReduce, Twister, and Spark. In this paper, we identify and categorize the limitations of MapReduce in handling iterative algorithms, and then, experimentally investigate the consequences of these limitations by using the most flexible and stable distributed system, Spark. According to our experiment results, the network I/O overhead was the primary factor that affected system performance the most. The disk I/O overhead also affected system performance, but it was less significant than the network I/O overhead. For the synchronization overhead, it affected system performance only when the static data was not cached.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Iterative algorithms</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Hadoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Spark</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">HaLoop</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">iMapReduce</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Twister</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Lee, Jae-Gil</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Cluster computing</subfield><subfield code="d">Springer US, 1998</subfield><subfield code="g">20(2017), 4 vom: 19. Sept., Seite 3593-3604</subfield><subfield code="w">(DE-627)265187907</subfield><subfield code="w">(DE-600)1465290-0</subfield><subfield code="w">(DE-576)9265187905</subfield><subfield code="x">1386-7857</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:20</subfield><subfield code="g">year:2017</subfield><subfield code="g">number:4</subfield><subfield code="g">day:19</subfield><subfield code="g">month:09</subfield><subfield code="g">pages:3593-3604</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10586-017-1167-y</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.50$jProgrammierung: Allgemeines</subfield><subfield code="q">VZ</subfield><subfield code="0">181569876</subfield><subfield code="0">(DE-625)181569876</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.32$jRechnerkommunikation</subfield><subfield code="q">VZ</subfield><subfield code="0">10640623X</subfield><subfield code="0">(DE-625)10640623X</subfield></datafield><datafield tag="936" ind1="b" ind2="k"><subfield code="a">54.25$jParallele Datenverarbeitung</subfield><subfield code="q">VZ</subfield><subfield code="0">181569892</subfield><subfield code="0">(DE-625)181569892</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">20</subfield><subfield code="j">2017</subfield><subfield code="e">4</subfield><subfield code="b">19</subfield><subfield code="c">09</subfield><subfield code="h">3593-3604</subfield></datafield></record></collection>
score	7.399355

Nicht das Richtige dabei?

Schreiben Sie uns!

An experimental analysis of limitations of MapReduce for iterative algorithms on Spark

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?