Exploring the performance limits of simultaneous multithreading for memory intensive applications

Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance g...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Athanasaki, Evangelia [verfasserIn] Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios

Format:	Artikel
Sprache:	Englisch

Erschienen:	2007

Schlagwörter:	Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis

Anmerkung:	© Springer Science+Business Media, LLC 2007

Übergeordnetes Werk:	Enthalten in: The journal of supercomputing - Springer US, 1987, 44(2007), 1 vom: 06. Okt., Seite 64-97
Übergeordnetes Werk:	volume:44 ; year:2007 ; number:1 ; day:06 ; month:10 ; pages:64-97

Links:	Volltext

DOI / URN:	10.1007/s11227-007-0149-x

Katalog-ID:	OLC203393537X

Internformat


LEADER	01000caa a22002652 4500
001	OLC203393537X
003	DE-627
005	20230504053641.0
007	tu
008	200819s2007 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s11227-007-0149-x \|2 doi
035			\|a (DE-627)OLC203393537X
035			\|a (DE-He213)s11227-007-0149-x-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|a 620 \|q VZ
100	1		\|a Athanasaki, Evangelia \|e verfasserin \|4 aut
245	1	0	\|a Exploring the performance limits of simultaneous multithreading for memory intensive applications
264		1	\|c 2007
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Science+Business Media, LLC 2007
520			\|a Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.
650		4	\|a Simultaneous multithreading
650		4	\|a Thread-level parallelism
650		4	\|a Instruction-level parallelism
650		4	\|a Software prefetching
650		4	\|a Speculative precomputation
650		4	\|a Performance analysis
700	1		\|a Anastopoulos, Nikos \|4 aut
700	1		\|a Kourtis, Kornilios \|4 aut
700	1		\|a Koziris, Nectarios \|4 aut
773	0	8	\|i Enthalten in \|t The journal of supercomputing \|d Springer US, 1987 \|g 44(2007), 1 vom: 06. Okt., Seite 64-97 \|w (DE-627)13046466X \|w (DE-600)740510-8 \|w (DE-576)018667775 \|x 0920-8542 \|7 nnns
773	1	8	\|g volume:44 \|g year:2007 \|g number:1 \|g day:06 \|g month:10 \|g pages:64-97
856	4	1	\|u https://doi.org/10.1007/s11227-007-0149-x \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-TEC
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_62
912			\|a GBV_ILN_70
912			\|a GBV_ILN_2010
912			\|a GBV_ILN_4307
912			\|a GBV_ILN_4324
951			\|a AR
952			\|d 44 \|j 2007 \|e 1 \|b 06 \|c 10 \|h 64-97

Indexfelder

author_variant	e a ea n a na k k kk n k nk
matchkey_str	article:09208542:2007----::xlrnteefraclmtosmlaeumlihednfre
hierarchy_sort_str	2007
publishDate	2007
allfields	10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97
spelling	10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97
allfields_unstemmed	10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97
allfieldsGer	10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97
allfieldsSound	10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97
language	English
source	Enthalten in The journal of supercomputing 44(2007), 1 vom: 06. Okt., Seite 64-97 volume:44 year:2007 number:1 day:06 month:10 pages:64-97
sourceStr	Enthalten in The journal of supercomputing 44(2007), 1 vom: 06. Okt., Seite 64-97 volume:44 year:2007 number:1 day:06 month:10 pages:64-97
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis
dewey-raw	004
isfreeaccess_bool	false
container_title	The journal of supercomputing
authorswithroles_txt_mv	Athanasaki, Evangelia @@aut@@ Anastopoulos, Nikos @@aut@@ Kourtis, Kornilios @@aut@@ Koziris, Nectarios @@aut@@
publishDateDaySort_date	2007-10-06T00:00:00Z
hierarchy_top_id	13046466X
dewey-sort	14
id	OLC203393537X
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC203393537X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053641.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2007 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-007-0149-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC203393537X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-007-0149-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Athanasaki, Evangelia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the performance limits of simultaneous multithreading for memory intensive applications</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2007</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2007</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Simultaneous multithreading</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Thread-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Instruction-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software prefetching</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speculative precomputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Performance analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Anastopoulos, Nikos</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kourtis, Kornilios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Koziris, Nectarios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">44(2007), 1 vom: 06. Okt., Seite 64-97</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:44</subfield><subfield code="g">year:2007</subfield><subfield code="g">number:1</subfield><subfield code="g">day:06</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:64-97</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-007-0149-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">44</subfield><subfield code="j">2007</subfield><subfield code="e">1</subfield><subfield code="b">06</subfield><subfield code="c">10</subfield><subfield code="h">64-97</subfield></datafield></record></collection>
author	Athanasaki, Evangelia
spellingShingle	Athanasaki, Evangelia ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis Exploring the performance limits of simultaneous multithreading for memory intensive applications
authorStr	Athanasaki, Evangelia
ppnlink_with_tag_str_mv	@@773@@(DE-627)13046466X
format	Article
dewey-ones	004 - Data processing & computer science 620 - Engineering & allied operations
delete_txt_mv	keep
author_role	aut aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0920-8542
topic_title	004 620 VZ Exploring the performance limits of simultaneous multithreading for memory intensive applications Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis
topic	ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis
topic_unstemmed	ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis
topic_browse	ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	The journal of supercomputing
hierarchy_parent_id	13046466X
dewey-tens	000 - Computer science, knowledge & systems 620 - Engineering
hierarchy_top_title	The journal of supercomputing
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775
title	Exploring the performance limits of simultaneous multithreading for memory intensive applications
ctrlnum	(DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p
title_full	Exploring the performance limits of simultaneous multithreading for memory intensive applications
author_sort	Athanasaki, Evangelia
journal	The journal of supercomputing
journalStr	The journal of supercomputing
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works 600 - Technology
recordtype	marc
publishDateSort	2007
contenttype_str_mv	txt
container_start_page	64
author_browse	Athanasaki, Evangelia Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios
container_volume	44
class	004 620 VZ
format_se	Aufsätze
author-letter	Athanasaki, Evangelia
doi_str_mv	10.1007/s11227-007-0149-x
dewey-full	004 620
title_sort	exploring the performance limits of simultaneous multithreading for memory intensive applications
title_auth	Exploring the performance limits of simultaneous multithreading for memory intensive applications
abstract	Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007
abstractGer	Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007
abstract_unstemmed	Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324
container_issue	1
title_short	Exploring the performance limits of simultaneous multithreading for memory intensive applications
url	https://doi.org/10.1007/s11227-007-0149-x
remote_bool	false
author2	Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios
author2Str	Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios
ppnlink	13046466X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s11227-007-0149-x
up_date	2024-07-03T18:58:00.493Z
_version_	1803585407429902336
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC203393537X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053641.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2007 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-007-0149-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC203393537X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-007-0149-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Athanasaki, Evangelia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the performance limits of simultaneous multithreading for memory intensive applications</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2007</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2007</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Simultaneous multithreading</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Thread-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Instruction-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software prefetching</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speculative precomputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Performance analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Anastopoulos, Nikos</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kourtis, Kornilios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Koziris, Nectarios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">44(2007), 1 vom: 06. Okt., Seite 64-97</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:44</subfield><subfield code="g">year:2007</subfield><subfield code="g">number:1</subfield><subfield code="g">day:06</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:64-97</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-007-0149-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">44</subfield><subfield code="j">2007</subfield><subfield code="e">1</subfield><subfield code="b">06</subfield><subfield code="c">10</subfield><subfield code="h">64-97</subfield></datafield></record></collection>
score	7.400193

Nicht das Richtige dabei?

Schreiben Sie uns!

Exploring the performance limits of simultaneous multithreading for memory intensive applications

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?