Exploring the performance limits of simultaneous multithreading for memory intensive applications
Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance g...
Ausführliche Beschreibung
Autor*in: |
Athanasaki, Evangelia [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2007 |
---|
Schlagwörter: |
---|
Anmerkung: |
© Springer Science+Business Media, LLC 2007 |
---|
Übergeordnetes Werk: |
Enthalten in: The journal of supercomputing - Springer US, 1987, 44(2007), 1 vom: 06. Okt., Seite 64-97 |
---|---|
Übergeordnetes Werk: |
volume:44 ; year:2007 ; number:1 ; day:06 ; month:10 ; pages:64-97 |
Links: |
---|
DOI / URN: |
10.1007/s11227-007-0149-x |
---|
Katalog-ID: |
OLC203393537X |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC203393537X | ||
003 | DE-627 | ||
005 | 20230504053641.0 | ||
007 | tu | ||
008 | 200819s2007 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s11227-007-0149-x |2 doi | |
035 | |a (DE-627)OLC203393537X | ||
035 | |a (DE-He213)s11227-007-0149-x-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |a 620 |q VZ |
100 | 1 | |a Athanasaki, Evangelia |e verfasserin |4 aut | |
245 | 1 | 0 | |a Exploring the performance limits of simultaneous multithreading for memory intensive applications |
264 | 1 | |c 2007 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © Springer Science+Business Media, LLC 2007 | ||
520 | |a Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. | ||
650 | 4 | |a Simultaneous multithreading | |
650 | 4 | |a Thread-level parallelism | |
650 | 4 | |a Instruction-level parallelism | |
650 | 4 | |a Software prefetching | |
650 | 4 | |a Speculative precomputation | |
650 | 4 | |a Performance analysis | |
700 | 1 | |a Anastopoulos, Nikos |4 aut | |
700 | 1 | |a Kourtis, Kornilios |4 aut | |
700 | 1 | |a Koziris, Nectarios |4 aut | |
773 | 0 | 8 | |i Enthalten in |t The journal of supercomputing |d Springer US, 1987 |g 44(2007), 1 vom: 06. Okt., Seite 64-97 |w (DE-627)13046466X |w (DE-600)740510-8 |w (DE-576)018667775 |x 0920-8542 |7 nnns |
773 | 1 | 8 | |g volume:44 |g year:2007 |g number:1 |g day:06 |g month:10 |g pages:64-97 |
856 | 4 | 1 | |u https://doi.org/10.1007/s11227-007-0149-x |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a GBV_ILN_62 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_2010 | ||
912 | |a GBV_ILN_4307 | ||
912 | |a GBV_ILN_4324 | ||
951 | |a AR | ||
952 | |d 44 |j 2007 |e 1 |b 06 |c 10 |h 64-97 |
author_variant |
e a ea n a na k k kk n k nk |
---|---|
matchkey_str |
article:09208542:2007----::xlrnteefraclmtosmlaeumlihednfre |
hierarchy_sort_str |
2007 |
publishDate |
2007 |
allfields |
10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97 |
spelling |
10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97 |
allfields_unstemmed |
10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97 |
allfieldsGer |
10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97 |
allfieldsSound |
10.1007/s11227-007-0149-x doi (DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p DE-627 ger DE-627 rakwb eng 004 620 VZ Athanasaki, Evangelia verfasserin aut Exploring the performance limits of simultaneous multithreading for memory intensive applications 2007 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2007 Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis Anastopoulos, Nikos aut Kourtis, Kornilios aut Koziris, Nectarios aut Enthalten in The journal of supercomputing Springer US, 1987 44(2007), 1 vom: 06. Okt., Seite 64-97 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:44 year:2007 number:1 day:06 month:10 pages:64-97 https://doi.org/10.1007/s11227-007-0149-x lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 AR 44 2007 1 06 10 64-97 |
language |
English |
source |
Enthalten in The journal of supercomputing 44(2007), 1 vom: 06. Okt., Seite 64-97 volume:44 year:2007 number:1 day:06 month:10 pages:64-97 |
sourceStr |
Enthalten in The journal of supercomputing 44(2007), 1 vom: 06. Okt., Seite 64-97 volume:44 year:2007 number:1 day:06 month:10 pages:64-97 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
The journal of supercomputing |
authorswithroles_txt_mv |
Athanasaki, Evangelia @@aut@@ Anastopoulos, Nikos @@aut@@ Kourtis, Kornilios @@aut@@ Koziris, Nectarios @@aut@@ |
publishDateDaySort_date |
2007-10-06T00:00:00Z |
hierarchy_top_id |
13046466X |
dewey-sort |
14 |
id |
OLC203393537X |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC203393537X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053641.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2007 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-007-0149-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC203393537X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-007-0149-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Athanasaki, Evangelia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the performance limits of simultaneous multithreading for memory intensive applications</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2007</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2007</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Simultaneous multithreading</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Thread-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Instruction-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software prefetching</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speculative precomputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Performance analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Anastopoulos, Nikos</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kourtis, Kornilios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Koziris, Nectarios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">44(2007), 1 vom: 06. Okt., Seite 64-97</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:44</subfield><subfield code="g">year:2007</subfield><subfield code="g">number:1</subfield><subfield code="g">day:06</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:64-97</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-007-0149-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">44</subfield><subfield code="j">2007</subfield><subfield code="e">1</subfield><subfield code="b">06</subfield><subfield code="c">10</subfield><subfield code="h">64-97</subfield></datafield></record></collection>
|
author |
Athanasaki, Evangelia |
spellingShingle |
Athanasaki, Evangelia ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis Exploring the performance limits of simultaneous multithreading for memory intensive applications |
authorStr |
Athanasaki, Evangelia |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)13046466X |
format |
Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0920-8542 |
topic_title |
004 620 VZ Exploring the performance limits of simultaneous multithreading for memory intensive applications Simultaneous multithreading Thread-level parallelism Instruction-level parallelism Software prefetching Speculative precomputation Performance analysis |
topic |
ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis |
topic_unstemmed |
ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis |
topic_browse |
ddc 004 misc Simultaneous multithreading misc Thread-level parallelism misc Instruction-level parallelism misc Software prefetching misc Speculative precomputation misc Performance analysis |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
The journal of supercomputing |
hierarchy_parent_id |
13046466X |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering |
hierarchy_top_title |
The journal of supercomputing |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 |
title |
Exploring the performance limits of simultaneous multithreading for memory intensive applications |
ctrlnum |
(DE-627)OLC203393537X (DE-He213)s11227-007-0149-x-p |
title_full |
Exploring the performance limits of simultaneous multithreading for memory intensive applications |
author_sort |
Athanasaki, Evangelia |
journal |
The journal of supercomputing |
journalStr |
The journal of supercomputing |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2007 |
contenttype_str_mv |
txt |
container_start_page |
64 |
author_browse |
Athanasaki, Evangelia Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios |
container_volume |
44 |
class |
004 620 VZ |
format_se |
Aufsätze |
author-letter |
Athanasaki, Evangelia |
doi_str_mv |
10.1007/s11227-007-0149-x |
dewey-full |
004 620 |
title_sort |
exploring the performance limits of simultaneous multithreading for memory intensive applications |
title_auth |
Exploring the performance limits of simultaneous multithreading for memory intensive applications |
abstract |
Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007 |
abstractGer |
Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007 |
abstract_unstemmed |
Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor. © Springer Science+Business Media, LLC 2007 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_62 GBV_ILN_70 GBV_ILN_2010 GBV_ILN_4307 GBV_ILN_4324 |
container_issue |
1 |
title_short |
Exploring the performance limits of simultaneous multithreading for memory intensive applications |
url |
https://doi.org/10.1007/s11227-007-0149-x |
remote_bool |
false |
author2 |
Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios |
author2Str |
Anastopoulos, Nikos Kourtis, Kornilios Koziris, Nectarios |
ppnlink |
13046466X |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s11227-007-0149-x |
up_date |
2024-07-03T18:58:00.493Z |
_version_ |
1803585407429902336 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC203393537X</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053641.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2007 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-007-0149-x</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC203393537X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-007-0149-x-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Athanasaki, Evangelia</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Exploring the performance limits of simultaneous multithreading for memory intensive applications</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2007</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2007</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent threads. Moreover, as these separate threads tend to put pressure on the same architectural resources, no significant speedup can be observed. In this paper, we evaluate and contrast thread-level parallelism (TLP) and speculative precomputation (SPR) techniques for a series of memory intensive codes executed on a specific SMT processor implementation. We explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instruction streams. By obtaining knowledge on how such streams interact when executed simultaneously on the processor, and quantifying their presence within each application’s threads, we try to interpret the observed performance for each application when parallelized according to the aforementioned techniques. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Simultaneous multithreading</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Thread-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Instruction-level parallelism</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Software prefetching</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Speculative precomputation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Performance analysis</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Anastopoulos, Nikos</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Kourtis, Kornilios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Koziris, Nectarios</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">44(2007), 1 vom: 06. Okt., Seite 64-97</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:44</subfield><subfield code="g">year:2007</subfield><subfield code="g">number:1</subfield><subfield code="g">day:06</subfield><subfield code="g">month:10</subfield><subfield code="g">pages:64-97</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-007-0149-x</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_62</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2010</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4307</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4324</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">44</subfield><subfield code="j">2007</subfield><subfield code="e">1</subfield><subfield code="b">06</subfield><subfield code="c">10</subfield><subfield code="h">64-97</subfield></datafield></record></collection>
|
score |
7.400193 |