Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large over...
Ausführliche Beschreibung
Autor*in: |
Zhou, Huan [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2016 |
---|
Schlagwörter: |
---|
Übergeordnetes Werk: |
Enthalten in: Lecture notes in computer science - Berlin, Germany : Springer, 1973, (2016) |
---|---|
Übergeordnetes Werk: |
year:2016 |
Links: |
---|
DOI / URN: |
10.1007/978-3-662-48096-0_29 |
---|
Katalog-ID: |
OLC1973550369 |
---|
LEADER | 01000caa a2200265 4500 | ||
---|---|---|---|
001 | OLC1973550369 | ||
003 | DE-627 | ||
005 | 20220224094737.0 | ||
007 | tu | ||
008 | 160430s2016 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/978-3-662-48096-0_29 |2 doi | |
028 | 5 | 2 | |a PQ20160430 |
035 | |a (DE-627)OLC1973550369 | ||
035 | |a (DE-599)GBVOLC1973550369 | ||
035 | |a (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 | ||
035 | |a (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 004 |q DNB |
082 | 0 | 4 | |a 620 |q AVZ |
100 | 1 | |a Zhou, Huan |e verfasserin |4 aut | |
245 | 1 | 0 | |a Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
264 | 1 | |c 2016 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
520 | |a The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. | ||
650 | 4 | |a Cluster Computing | |
650 | 4 | |a Parallel | |
650 | 4 | |a Distributed | |
650 | 4 | |a Computer Science | |
700 | 1 | |a Idrees, Kamran |4 oth | |
700 | 1 | |a Gracia, José |4 oth | |
773 | 0 | 8 | |i Enthalten in |t Lecture notes in computer science |d Berlin, Germany : Springer, 1973 |g (2016) |w (DE-627)129300152 |w (DE-600)121909-1 |w (DE-576)014492687 |x 0302-9743 |
773 | 1 | 8 | |g year:2016 |
856 | 4 | 1 | |u http://dx.doi.org/10.1007/978-3-662-48096-0_29 |3 Volltext |
856 | 4 | 2 | |u http://arxiv.org/abs/1603.02226 |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-TEC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OPC-BBI | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_2018 | ||
951 | |a AR | ||
952 | |j 2016 |
author_variant |
h z hz |
---|---|
matchkey_str |
article:03029743:2016----::eeaigp3hrdeoyxesosoefcet |
hierarchy_sort_str |
2016 |
publishDate |
2016 |
allfields |
10.1007/978-3-662-48096-0_29 doi PQ20160430 (DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg DE-627 ger DE-627 rakwb eng 004 DNB 620 AVZ Zhou, Huan verfasserin aut Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. Cluster Computing Parallel Distributed Computer Science Idrees, Kamran oth Gracia, José oth Enthalten in Lecture notes in computer science Berlin, Germany : Springer, 1973 (2016) (DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 0302-9743 year:2016 http://dx.doi.org/10.1007/978-3-662-48096-0_29 Volltext http://arxiv.org/abs/1603.02226 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 AR 2016 |
spelling |
10.1007/978-3-662-48096-0_29 doi PQ20160430 (DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg DE-627 ger DE-627 rakwb eng 004 DNB 620 AVZ Zhou, Huan verfasserin aut Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. Cluster Computing Parallel Distributed Computer Science Idrees, Kamran oth Gracia, José oth Enthalten in Lecture notes in computer science Berlin, Germany : Springer, 1973 (2016) (DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 0302-9743 year:2016 http://dx.doi.org/10.1007/978-3-662-48096-0_29 Volltext http://arxiv.org/abs/1603.02226 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 AR 2016 |
allfields_unstemmed |
10.1007/978-3-662-48096-0_29 doi PQ20160430 (DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg DE-627 ger DE-627 rakwb eng 004 DNB 620 AVZ Zhou, Huan verfasserin aut Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. Cluster Computing Parallel Distributed Computer Science Idrees, Kamran oth Gracia, José oth Enthalten in Lecture notes in computer science Berlin, Germany : Springer, 1973 (2016) (DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 0302-9743 year:2016 http://dx.doi.org/10.1007/978-3-662-48096-0_29 Volltext http://arxiv.org/abs/1603.02226 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 AR 2016 |
allfieldsGer |
10.1007/978-3-662-48096-0_29 doi PQ20160430 (DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg DE-627 ger DE-627 rakwb eng 004 DNB 620 AVZ Zhou, Huan verfasserin aut Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. Cluster Computing Parallel Distributed Computer Science Idrees, Kamran oth Gracia, José oth Enthalten in Lecture notes in computer science Berlin, Germany : Springer, 1973 (2016) (DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 0302-9743 year:2016 http://dx.doi.org/10.1007/978-3-662-48096-0_29 Volltext http://arxiv.org/abs/1603.02226 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 AR 2016 |
allfieldsSound |
10.1007/978-3-662-48096-0_29 doi PQ20160430 (DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg DE-627 ger DE-627 rakwb eng 004 DNB 620 AVZ Zhou, Huan verfasserin aut Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems 2016 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. Cluster Computing Parallel Distributed Computer Science Idrees, Kamran oth Gracia, José oth Enthalten in Lecture notes in computer science Berlin, Germany : Springer, 1973 (2016) (DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 0302-9743 year:2016 http://dx.doi.org/10.1007/978-3-662-48096-0_29 Volltext http://arxiv.org/abs/1603.02226 GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 AR 2016 |
language |
English |
source |
Enthalten in Lecture notes in computer science (2016) year:2016 |
sourceStr |
Enthalten in Lecture notes in computer science (2016) year:2016 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Cluster Computing Parallel Distributed Computer Science |
dewey-raw |
004 |
isfreeaccess_bool |
false |
container_title |
Lecture notes in computer science |
authorswithroles_txt_mv |
Zhou, Huan @@aut@@ Idrees, Kamran @@oth@@ Gracia, José @@oth@@ |
publishDateDaySort_date |
2016-01-01T00:00:00Z |
hierarchy_top_id |
129300152 |
dewey-sort |
14 |
id |
OLC1973550369 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1973550369</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220224094737.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160430s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/978-3-662-48096-0_29</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1973550369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1973550369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Zhou, Huan</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster Computing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Idrees, Kamran</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gracia, José</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Lecture notes in computer science</subfield><subfield code="d">Berlin, Germany : Springer, 1973</subfield><subfield code="g">(2016)</subfield><subfield code="w">(DE-627)129300152</subfield><subfield code="w">(DE-600)121909-1</subfield><subfield code="w">(DE-576)014492687</subfield><subfield code="x">0302-9743</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">year:2016</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1007/978-3-662-48096-0_29</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://arxiv.org/abs/1603.02226</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="j">2016</subfield></datafield></record></collection>
|
author |
Zhou, Huan |
spellingShingle |
Zhou, Huan ddc 004 ddc 620 misc Cluster Computing misc Parallel misc Distributed misc Computer Science Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
authorStr |
Zhou, Huan |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)129300152 |
format |
Article |
dewey-ones |
004 - Data processing & computer science 620 - Engineering & allied operations |
delete_txt_mv |
keep |
author_role |
aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0302-9743 |
topic_title |
004 DNB 620 AVZ Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems Cluster Computing Parallel Distributed Computer Science |
topic |
ddc 004 ddc 620 misc Cluster Computing misc Parallel misc Distributed misc Computer Science |
topic_unstemmed |
ddc 004 ddc 620 misc Cluster Computing misc Parallel misc Distributed misc Computer Science |
topic_browse |
ddc 004 ddc 620 misc Cluster Computing misc Parallel misc Distributed misc Computer Science |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
author2_variant |
k i ki j g jg |
hierarchy_parent_title |
Lecture notes in computer science |
hierarchy_parent_id |
129300152 |
dewey-tens |
000 - Computer science, knowledge & systems 620 - Engineering |
hierarchy_top_title |
Lecture notes in computer science |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)129300152 (DE-600)121909-1 (DE-576)014492687 |
title |
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
ctrlnum |
(DE-627)OLC1973550369 (DE-599)GBVOLC1973550369 (PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0 (KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg |
title_full |
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
author_sort |
Zhou, Huan |
journal |
Lecture notes in computer science |
journalStr |
Lecture notes in computer science |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works 600 - Technology |
recordtype |
marc |
publishDateSort |
2016 |
contenttype_str_mv |
txt |
author_browse |
Zhou, Huan |
class |
004 DNB 620 AVZ |
format_se |
Aufsätze |
author-letter |
Zhou, Huan |
doi_str_mv |
10.1007/978-3-662-48096-0_29 |
dewey-full |
004 620 |
title_sort |
leveraging mpi-3 shared-memory extensions for efficient pgas runtime systems |
title_auth |
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
abstract |
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. |
abstractGer |
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. |
abstract_unstemmed |
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers. |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_70 GBV_ILN_2018 |
title_short |
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems |
url |
http://dx.doi.org/10.1007/978-3-662-48096-0_29 http://arxiv.org/abs/1603.02226 |
remote_bool |
false |
author2 |
Idrees, Kamran Gracia, José |
author2Str |
Idrees, Kamran Gracia, José |
ppnlink |
129300152 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
author2_role |
oth oth |
doi_str |
10.1007/978-3-662-48096-0_29 |
up_date |
2024-07-04T02:40:49.777Z |
_version_ |
1803614525621010432 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a2200265 4500</leader><controlfield tag="001">OLC1973550369</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20220224094737.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">160430s2016 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/978-3-662-48096-0_29</subfield><subfield code="2">doi</subfield></datafield><datafield tag="028" ind1="5" ind2="2"><subfield code="a">PQ20160430</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC1973550369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)GBVOLC1973550369</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(PRQ)a627-70ebbc031968e56581418beefc0b1eaccf00f1366b9105edf9a1fe8f523cd91c0</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(KEY)0013707320160000000000000000leveragingmpi3sharedmemoryextensionsforefficientpg</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="q">DNB</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">620</subfield><subfield code="q">AVZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Zhou, Huan</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2016</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In this paper, we present an optimized PGAS-like runtime system which uses the new MPI-3 shared-memory extensions to serve intra-node communication requests and MPI-3 one-sided communication primitives to serve inter-node communication requests. The performance of our runtime system is evaluated on a Cray XC40 system through low-level communication benchmarks, a random-access benchmark and a stencil kernel. The results of the experiments demonstrate that the performance of our hybrid runtime system matches the performance of low-level RMA libraries for intra-node transfers, and that of MPI-3 for inter-node transfers.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cluster Computing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer Science</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Idrees, Kamran</subfield><subfield code="4">oth</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Gracia, José</subfield><subfield code="4">oth</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">Lecture notes in computer science</subfield><subfield code="d">Berlin, Germany : Springer, 1973</subfield><subfield code="g">(2016)</subfield><subfield code="w">(DE-627)129300152</subfield><subfield code="w">(DE-600)121909-1</subfield><subfield code="w">(DE-576)014492687</subfield><subfield code="x">0302-9743</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">year:2016</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">http://dx.doi.org/10.1007/978-3-662-48096-0_29</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="u">http://arxiv.org/abs/1603.02226</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_2018</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="j">2016</subfield></datafield></record></collection>
|
score |
7.400075 |