Cluster Cache Monitor: Leveraging the Proximity Data in CMP
Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance...
Ausführliche Beschreibung
Autor*in: |
Li, Guohong [verfasserIn] |
---|
Format: |
Artikel |
---|---|
Sprache: |
Englisch |
Erschienen: |
2014 |
---|
Schlagwörter: |
---|
Anmerkung: |
© The Author(s) 2014 |
---|
Übergeordnetes Werk: |
Enthalten in: International journal of parallel programming - Springer US, 1986, 43(2014), 6 vom: 04. Nov., Seite 1054-1077 |
---|---|
Übergeordnetes Werk: |
volume:43 ; year:2014 ; number:6 ; day:04 ; month:11 ; pages:1054-1077 |
Links: |
---|
DOI / URN: |
10.1007/s10766-014-0339-0 |
---|
Katalog-ID: |
OLC2044605805 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | OLC2044605805 | ||
003 | DE-627 | ||
005 | 20230503081002.0 | ||
007 | tu | ||
008 | 200820s2014 xx ||||| 00| ||eng c | ||
024 | 7 | |a 10.1007/s10766-014-0339-0 |2 doi | |
035 | |a (DE-627)OLC2044605805 | ||
035 | |a (DE-He213)s10766-014-0339-0-p | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
082 | 0 | 4 | |a 070 |a 004 |q VZ |
100 | 1 | |a Li, Guohong |e verfasserin |4 aut | |
245 | 1 | 0 | |a Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
264 | 1 | |c 2014 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ohne Hilfsmittel zu benutzen |b n |2 rdamedia | ||
338 | |a Band |b nc |2 rdacarrier | ||
500 | |a © The Author(s) 2014 | ||
520 | |a Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. | ||
650 | 4 | |a Network Interface | |
650 | 4 | |a Memory Block | |
650 | 4 | |a Home Node | |
650 | 4 | |a Cache Block | |
650 | 4 | |a Cooperative Cache | |
700 | 1 | |a Temam, Olivier |4 aut | |
700 | 1 | |a Liu, Zhenyu |4 aut | |
700 | 1 | |a Guo, Sanchuan |4 aut | |
700 | 1 | |a Wang, Dongsheng |4 aut | |
773 | 0 | 8 | |i Enthalten in |t International journal of parallel programming |d Springer US, 1986 |g 43(2014), 6 vom: 04. Nov., Seite 1054-1077 |w (DE-627)129622028 |w (DE-600)246656-9 |w (DE-576)015131793 |x 0885-7458 |7 nnns |
773 | 1 | 8 | |g volume:43 |g year:2014 |g number:6 |g day:04 |g month:11 |g pages:1054-1077 |
856 | 4 | 1 | |u https://doi.org/10.1007/s10766-014-0339-0 |z lizenzpflichtig |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a SYSFLAG_A | ||
912 | |a GBV_OLC | ||
912 | |a SSG-OLC-MAT | ||
912 | |a SSG-OPC-BBI | ||
912 | |a GBV_ILN_22 | ||
912 | |a GBV_ILN_70 | ||
912 | |a GBV_ILN_4318 | ||
912 | |a GBV_ILN_4323 | ||
951 | |a AR | ||
952 | |d 43 |j 2014 |e 6 |b 04 |c 11 |h 1054-1077 |
author_variant |
g l gl o t ot z l zl s g sg d w dw |
---|---|
matchkey_str |
article:08857458:2014----::lsecceoiolvrgnterx |
hierarchy_sort_str |
2014 |
publishDate |
2014 |
allfields |
10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077 |
spelling |
10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077 |
allfields_unstemmed |
10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077 |
allfieldsGer |
10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077 |
allfieldsSound |
10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077 |
language |
English |
source |
Enthalten in International journal of parallel programming 43(2014), 6 vom: 04. Nov., Seite 1054-1077 volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 |
sourceStr |
Enthalten in International journal of parallel programming 43(2014), 6 vom: 04. Nov., Seite 1054-1077 volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 |
format_phy_str_mv |
Article |
institution |
findex.gbv.de |
topic_facet |
Network Interface Memory Block Home Node Cache Block Cooperative Cache |
dewey-raw |
070 |
isfreeaccess_bool |
false |
container_title |
International journal of parallel programming |
authorswithroles_txt_mv |
Li, Guohong @@aut@@ Temam, Olivier @@aut@@ Liu, Zhenyu @@aut@@ Guo, Sanchuan @@aut@@ Wang, Dongsheng @@aut@@ |
publishDateDaySort_date |
2014-11-04T00:00:00Z |
hierarchy_top_id |
129622028 |
dewey-sort |
270 |
id |
OLC2044605805 |
language_de |
englisch |
fullrecord |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2044605805</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503081002.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10766-014-0339-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2044605805</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10766-014-0339-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Li, Guohong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Cluster Cache Monitor: Leveraging the Proximity Data in CMP</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Network Interface</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Home Node</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cooperative Cache</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Zhenyu</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Sanchuan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Dongsheng</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of parallel programming</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">43(2014), 6 vom: 04. Nov., Seite 1054-1077</subfield><subfield code="w">(DE-627)129622028</subfield><subfield code="w">(DE-600)246656-9</subfield><subfield code="w">(DE-576)015131793</subfield><subfield code="x">0885-7458</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:6</subfield><subfield code="g">day:04</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:1054-1077</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10766-014-0339-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2014</subfield><subfield code="e">6</subfield><subfield code="b">04</subfield><subfield code="c">11</subfield><subfield code="h">1054-1077</subfield></datafield></record></collection>
|
author |
Li, Guohong |
spellingShingle |
Li, Guohong ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
authorStr |
Li, Guohong |
ppnlink_with_tag_str_mv |
@@773@@(DE-627)129622028 |
format |
Article |
dewey-ones |
070 - News media, journalism & publishing 004 - Data processing & computer science |
delete_txt_mv |
keep |
author_role |
aut aut aut aut aut |
collection |
OLC |
remote_str |
false |
illustrated |
Not Illustrated |
issn |
0885-7458 |
topic_title |
070 004 VZ Cluster Cache Monitor: Leveraging the Proximity Data in CMP Network Interface Memory Block Home Node Cache Block Cooperative Cache |
topic |
ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache |
topic_unstemmed |
ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache |
topic_browse |
ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache |
format_facet |
Aufsätze Gedruckte Aufsätze |
format_main_str_mv |
Text Zeitschrift/Artikel |
carriertype_str_mv |
nc |
hierarchy_parent_title |
International journal of parallel programming |
hierarchy_parent_id |
129622028 |
dewey-tens |
070 - News media, journalism & publishing 000 - Computer science, knowledge & systems |
hierarchy_top_title |
International journal of parallel programming |
isfreeaccess_txt |
false |
familylinks_str_mv |
(DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 |
title |
Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
ctrlnum |
(DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p |
title_full |
Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
author_sort |
Li, Guohong |
journal |
International journal of parallel programming |
journalStr |
International journal of parallel programming |
lang_code |
eng |
isOA_bool |
false |
dewey-hundreds |
000 - Computer science, information & general works |
recordtype |
marc |
publishDateSort |
2014 |
contenttype_str_mv |
txt |
container_start_page |
1054 |
author_browse |
Li, Guohong Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng |
container_volume |
43 |
class |
070 004 VZ |
format_se |
Aufsätze |
author-letter |
Li, Guohong |
doi_str_mv |
10.1007/s10766-014-0339-0 |
dewey-full |
070 004 |
title_sort |
cluster cache monitor: leveraging the proximity data in cmp |
title_auth |
Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
abstract |
Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014 |
abstractGer |
Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014 |
abstract_unstemmed |
Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014 |
collection_details |
GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 |
container_issue |
6 |
title_short |
Cluster Cache Monitor: Leveraging the Proximity Data in CMP |
url |
https://doi.org/10.1007/s10766-014-0339-0 |
remote_bool |
false |
author2 |
Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng |
author2Str |
Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng |
ppnlink |
129622028 |
mediatype_str_mv |
n |
isOA_txt |
false |
hochschulschrift_bool |
false |
doi_str |
10.1007/s10766-014-0339-0 |
up_date |
2024-07-04T00:08:10.026Z |
_version_ |
1803604920921751552 |
fullrecord_marcxml |
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2044605805</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503081002.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx ||||| 00| ||eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10766-014-0339-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2044605805</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10766-014-0339-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Li, Guohong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Cluster Cache Monitor: Leveraging the Proximity Data in CMP</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Network Interface</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Home Node</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cooperative Cache</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Zhenyu</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Sanchuan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Dongsheng</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of parallel programming</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">43(2014), 6 vom: 04. Nov., Seite 1054-1077</subfield><subfield code="w">(DE-627)129622028</subfield><subfield code="w">(DE-600)246656-9</subfield><subfield code="w">(DE-576)015131793</subfield><subfield code="x">0885-7458</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:6</subfield><subfield code="g">day:04</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:1054-1077</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10766-014-0339-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2014</subfield><subfield code="e">6</subfield><subfield code="b">04</subfield><subfield code="c">11</subfield><subfield code="h">1054-1077</subfield></datafield></record></collection>
|
score |
7.4016323 |