Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Li, Guohong [verfasserIn] Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng

Format:	Artikel
Sprache:	Englisch

Erschienen:	2014

Schlagwörter:	Network Interface Memory Block Home Node Cache Block Cooperative Cache

Anmerkung:	© The Author(s) 2014

Übergeordnetes Werk:	Enthalten in: International journal of parallel programming - Springer US, 1986, 43(2014), 6 vom: 04. Nov., Seite 1054-1077
Übergeordnetes Werk:	volume:43 ; year:2014 ; number:6 ; day:04 ; month:11 ; pages:1054-1077

Links:	Volltext

DOI / URN:	10.1007/s10766-014-0339-0

Katalog-ID:	OLC2044605805

Internformat


LEADER	01000caa a22002652 4500
001	OLC2044605805
003	DE-627
005	20230503081002.0
007	tu
008	200820s2014 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s10766-014-0339-0 \|2 doi
035			\|a (DE-627)OLC2044605805
035			\|a (DE-He213)s10766-014-0339-0-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 070 \|a 004 \|q VZ
100	1		\|a Li, Guohong \|e verfasserin \|4 aut
245	1	0	\|a Cluster Cache Monitor: Leveraging the Proximity Data in CMP
264		1	\|c 2014
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © The Author(s) 2014
520			\|a Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.
650		4	\|a Network Interface
650		4	\|a Memory Block
650		4	\|a Home Node
650		4	\|a Cache Block
650		4	\|a Cooperative Cache
700	1		\|a Temam, Olivier \|4 aut
700	1		\|a Liu, Zhenyu \|4 aut
700	1		\|a Guo, Sanchuan \|4 aut
700	1		\|a Wang, Dongsheng \|4 aut
773	0	8	\|i Enthalten in \|t International journal of parallel programming \|d Springer US, 1986 \|g 43(2014), 6 vom: 04. Nov., Seite 1054-1077 \|w (DE-627)129622028 \|w (DE-600)246656-9 \|w (DE-576)015131793 \|x 0885-7458 \|7 nnns
773	1	8	\|g volume:43 \|g year:2014 \|g number:6 \|g day:04 \|g month:11 \|g pages:1054-1077
856	4	1	\|u https://doi.org/10.1007/s10766-014-0339-0 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-MAT
912			\|a SSG-OPC-BBI
912			\|a GBV_ILN_22
912			\|a GBV_ILN_70
912			\|a GBV_ILN_4318
912			\|a GBV_ILN_4323
951			\|a AR
952			\|d 43 \|j 2014 \|e 6 \|b 04 \|c 11 \|h 1054-1077

Indexfelder

author_variant	g l gl o t ot z l zl s g sg d w dw
matchkey_str	article:08857458:2014----::lsecceoiolvrgnterx
hierarchy_sort_str	2014
publishDate	2014
allfields	10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077
spelling	10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077
allfields_unstemmed	10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077
allfieldsGer	10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077
allfieldsSound	10.1007/s10766-014-0339-0 doi (DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p DE-627 ger DE-627 rakwb eng 070 004 VZ Li, Guohong verfasserin aut Cluster Cache Monitor: Leveraging the Proximity Data in CMP 2014 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © The Author(s) 2014 Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. Network Interface Memory Block Home Node Cache Block Cooperative Cache Temam, Olivier aut Liu, Zhenyu aut Guo, Sanchuan aut Wang, Dongsheng aut Enthalten in International journal of parallel programming Springer US, 1986 43(2014), 6 vom: 04. Nov., Seite 1054-1077 (DE-627)129622028 (DE-600)246656-9 (DE-576)015131793 0885-7458 nnns volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077 https://doi.org/10.1007/s10766-014-0339-0 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323 AR 43 2014 6 04 11 1054-1077
language	English
source	Enthalten in International journal of parallel programming 43(2014), 6 vom: 04. Nov., Seite 1054-1077 volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077
sourceStr	Enthalten in International journal of parallel programming 43(2014), 6 vom: 04. Nov., Seite 1054-1077 volume:43 year:2014 number:6 day:04 month:11 pages:1054-1077
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Network Interface Memory Block Home Node Cache Block Cooperative Cache
dewey-raw	070
isfreeaccess_bool	false
container_title	International journal of parallel programming
authorswithroles_txt_mv	Li, Guohong @@aut@@ Temam, Olivier @@aut@@ Liu, Zhenyu @@aut@@ Guo, Sanchuan @@aut@@ Wang, Dongsheng @@aut@@
publishDateDaySort_date	2014-11-04T00:00:00Z
hierarchy_top_id	129622028
dewey-sort	270
id	OLC2044605805
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2044605805</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503081002.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10766-014-0339-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2044605805</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10766-014-0339-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Li, Guohong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Cluster Cache Monitor: Leveraging the Proximity Data in CMP</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Network Interface</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Home Node</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cooperative Cache</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Zhenyu</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Sanchuan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Dongsheng</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of parallel programming</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">43(2014), 6 vom: 04. Nov., Seite 1054-1077</subfield><subfield code="w">(DE-627)129622028</subfield><subfield code="w">(DE-600)246656-9</subfield><subfield code="w">(DE-576)015131793</subfield><subfield code="x">0885-7458</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:6</subfield><subfield code="g">day:04</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:1054-1077</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10766-014-0339-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2014</subfield><subfield code="e">6</subfield><subfield code="b">04</subfield><subfield code="c">11</subfield><subfield code="h">1054-1077</subfield></datafield></record></collection>
author	Li, Guohong
spellingShingle	Li, Guohong ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache Cluster Cache Monitor: Leveraging the Proximity Data in CMP
authorStr	Li, Guohong
ppnlink_with_tag_str_mv	@@773@@(DE-627)129622028
format	Article
dewey-ones	070 - News media, journalism & publishing 004 - Data processing & computer science
delete_txt_mv	keep
author_role	aut aut aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0885-7458
topic_title	070 004 VZ Cluster Cache Monitor: Leveraging the Proximity Data in CMP Network Interface Memory Block Home Node Cache Block Cooperative Cache
topic	ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache
topic_unstemmed	ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache
topic_browse	ddc 070 misc Network Interface misc Memory Block misc Home Node misc Cache Block misc Cooperative Cache
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	International journal of parallel programming
hierarchy_parent_id	129622028
dewey-tens	070 - News media, journalism & publishing 000 - Computer science, knowledge & systems
hierarchy_top_title	International journal of parallel programming
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)129622028 (DE-600)246656-9 (DE-576)015131793
title	Cluster Cache Monitor: Leveraging the Proximity Data in CMP
ctrlnum	(DE-627)OLC2044605805 (DE-He213)s10766-014-0339-0-p
title_full	Cluster Cache Monitor: Leveraging the Proximity Data in CMP
author_sort	Li, Guohong
journal	International journal of parallel programming
journalStr	International journal of parallel programming
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works
recordtype	marc
publishDateSort	2014
contenttype_str_mv	txt
container_start_page	1054
author_browse	Li, Guohong Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng
container_volume	43
class	070 004 VZ
format_se	Aufsätze
author-letter	Li, Guohong
doi_str_mv	10.1007/s10766-014-0339-0
dewey-full	070 004
title_sort	cluster cache monitor: leveraging the proximity data in cmp
title_auth	Cluster Cache Monitor: Leveraging the Proximity Data in CMP
abstract	Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014
abstractGer	Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014
abstract_unstemmed	Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA. © The Author(s) 2014
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-MAT SSG-OPC-BBI GBV_ILN_22 GBV_ILN_70 GBV_ILN_4318 GBV_ILN_4323
container_issue	6
title_short	Cluster Cache Monitor: Leveraging the Proximity Data in CMP
url	https://doi.org/10.1007/s10766-014-0339-0
remote_bool	false
author2	Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng
author2Str	Temam, Olivier Liu, Zhenyu Guo, Sanchuan Wang, Dongsheng
ppnlink	129622028
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s10766-014-0339-0
up_date	2024-07-04T00:08:10.026Z
_version_	1803604920921751552
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2044605805</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230503081002.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200820s2014 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s10766-014-0339-0</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2044605805</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s10766-014-0339-0-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">070</subfield><subfield code="a">004</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Li, Guohong</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Cluster Cache Monitor: Leveraging the Proximity Data in CMP</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2014</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© The Author(s) 2014</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of $$2\times 2$$ nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Network Interface</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Home Node</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache Block</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cooperative Cache</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Temam, Olivier</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Liu, Zhenyu</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Guo, Sanchuan</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wang, Dongsheng</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">International journal of parallel programming</subfield><subfield code="d">Springer US, 1986</subfield><subfield code="g">43(2014), 6 vom: 04. Nov., Seite 1054-1077</subfield><subfield code="w">(DE-627)129622028</subfield><subfield code="w">(DE-600)246656-9</subfield><subfield code="w">(DE-576)015131793</subfield><subfield code="x">0885-7458</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:43</subfield><subfield code="g">year:2014</subfield><subfield code="g">number:6</subfield><subfield code="g">day:04</subfield><subfield code="g">month:11</subfield><subfield code="g">pages:1054-1077</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s10766-014-0339-0</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OPC-BBI</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_22</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4318</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_4323</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">43</subfield><subfield code="j">2014</subfield><subfield code="e">6</subfield><subfield code="b">04</subfield><subfield code="c">11</subfield><subfield code="h">1054-1077</subfield></datafield></record></collection>
score	7.4016323

Nicht das Richtige dabei?

Schreiben Sie uns!

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?