Active memory controller

Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an...
Ausführliche Beschreibung

Gespeichert in:

Autor*in:	Fang, Zhen [verfasserIn] Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei

Format:	Artikel
Sprache:	Englisch

Erschienen:	2012

Schlagwörter:	Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization

Anmerkung:	© Springer Science+Business Media, LLC 2012

Übergeordnetes Werk:	Enthalten in: The journal of supercomputing - Springer US, 1987, 62(2012), 1 vom: 17. Jan., Seite 510-549
Übergeordnetes Werk:	volume:62 ; year:2012 ; number:1 ; day:17 ; month:01 ; pages:510-549

Links:	Volltext

DOI / URN:	10.1007/s11227-011-0735-9

Katalog-ID:	OLC2033939782

Internformat


LEADER	01000caa a22002652 4500
001	OLC2033939782
003	DE-627
005	20230504053732.0
007	tu
008	200819s2012 xx \|\|\|\|\| 00\| \|\|eng c
024	7		\|a 10.1007/s11227-011-0735-9 \|2 doi
035			\|a (DE-627)OLC2033939782
035			\|a (DE-He213)s11227-011-0735-9-p
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|a 620 \|q VZ
100	1		\|a Fang, Zhen \|e verfasserin \|4 aut
245	1	0	\|a Active memory controller
264		1	\|c 2012
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a © Springer Science+Business Media, LLC 2012
520			\|a Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.
650		4	\|a Distributed shared memory
650		4	\|a Cache coherence
650		4	\|a Memory architecture
650		4	\|a Interprocessor synchronization
650		4	\|a DRAM organization
700	1		\|a Zhang, Lixin \|4 aut
700	1		\|a Carter, John B. \|4 aut
700	1		\|a McKee, Sally A. \|4 aut
700	1		\|a Ibrahim, Ali \|4 aut
700	1		\|a Parker, Michael A. \|4 aut
700	1		\|a Jiang, Xiaowei \|4 aut
773	0	8	\|i Enthalten in \|t The journal of supercomputing \|d Springer US, 1987 \|g 62(2012), 1 vom: 17. Jan., Seite 510-549 \|w (DE-627)13046466X \|w (DE-600)740510-8 \|w (DE-576)018667775 \|x 0920-8542 \|7 nnns
773	1	8	\|g volume:62 \|g year:2012 \|g number:1 \|g day:17 \|g month:01 \|g pages:510-549
856	4	1	\|u https://doi.org/10.1007/s11227-011-0735-9 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a SSG-OLC-TEC
912			\|a SSG-OLC-MAT
912			\|a GBV_ILN_70
951			\|a AR
952			\|d 62 \|j 2012 \|e 1 \|b 17 \|c 01 \|h 510-549

Indexfelder

author_variant	z f zf l z lz j b c jb jbc s a m sa sam a i ai m a p ma map x j xj
matchkey_str	article:09208542:2012----::cieeoyo
hierarchy_sort_str	2012
publishDate	2012
allfields	10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549
spelling	10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549
allfields_unstemmed	10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549
allfieldsGer	10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549
allfieldsSound	10.1007/s11227-011-0735-9 doi (DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p DE-627 ger DE-627 rakwb eng 004 620 VZ Fang, Zhen verfasserin aut Active memory controller 2012 Text txt rdacontent ohne Hilfsmittel zu benutzen n rdamedia Band nc rdacarrier © Springer Science+Business Media, LLC 2012 Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization Zhang, Lixin aut Carter, John B. aut McKee, Sally A. aut Ibrahim, Ali aut Parker, Michael A. aut Jiang, Xiaowei aut Enthalten in The journal of supercomputing Springer US, 1987 62(2012), 1 vom: 17. Jan., Seite 510-549 (DE-627)13046466X (DE-600)740510-8 (DE-576)018667775 0920-8542 nnns volume:62 year:2012 number:1 day:17 month:01 pages:510-549 https://doi.org/10.1007/s11227-011-0735-9 lizenzpflichtig Volltext GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70 AR 62 2012 1 17 01 510-549
language	English
source	Enthalten in The journal of supercomputing 62(2012), 1 vom: 17. Jan., Seite 510-549 volume:62 year:2012 number:1 day:17 month:01 pages:510-549
sourceStr	Enthalten in The journal of supercomputing 62(2012), 1 vom: 17. Jan., Seite 510-549 volume:62 year:2012 number:1 day:17 month:01 pages:510-549
format_phy_str_mv	Article
institution	findex.gbv.de
topic_facet	Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization
dewey-raw	004
isfreeaccess_bool	false
container_title	The journal of supercomputing
authorswithroles_txt_mv	Fang, Zhen @@aut@@ Zhang, Lixin @@aut@@ Carter, John B. @@aut@@ McKee, Sally A. @@aut@@ Ibrahim, Ali @@aut@@ Parker, Michael A. @@aut@@ Jiang, Xiaowei @@aut@@
publishDateDaySort_date	2012-01-17T00:00:00Z
hierarchy_top_id	13046466X
dewey-sort	14
id	OLC2033939782
language_de	englisch
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2033939782</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053732.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-011-0735-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2033939782</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-011-0735-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Fang, Zhen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Active memory controller</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed shared memory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache coherence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interprocessor synchronization</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">DRAM organization</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Lixin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Carter, John B.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">McKee, Sally A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ibrahim, Ali</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Parker, Michael A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jiang, Xiaowei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">62(2012), 1 vom: 17. Jan., Seite 510-549</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:1</subfield><subfield code="g">day:17</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:510-549</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-011-0735-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2012</subfield><subfield code="e">1</subfield><subfield code="b">17</subfield><subfield code="c">01</subfield><subfield code="h">510-549</subfield></datafield></record></collection>
author	Fang, Zhen
spellingShingle	Fang, Zhen ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization Active memory controller
authorStr	Fang, Zhen
ppnlink_with_tag_str_mv	@@773@@(DE-627)13046466X
format	Article
dewey-ones	004 - Data processing & computer science 620 - Engineering & allied operations
delete_txt_mv	keep
author_role	aut aut aut aut aut aut aut
collection	OLC
remote_str	false
illustrated	Not Illustrated
issn	0920-8542
topic_title	004 620 VZ Active memory controller Distributed shared memory Cache coherence Memory architecture Interprocessor synchronization DRAM organization
topic	ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization
topic_unstemmed	ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization
topic_browse	ddc 004 misc Distributed shared memory misc Cache coherence misc Memory architecture misc Interprocessor synchronization misc DRAM organization
format_facet	Aufsätze Gedruckte Aufsätze
format_main_str_mv	Text Zeitschrift/Artikel
carriertype_str_mv	nc
hierarchy_parent_title	The journal of supercomputing
hierarchy_parent_id	13046466X
dewey-tens	000 - Computer science, knowledge & systems 620 - Engineering
hierarchy_top_title	The journal of supercomputing
isfreeaccess_txt	false
familylinks_str_mv	(DE-627)13046466X (DE-600)740510-8 (DE-576)018667775
title	Active memory controller
ctrlnum	(DE-627)OLC2033939782 (DE-He213)s11227-011-0735-9-p
title_full	Active memory controller
author_sort	Fang, Zhen
journal	The journal of supercomputing
journalStr	The journal of supercomputing
lang_code	eng
isOA_bool	false
dewey-hundreds	000 - Computer science, information & general works 600 - Technology
recordtype	marc
publishDateSort	2012
contenttype_str_mv	txt
container_start_page	510
author_browse	Fang, Zhen Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei
container_volume	62
class	004 620 VZ
format_se	Aufsätze
author-letter	Fang, Zhen
doi_str_mv	10.1007/s11227-011-0735-9
dewey-full	004 620
title_sort	active memory controller
title_auth	Active memory controller
abstract	Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012
abstractGer	Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012
abstract_unstemmed	Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation. © Springer Science+Business Media, LLC 2012
collection_details	GBV_USEFLAG_A SYSFLAG_A GBV_OLC SSG-OLC-TEC SSG-OLC-MAT GBV_ILN_70
container_issue	1
title_short	Active memory controller
url	https://doi.org/10.1007/s11227-011-0735-9
remote_bool	false
author2	Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei
author2Str	Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei
ppnlink	13046466X
mediatype_str_mv	n
isOA_txt	false
hochschulschrift_bool	false
doi_str	10.1007/s11227-011-0735-9
up_date	2024-07-03T18:59:12.239Z
_version_	1803585482642161665
fullrecord_marcxml	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01000caa a22002652 4500</leader><controlfield tag="001">OLC2033939782</controlfield><controlfield tag="003">DE-627</controlfield><controlfield tag="005">20230504053732.0</controlfield><controlfield tag="007">tu</controlfield><controlfield tag="008">200819s2012 xx \|\|\|\|\| 00\| \|\|eng c</controlfield><datafield tag="024" ind1="7" ind2=" "><subfield code="a">10.1007/s11227-011-0735-9</subfield><subfield code="2">doi</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-627)OLC2033939782</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-He213)s11227-011-0735-9-p</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-627</subfield><subfield code="b">ger</subfield><subfield code="c">DE-627</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1=" " ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="082" ind1="0" ind2="4"><subfield code="a">004</subfield><subfield code="a">620</subfield><subfield code="q">VZ</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Fang, Zhen</subfield><subfield code="e">verfasserin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Active memory controller</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="c">2012</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">Text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">ohne Hilfsmittel zu benutzen</subfield><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">Band</subfield><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">© Springer Science+Business Media, LLC 2012</subfield></datafield><datafield tag="520" ind1=" " ind2=" "><subfield code="a">Abstract Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs’ performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50× faster barriers, 12× faster spinlocks, 8.5×–15× faster stream/array operations, and 3× faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Distributed shared memory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache coherence</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory architecture</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Interprocessor synchronization</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">DRAM organization</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Zhang, Lixin</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Carter, John B.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">McKee, Sally A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Ibrahim, Ali</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Parker, Michael A.</subfield><subfield code="4">aut</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Jiang, Xiaowei</subfield><subfield code="4">aut</subfield></datafield><datafield tag="773" ind1="0" ind2="8"><subfield code="i">Enthalten in</subfield><subfield code="t">The journal of supercomputing</subfield><subfield code="d">Springer US, 1987</subfield><subfield code="g">62(2012), 1 vom: 17. Jan., Seite 510-549</subfield><subfield code="w">(DE-627)13046466X</subfield><subfield code="w">(DE-600)740510-8</subfield><subfield code="w">(DE-576)018667775</subfield><subfield code="x">0920-8542</subfield><subfield code="7">nnns</subfield></datafield><datafield tag="773" ind1="1" ind2="8"><subfield code="g">volume:62</subfield><subfield code="g">year:2012</subfield><subfield code="g">number:1</subfield><subfield code="g">day:17</subfield><subfield code="g">month:01</subfield><subfield code="g">pages:510-549</subfield></datafield><datafield tag="856" ind1="4" ind2="1"><subfield code="u">https://doi.org/10.1007/s11227-011-0735-9</subfield><subfield code="z">lizenzpflichtig</subfield><subfield code="3">Volltext</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_USEFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SYSFLAG_A</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_OLC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-TEC</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">SSG-OLC-MAT</subfield></datafield><datafield tag="912" ind1=" " ind2=" "><subfield code="a">GBV_ILN_70</subfield></datafield><datafield tag="951" ind1=" " ind2=" "><subfield code="a">AR</subfield></datafield><datafield tag="952" ind1=" " ind2=" "><subfield code="d">62</subfield><subfield code="j">2012</subfield><subfield code="e">1</subfield><subfield code="b">17</subfield><subfield code="c">01</subfield><subfield code="h">510-549</subfield></datafield></record></collection>
score	7.400646

Nicht das Richtige dabei?

Schreiben Sie uns!

Active memory controller

Nicht das Richtige dabei?

Zugang & Verfügbarkeit

Vorhandene Bände

Nicht das Richtige dabei?